Tidyverse

Logistic Regression: Part II - Varietal adoption dataset

Binary classifier using categorical predictor

Let’s say we have two variable – survey response of farmer to willingness to adopt improved rice variety (in YES/NO) and them having been trained earlier about agricultural input management (in trained/untrained).

Read in the data and notice the summary.

rice_data <- readxl::read_xlsx(here::here("content", "blog", "data", "rice_variety_adoption.xlsx")) %>% 
  mutate_if(.predicate = is.character, as.factor)

rice_variety_adoption <- readxl::read_xlsx(here::here("content", "blog", "data", "rice_variety_adoption.xlsx")) %>%
  select(improved_variety_adoption, training) %>% 
  # convert data to suitable factor type for analysis.
  mutate_if(is.character, as.factor)

head(rice_variety_adoption) # now we have data
## # A tibble: 6 × 2
##   improved_variety_adoption training
##   <fct>                     <fct>   
## 1 No                        No      
## 2 Yes                       No      
## 3 No                        No      
## 4 Yes                       No      
## 5 Yes                       No      
## 6 No                        No

As a basic descriptive, contruct one way and two way cross tabulation summary, showing count of each categories. This is because logistic regression uses count data, much like in a non-parametric model.

Logistic Regression: Part I - Fundamentals

Likelihood theory

Probit models were the first of those being used to analyze non-normal data using non-linear models. In an early example of probit regression, Bliss(1934) describes an experiment in which nicotine is applied to aphids and the proportion killed is recorded. As an appendix to a paper Bliss wrote a year later (Bliss, 1935), Fisher (1935) outlines the use of maximum likelihood to obtain estimates of the probit model.

Tidytuesday: Claremont Run, X-men Characters

X men characters

Data dictionary explore

Table: Table 1: Data summary

Name Piped data
Number of rows 308
Number of columns 9
_______________________
Column type frequency:
character 8
numeric 1
________________________
Group variables None

Variable type: character

Making Summary Tables in R

Background

Table output of R is one of the richest and satisfying to use feature. Rmarkdown format provides loads of package support to create, format, and present tables beautifully. This is on one aspect extremely useful while on the other end it could very well be daunting as to choose between various package options to use while formating your table. I have a bunch of suggestions and enlistments here to help get off that dilemma.

String tip: vectorized pattern replacement

Example case

Suppose you have a bunch of really filthy names, which makes you puke… You can go about fixing those with the help of stringi and stringr

Lets say following character vector hosts those filthy names.

filthy <- c("Grains %", "Moisture (gm/kg)", "Plant height (cm)", "White   spaces", "White space  (filth%)")
filthy
## [1] "Grains %"              "Moisture (gm/kg)"      "Plant height (cm)"    
## [4] "White   spaces"        "White space  (filth%)"

Now to get rid of the filth use string manipulation.

Expressing timestamp data in calendar

Unlike composing a text memos and keeping tracks of those, calendar graphics is a highly effective visual aid to taking notes and summarizing them. Well, we all have used calendar, one way or the other, in our lifetimes.

Calendar based graphics enables an accurate catch at the very first glance; For example, it is very easy relating one activity of a period to another when they are laid linearly with precise graduations. Calendar graphics does exactly that – some features (usually tiles) provide graduation, representing fixed interval of time (e.g., a day). This when combined with text allows unlimited freedom to provide narration for specific intervals.

Tidyverse and tidbits

Ideas surrounding tidy evaluation

  1. R code is a tree

    Every expression in R can be broken down to a form represented by a tree. For instance, on top of the tree there is “a function call” followed by it’s branches: first child = function name itself, other children = function arguments. Complex calls have multiple levels of branching.

  2. The code tree could be captured by quoting

    expr() quotes your(function developer) expression