Logistic Regression: Part II - Varietal adoption dataset
Binary classifier using categorical predictor
Let’s say we have two variable – survey response of farmer to willingness to adopt improved rice variety (in YES/NO) and them having been trained earlier about agricultural input management (in trained/untrained).
Read in the data and notice the summary.
rice_data <- readxl::read_xlsx(here::here("content", "blog", "data", "rice_variety_adoption.xlsx")) %>%
mutate_if(.predicate = is.character, as.factor)
rice_variety_adoption <- readxl::read_xlsx(here::here("content", "blog", "data", "rice_variety_adoption.xlsx")) %>%
select(improved_variety_adoption, training) %>%
# convert data to suitable factor type for analysis.
mutate_if(is.character, as.factor)
head(rice_variety_adoption) # now we have data
## # A tibble: 6 × 2
## improved_variety_adoption training
## <fct> <fct>
## 1 No No
## 2 Yes No
## 3 No No
## 4 Yes No
## 5 Yes No
## 6 No No
As a basic descriptive, contruct one way and two way cross tabulation summary, showing count of each categories. This is because logistic regression uses count data, much like in a non-parametric model.