Skip to contents

This vignette emphasizes how to analyze survey data based on Kobo, leveraging both the svy_*() family of functions and a Kobo tool.

Proportion for select ones

As of version v0.0.3, kobo_select_one() uses only the information present in the dataset, and then retrieve labels from the Kobo tool. It means that it does not provide lines for responses that have not been chosen.


# With the labels, wonderful "var_label" and "var_value_label"
# Choices is optional
# Survey is mandatory
kobo_select_one(design, 
                vars = c("h_2_type_latrine", "admin1"), 
                survey, 
                choices, 
                group = "milieu")

# For all select ones in the survey sheet
kobo_select_one_all(design, survey, choices)

Proportion for select multiples

This function is deliberately conservative for the following:

  • if a choice exists in the Kobo tool, but not in the dataset, it is removed from the calculation;
  • if a choice exists in the dataset, but not in the Kobo tool, it will not be taken into account; for example, if a choice has been added and recoded during cleaning, the Kobo tool must be updated beforehand (which goes hand in hand with the good practice of having an up-to-date Kobo tool that can be used as a dictionary of variables;
  • input a filtered survey sheet with the variables corresponding to the data (main, hh roster, education loop, etc.).
# With the labels, note the "choices_sep" argument
# that allows for choosing the choice separator in the database
# either a "/" or "." or a "_" , etc.
# It still only accepts one variable
# Arg 'vars' can take a vector of select_multiple variables
kobo_select_multiple(design, c("e_typ_ecole", "e_typ_ecole"), survey, choices, choices_sep = "_")

# For all select multiples
kobo_select_multiple_all(design, survey, choices, choices_sep = "_")

Mean and median for numeric variables (decimal, integer, calculate)

# Mean for one or several numeric variables
kobo_mean(design, c("c_total_3_17_femmes", "e_abandont_3a_4a_fille"), survey)

# Median for one or several numeric variables
kobo_median(design, "f_5_depenses_ba", survey, group = "milieu")

# Do the same for all variables
kobo_mean_all(design, survey)
kobo_median_all(design,survey)

Ratio for numeric variables (decimal, integer, calculate)

kobo_ratio(design, nums = "e_abandont_3a_4a_fille", denoms = "c_total_3_17_femmes", survey = survey)

Quick automation

The auto_kobo_analysis() function runs all the above functions but kobo_ratio() at once.

While all these functions provide a quick workflow for analyzing survey data, the recommend way is to provide a data analysis plan and use functions kobo_analysis() or kobo_analysis_dap() (see below), which allows for finer analyses (e.g. providing ratios, labels of indicators, etc., beyond types as defined in the Kobo tool.)

Make your own analysis

# Calculate a mean
kobo_analysis(design, analysis = "mean", vars = c("c_total_3_17_femmes", "e_abandont_3a_4a_fille"), survey)

# Calculate a median
kobo_analysis(design, analysis = "median", vars = "f_5_depenses_ba", survey)

# Calculate a ratio proportion
kobo_analysis(design, analysis = "ratio", vars = c("e_abandont_3a_4a_fille" = "c_total_3_17_femmes"), survey)

# Calculate a select_one proportion
kobo_analysis(design, analysis = "select_one", vars = c("h_2_type_latrine", "admin1"), survey, choices, na_rm = F)

# Calculate a select_multiple proportion
kobo_analysis(design, analysis = "select_multiple", vars = "e_typ_ecole", survey, choices)

Make your own analysis using a data analysis plan

Necessary columns are: analysis, vars, na_rm. Other arguments are to be passed for the whole data analysis plan, e.g. group, level, vartype, etc. If there are other columns, for instance useful for reporting such as the indicator name or the sector, it is kept. The function runs as is:

  • separate the dataframe to lists by analysis type
  • map out the analysis for each type
  • bind all
  • left_join the other columns

It should contain only one variable to analyze per row (or in the case of a ratio the two variables to calculate the ratio from separated by a comma). The package contains an example:

analysis_dap
#>                sector
#> 1 General information
#> 2            Expenses
#> 3           Education
#> 4          Sanitation
#> 5           Education
#> 6           Education
#>                                                                       indicator
#> 1                       Mean of the number of school-aged girl in the household
#> 2                                                       Median of food expenses
#> 3 A ratio that does not really mean anything and is a bad example for education
#> 4                                            % of households by type of latrine
#> 5       % of households by type of school that children go to (private, public)
#> 6       % of households by type of school that children go to (private, public)
#>                                          var        analysis na_rm
#> 1                        c_total_3_17_femmes            mean   yes
#> 2                            f_5_depenses_ba          median   yes
#> 3 e_abandont_3a_4a_fille,c_total_3_17_femmes           ratio   yes
#> 4                           h_2_type_latrine      select_one    no
#> 5                                e_typ_ecole select_multiple   yes
#> 6                                e_typ_ecole select_multiple    no
#>                                         subset
#> 1              Household with school-aged girl
#> 2                                         <NA>
#> 3   Maybe a subset that does not mean anything
#> 4                                         <NA>
#> 5 Households with at least a school-aged child
#> 6                                         <NA>

Then, to run the analysis, do the following:

# Default
kobo_analysis_from_dap(design, analysis_dap, survey, choices, choices_sep = "_")

# Grouped and confidence level of 0.99
kobo_analysis_from_dap(design, analysis_dap, survey, choices, group = "milieu", level = 0.99, choices_sep = "_")