--- title: "Using nettskjemar" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using nettskjemar} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: sentence --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` Nettskjemar connects to version 2 of the [nettskjema api](https://nettskjema.no/), and the main functionality here is to download data from a form into R. Once you have [created a nettskjema api user](auth_setup.html), and set up your Renvironment locally, you can start accessing your forms. ## General recommendations While functions to download data also have the option to turn off the codebook, i.e. return the data with the original questions as column names, this is **not recommended**. Working with data in R in this format is very unpredictable, and we cannot guarantee that the functions in this package will act as expected. Therefore, you are **highly advised** if you are using this package, to turn on the codebook in the Nettskjema portal for your form, and setting up a codebook for the entire form. You can toggle the codebook for a form by going to the Nettskjema portal and entering your form. Then proceed to "Settings" and then "General settings", and make sure §Codebook activated" is set to "Yes". ## [Tidyverse](https://www.tidyverse.org/) compatible The data returns in this package are developed to be tidyverse-compatible. This means that those who are familiar with tidyverse, should find working with the data as retrieved from this package fairly easy. If you want to learn about the tidyverse and how to use is, there are excellent resources for that on the [Tidyverse webpage](https://www.tidyverse.org/learn/). ## Download submissions Perhaps at the core of nettskjemar is the ability to download submission answers to a form into a tibble (variation of a data.frame). ```{r} library(nettskjemar) nettskjema_get_data(123823) ``` Form 123823 has 4 responses to download. # A tibble: 4 × 18 form_id submission_id attachment_1 attachment_2 checkbox checkbox_matrix… 1 123823 16785801 NA NA 1;2 1;2 2 123823 16779763 -9j5dzy.jpg amm_mowinckel_… 2 1 3 123823 16509317 Screenshot 2021-01-22… NA 1 1 4 123823 16508664 marius.jpeg NA 1 1;2 # … with 12 more variables: checkbox_matrix_2 , date , datetime , # dropdown , freetext , number_decimal , number_integer , # radio , radio_matrix_1 , radio_matrix_2 , slider , time There are many arguments that can be set that give you control over the data extraction. By default, the data as set by the codebook is retrieved by `use_codebook = TRUE`, this can be set to `FALSE` which would retrieve the full-text information. If the codebook has not been set up, initial download will fail because there is none, and the user must toggle this themselves in the nettskjema-portal. ```{r} nettskjema_get_data(123823, use_codebook = FALSE) ``` Form 123823 has 3 responses to download. Error: `select()` doesn't handle lists. Run `rlang::last_error()` to see where the error occurred. This specific test form includes attachments, which we know currently cannot be accessed without a codebook. ### Incremental downloads #### By date If you are incrementally checking the data from your form, you don't have to download the entire catalogue at the same time. While in most cases, the data requests are very fast, for forms with a large number of responses, incremental downloads might be more efficient. ```{r} nettskjema_get_data(123823, from_date = "2021-10-20") ``` Form 123823 has 2 responses to download. # A tibble: 2 × 18 form_id submission_id attachment_1 attachment_2 checkbox checkbox_matrix_1 1 123823 16785801 NA NA 1;2 1;2 2 123823 16779763 -9j5dzy.jpg amm_mowinckel_300.png 2 1 # … with 12 more variables: checkbox_matrix_2 , date , datetime , # dropdown , freetext , number_decimal , number_integer , # radio , radio_matrix_1 , radio_matrix_2 , slider , time #### By submission id Another way to incrementally get data is by submission id. In a similar way as with `from_date`, `from_submission` allows you to specify from which submission ID on the data should be retrieved. ```{r} nettskjema_get_data(123823, from_submission = 16779763) ``` Form 123823 has 1 responses to download. # A tibble: 1 × 16 form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2 date datetime 1 123823 16785801 1;2 1;2 1;2 07.10.2021 15.10.2… # … with 9 more variables: dropdown , freetext , number_decimal , # number_integer , radio , radio_matrix_1 , radio_matrix_2 , # slider , time ### Controlling checkbox output The Nettskjema survey tool includes the possibility to create checkboxes, i.e. giving the respondents the ability to select several options within a question. How this returned as data is not clear cut. The default behavior of Nettskjema portal is to create one enumerated column per checkbox, with the context of the column cells being the codebook value. The default behavior of this package is to return the checkboxes as character strings with options selected separated by semi-colon (`;`). ```{r} # To start inspecting the data more library(dplyr) # These are the defaults, they don't need to be set # They are just highlighted here nettskjema_get_data(123823, checkbox_type = "string", checkbox_delim = ";") %>% # this data has all checkbox questions coded with names like "checkbox" select(form_id, submission_id, starts_with("checkbox")) ``` Form 123823 has 4 responses to download. # A tibble: 4 × 5 form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2 1 123823 16785801 1;2 1;2 1;2 2 123823 16779763 2 1 NA 3 123823 16509317 1 1 1 4 123823 16508664 1 1;2 1 these can be separated into rows if wanted, using tidyverse syntax. ```{r} nettskjema_get_data(123823, checkbox_type = "string", checkbox_delim = ";") %>% select(form_id, submission_id, starts_with("checkbox")) %>% separate_rows(checkbox) ``` Form 123823 has 4 responses to download. # A tibble: 5 × 5 form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2 1 123823 16785801 1 1;2 1;2 2 123823 16785801 2 1;2 1;2 3 123823 16779763 2 1 NA 4 123823 16509317 1 1 1 5 123823 16508664 1 1;2 1 Another way is to request the checkbox data returned as list columns ```{r} nettskjema_get_data(123823, checkbox_type = "list") %>% select(form_id, submission_id, starts_with("checkbox")) ``` Form 123823 has 4 responses to download. # A tibble: 4 × 5 form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2 1 123823 16785801 2 123823 16779763 3 123823 16509317 4 123823 16508664 Similar type action for list columns as for string with `separate_rows` is to `unnest` the list column. ```{r} nettskjema_get_data(123823, checkbox_type = "list") %>% select(form_id, submission_id, starts_with("checkbox")) %>% unnest(checkbox) ``` Form 123823 has 4 responses to download. # A tibble: 5 × 5 form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2 1 123823 16785801 1 2 123823 16785801 2 3 123823 16779763 2 4 123823 16509317 1 5 123823 16508664 1 The last option is to return the data where each checkbox is a column, with a binary indicator showing if the option was selected (`1`) or not (`0`). ```{r} nettskjema_get_data(123823, checkbox_type = "columns") %>% select(form_id, submission_id, starts_with("checkbox")) ``` Form 123823 has 4 responses to download. # A tibble: 4 × 8 form_id submission_id checkbox_1 checkbox_2 checkbox_matrix_1_1 checkbox_matrix_1_2 1 123823 16785801 1 1 1 1 2 123823 16779763 0 1 1 0 3 123823 16509317 1 0 1 0 4 123823 16508664 1 0 1 1 # … with 2 more variables: checkbox_matrix_2_1 , checkbox_matrix_2_2 There is a gotcha with this last option. Currently, there is no way to indicate values that should actually be `NA`, i.e. if the question is optional there is no way to know if lack of selection means the item was explicitly not selected or someone just skipped the question. ## Getting an overview of responses If you want a quick idea of what your data contains, we recommend using the `skim()` function from the {skimr} package. ```{r} dt <- nettskjema_get_data(123823) skimr::skim(dt) ``` ``` ── Data Summary ──────────────────────── Values Name dt Number of rows 4 Number of columns 18 _______________________ Column type frequency: character 17 numeric 1 ________________________ Group variables None ── Variable type: character ───────────────────────────────────────────────────────────── skim_variable n_missing complete_rate min max empty n_unique whitespace 1 submission_id 0 1 8 8 0 4 0 2 attachment_1 1 0.75 11 37 0 3 0 3 attachment_2 3 0.25 21 21 0 1 0 4 checkbox 0 1 1 3 0 3 0 5 checkbox_matrix_1 0 1 1 3 0 2 0 6 checkbox_matrix_2 1 0.75 1 3 0 2 0 7 date 1 0.75 10 10 0 2 0 8 datetime 1 0.75 16 16 0 2 0 9 dropdown 0 1 1 1 0 2 0 10 freetext 0 1 3 9 0 4 0 11 number_decimal 1 0.75 3 5 0 2 0 12 number_integer 1 0.75 1 2 0 3 0 13 radio 0 1 1 1 0 2 0 14 radio_matrix_1 0 1 1 1 0 2 0 15 radio_matrix_2 0 1 1 1 0 2 0 16 slider 1 0.75 1 1 0 3 0 17 time 1 0.75 5 5 0 1 0 ── Variable type: numeric ─────────────────────────────────────────────────────────────── skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 1 form_id 0 1 123823 0 123823 123823 123823 123823 123823 hist 1 ▁▁▇▁▁ ``` If you want a quick idea of data types and missing values, the `vis_dat()` function from the {visdat} package is a great graphical tool. ```{r} visdat::vis_dat(dt) ``` ```{r eval = TRUE, echo = FALSE, out.width="100%", fig.alt="Image of the data where observations are on y, and clumns on x. This is a tile plot, where colours of the tiles corresponds to data types. In this plot, grey is NA, red is character of blue is integer."} knitr::include_graphics("static/vis_dat.png") ```