Using nettskjemar

Nettskjemar connects to version 2 of the nettskjema api, and the main functionality here is to download data from a form into R. Once you have created a nettskjema api user, and set up your Renvironment locally, you can start accessing your forms.

General recommendations

While functions to download data also have the option to turn off the codebook, i.e. return the data with the original questions as column names, this is not recommended. Working with data in R in this format is very unpredictable, and we cannot guarantee that the functions in this package will act as expected.

Therefore, you are highly advised if you are using this package, to turn on the codebook in the Nettskjema portal for your form, and setting up a codebook for the entire form. You can toggle the codebook for a form by going to the Nettskjema portal and entering your form. Then proceed to “Settings” and then “General settings”, and make sure §Codebook activated” is set to “Yes”.

Tidyverse compatible

The data returns in this package are developed to be tidyverse-compatible. This means that those who are familiar with tidyverse, should find working with the data as retrieved from this package fairly easy. If you want to learn about the tidyverse and how to use is, there are excellent resources for that on the Tidyverse webpage.

Download submissions

Perhaps at the core of nettskjemar is the ability to download submission answers to a form into a tibble (variation of a data.frame).

library(nettskjemar)

nettskjema_get_data(123823)
Form 123823 has 4 responses to download.
# A tibble: 4 × 18
  form_id submission_id attachment_1           attachment_2    checkbox checkbox_matrix…
    <dbl> <chr>         <chr>                  <chr>           <chr>    <chr>           
1  123823 16785801      NA                     NA              1;2      1;2             
2  123823 16779763      -9j5dzy.jpg            amm_mowinckel_… 2        1               
3  123823 16509317      Screenshot 2021-01-22… NA              1        1               
4  123823 16508664      marius.jpeg            NA              1        1;2             
# … with 12 more variables: checkbox_matrix_2 <chr>, date <chr>, datetime <chr>,
#   dropdown <chr>, freetext <chr>, number_decimal <chr>, number_integer <chr>,
#   radio <chr>, radio_matrix_1 <chr>, radio_matrix_2 <chr>, slider <chr>, time <chr>

There are many arguments that can be set that give you control over the data extraction. By default, the data as set by the codebook is retrieved by use_codebook = TRUE, this can be set to FALSE which would retrieve the full-text information. If the codebook has not been set up, initial download will fail because there is none, and the user must toggle this themselves in the nettskjema-portal.

nettskjema_get_data(123823, use_codebook = FALSE)
Form 123823 has 3 responses to download.
 Error: `select()` doesn't handle lists.
Run `rlang::last_error()` to see where the error occurred. 

This specific test form includes attachments, which we know currently cannot be accessed without a codebook.

Incremental downloads

By date

If you are incrementally checking the data from your form, you don’t have to download the entire catalogue at the same time. While in most cases, the data requests are very fast, for forms with a large number of responses, incremental downloads might be more efficient.

nettskjema_get_data(123823, from_date = "2021-10-20")
Form 123823 has 2 responses to download.
# A tibble: 2 × 18
  form_id submission_id attachment_1 attachment_2          checkbox checkbox_matrix_1
    <dbl> <chr>         <chr>        <chr>                 <chr>    <chr>            
1  123823 16785801      NA           NA                    1;2      1;2              
2  123823 16779763      -9j5dzy.jpg  amm_mowinckel_300.png 2        1                
# … with 12 more variables: checkbox_matrix_2 <chr>, date <chr>, datetime <chr>,
#   dropdown <chr>, freetext <chr>, number_decimal <chr>, number_integer <chr>,
#   radio <chr>, radio_matrix_1 <chr>, radio_matrix_2 <chr>, slider <chr>, time <chr>

By submission id

Another way to incrementally get data is by submission id. In a similar way as with from_date, from_submission allows you to specify from which submission ID on the data should be retrieved.

nettskjema_get_data(123823, from_submission = 16779763)
Form 123823 has 1 responses to download.
# A tibble: 1 × 16
  form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2 date       datetime
    <dbl> <chr>         <chr>    <chr>             <chr>             <chr>      <chr>   
1  123823 16785801      1;2      1;2               1;2               07.10.2021 15.10.2…
# … with 9 more variables: dropdown <chr>, freetext <chr>, number_decimal <chr>,
#   number_integer <chr>, radio <chr>, radio_matrix_1 <chr>, radio_matrix_2 <chr>,
#   slider <chr>, time <chr>

Controlling checkbox output

The Nettskjema survey tool includes the possibility to create checkboxes, i.e. giving the respondents the ability to select several options within a question. How this returned as data is not clear cut. The default behavior of Nettskjema portal is to create one enumerated column per checkbox, with the context of the column cells being the codebook value. The default behavior of this package is to return the checkboxes as character strings with options selected separated by semi-colon (;).

# To start inspecting the data more
library(dplyr)

# These are the defaults, they don't need to be set
# They are just highlighted here
nettskjema_get_data(123823, 
                    checkbox_type = "string", 
                    checkbox_delim = ";") %>% 
  # this data has all checkbox questions coded with names like "checkbox"
  select(form_id, submission_id, starts_with("checkbox"))
Form 123823 has 4 responses to download.
# A tibble: 4 × 5
  form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2
    <dbl> <chr>         <chr>    <chr>             <chr>            
1  123823 16785801      1;2      1;2               1;2              
2  123823 16779763      2        1                 NA               
3  123823 16509317      1        1                 1                
4  123823 16508664      1        1;2               1    

these can be separated into rows if wanted, using tidyverse syntax.

nettskjema_get_data(123823, 
                    checkbox_type = "string", 
                    checkbox_delim = ";") %>% 
  select(form_id, submission_id, starts_with("checkbox")) %>% 
  separate_rows(checkbox)
Form 123823 has 4 responses to download.
# A tibble: 5 × 5
  form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2
    <dbl> <chr>         <chr>    <chr>             <chr>            
1  123823 16785801      1        1;2               1;2              
2  123823 16785801      2        1;2               1;2              
3  123823 16779763      2        1                 NA               
4  123823 16509317      1        1                 1                
5  123823 16508664      1        1;2               1       

Another way is to request the checkbox data returned as list columns

nettskjema_get_data(123823, 
                    checkbox_type = "list") %>% 
  select(form_id, submission_id, starts_with("checkbox")) 
Form 123823 has 4 responses to download.
# A tibble: 4 × 5
  form_id submission_id checkbox  checkbox_matrix_1 checkbox_matrix_2
    <dbl> <chr>         <list>    <list>            <list>           
1  123823 16785801      <chr [2]> <chr [2]>         <chr [2]>        
2  123823 16779763      <chr [1]> <chr [1]>         <chr [1]>        
3  123823 16509317      <chr [1]> <chr [1]>         <chr [1]>        
4  123823 16508664      <chr [1]> <chr [2]>         <chr [1]>       

Similar type action for list columns as for string with separate_rows is to unnest the list column.

nettskjema_get_data(123823, 
                    checkbox_type = "list") %>% 
  select(form_id, submission_id, starts_with("checkbox")) %>% 
  unnest(checkbox)
Form 123823 has 4 responses to download.
# A tibble: 5 × 5
  form_id submission_id checkbox checkbox_matrix_1 checkbox_matrix_2
    <dbl> <chr>         <chr>    <list>            <list>           
1  123823 16785801      1        <chr [2]>         <chr [2]>        
2  123823 16785801      2        <chr [2]>         <chr [2]>        
3  123823 16779763      2        <chr [1]>         <chr [1]>        
4  123823 16509317      1        <chr [1]>         <chr [1]>        
5  123823 16508664      1        <chr [2]>         <chr [1]>         

The last option is to return the data where each checkbox is a column, with a binary indicator showing if the option was selected (1) or not (0).

nettskjema_get_data(123823, 
                    checkbox_type = "columns") %>% 
  select(form_id, submission_id, starts_with("checkbox"))
Form 123823 has 4 responses to download.
# A tibble: 4 × 8
  form_id submission_id checkbox_1 checkbox_2 checkbox_matrix_1_1 checkbox_matrix_1_2
    <dbl> <chr>              <int>      <int>               <int>               <int>
1  123823 16785801               1          1                   1                   1
2  123823 16779763               0          1                   1                   0
3  123823 16509317               1          0                   1                   0
4  123823 16508664               1          0                   1                   1
# … with 2 more variables: checkbox_matrix_2_1 <int>, checkbox_matrix_2_2 <int>    

There is a gotcha with this last option. Currently, there is no way to indicate values that should actually be NA, i.e. if the question is optional there is no way to know if lack of selection means the item was explicitly not selected or someone just skipped the question.

Getting an overview of responses

If you want a quick idea of what your data contains, we recommend using the skim() function from the {skimr} package.

dt <- nettskjema_get_data(123823)
skimr::skim(dt)
── Data Summary ────────────────────────
                           Values
Name                       dt    
Number of rows             4     
Number of columns          18    
_______________________          
Column type frequency:           
  character                17    
  numeric                  1     
________________________         
Group variables            None  

── Variable type: character ─────────────────────────────────────────────────────────────
   skim_variable     n_missing complete_rate   min   max empty n_unique whitespace
 1 submission_id             0          1        8     8     0        4          0
 2 attachment_1              1          0.75    11    37     0        3          0
 3 attachment_2              3          0.25    21    21     0        1          0
 4 checkbox                  0          1        1     3     0        3          0
 5 checkbox_matrix_1         0          1        1     3     0        2          0
 6 checkbox_matrix_2         1          0.75     1     3     0        2          0
 7 date                      1          0.75    10    10     0        2          0
 8 datetime                  1          0.75    16    16     0        2          0
 9 dropdown                  0          1        1     1     0        2          0
10 freetext                  0          1        3     9     0        4          0
11 number_decimal            1          0.75     3     5     0        2          0
12 number_integer            1          0.75     1     2     0        3          0
13 radio                     0          1        1     1     0        2          0
14 radio_matrix_1            0          1        1     1     0        2          0
15 radio_matrix_2            0          1        1     1     0        2          0
16 slider                    1          0.75     1     1     0        3          0
17 time                      1          0.75     5     5     0        1          0

── Variable type: numeric ───────────────────────────────────────────────────────────────
  skim_variable n_missing complete_rate   mean    sd     p0    p25    p50    p75   p100
1 form_id               0             1 123823     0 123823 123823 123823 123823 123823
  hist 
1 ▁▁▇▁▁

If you want a quick idea of data types and missing values, the vis_dat() function from the {visdat} package is a great graphical tool.

visdat::vis_dat(dt)

Image of the data where observations are on y, and clumns on x. This is a tile plot, where colours of the tiles corresponds to data types. In this plot, grey is NA, red is character of blue is integer.