Emorie D Beck
| Assignment Weights | Percent | 
|---|---|
| Class Participation | 20% | 
| Response Papers + Visualizations | 20% | 
| Final Project Proposal | 10% | 
| Class Presentation | 20% | 
| Final Project | 30% | 
| Total | 100% | 
tidyverse
tidymodels)tidyverse
dyplr (data manipulation)tidyr (data transformation and reshaping)
Data Manipulation in dplyr
%>%: The pipe. Read as “and then.”filter(): Pick observations (rows) by their values.select(): Pick variables (columns) by their names.arrange(): Reorder the rows.group_by(): Implicitly split the data set by grouping by names (columns).mutate(): Create new variables with functions of existing variables.summarize() / summarise(): Collapse many values down to a single summary.%>%filter()select()arrange()group_by()mutate()summarize()Although each of these functions are powerful alone, they are incredibly powerful in conjunction with one another. So below, I’ll briefly introduce each function, then link them all together using an example of basic data cleaning and summary.
%>%
%>% is wonderful. It makes coding intuitive. Often in coding, you need to use so-called nested functions. For example, you might want to round a number after taking the square of 43.%>%
The issue with this comes whenever we need to do a series of operations on a data set or other type of object. In such cases, if we run it in a single call, then we have to start in the middle and read our way out.
%>%
The pipe solves this by allowing you to read from left to right (or top to bottom). The easiest way to think of it is that each call of %>% reads and operates as “and then.” So with the rounded square root of 43, for example:
filter()
Often times, when conducting research (experiments or otherwise), there are observations (people, specific trials, etc.) that you don’t want to include.
# A tibble: 6 × 28
     A1    A2    A3    A4    A5    C1    C2    C3    C4    C5    E1    E2    E3
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     2     4     3     4     4     2     3     3     4     4     3     3     3
2     2     4     5     2     5     5     4     4     3     4     1     1     6
3     5     4     5     4     4     4     5     4     2     5     2     4     4
4     4     4     6     5     5     4     4     3     5     5     5     3     4
5     2     3     3     4     5     4     4     5     3     2     2     2     5
6     6     6     5     6     5     6     6     6     1     3     2     1     6
# … with 15 more variables: E4 <int>, E5 <int>, N1 <int>, N2 <int>, N3 <int>,
#   N4 <int>, N5 <int>, O1 <int>, O2 <int>, O3 <int>, O4 <int>, O5 <int>,
#   gender <int>, education <int>, age <int>filter()
Often times, when conducting research (experiments or otherwise), there are observations (people, specific trials, etc.) that you don’t want to include.
filter()
Often times, when conducting research (experiments or otherwise), there are observations (people, specific trials, etc.) that you don’t want to include.
But this isn’t quite right. We still have folks below 12. But, the beauty of filter() is that you can do sequence of OR and AND statements when there is more than one condition, such as up to 18 AND at least 12.
filter()
Often times, when conducting research (experiments or otherwise), there are observations (people, specific trials, etc.) that you don’t want to include.
Got it!
filter()
<, >, <=, and >=
bfi data frame to a string.filter()
Now let’s try a few things:
1. Create a data set with only individuals with some college (==).
filter()
Now let’s try a few things:
2. Create a data set with only people age 18 (==).
filter()
Now let’s try a few things:
3. Create a data set with individuals with some college or above (%in%).
select()
filter() is for pulling certain observations (rows), then select() is for pulling certain variables (columns).select()
bfi data, most of these have been pre-removed, so instead, we’ll imagine we don’t want to use any indicators of Agreeableness (A1-A5) and that we aren’t interested in gender.select(), there are few ways choose variables. We can bare quote name the ones we want to keep, bare quote names we want to remove, or use any of a number of select() helper functions.select():select():# A tibble: 2,800 × 22
     C1    C2    C3    C4    C5    E1    E2    E3    E4    E5    N1    N2    N3
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     2     3     3     4     4     3     3     3     4     4     3     4     2
2     5     4     4     3     4     1     1     6     4     3     3     3     3
3     4     5     4     2     5     2     4     4     4     5     4     5     4
4     4     4     3     5     5     5     3     4     4     4     2     5     2
5     4     4     5     3     2     2     2     5     4     5     2     3     4
6     6     6     6     1     3     2     1     6     5     6     3     5     2
# … with 2,794 more rows, and 9 more variables: N4 <int>, N5 <int>, O1 <int>,
#   O2 <int>, O3 <int>, O4 <int>, O5 <int>, education <chr>, age <int>select():select() helper functions.starts_with()ends_with()contains()matches()num_range()one_of()all_of()arrange()
R sort() function, the arrange() function is tidyverse version that plays nicely with other tidyverse functions.arrange()
So in our previous examples, we could also arrange() our data by age or education, rather than simply filtering. (Or as we’ll see later, we can do both!)
arrange()
We can also arrange by multiple columns, like if we wanted to sort by gender then education:
Much of the power of dplyr functions lay in the split-apply-combine method
A given set of of data are:
group_by()
group_by() function is the “split” of the methodgroup_by()
So imagine that we wanted to group_by() education levels to get average ages at each level
# A tibble: 2,800 × 8
# Groups:   education [6]
     C1    C2    C3    C4    C5   age gender education   
  <int> <int> <int> <int> <int> <int>  <int> <chr>       
1     2     3     3     4     4    16      1 <NA>        
2     5     4     4     3     4    18      2 <NA>        
3     4     5     4     2     5    17      2 <NA>        
4     4     4     3     5     5    17      2 <NA>        
5     4     4     5     3     2    17      1 <NA>        
6     6     6     6     1     3    21      2 Some College
# … with 2,794 more rowsgroup_by()
ungroup() function:bfi %>%
  select(starts_with("C"), age, gender, education) %>%
  group_by(education) %>%
  ungroup() %>%
  print(n = 6)# A tibble: 2,800 × 8
     C1    C2    C3    C4    C5   age gender education   
  <int> <int> <int> <int> <int> <int>  <int> <chr>       
1     2     3     3     4     4    16      1 <NA>        
2     5     4     4     3     4    18      2 <NA>        
3     4     5     4     2     5    17      2 <NA>        
4     4     4     3     5     5    17      2 <NA>        
5     4     4     5     3     2    17      1 <NA>        
6     6     6     6     1     3    21      2 Some College
# … with 2,794 more rowsgroup_by()
Multiple group_by() calls overwrites previous calls:
bfi %>%
  select(starts_with("C"), age, gender, education) %>%
  group_by(education) %>%
  group_by(gender, age) %>%
  print(n = 6)# A tibble: 2,800 × 8
# Groups:   gender, age [115]
     C1    C2    C3    C4    C5   age gender education   
  <int> <int> <int> <int> <int> <int>  <int> <chr>       
1     2     3     3     4     4    16      1 <NA>        
2     5     4     4     3     4    18      2 <NA>        
3     4     5     4     2     5    17      2 <NA>        
4     4     4     3     5     5    17      2 <NA>        
5     4     4     5     3     2    17      1 <NA>        
6     6     6     6     1     3    21      2 Some College
# … with 2,794 more rowsmutate()
mutate() is one of your “apply” functionsmutate(), the resulting data frame will have the same number of rows you started withmutate()
To demonstrate, let’s add a column that indicated average age levels within each age group
bfi %>%
  select(starts_with("C"), age, gender, education) %>%
  arrange(education) %>%
  group_by(education) %>% 
  mutate(age_by_edu = mean(age, na.rm = T)) %>%
  print(n = 6)# A tibble: 2,800 × 9
# Groups:   education [6]
     C1    C2    C3    C4    C5   age gender education age_by_edu
  <int> <int> <int> <int> <int> <int>  <int> <chr>          <dbl>
1     6     6     3     4     5    19      1 Below HS        25.1
2     4     3     5     3     2    21      1 Below HS        25.1
3     5     5     5     2     2    17      1 Below HS        25.1
4     5     5     4     1     1    18      1 Below HS        25.1
5     4     5     4     3     3    18      1 Below HS        25.1
6     3     2     3     4     6    18      2 Below HS        25.1
# … with 2,794 more rowsmutate()
mutate() is also super useful even when you aren’t grouping
We can create a new category
bfi %>%
  select(starts_with("C"), age, gender, education) %>%
  mutate(gender_cat = plyr::mapvalues(gender, c(1,2), c("Male", "Female")))# A tibble: 2,800 × 9
      C1    C2    C3    C4    C5   age gender education    gender_cat
   <int> <int> <int> <int> <int> <int>  <int> <chr>        <chr>     
 1     2     3     3     4     4    16      1 <NA>         Male      
 2     5     4     4     3     4    18      2 <NA>         Female    
 3     4     5     4     2     5    17      2 <NA>         Female    
 4     4     4     3     5     5    17      2 <NA>         Female    
 5     4     4     5     3     2    17      1 <NA>         Male      
 6     6     6     6     1     3    21      2 Some College Female    
 7     5     4     4     2     3    18      1 <NA>         Male      
 8     3     2     4     2     4    19      1 HS           Male      
 9     6     6     3     4     5    19      1 Below HS     Male      
10     6     5     6     2     1    17      2 <NA>         Female    
# … with 2,790 more rowsmutate()
mutate() is also super useful even when you aren’t grouping
We could also just overwrite it:
bfi %>%
  select(starts_with("C"), age, gender, education) %>%
  mutate(gender = plyr::mapvalues(gender, c(1,2), c("Male", "Female")))# A tibble: 2,800 × 8
      C1    C2    C3    C4    C5   age gender education   
   <int> <int> <int> <int> <int> <int> <chr>  <chr>       
 1     2     3     3     4     4    16 Male   <NA>        
 2     5     4     4     3     4    18 Female <NA>        
 3     4     5     4     2     5    17 Female <NA>        
 4     4     4     3     5     5    17 Female <NA>        
 5     4     4     5     3     2    17 Male   <NA>        
 6     6     6     6     1     3    21 Female Some College
 7     5     4     4     2     3    18 Male   <NA>        
 8     3     2     4     2     4    19 Male   HS          
 9     6     6     3     4     5    19 Male   Below HS    
10     6     5     6     2     1    17 Female <NA>        
# … with 2,790 more rowssummarize() / summarise()
summarize() is one of your “apply” functions# group_by() education
bfi %>%
  select(starts_with("C"), age, gender, education) %>%
  arrange(education) %>%
  group_by(education) %>% 
  summarize(age_by_edu = mean(age, na.rm = T))  # A tibble: 6 × 2
  education     age_by_edu
  <chr>              <dbl>
1 Below HS            25.1
2 College             33.0
3 Higher Degree       35.3
4 HS                  31.5
5 Some College        27.2
6 <NA>                18.0summarize() / summarise()
summarize() is one of your “apply” functions
Data Wrangling in tidyr
tidyrpivot_longer(), which takes a “wide” format data frame and makes it long.pivot_wider(), which takes a “long” format data frame and makes it wide.tidyrfull_join(), which merges all rows in either data frameinner_join(), which merges rows whose keys are present in both data framesleft_join(), which “prioritizes” the first data setright_join(), which “prioritizes” the second data set(See also:anti_join() and semi_join())
tidyr Functionspivot_longer()
gather()) Makes wide data long, based on a key 
data: the data, blank if pipedcols: columns to be made long, selected via select() callsnames_to: name(s) of key column(s) in new long data frame (string or string vector)values_to: name of values in new long data frame (string)names_sep: separator in column headers, if multiple keysvalues_drop_na: drop missing cells (similar to na.rm = T) pivot_longer(): Basic ApplicationLet’s start with an easy one – one key, one value:
# A tibble: 69,492 × 6
  SID   gender education   age item  values
  <chr>  <int> <chr>     <int> <chr>  <int>
1 1          1 <NA>         16 A1         2
2 1          1 <NA>         16 A2         4
3 1          1 <NA>         16 A3         3
4 1          1 <NA>         16 A4         4
5 1          1 <NA>         16 A5         4
6 1          1 <NA>         16 C1         2
7 1          1 <NA>         16 C2         3
8 1          1 <NA>         16 C3         3
# … with 69,484 more rowspivot_longer(): More Advanced ApplicationNow a harder one – two keys, one value:
# A tibble: 69,492 × 7
  SID   gender education   age trait item_num values
  <chr>  <int> <chr>     <int> <chr> <chr>     <int>
1 1          1 <NA>         16 A     1             2
2 1          1 <NA>         16 A     2             4
3 1          1 <NA>         16 A     3             3
4 1          1 <NA>         16 A     4             4
5 1          1 <NA>         16 A     5             4
6 1          1 <NA>         16 C     1             2
7 1          1 <NA>         16 C     2             3
8 1          1 <NA>         16 C     3             3
# … with 69,484 more rowspivot_wider()
spread()) Makes wide data long, based on a key 
data: the data, blank if pipednames_from: name(s) of key column(s) in new long data frame (string or string vector)names_sep: separator in column headers, if multiple keysnames_glue: specify multiple or custom separators of multiple keysvalues_from: name of values in new long data frame (string)values_fn: function applied to data with duplicate labels pivot_wider(): Basic Application# A tibble: 2,800 × 29
   SID   gender education    age    A1    A2    A3    A4    A5    C1    C2    C3
   <chr>  <int> <chr>      <int> <int> <int> <int> <int> <int> <int> <int> <int>
 1 1          1 <NA>          16     2     4     3     4     4     2     3     3
 2 2          2 <NA>          18     2     4     5     2     5     5     4     4
 3 3          2 <NA>          17     5     4     5     4     4     4     5     4
 4 4          2 <NA>          17     4     4     6     5     5     4     4     3
 5 5          1 <NA>          17     2     3     3     4     5     4     4     5
 6 6          2 Some Coll…    21     6     6     5     6     5     6     6     6
 7 7          1 <NA>          18     2     5     5     3     5     5     4     4
 8 8          1 HS            19     4     3     1     5     1     3     2     4
 9 9          1 Below HS      19     4     3     6     3     3     6     6     3
10 10         2 <NA>          17     2     5     6     6     5     6     5     6
# … with 2,790 more rows, and 17 more variables: C4 <int>, C5 <int>, E1 <int>,
#   E2 <int>, E3 <int>, E4 <int>, E5 <int>, N1 <int>, N2 <int>, N3 <int>,
#   N4 <int>, N5 <int>, O1 <int>, O2 <int>, O3 <int>, O4 <int>, O5 <int>pivot_wider(): More Advancedbfi_long %>%
  pivot_wider(
    names_from = c("trait", "item_num")
    , values_from = "values"
    , names_sep = "_"
  )# A tibble: 2,800 × 29
   SID   gender education    age   A_1   A_2   A_3   A_4   A_5   C_1   C_2   C_3
   <chr>  <int> <chr>      <int> <int> <int> <int> <int> <int> <int> <int> <int>
 1 1          1 <NA>          16     2     4     3     4     4     2     3     3
 2 2          2 <NA>          18     2     4     5     2     5     5     4     4
 3 3          2 <NA>          17     5     4     5     4     4     4     5     4
 4 4          2 <NA>          17     4     4     6     5     5     4     4     3
 5 5          1 <NA>          17     2     3     3     4     5     4     4     5
 6 6          2 Some Coll…    21     6     6     5     6     5     6     6     6
 7 7          1 <NA>          18     2     5     5     3     5     5     4     4
 8 8          1 HS            19     4     3     1     5     1     3     2     4
 9 9          1 Below HS      19     4     3     6     3     3     6     6     3
10 10         2 <NA>          17     2     5     6     6     5     6     5     6
# … with 2,790 more rows, and 17 more variables: C_4 <int>, C_5 <int>,
#   E_1 <int>, E_2 <int>, E_3 <int>, E_4 <int>, E_5 <int>, N_1 <int>,
#   N_2 <int>, N_3 <int>, N_4 <int>, N_5 <int>, O_1 <int>, O_2 <int>,
#   O_3 <int>, O_4 <int>, O_5 <int>pivot_wider(): A Little More Advancedbfi_long %>%
  select(-item_num) %>%
  pivot_wider(
    names_from = "trait"
    , values_from = "values"
    , names_sep = "_"
    , values_fn = mean
  )# A tibble: 2,800 × 9
   SID   gender education      age     A     C     E     N     O
   <chr>  <int> <chr>        <int> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 1          1 <NA>            16   3.4   3.2  3.4    2.8   3.8
 2 2          2 <NA>            18   3.6   4    3      3.8   3.2
 3 3          2 <NA>            17   4.4   4    3.8    3.6   3.6
 4 4          2 <NA>            17   4.8   4.2  4      2.8   3.6
 5 5          1 <NA>            17   3.4   3.6  3.6    3.2   3.2
 6 6          2 Some College    21   5.6   4.4  4      3     3.8
 7 7          1 <NA>            18   4     3.6  4.2    1.4   3.8
 8 8          1 HS              19   2.8   3    3.2    4.2   3.4
 9 9          1 Below HS        19   3.8   4.8  3.75   3.6   5  
10 10         2 <NA>            17   4.8   4    3.6    4.2   3.6
# … with 2,790 more rowsdplyr Functions_join() Functionsfull_join()inner_join()left_join()right_join()_join() Functionsbfi_only <- bfi %>% 
  rownames_to_column("SID") %>%
  select(SID, matches("[0-9]"))
bfi_only %>% print(n = 6)# A tibble: 2,800 × 26
  SID      A1    A2    A3    A4    A5    C1    C2    C3    C4    C5    E1    E2
  <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1         2     4     3     4     4     2     3     3     4     4     3     3
2 2         2     4     5     2     5     5     4     4     3     4     1     1
3 3         5     4     5     4     4     4     5     4     2     5     2     4
4 4         4     4     6     5     5     4     4     3     5     5     5     3
5 5         2     3     3     4     5     4     4     5     3     2     2     2
6 6         6     6     5     6     5     6     6     6     1     3     2     1
# … with 2,794 more rows, and 13 more variables: E3 <int>, E4 <int>, E5 <int>,
#   N1 <int>, N2 <int>, N3 <int>, N4 <int>, N5 <int>, O1 <int>, O2 <int>,
#   O3 <int>, O4 <int>, O5 <int>full_join()
Most simply, we can put those back together keeping all observations.
# A tibble: 2,800 × 29
  SID      A1    A2    A3    A4    A5    C1    C2    C3    C4    C5    E1    E2
  <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1         2     4     3     4     4     2     3     3     4     4     3     3
2 2         2     4     5     2     5     5     4     4     3     4     1     1
3 3         5     4     5     4     4     4     5     4     2     5     2     4
4 4         4     4     6     5     5     4     4     3     5     5     5     3
5 5         2     3     3     4     5     4     4     5     3     2     2     2
6 6         6     6     5     6     5     6     6     6     1     3     2     1
# … with 2,794 more rows, and 16 more variables: E3 <int>, E4 <int>, E5 <int>,
#   N1 <int>, N2 <int>, N3 <int>, N4 <int>, N5 <int>, O1 <int>, O2 <int>,
#   O3 <int>, O4 <int>, O5 <int>, education <chr>, gender <int>, age <int># A tibble: 2,800 × 29
  SID      A1    A2    A3    A4    A5    C1    C2    C3    C4    C5    E1    E2
  <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1         2     4     3     4     4     2     3     3     4     4     3     3
2 2         2     4     5     2     5     5     4     4     3     4     1     1
3 3         5     4     5     4     4     4     5     4     2     5     2     4
4 4         4     4     6     5     5     4     4     3     5     5     5     3
5 5         2     3     3     4     5     4     4     5     3     2     2     2
6 6         6     6     5     6     5     6     6     6     1     3     2     1
# … with 2,794 more rows, and 16 more variables: E3 <int>, E4 <int>, E5 <int>,
#   N1 <int>, N2 <int>, N3 <int>, N4 <int>, N5 <int>, O1 <int>, O2 <int>,
#   O3 <int>, O4 <int>, O5 <int>, gender <int>, education <chr>, age <int>inner_join()
We can also keep all rows present in both data frames
bfi_dem %>%
  filter(row_number() %in% 1:1700) %>%
  inner_join(
    bfi_only %>%
      filter(row_number() %in% 1200:2800)
  ) %>%
  print(n = 6)# A tibble: 501 × 29
  SID   education   gender   age    A1    A2    A3    A4    A5    C1    C2    C3
  <chr> <chr>        <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1200  Some Colle…      2    18     1     5     6     5     5     5     6     5
2 1201  College          2    29     1     5     6     5     5     2     1     4
3 1202  Higher Deg…      1    46     2     5     6     5     6     6     6     6
4 1203  Higher Deg…      1    58     5     4     4     4     5     4     4     5
5 1204  Higher Deg…      2    38     1     4     6     6     6     4     4     5
6 1205  Higher Deg…      2    27     2     3     1     1     1     4     2     2
# … with 495 more rows, and 17 more variables: C4 <int>, C5 <int>, E1 <int>,
#   E2 <int>, E3 <int>, E4 <int>, E5 <int>, N1 <int>, N2 <int>, N3 <int>,
#   N4 <int>, N5 <int>, O1 <int>, O2 <int>, O3 <int>, O4 <int>, O5 <int># A tibble: 2,800 × 29
  SID      A1    A2    A3    A4    A5    C1    C2    C3    C4    C5    E1    E2
  <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1         2     4     3     4     4     2     3     3     4     4     3     3
2 2         2     4     5     2     5     5     4     4     3     4     1     1
3 3         5     4     5     4     4     4     5     4     2     5     2     4
4 4         4     4     6     5     5     4     4     3     5     5     5     3
5 5         2     3     3     4     5     4     4     5     3     2     2     2
6 6         6     6     5     6     5     6     6     6     1     3     2     1
# … with 2,794 more rows, and 16 more variables: E3 <int>, E4 <int>, E5 <int>,
#   N1 <int>, N2 <int>, N3 <int>, N4 <int>, N5 <int>, O1 <int>, O2 <int>,
#   O3 <int>, O4 <int>, O5 <int>, gender <int>, education <chr>, age <int>left_join()
Or all rows present in the left (first) data frame, perhaps if it’s a subset of people with complete data
# A tibble: 2,577 × 29
  SID   education   gender   age    A1    A2    A3    A4    A5    C1    C2    C3
  <chr> <chr>        <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 6     Some Colle…      2    21     6     6     5     6     5     6     6     6
2 8     HS               1    19     4     3     1     5     1     3     2     4
3 9     Below HS         1    19     4     3     6     3     3     6     6     3
4 11    Below HS         1    21     4     4     5     6     5     4     3     5
5 15    Below HS         1    17     4     5     2     2     1     5     5     5
6 23    Higher Deg…      1    68     1     5     6     5     6     4     3     2
# … with 2,571 more rows, and 17 more variables: C4 <int>, C5 <int>, E1 <int>,
#   E2 <int>, E3 <int>, E4 <int>, E5 <int>, N1 <int>, N2 <int>, N3 <int>,
#   N4 <int>, N5 <int>, O1 <int>, O2 <int>, O3 <int>, O4 <int>, O5 <int># A tibble: 2,800 × 29
  SID      A1    A2    A3    A4    A5    C1    C2    C3    C4    C5    E1    E2
  <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1         2     4     3     4     4     2     3     3     4     4     3     3
2 2         2     4     5     2     5     5     4     4     3     4     1     1
3 3         5     4     5     4     4     4     5     4     2     5     2     4
4 4         4     4     6     5     5     4     4     3     5     5     5     3
5 5         2     3     3     4     5     4     4     5     3     2     2     2
6 6         6     6     5     6     5     6     6     6     1     3     2     1
# … with 2,794 more rows, and 16 more variables: E3 <int>, E4 <int>, E5 <int>,
#   N1 <int>, N2 <int>, N3 <int>, N4 <int>, N5 <int>, O1 <int>, O2 <int>,
#   O3 <int>, O4 <int>, O5 <int>, gender <int>, education <chr>, age <int>right_join()
Or all rows present in the right (second) data frame, such as I do when I join a codebook with raw data
# A tibble: 2,800 × 29
  SID   education   gender   age    A1    A2    A3    A4    A5    C1    C2    C3
  <chr> <chr>        <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 6     Some Colle…      2    21     6     6     5     6     5     6     6     6
2 8     HS               1    19     4     3     1     5     1     3     2     4
3 9     Below HS         1    19     4     3     6     3     3     6     6     3
4 11    Below HS         1    21     4     4     5     6     5     4     3     5
5 15    Below HS         1    17     4     5     2     2     1     5     5     5
6 23    Higher Deg…      1    68     1     5     6     5     6     4     3     2
# … with 2,794 more rows, and 17 more variables: C4 <int>, C5 <int>, E1 <int>,
#   E2 <int>, E3 <int>, E4 <int>, E5 <int>, N1 <int>, N2 <int>, N3 <int>,
#   N4 <int>, N5 <int>, O1 <int>, O2 <int>, O3 <int>, O4 <int>, O5 <int># A tibble: 2,800 × 29
  SID      A1    A2    A3    A4    A5    C1    C2    C3    C4    C5    E1    E2
  <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1         2     4     3     4     4     2     3     3     4     4     3     3
2 2         2     4     5     2     5     5     4     4     3     4     1     1
3 3         5     4     5     4     4     4     5     4     2     5     2     4
4 4         4     4     6     5     5     4     4     3     5     5     5     3
5 5         2     3     3     4     5     4     4     5     3     2     2     2
6 6         6     6     5     6     5     6     6     6     1     3     2     1
# … with 2,794 more rows, and 16 more variables: E3 <int>, E4 <int>, E5 <int>,
#   N1 <int>, N2 <int>, N3 <int>, N4 <int>, N5 <int>, O1 <int>, O2 <int>,
#   O3 <int>, O4 <int>, O5 <int>, gender <int>, education <chr>, age <int>PSC 290 - Data Visualization