Week 8 - Functional Tables and Figures

Emorie D Beck

Outline

  1. Questions on Homework
  2. Tables
  3. Figures
  4. (GitHub)

APA Tables

In psychology, we must work within the confines of APA style. Although these guidelines have been updated, the style guide remains quite similar to earlier guidelines with respect to tables.

But psychology research is heterogeneous and expectations for modern tables require combining multiple models in creative ways.

Small tweaks to data or model arguments can spell disaster for creating a table. It’s easy to make mistakes in copying values or matching different models to their respective rows and columns.

Thankfully, the increasing popularity of R has been coupled with more methods for creating a reproducible workflow that includes tables.

Tables in R

In this workshop, we will directly cover 3 different use cases, while a few others will be included in supplementary materials.

Personally, I favor non-automated tools, so we will cover the following packages:
- kable + kableExtra (html and LaTeX)
- papaja

Using these packages will build on earlier tutorials using tidyr, dplyr, workflow, and purrr and round out our discuss on data presentation using ggplot2.

For less flexible but more accessible tables see:
- apaTable
- sjPlot
- corrplot

Important Tools

Although it doesn’t cover all models, the broom and broom.mixed family of packages will provide easy to work with estimates of nearly all types of models and will also provide the model terms that are ideal for most APA tables, including estimates, standard errors, and confidence intervals.

lavaan models are slightly more complicated, but it’s actually relatively easy to deal with them (and how to extract their terms), assuming that you understand the models you are running.

Data

The data we’re going to use are from the teaching sample from the German Socioeconomic Panel Study. These data have been pre-cleaned (see earlier workshop on workflow and creating guidelines for tips).

The data we’ll use fall into three categories:
1. Personality trait composites: Negative Affect, Positive Affect, Self-Esteem, CESD Depression, and Optimism. These were cleaned, reversed coded, and composited prior to being included in this final data set.
2. Outcomes: Moving in with a partner, marriage, divorce, and child birth. These were cleaned, coded as 1 (occurred) or 0 (did not occur) according to whether an outcome occurred for each individual or not after each possible measured personality year. Moreover, people who experienced these outcomes prior to the target personality year are excluded.
3. Covariates: Age, gender (0 = male, 1 = female, education (0 = high school or below, 1 = college, 2 = higher than college), gross wages, self-rated health, smoking (0 = never smoked 1 = ever smoked), exercise, BMI, religion, parental education, and parental occupational prestige (ISEI). Each of these were composited for all available data up to the measured personality years.

Data

(gsoep <- read_csv("week8-data.csv"))
# A tibble: 329,651 × 21
   p_year   SID Outcome   o_value Trait p_value   age education gender grsWages
    <dbl> <dbl> <chr>       <dbl> <chr>   <dbl> <dbl>     <dbl>  <dbl>    <dbl>
 1   2005   901 chldbrth        0 OP          4    33        NA      1   13597.
 2   2005   901 divorced        0 OP          4    33        NA      1   13597.
 3   2005   901 married         0 OP          4    33        NA      1   13597.
 4   2005   901 mvInPrtnr       0 OP          4    33        NA      1   13597.
 5   2005  1202 chldbrth        0 OP          2    71        NA      1    3900.
 6   2005  1202 divorced        0 OP          2    71        NA      1    3900.
 7   2005  1202 mvInPrtnr       0 OP          2    71        NA      1    3900.
 8   2005  2301 chldbrth        0 OP          3    38        NA      0   33429.
 9   2005  2301 divorced        0 OP          3    38        NA      0   33429.
10   2005  2301 mvInPrtnr       1 OP          3    38        NA      0   33429.
# ℹ 329,641 more rows
# ℹ 11 more variables: physhlthevnt <dbl>, SRhealth <dbl>, smokes <dbl>,
#   exercise <dbl>, BMI <dbl>, married <dbl>, numKids <dbl>, PhysFunc <dbl>,
#   religion <dbl>, parEdu <dbl>, parOccPrstg <dbl>

Basic: One DV/Outcome, Multiple Model Terms

We’ll start with a basic case, predicting who has a child from personality, both with and without control variables.

Becauce outcome variables are binary, we’ll use logistic regression.

The basic form of the model is: \(\log\Big(\frac{p_i}{1-p_i}\Big) = b_0 + b_1X_1 + b_2X_2 ... b_pXp\)

In other words, we’re predicting the log odds of having a child from a linear combination of predictor variables.

Set up the data frame

First, we’ll use some of what we learned in the purrr workshop to set ourselves up to be able to create these tables easily, using group_by() and nest() to create nested data frames for our target personality + outcome combinations. To do this, we’ll also use what you learned about filter() and mutate().

Set up the data frame

First, we’ll use some of what we learned in the purrr workshop to set ourselves up to be able to create these tables easily, using group_by() and nest() to create nested data frames for our target personality + outcome combinations. To do this, we’ll also use what you learned about filter() and mutate().

gsoep_nested1 <- gsoep %>%
  filter(Outcome == "chldbrth") %>%
  group_by(Trait, Outcome) %>%
  nest() %>%
  ungroup()

Set up the data frame

First, let’s pause and see what we have. We now have a data frame with 3 columns (Outcome, Trait, and data) and 4 rows. The data column is of class list, meaning it’s a “list column” that contains a tibble in each cell. This means that we can use purrr functions to run operations on each of these data frames individually but without having to copy and paste the same operation multiple times for each model we want to run.

gsoep_nested1
# A tibble: 5 × 3
  Outcome  Trait  data                  
  <chr>    <chr>  <list>                
1 chldbrth OP     <tibble [9,192 × 19]> 
2 chldbrth DEP    <tibble [28,634 × 19]>
3 chldbrth NegAff <tibble [16,992 × 19]>
4 chldbrth PA     <tibble [16,969 × 19]>
5 chldbrth SE     <tibble [7,934 × 19]> 

Run Models

To run the models, I like to write short functions that are easier to read than including a local function within a call to purrr::map(). Here, we’re just going to write a simple function to predict child birth from personality.

mod1_fun <- function(d){
  d$o_value <- factor(d$o_value)
  glm(o_value ~ p_value, data = d, family = binomial(link = "logit"))
}

gsoep_nested1 <- gsoep_nested1 %>%
  mutate(m = map(data, mod1_fun))

Run Models

Now, when we look at the nested frame, we see an additional column, which is also a list, but this column contains <glm> objects rather than tibbles.

gsoep_nested1
# A tibble: 5 × 4
  Outcome  Trait  data                   m     
  <chr>    <chr>  <list>                 <list>
1 chldbrth OP     <tibble [9,192 × 19]>  <glm> 
2 chldbrth DEP    <tibble [28,634 × 19]> <glm> 
3 chldbrth NegAff <tibble [16,992 × 19]> <glm> 
4 chldbrth PA     <tibble [16,969 × 19]> <glm> 
5 chldbrth SE     <tibble [7,934 × 19]>  <glm> 

Get Key Terms

Now that we have the models, we want to get our key terms. I’m a big fan of using the function tidy from the broom package to do this. Bonus because it plays nicely with purrr. Double bonus because it will give us confidence intervals, which I generally prefer over p-values and standard erorrs because I find them more informative.

Get Key Terms

Now that we have the models, we want to get our key terms. I’m a big fan of using the function tidy from the broom package to do this. Bonus because it plays nicely with purrr. Double bonus because it will give us confidence intervals, which I generally prefer over p-values and standard erorrs because I find them more informative.

gsoep_nested1 <- gsoep_nested1 %>%
  mutate(tidy = map(m, ~tidy(., conf.int = T)))

Get Key Terms

gsoep_nested1 <- gsoep_nested1 %>%
  mutate(tidy = map(m, ~tidy(., conf.int = T)))
gsoep_nested1
# A tibble: 5 × 5
  Outcome  Trait  data                   m      tidy            
  <chr>    <chr>  <list>                 <list> <list>          
1 chldbrth OP     <tibble [9,192 × 19]>  <glm>  <tibble [2 × 7]>
2 chldbrth DEP    <tibble [28,634 × 19]> <glm>  <tibble [2 × 7]>
3 chldbrth NegAff <tibble [16,992 × 19]> <glm>  <tibble [2 × 7]>
4 chldbrth PA     <tibble [16,969 × 19]> <glm>  <tibble [2 × 7]>
5 chldbrth SE     <tibble [7,934 × 19]>  <glm>  <tibble [2 × 7]>

Note that what I’ve used here is a local function, meaning that I’ve used the notation ~function(., arguments). The tilda tells R we want a local function, and the . tells R to use the mapped m column as the function input.

Now we have a fifth column, which is a list column called tidy that contains a tibble, just like the data column.

Creating a Table

Now we are finally ready to create a table! I’m going to use kable + kableExtra to do this in steps.

First, we’ll unnest the tidy column from our data frame. Before doing so, we will drop the data and m columns because they’ve done their work for now.

tidy1 <- gsoep_nested1 %>%
  select(-data, -m) %>%
  unnest(tidy)
tidy1
# A tibble: 10 × 9
   Outcome Trait term  estimate std.error statistic   p.value conf.low conf.high
   <chr>   <chr> <chr>    <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
 1 chldbr… OP    (Int…  -4.30      0.186     -23.1  1.84e-118  -4.67      -3.94 
 2 chldbr… OP    p_va…   0.518     0.0582      8.90 5.53e- 19   0.405      0.633
 3 chldbr… DEP   (Int…  -3.48      0.146     -23.8  3.86e-125  -3.76      -3.19 
 4 chldbr… DEP   p_va…   0.143     0.0364      3.92 9.00e-  5   0.0716     0.214
 5 chldbr… NegA… (Int…  -3.58      0.132     -27.2  2.43e-163  -3.84      -3.33 
 6 chldbr… NegA… p_va…   0.107     0.0471      2.27 2.33e-  2   0.0142     0.199
 7 chldbr… PA    (Int…  -5.79      0.219     -26.5  1.64e-154  -6.23      -5.37 
 8 chldbr… PA    p_va…   0.677     0.0554     12.2  2.91e- 34   0.569      0.786
 9 chldbr… SE    (Int…  -4.11      0.342     -12.0  2.79e- 33  -4.80      -3.46 
10 chldbr… SE    p_va…   0.0853    0.0583      1.46 1.43e-  1  -0.0264     0.202

As you can see, we now have lots of information about our model terms, which are already nicely indexed by Outcome and Trait combinations.

Creating a Table

But before we’re ready to create a table, we have to make a few considerations:

  • What is our target term? In this case “p_value” which is the change in log odds associated with a 1 unit increase/decrease in p_value.
  • How will we denote significance? In this case, we’ll use confidence intervals whose signs match. We’ll then bold these terms for our table.
  • What is the desired final structure for the table? I’d like columns for Trait, estimate (b), and confidence intervals (CI) formatted to two decimal places and bolded if significant. I’d also like a span header denoting that the outcome measure is child birth.

Creating a Table

But before we’re ready to create a table, we have to make a few considerations:

What is our target term? In this case “p_value” which is the change in log odds associated with a 1 unit increase/decrease in p_value.

tidy1 <- tidy1 %>% filter(term == "p_value")
tidy1
# A tibble: 5 × 9
  Outcome  Trait  term  estimate std.error statistic  p.value conf.low conf.high
  <chr>    <chr>  <chr>    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 chldbrth OP     p_va…   0.518     0.0582      8.90 5.53e-19   0.405      0.633
2 chldbrth DEP    p_va…   0.143     0.0364      3.92 9.00e- 5   0.0716     0.214
3 chldbrth NegAff p_va…   0.107     0.0471      2.27 2.33e- 2   0.0142     0.199
4 chldbrth PA     p_va…   0.677     0.0554     12.2  2.91e-34   0.569      0.786
5 chldbrth SE     p_va…   0.0853    0.0583      1.46 1.43e- 1  -0.0264     0.202

Creating a Table

How will we denote significance? In this case, we’ll use confidence intervals whose signs match. We’ll then bold these terms for our table.

tidy1 <- tidy1 %>% mutate(sig = ifelse(sign(conf.low) == sign(conf.high), "sig", "ns"))
tidy1
# A tibble: 5 × 10
  Outcome  Trait  term  estimate std.error statistic  p.value conf.low conf.high
  <chr>    <chr>  <chr>    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 chldbrth OP     p_va…   0.518     0.0582      8.90 5.53e-19   0.405      0.633
2 chldbrth DEP    p_va…   0.143     0.0364      3.92 9.00e- 5   0.0716     0.214
3 chldbrth NegAff p_va…   0.107     0.0471      2.27 2.33e- 2   0.0142     0.199
4 chldbrth PA     p_va…   0.677     0.0554     12.2  2.91e-34   0.569      0.786
5 chldbrth SE     p_va…   0.0853    0.0583      1.46 1.43e- 1  -0.0264     0.202
# ℹ 1 more variable: sig <chr>

Creating a Table

What is the desired final structure for the table? I’d like columns for Trait, estimate (b), and confidence intervals (CI) formatted to two decimal places and bolded if significant. I’d also like a span header denoting that the outcome measure is child birth.

Before we do this, though, we need to convert our log odds to odds ratios, using the exp() function.

tidy1 <- tidy1 %>%
  mutate_at(vars(estimate, conf.low, conf.high), exp) 
tidy1
# A tibble: 5 × 10
  Outcome  Trait  term  estimate std.error statistic  p.value conf.low conf.high
  <chr>    <chr>  <chr>    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 chldbrth OP     p_va…     1.68    0.0582      8.90 5.53e-19    1.50       1.88
2 chldbrth DEP    p_va…     1.15    0.0364      3.92 9.00e- 5    1.07       1.24
3 chldbrth NegAff p_va…     1.11    0.0471      2.27 2.33e- 2    1.01       1.22
4 chldbrth PA     p_va…     1.97    0.0554     12.2  2.91e-34    1.77       2.20
5 chldbrth SE     p_va…     1.09    0.0583      1.46 1.43e- 1    0.974      1.22
# ℹ 1 more variable: sig <chr>

Creating a Table

Now, we can format them.

tidy1 <- tidy1 %>%
  mutate_at(vars(estimate, conf.low, conf.high), ~sprintf("%.2f", .)) 
tidy1
# A tibble: 5 × 10
  Outcome  Trait  term  estimate std.error statistic  p.value conf.low conf.high
  <chr>    <chr>  <chr> <chr>        <dbl>     <dbl>    <dbl> <chr>    <chr>    
1 chldbrth OP     p_va… 1.68        0.0582      8.90 5.53e-19 1.50     1.88     
2 chldbrth DEP    p_va… 1.15        0.0364      3.92 9.00e- 5 1.07     1.24     
3 chldbrth NegAff p_va… 1.11        0.0471      2.27 2.33e- 2 1.01     1.22     
4 chldbrth PA     p_va… 1.97        0.0554     12.2  2.91e-34 1.77     2.20     
5 chldbrth SE     p_va… 1.09        0.0583      1.46 1.43e- 1 0.97     1.22     
# ℹ 1 more variable: sig <chr>

sprintf() is my favorite base R formatting function. “%.2f” means I’m asking it to take a floating point number and include 2 digits after the “.” and 0 before. We can now see that the estimate, conf.low, and conf.high columns are of class <chr> instead of <dbl>.

Creating a Table

But now we need to create our confidence intervals.

tidy1 <- tidy1 %>%
  mutate(CI = sprintf("[%s, %s]", conf.low, conf.high))
tidy1
# A tibble: 5 × 11
  Outcome  Trait  term  estimate std.error statistic  p.value conf.low conf.high
  <chr>    <chr>  <chr> <chr>        <dbl>     <dbl>    <dbl> <chr>    <chr>    
1 chldbrth OP     p_va… 1.68        0.0582      8.90 5.53e-19 1.50     1.88     
2 chldbrth DEP    p_va… 1.15        0.0364      3.92 9.00e- 5 1.07     1.24     
3 chldbrth NegAff p_va… 1.11        0.0471      2.27 2.33e- 2 1.01     1.22     
4 chldbrth PA     p_va… 1.97        0.0554     12.2  2.91e-34 1.77     2.20     
5 chldbrth SE     p_va… 1.09        0.0583      1.46 1.43e- 1 0.97     1.22     
# ℹ 2 more variables: sig <chr>, CI <chr>

Creating a Table

And bold the significant confidence intervals and estimates.

tidy1 <- tidy1 %>%
  mutate_at(vars(estimate, CI), ~ifelse(sig == "sig", sprintf("<strong>%s</strong>", .), .))
tidy1
# A tibble: 5 × 11
  Outcome  Trait  term  estimate std.error statistic  p.value conf.low conf.high
  <chr>    <chr>  <chr> <chr>        <dbl>     <dbl>    <dbl> <chr>    <chr>    
1 chldbrth OP     p_va… <strong…    0.0582      8.90 5.53e-19 1.50     1.88     
2 chldbrth DEP    p_va… <strong…    0.0364      3.92 9.00e- 5 1.07     1.24     
3 chldbrth NegAff p_va… <strong…    0.0471      2.27 2.33e- 2 1.01     1.22     
4 chldbrth PA     p_va… <strong…    0.0554     12.2  2.91e-34 1.77     2.20     
5 chldbrth SE     p_va… 1.09        0.0583      1.46 1.43e- 1 0.97     1.22     
# ℹ 2 more variables: sig <chr>, CI <chr>

This reads as “for both the estimate and the CI columns, if the sig column is equal to”sig”, then let’s format it as bold using html. Otherwise, let’s leave it alone.” And indeed, we can see that the final result formats 3/4 rows.

Creating a Table

Thankfully, these can be achieved without considerable reshaping of the data, which is why we’ve started here, so we’re almost done. We just need to get rid of some unnecessary columnns.

tidy1 <- tidy1 %>%
  select(Trait, OR = estimate, CI)

Because we just have one target term and one outcome, we don’t need those columns, so we’re just keeping Trait, OR, which I renamed as such within in the select call, and CI.

Kabling a Table

Now let’s kable. You’ve likely used the kable() function from the knitr before. It’s a very useful and simple function in most occasions.

kable(tidy1)
Trait OR CI
OP <strong>1.68</strong> <strong>[1.50, 1.88]</strong>
DEP <strong>1.15</strong> <strong>[1.07, 1.24]</strong>
NegAff <strong>1.11</strong> <strong>[1.01, 1.22]</strong>
PA <strong>1.97</strong> <strong>[1.77, 2.20]</strong>
SE 1.09 [0.97, 1.22]

Kabling a Table

It will automatically generate the html code needed to create a table. But if we look closely at the code, it gives us some gobbledigook where we inputted html, so we need a way around that. I’m also going to throw in kable_styling(full_width = F) from the kableExtra package to help out here. It’s not doing much, but it will make the formatted table print in your Viewer.

kable(tidy1, escape = F) %>%
  kable_classic(full_width = F, html_font = "Times")
Trait OR CI
OP 1.68 [1.50, 1.88]
DEP 1.15 [1.07, 1.24]
NegAff 1.11 [1.01, 1.22]
PA 1.97 [1.77, 2.20]
SE 1.09 [0.97, 1.22]

Much better. But this still doesn’t look like an APA table, so let’s keep going.

Kabling a Table

  1. APA tables usually write out long names for our predictors, so let’s change those first. I’m going to create a reference tibble and use mapvalues() from the plyr function for this.
p_names <- tibble(
  old = c("NegAff", "PA", "SE", "OP", "DEP"),
  new = c("Negative Affect", "Positive Affect", "Self-Esteem", "Optimism", "Depression")
)

tidy1 %>%
  mutate(Trait = mapvalues(Trait, from = p_names$old, to = p_names$new),
         Trait = factor(Trait, levels = p_names$new)) %>%
  arrange(Trait) %>%
  kable(., escape = F) %>%
  kable_classic(full_width = F, html_font = "Times")
Trait OR CI
Negative Affect 1.11 [1.01, 1.22]
Positive Affect 1.97 [1.77, 2.20]
Self-Esteem 1.09 [0.97, 1.22]
Optimism 1.68 [1.50, 1.88]
Depression 1.15 [1.07, 1.24]

The combination of factor plus arrange here is super helpful for ordering your table.

Kabling a Table

  1. The alignment of the columns isn’t quite right. Let’s fix that. We’ll change the trait to right justified and b and CI to centered.
tidy1 %>%
  mutate(Trait = mapvalues(Trait, from = p_names$old, to = p_names$new),
         Trait = factor(Trait, levels = p_names$new)) %>%
  arrange(Trait) %>%
  kable(.
        , escape = F
        , align = c("r", "c", "c")
        ) %>%
  kable_classic(full_width = F, html_font = "Times")
Trait OR CI
Negative Affect 1.11 [1.01, 1.22]
Positive Affect 1.97 [1.77, 2.20]
Self-Esteem 1.09 [0.97, 1.22]
Optimism 1.68 [1.50, 1.88]
Depression 1.15 [1.07, 1.24]

Kabling a Table

  1. But we’re still missing our span header. There’s a great function in the kableExtra package for this add_header_above. This function takes a named vector as argument, where the elements of the vector refer to the number of columns the named element should span.
tidy1 %>%
  mutate(Trait = mapvalues(Trait, from = p_names$old, to = p_names$new),
         Trait = factor(Trait, levels = p_names$new)) %>%
  arrange(Trait) %>%
  kable(.
        , escape = F
        , align = c("r", "c", "c")
        ) %>%
  kable_classic(full_width = F, html_font = "Times") %>%
  add_header_above(c(" " = 1, "Birth of a Child" = 2))
Birth of a Child
Trait OR CI
Negative Affect 1.11 [1.01, 1.22]
Positive Affect 1.97 [1.77, 2.20]
Self-Esteem 1.09 [0.97, 1.22]
Optimism 1.68 [1.50, 1.88]
Depression 1.15 [1.07, 1.24]

Note that what the " " = 1 does is skip the Trait column. This is very useful because it let’s us not have a span header over every column.

Kabling a Table

  1. APA style requires we note how we denote significance and have a title, so let’s add a title and a note.
tidy1 %>%
  mutate(Trait = mapvalues(Trait, from = p_names$old, to = p_names$new),
         Trait = factor(Trait, levels = p_names$new)) %>%
  arrange(Trait) %>%
  kable(.
        , escape = F
        , align = c("r", "c", "c")
        , caption = "<strong>Table 1</strong><br><em>Estimated Personality-Outcome Associations</em>"
        ) %>%
  kable_classic(full_width = F, html_font = "Times")%>%
  add_header_above(c(" " = 1, "Birth of a Child" = 2)) %>%
  add_footnote(label = "Bold values indicate terms whose confidence intervals did not overlap with 0", notation = "none")
Table 1
Estimated Personality-Outcome Associations
Birth of a Child
Trait OR CI
Negative Affect 1.11 [1.01, 1.22]
Positive Affect 1.97 [1.77, 2.20]
Self-Esteem 1.09 [0.97, 1.22]
Optimism 1.68 [1.50, 1.88]
Depression 1.15 [1.07, 1.24]
Bold values indicate terms whose confidence intervals did not overlap with 0

We did it!

A Quick Note: HTML v. LaTeX

When creating tables, I prefer using HTML when I need the resulting tables to be in HTML and LaTeX when I can place the tables in a PDF. The syntax using kable and kableExtra is the same with the following exceptions:

  1. The format argument in kable() would need to be set as format = "latex".
  2. The chunk option for a table to render would need to be set as {r, results = 'asis'}.
  3. Bolding would need to be done as \\textbf{}, rather than the html <strong></strong> tag.
  4. When using collapse_rows(), which we’ll get to later, you’d want to set the latex_hline argument to latex_hline = "none".

Extracting terms from lavaan

Most models in R can easily have terms extracted via broom or broom.mixed, but one major exception to this are SEM models in lavaan. Let’s step through an example using lavaan to show you some of my tricks for working with it.

Data

For this example, we’ll use a different data set from the SOEP that have the item level personality data. The code below reads it in, recodes, and reverse scores it.

rev_code <- c("Big5__A_coarse", "Big5__C_lazy", "Big5__E_reserved", "Big5__N_dealStress")
(gsoep2 <- read_csv("week8-data-2.csv") %>%
   mutate_at(vars(matches("Big5")), function(x) {x[x < 0] <- NA; x}) %>%
   mutate_at(vars(matches("LifeEvent")), ~mapvalues(., seq(-7,1), c(rep(NA, 5), 0, NA, NA, 1), warn_missing = F)) %>%
    mutate_at(
    vars(all_of(rev_code))
    , ~as.numeric(reverse.code(., keys = -1, mini = 1, maxi = 7))
    ))
# A tibble: 136,627 × 30
   Procedural__SID Procedural__household Demographic__DOB Demographic__Sex  year
             <dbl>                 <dbl>            <dbl>            <dbl> <dbl>
 1             901                    94             1951                2  2005
 2             901                    94             1951                2  2006
 3             901                    94             1951                2  2007
 4             901                    94             1951                2  2008
 5             901                    94             1951                2  2009
 6             901                    94             1951                2  2010
 7             901                    94             1951                2  2011
 8             901                    94             1951                2  2012
 9             901                    94             1951                2  2013
10             901                    94             1951                2  2014
# ℹ 136,617 more rows
# ℹ 25 more variables: Big5__C_thorough <dbl>, Big5__E_communic <dbl>,
#   Big5__A_coarse <dbl>, Big5__O_original <dbl>, Big5__N_worry <dbl>,
#   Big5__A_forgive <dbl>, Big5__C_lazy <dbl>, Big5__E_sociable <dbl>,
#   Big5__O_artistic <dbl>, Big5__N_nervous <dbl>, Big5__C_efficient <dbl>,
#   Big5__E_reserved <dbl>, Big5__A_friendly <dbl>, Big5__O_imagin <dbl>,
#   Big5__N_dealStress <dbl>, LifeEvent__ChldBrth <dbl>, …

Data

Now we’ll change it to long format for easier cleaning and reshaping

gsoep2_long <- gsoep2 %>%
  pivot_longer(
    cols = c(starts_with("Big5"), starts_with("Life"))
    , names_to = c("category", "item")
    , names_sep = "__"
    , values_to = "value"
    , values_drop_na = T
  ) 

Data Cleaning

  • Now, we need to reshape the Big Five data for lavaan
  • If you learn nothing else from this class, I want you to learn this trick I use to run the same model multiple times on different variables
  • We are going to run a second-order latent growth model of the BFI-S, which has three items per Big Five trait
  • The item numbers are arbitrary but consistent across traits (3) as are the years (2005, 2009, 2013)
  • So if we separate the trait information from the item and year, then we can run identical models across traits
gsoep2_lavaan <- gsoep2_long %>% 
  filter(category == "Big5") %>%
  separate(item, c("trait", "item"), sep = "_") %>%
  group_by(Procedural__SID, year, trait) %>%
  mutate(item = mapvalues(item, unique(item), 1:n())) %>%
  ungroup() %>%
  pivot_wider(
    names_from = c("item", "year")
    , names_prefix = "I"
    , values_from = "value"
  )
gsoep2_lavaan
# A tibble: 83,572 × 15
   Procedural__SID Procedural__household Demographic__DOB Demographic__Sex
             <dbl>                 <dbl>            <dbl>            <dbl>
 1             901                    94             1951                2
 2             901                    94             1951                2
 3             901                    94             1951                2
 4             901                    94             1951                2
 5             901                    94             1951                2
 6            1202                   124             1913                2
 7            1202                   124             1913                2
 8            1202                   124             1913                2
 9            1202                   124             1913                2
10            1202                   124             1913                2
# ℹ 83,562 more rows
# ℹ 11 more variables: category <chr>, trait <chr>, I1_2005 <dbl>,
#   I2_2005 <dbl>, I3_2005 <dbl>, I1_2009 <dbl>, I2_2009 <dbl>, I3_2009 <dbl>,
#   I1_2013 <dbl>, I2_2013 <dbl>, I3_2013 <dbl>

Second-Order Latent Growth Model

  • Because of how we set up the data, we only have to write this out once. This is extra helpful for SEM because the model syntax can get really long and it would be time consuming to have to write this out separately for each of the Big Five
mod <- '
  W1 =~ NA*I1_2005 + lambda1*I1_2005 + lambda2*I2_2005 + lambda3*I3_2005
  W2 =~ NA*I1_2009 + lambda1*I1_2009 + lambda2*I2_2009 + lambda3*I3_2009
  W3 =~ NA*I1_2013 + lambda1*I1_2013 + lambda2*I2_2013 + lambda3*I3_2013
  
  i =~ 1*W1 + 1*W2 + 1*W3
  s =~ -1*W1 + 0*W2 + 1*W3
  
  ## intercepts
  I1_2005 ~ t1*1
  I1_2009 ~ t2*1
  I1_2013 ~ t3*1
  
  I2_2005 ~ t1*1
  I2_2009 ~ t2*1
  I2_2013 ~ t3*1
  
  I3_2005 ~ t1*1
  I3_2009 ~ t2*1
  I3_2013 ~ t3*1
  
  ## correlated residuals across time
  I1_2005 ~~ I1_2009 + I1_2013
  I1_2009 ~~ I1_2013
  I2_2005 ~~ I2_2009 + I2_2013
  I2_2009 ~~ I2_2013
  I3_2005 ~~ I3_2009 + I3_2013
  I3_2009 ~~ I3_2013
  
  ## latent variable intercepts
  W1 ~ 0*1
  W2 ~ 0*1
  W3 ~ 0*1
  
  #model constraints for effect coding
  ## loadings must average to 1
  lambda1 == 3 - lambda2 - lambda3
  ## means must average to 0
  t1 == 0 - t2 - t3
  '

Second-Order Latent Growth Model

Now, we’ll write a little function that will run the model syntax across cross-sections of our data for each Big Five trait

lavaan_fun <- function(d){
  m <- growth(
    mod
    , data = d
    , missing = "fiml"
  )
  return(m)
}

Second-Order Latent Growth Model

Now, we can create those cross-sections using tidyr::nest() and then run the model using purrr::map(). Having the trait information as rows in our data frame and using list columns let’s us separate out the trait information from the item/year information we need to run the models.

gsoep_nested2 <- gsoep2_lavaan %>%
  group_by(category, trait) %>%
  nest() %>%
  ungroup() %>%
  mutate(m = map(data, lavaan_fun))
gsoep_nested2
# A tibble: 5 × 4
  category trait data                   m       
  <chr>    <chr> <list>                 <list>  
1 Big5     C     <tibble [16,716 × 13]> <lavaan>
2 Big5     E     <tibble [16,714 × 13]> <lavaan>
3 Big5     A     <tibble [16,718 × 13]> <lavaan>
4 Big5     O     <tibble [16,708 × 13]> <lavaan>
5 Big5     N     <tibble [16,716 × 13]> <lavaan>

Extracting Results

  • Summaries of SEM models are going to spit out a MUCH longer summary of results than the typical models we run and extract information from using broom/broom.mixed.
  • When I do SEM, I do make a table that includes all of these parameters for each model and put it in the supplement
  • But for the paper itself, we only want a subset of key model terms across all traits
summary(gsoep_nested2$m[[1]])
lavaan 0.6.15 ended normally after 81 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        44
  Number of equality constraints                    14

  Number of observations                         16716
  Number of missing patterns                        38

Model Test User Model:
                                                      
  Test statistic                               284.272
  Degrees of freedom                                24
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  W1 =~                                               
    I1_2005 (lmb1)    1.053    0.001 1312.911    0.000
    I2_2005 (lmb2)    0.960    0.001  810.689    0.000
    I3_2005 (lmb3)    0.987    0.001 1142.417    0.000
  W2 =~                                               
    I1_2009 (lmb1)    1.053    0.001 1312.911    0.000
    I2_2009 (lmb2)    0.960    0.001  810.689    0.000
    I3_2009 (lmb3)    0.987    0.001 1142.417    0.000
  W3 =~                                               
    I1_2013 (lmb1)    1.053    0.001 1312.911    0.000
    I2_2013 (lmb2)    0.960    0.001  810.689    0.000
    I3_2013 (lmb3)    0.987    0.001 1142.417    0.000
  i =~                                                
    W1                1.000                           
    W2                1.000                           
    W3                1.000                           
  s =~                                                
    W1               -1.000                           
    W2                0.000                           
    W3                1.000                           

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .I1_2005 ~~                                          
   .I1_2009           0.029    0.010    3.073    0.002
   .I1_2013           0.046    0.011    4.130    0.000
 .I1_2009 ~~                                          
   .I1_2013           0.052    0.010    5.131    0.000
 .I2_2005 ~~                                          
   .I2_2009           0.775    0.025   31.118    0.000
   .I2_2013           0.685    0.029   23.661    0.000
 .I2_2009 ~~                                          
   .I2_2013           0.924    0.028   32.973    0.000
 .I3_2005 ~~                                          
   .I3_2009           0.148    0.011   13.042    0.000
   .I3_2013           0.139    0.013   11.011    0.000
 .I3_2009 ~~                                          
   .I3_2013           0.179    0.012   15.087    0.000
  i ~~                                                
    s                -0.004    0.005   -0.811    0.417

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .I1_2005   (t1)    0.443    0.089    4.952    0.000
   .I1_2009   (t2)   -0.027    0.005   -4.944    0.000
   .I1_2013   (t3)   -0.416    0.089   -4.650    0.000
   .I2_2005   (t1)    0.443    0.089    4.952    0.000
   .I2_2009   (t2)   -0.027    0.005   -4.944    0.000
   .I2_2013   (t3)   -0.416    0.089   -4.650    0.000
   .I3_2005   (t1)    0.443    0.089    4.952    0.000
   .I3_2009   (t2)   -0.027    0.005   -4.944    0.000
   .I3_2013   (t3)   -0.416    0.089   -4.650    0.000
   .W1                0.000                           
   .W2                0.000                           
   .W3                0.000                           
    i                 5.841    0.007  867.282    0.000
    s                 0.405    0.088    4.607    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .I1_2005           0.429    0.011   39.517    0.000
   .I2_2005           1.849    0.028   65.804    0.000
   .I3_2005           0.719    0.013   55.538    0.000
   .I1_2009           0.478    0.012   40.829    0.000
   .I2_2009           2.021    0.031   66.239    0.000
   .I3_2009           0.754    0.014   55.362    0.000
   .I1_2013           0.433    0.011   39.096    0.000
   .I2_2013           2.094    0.032   64.482    0.000
   .I3_2013           0.684    0.013   52.839    0.000
   .W1                0.144    0.018    7.990    0.000
   .W2                0.205    0.009   22.139    0.000
   .W3                0.084    0.018    4.696    0.000
    i                 0.380    0.009   44.413    0.000
    s                 0.057    0.010    5.808    0.000

Constraints:
                                               |Slack|
    lambda1 - (3-lambda2-lambda3)                0.000
    t1 - (0-t2-t3)                               0.000

Extracting Results

So what I do is to write a little function. I try to write it so that it can be used across multiple qualitatively different models if I’m running different (e.g., see second example later). To do that, I have (1) the function and (2) a tibble called terms that contains the paths I want to extract.

terms <- tribble(
  ~path,     ~new,
  "i~1", "Intercept",
  "s~1", "Slope",
  "i~~i", "Intercept Variance",
  "s~~s", "Slope Variance",
  "i~~s", "Intercept-Slope Covariance"
)

extract_fun <- function(m){
  parameterEstimates(m) %>%
    data.frame() %>%
    unite(path, lhs, op, rhs, sep = "") %>%
    filter(path %in% terms$path) %>%
    left_join(terms) %>%
    select(term = new, est, ci.lower, ci.upper, pvalue)
}

Extracting Results

Now we can run that function to extract the results we care about

gsoep_nested2 <- gsoep_nested2 %>%
  mutate(summary = map(m, extract_fun))
gsoep_nested2
# A tibble: 5 × 5
  category trait data                   m        summary     
  <chr>    <chr> <list>                 <list>   <list>      
1 Big5     C     <tibble [16,716 × 13]> <lavaan> <df [5 × 5]>
2 Big5     E     <tibble [16,714 × 13]> <lavaan> <df [5 × 5]>
3 Big5     A     <tibble [16,718 × 13]> <lavaan> <df [5 × 5]>
4 Big5     O     <tibble [16,708 × 13]> <lavaan> <df [5 × 5]>
5 Big5     N     <tibble [16,716 × 13]> <lavaan> <df [5 × 5]>

Formatting Results

The way that we need to format the data is pretty standard, so below are three functions I use when I need to format results from lavaan for putting into a table

round_fun <- function(x) if(!is.na(x)) if(abs(x) > .01) sprintf("%.2f", x) else sprintf("%.3f", x) else ""
pround_fun <- function(x) if(!is.na(x)) if(abs(x) > .01) sprintf("%.2f", x) else if(x >= .001) sprintf("%.3f", x) else "&lt; .001" else ""

format_fun <- function(d){
  d %>%
    mutate(sig = ifelse(pvalue < .05, "sig", "ns")) %>%
    rowwise() %>%
    mutate_at(vars(est, ci.lower, ci.upper), round_fun) %>%
    mutate_at(vars(pvalue), pround_fun) %>%
    ungroup() %>%
    mutate(CI = sprintf("[%s,%s]", ci.lower, ci.upper)) %>%
    mutate_at(vars(est, CI, pvalue), ~ifelse(sig == "sig" & !is.na(sig), sprintf("<strong>%s</strong>", .), .)) 
}

Formatting Results

Now let’s run that and see what it looks like

gsoep_tab <- gsoep_nested2 %>%
  select(-data, -m) %>%
  unnest(summary) %>%
  format_fun() 
gsoep_tab
# A tibble: 25 × 9
   category trait term                est   ci.lower ci.upper pvalue sig   CI   
   <chr>    <chr> <chr>               <chr> <chr>    <chr>    <chr>  <chr> <chr>
 1 Big5     C     Intercept Variance  <str… 0.36     0.40     <stro… sig   <str…
 2 Big5     C     Slope Variance      <str… 0.04     0.08     <stro… sig   <str…
 3 Big5     C     Intercept-Slope Co… -0.0… -0.02    0.006    0.42   ns    [-0.…
 4 Big5     C     Intercept           <str… 5.83     5.85     <stro… sig   <str…
 5 Big5     C     Slope               <str… 0.23     0.58     <stro… sig   <str…
 6 Big5     E     Intercept Variance  <str… 0.66     0.71     <stro… sig   <str…
 7 Big5     E     Slope Variance      <str… 0.05     0.10     <stro… sig   <str…
 8 Big5     E     Intercept-Slope Co… 0.004 -0.010   0.02     0.57   ns    [-0.…
 9 Big5     E     Intercept           <str… 4.81     4.84     <stro… sig   <str…
10 Big5     E     Slope               <str… 0.04     0.16     <stro… sig   <str…
# ℹ 15 more rows

Formatting Results

We’re really close to being ready to make the table using kable, but we need to do a little reshaping and to remove some columns first.

gsoep_tab <- gsoep_tab %>%
  select(-ci.lower, -ci.upper, -sig, -category, -pvalue) %>%
  pivot_wider(
    names_from = "trait"
    , names_glue = "{trait}_{.value}"
    , values_from = c("est", "CI")
  ) 
gsoep_tab
# A tibble: 5 × 11
  term               C_est E_est A_est O_est N_est C_CI  E_CI  A_CI  O_CI  N_CI 
  <chr>              <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Intercept Variance <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
2 Slope Variance     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
3 Intercept-Slope C… -0.0… 0.004 0.009 -0.0… <str… [-0.… [-0.… [-0.… [-0.… <str…
4 Intercept          <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
5 Slope              <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…

Formatting Results

Unfortunately, pivot_wider() doesn’t really let us have too much control over the order of the columns, so we need to move them around so the estimates and CI’s for each trait are next to each other. There are many ways to do this, but I’m showing you two below:

ord <- paste(rep(c("E", "A", "C", "N", "O"), each = 2), rep(c("est", "CI"), times = 5), sep = "_")
gsoep_tab <- gsoep_tab %>%
  select(term, all_of(ord)) %>%
  # select(starts_with("E"), starts_with("A"), starts_with("C"), starts_with("N"), starts_with("O"))
  mutate(term = factor(term, terms$new)) %>%
  arrange(term)
gsoep_tab
# A tibble: 5 × 11
  term               E_est E_CI  A_est A_CI  C_est C_CI  N_est N_CI  O_est O_CI 
  <fct>              <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Intercept          <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
2 Slope              <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
3 Intercept Variance <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
4 Slope Variance     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
5 Intercept-Slope C… 0.004 [-0.… 0.009 [-0.… -0.0… [-0.… <str… <str… -0.0… [-0.…

Kabling the Table

Now, let’s go ahead and create the table using kable. Like before, we’re going to use add_header_above(), but I’m going to show you a trick that I use for it to make specifying it a little easier.

  • add_header_above() takes in a named vector as input where the values have to sum to the number of columns in the data frame (11)
hdr <- c(1, rep(2,5))
names(hdr) <- c(" ", "Extraversion", "Agreeableness", "Conscientiousness", "Neuroticism", "Openness")
gsoep_tab %>%
  kable(.
        , escape = F
        , align = c("r", rep("c", 10))
        , col.names = c("Term", rep(c("<em>est.</em>", "CI"), times = 5))
        , caption = "<strong>Table 2</strong><br><em>Big Five Personality Trait Trajectories from Latent Growth Models</em>"
        ) %>%
  kable_classic(full_width = F, html_font = "Times") %>%
  add_header_above(hdr) %>%
  add_footnote(label = "Bold values indicate terms p < .05", notation = "none")

Kabling the Table

Table 2
Big Five Personality Trait Trajectories from Latent Growth Models
Extraversion
Agreeableness
Conscientiousness
Neuroticism
Openness
Term est. CI est. CI est. CI est. CI est. CI
Intercept 4.83 [4.81,4.84] 5.41 [5.39,5.42] 5.84 [5.83,5.85] 3.84 [3.82,3.86] 4.51 [4.49,4.53]
Slope 0.10 [0.04,0.16] 0.49 [0.34,0.65] 0.40 [0.23,0.58] -0.95 [-1.05,-0.86] -0.40 [-0.57,-0.24]
Intercept Variance 0.69 [0.66,0.71] 0.35 [0.33,0.37] 0.38 [0.36,0.40] 0.69 [0.66,0.72] 0.66 [0.63,0.69]
Slope Variance 0.07 [0.05,0.10] 0.03 [0.009,0.05] 0.06 [0.04,0.08] 0.07 [0.04,0.10] 0.07 [0.04,0.10]
Intercept-Slope Covariance 0.004 [-0.010,0.02] 0.009 [-0.003,0.02] -0.004 [-0.02,0.006] 0.02 [0.006,0.04] -0.007 [-0.02,0.009]
Bold values indicate terms p < .05

Second-Order Latent Growth Model: Slopes as Predictors

Now, let’s do a second example that is more complex and uses the intercepts and slopes we estimated to predict life outcomes. To do so, we first need to clean and prep the life event data and merge it back with our Big Five item-level data.

Note that in this case, we’re using long data two ways: for the Big Five traits and for the life events. In other words, we’re getting every combination of the two and are using inner_join() to make that happen!

gsoep2_lavaan <- gsoep2_long %>%
  filter(category == "LifeEvent") %>%
  group_by(Procedural__SID, Demographic__DOB, item) %>%
  summarize(le_value = max(value)) %>%
  ungroup() %>%
  mutate(age = 2005 - Demographic__DOB - 45) %>%
  rename(le = item) %>%
  inner_join(gsoep2_lavaan)
gsoep2_lavaan
# A tibble: 807,847 × 18
   Procedural__SID Demographic__DOB le      le_value   age Procedural__household
             <dbl>            <dbl> <chr>      <dbl> <dbl>                 <dbl>
 1             901             1951 ChldBr…        0     9                    94
 2             901             1951 ChldBr…        0     9                    94
 3             901             1951 ChldBr…        0     9                    94
 4             901             1951 ChldBr…        0     9                    94
 5             901             1951 ChldBr…        0     9                    94
 6             901             1951 ChldMv…        0     9                    94
 7             901             1951 ChldMv…        0     9                    94
 8             901             1951 ChldMv…        0     9                    94
 9             901             1951 ChldMv…        0     9                    94
10             901             1951 ChldMv…        0     9                    94
# ℹ 807,837 more rows
# ℹ 12 more variables: Demographic__Sex <dbl>, category <chr>, trait <chr>,
#   I1_2005 <dbl>, I2_2005 <dbl>, I3_2005 <dbl>, I1_2009 <dbl>, I2_2009 <dbl>,
#   I3_2009 <dbl>, I1_2013 <dbl>, I2_2013 <dbl>, I3_2013 <dbl>

Data Cleaning

Now let’s create the nested data frame.

gsoep_nested3 <- gsoep2_lavaan %>%
  drop_na(trait, le) %>%
  group_by(trait, le) %>%
  nest() %>%
  ungroup()
gsoep_nested3
# A tibble: 50 × 3
   le        trait data                  
   <chr>     <chr> <list>                
 1 ChldBrth  C     <tibble [16,716 × 16]>
 2 ChldBrth  E     <tibble [16,714 × 16]>
 3 ChldBrth  A     <tibble [16,718 × 16]>
 4 ChldBrth  O     <tibble [16,708 × 16]>
 5 ChldBrth  N     <tibble [16,716 × 16]>
 6 ChldMvOut C     <tibble [16,716 × 16]>
 7 ChldMvOut E     <tibble [16,714 × 16]>
 8 ChldMvOut A     <tibble [16,718 × 16]>
 9 ChldMvOut O     <tibble [16,708 × 16]>
10 ChldMvOut N     <tibble [16,716 × 16]>
# ℹ 40 more rows

Model Setup

We’re really just building on the second-order latent growth models we already ran. We already ran those and know that we have specified the models correctly / have enough variance to use the slopes and intercept to predict other things. So now we just need to add the new regression paths to that syntax to be able to run the models.

As with the model we specified before, by having the life event data long, we are able to do this just one time and have that work for all the life events. So using this trick, we can run 50 unique models with one set of model syntax!

Model Setup

Note that we have to modify the function that calls growth() because we have a categorical outcome that needs a different estimator.

mod2 <- '
le_value ~ i + age
le_value ~ s
'

mod2 <- paste(mod, mod2, collapse = "\n")

lavaan_fun <- function(d){
  m <- growth(
    mod2
    , data = d
    , ordered = "le_value"
    , estimator = 'WLSMV'
    , missing = "pairwise"
    , parallel = "multicore"
  )
  return(m)
}

gsoep_nested3 <- gsoep_nested3 %>%
  mutate(m = map(data, lavaan_fun))
# saveRDS(gsoep_nested3, "gsoep_nested3.RDS")

Extract Results

Remember how I said that the reason we created the terms tibble was because it would make extract_fun() more flexible / portable? Let’s see that in action. We’ll create a new one that lists only the key additional terms in this model and use that to extract those estimates.

gsoep_nested3 <- readRDS("gsoep_nested3.RDS")
terms <- tribble(
  ~path,     ~new,
  "le_value~i", "Intercept",
  "le_value~s", "Slope",
  "le_value~age", "Age"
)

gsoep_nested3 <- gsoep_nested3 %>%
  mutate(summary = map(m, extract_fun))

Formatting Results

Now we’re ready to format the results using the same steps as before. Note that I’m combining all them here because we’ve already stepped through them slowly:

  1. Remove other list columns and unnest()
  2. Format the results using format_fun()
  3. Remove unnecessary columns and pivot_wider()
  4. Change the order of the columns
  5. Factor the terms to help us arrange the rows
  6. Sort the rows by Life Event and then term

Formatting Results

  1. Remove other list columns and unnest()
gsoep_tab2 <- gsoep_nested3 %>%
  select(-data, -m) %>%
  unnest(summary) 
gsoep_tab2
# A tibble: 150 × 7
   le       trait term           est ci.lower ci.upper   pvalue
   <chr>    <chr> <chr>        <dbl>    <dbl>    <dbl>    <dbl>
 1 ChldBrth C     Intercept  0.118     0.0524   0.185  0.000439
 2 ChldBrth C     Age       -0.0323   -0.0353  -0.0294 0       
 3 ChldBrth C     Slope      0.205    -0.311    0.722  0.436   
 4 ChldBrth E     Intercept  0.00541  -0.0422   0.0530 0.824   
 5 ChldBrth E     Age       -0.0323   -0.0353  -0.0294 0       
 6 ChldBrth E     Slope     -0.0455   -0.445    0.354  0.823   
 7 ChldBrth A     Intercept  0.0891   -0.0185   0.197  0.104   
 8 ChldBrth A     Age       -0.0323   -0.0353  -0.0294 0       
 9 ChldBrth A     Slope     -1.19     -3.39     1.00   0.287   
10 ChldBrth O     Intercept -0.0911   -0.140   -0.0425 0.000236
# ℹ 140 more rows

Formatting Results

  1. Format the results using format_fun()
gsoep_tab2 <- gsoep_nested3 %>%
  select(-data, -m) %>%
  unnest(summary) %>%
  format_fun()
gsoep_tab2
# A tibble: 150 × 9
   le       trait term      est             ci.lower ci.upper pvalue sig   CI   
   <chr>    <chr> <chr>     <chr>           <chr>    <chr>    <chr>  <chr> <chr>
 1 ChldBrth C     Intercept <strong>0.12</… 0.05     0.18     <stro… sig   <str…
 2 ChldBrth C     Age       <strong>-0.03<… -0.04    -0.03    <stro… sig   <str…
 3 ChldBrth C     Slope     0.21            -0.31    0.72     0.44   ns    [-0.…
 4 ChldBrth E     Intercept 0.005           -0.04    0.05     0.82   ns    [-0.…
 5 ChldBrth E     Age       <strong>-0.03<… -0.04    -0.03    <stro… sig   <str…
 6 ChldBrth E     Slope     -0.05           -0.44    0.35     0.82   ns    [-0.…
 7 ChldBrth A     Intercept 0.09            -0.02    0.20     0.10   ns    [-0.…
 8 ChldBrth A     Age       <strong>-0.03<… -0.04    -0.03    <stro… sig   <str…
 9 ChldBrth A     Slope     -1.19           -3.39    1.00     0.29   ns    [-3.…
10 ChldBrth O     Intercept <strong>-0.09<… -0.14    -0.04    <stro… sig   <str…
# ℹ 140 more rows

Formatting Results

  1. Remove unnecessary columns and pivot_wider()
gsoep_tab2 <- gsoep_nested3 %>%
  select(-data, -m) %>%
  unnest(summary) %>%
  format_fun() %>%
  select(-ci.lower, -ci.upper, -sig, -pvalue) %>%
  pivot_wider(
    names_from = "trait"
    , names_glue = "{trait}_{.value}"
    , values_from = c("est", "CI")
  ) 
gsoep_tab2
# A tibble: 30 × 12
   le        term    C_est E_est A_est O_est N_est C_CI  E_CI  A_CI  O_CI  N_CI 
   <chr>     <chr>   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 ChldBrth  Interc… <str… 0.005 0.09  <str… -0.04 <str… [-0.… [-0.… <str… [-0.…
 2 ChldBrth  Age     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
 3 ChldBrth  Slope   0.21  -0.05 -1.19 0.006 0.07  [-0.… [-0.… [-3.… [-0.… [-0.…
 4 ChldMvOut Interc… <str… 0.02  0.05  0.009 0.01  <str… [-0.… [-0.… [-0.… [-0.…
 5 ChldMvOut Age     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
 6 ChldMvOut Slope   <str… <str… -1.62 <str… 0.09  <str… <str… [-4.… <str… [-0.…
 7 DadDied   Interc… 0.005 -0.02 -0.07 -0.04 0.04  [-0.… [-0.… [-0.… [-0.… [-0.…
 8 DadDied   Age     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
 9 DadDied   Slope   -0.43 <str… -0.92 <str… 0.01  [-0.… <str… [-2.… <str… [-0.…
10 Divorce   Interc… <str… 0.05  -0.07 0.03  <str… <str… [-0.… [-0.… [-0.… <str…
# ℹ 20 more rows

Formatting Results

  1. Change the order of the columns
ord <- paste(rep(c("E", "A", "C", "N", "O"), each = 2), rep(c("est", "CI"), times = 5), sep = "_")
gsoep_tab2 <- gsoep_nested3 %>%
  select(-data, -m) %>%
  unnest(summary) %>%
  format_fun() %>%
  select(-ci.lower, -ci.upper, -sig, -pvalue) %>%
  pivot_wider(
    names_from = "trait"
    , names_glue = "{trait}_{.value}"
    , values_from = c("est", "CI")
  ) %>%
  select(le, term, all_of(ord))
gsoep_tab2
# A tibble: 30 × 12
   le        term    E_est E_CI  A_est A_CI  C_est C_CI  N_est N_CI  O_est O_CI 
   <chr>     <chr>   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 ChldBrth  Interc… 0.005 [-0.… 0.09  [-0.… <str… <str… -0.04 [-0.… <str… <str…
 2 ChldBrth  Age     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
 3 ChldBrth  Slope   -0.05 [-0.… -1.19 [-3.… 0.21  [-0.… 0.07  [-0.… 0.006 [-0.…
 4 ChldMvOut Interc… 0.02  [-0.… 0.05  [-0.… <str… <str… 0.01  [-0.… 0.009 [-0.…
 5 ChldMvOut Age     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
 6 ChldMvOut Slope   <str… <str… -1.62 [-4.… <str… <str… 0.09  [-0.… <str… <str…
 7 DadDied   Interc… -0.02 [-0.… -0.07 [-0.… 0.005 [-0.… 0.04  [-0.… -0.04 [-0.…
 8 DadDied   Age     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
 9 DadDied   Slope   <str… <str… -0.92 [-2.… -0.43 [-0.… 0.01  [-0.… <str… <str…
10 Divorce   Interc… 0.05  [-0.… -0.07 [-0.… <str… <str… <str… <str… 0.03  [-0.…
# ℹ 20 more rows

Formatting Results

  1. Factor the terms to help us arrange the rows
ord <- paste(rep(c("E", "A", "C", "N", "O"), each = 2), rep(c("est", "CI"), times = 5), sep = "_")
gsoep_tab2 <- gsoep_nested3 %>%
  select(-data, -m) %>%
  unnest(summary) %>%
  format_fun() %>%
  select(-ci.lower, -ci.upper, -sig, -pvalue) %>%
  pivot_wider(
    names_from = "trait"
    , names_glue = "{trait}_{.value}"
    , values_from = c("est", "CI")
  ) %>%
  select(le, term, all_of(ord)) %>%
  # select(starts_with("E"), starts_with("A"), starts_with("C"), starts_with("N"), starts_with("O"))
  mutate(term = factor(term, terms$new)) 
gsoep_tab2
# A tibble: 30 × 12
   le        term    E_est E_CI  A_est A_CI  C_est C_CI  N_est N_CI  O_est O_CI 
   <chr>     <fct>   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 ChldBrth  Interc… 0.005 [-0.… 0.09  [-0.… <str… <str… -0.04 [-0.… <str… <str…
 2 ChldBrth  Age     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
 3 ChldBrth  Slope   -0.05 [-0.… -1.19 [-3.… 0.21  [-0.… 0.07  [-0.… 0.006 [-0.…
 4 ChldMvOut Interc… 0.02  [-0.… 0.05  [-0.… <str… <str… 0.01  [-0.… 0.009 [-0.…
 5 ChldMvOut Age     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
 6 ChldMvOut Slope   <str… <str… -1.62 [-4.… <str… <str… 0.09  [-0.… <str… <str…
 7 DadDied   Interc… -0.02 [-0.… -0.07 [-0.… 0.005 [-0.… 0.04  [-0.… -0.04 [-0.…
 8 DadDied   Age     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
 9 DadDied   Slope   <str… <str… -0.92 [-2.… -0.43 [-0.… 0.01  [-0.… <str… <str…
10 Divorce   Interc… 0.05  [-0.… -0.07 [-0.… <str… <str… <str… <str… 0.03  [-0.…
# ℹ 20 more rows

Formatting Results

  1. Sort the rows by Life Event and then term
ord <- paste(rep(c("E", "A", "C", "N", "O"), each = 2), rep(c("est", "CI"), times = 5), sep = "_")
gsoep_tab2 <- gsoep_nested3 %>%
  select(-data, -m) %>%
  unnest(summary) %>%
  format_fun() %>%
  select(-ci.lower, -ci.upper, -sig, -pvalue) %>%
  pivot_wider(
    names_from = "trait"
    , names_glue = "{trait}_{.value}"
    , values_from = c("est", "CI")
  ) %>%
  select(le, term, all_of(ord)) %>%
  # select(starts_with("E"), starts_with("A"), starts_with("C"), starts_with("N"), starts_with("O"))
  mutate(term = factor(term, terms$new)) %>%
  arrange(le, term)
gsoep_tab2
# A tibble: 30 × 12
   le        term    E_est E_CI  A_est A_CI  C_est C_CI  N_est N_CI  O_est O_CI 
   <chr>     <fct>   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 ChldBrth  Interc… 0.005 [-0.… 0.09  [-0.… <str… <str… -0.04 [-0.… <str… <str…
 2 ChldBrth  Slope   -0.05 [-0.… -1.19 [-3.… 0.21  [-0.… 0.07  [-0.… 0.006 [-0.…
 3 ChldBrth  Age     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
 4 ChldMvOut Interc… 0.02  [-0.… 0.05  [-0.… <str… <str… 0.01  [-0.… 0.009 [-0.…
 5 ChldMvOut Slope   <str… <str… -1.62 [-4.… <str… <str… 0.09  [-0.… <str… <str…
 6 ChldMvOut Age     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
 7 DadDied   Interc… -0.02 [-0.… -0.07 [-0.… 0.005 [-0.… 0.04  [-0.… -0.04 [-0.…
 8 DadDied   Slope   <str… <str… -0.92 [-2.… -0.43 [-0.… 0.01  [-0.… <str… <str…
 9 DadDied   Age     <str… <str… <str… <str… <str… <str… <str… <str… <str… <str…
10 Divorce   Interc… 0.05  [-0.… -0.07 [-0.… <str… <str… <str… <str… 0.03  [-0.…
# ℹ 20 more rows

Kabling the Table

Now we’re ready to kable the table! This looks identical to before, but I’m going to show you one more trick using kableExtra::group_rows. Rather than having to count the rows and adding a bunch of manual calls, I always just use the data frame to count for me to avoid errors. Then, I can create the table and use a for loop to tack on all the grouped rows using that reference data frame. Much more flexible and less error prone!

hdr <- c(1, rep(2,5))
names(hdr) <- c(" ", "Extraversion", "Agreeableness", "Conscientiousness", "Neuroticism", "Openness")
rs <- gsoep_tab2 %>% group_by(le) %>% tally() %>% 
    mutate(end = cumsum(n), start = lag(end) + 1, start = ifelse(is.na(start), 1, start))
gsoep_tab2_kable <- gsoep_tab2 %>%
  select(-le) %>%
  kable(.
        , escape = F
        , align = c("r", rep("c", 10))
        , col.names = c("Term", rep(c("<em>est.</em>", "CI"), times = 5))
        , caption = "<strong>Table 3</strong><br><em>Big Five Personality Trait Trajectory Associations with Life Events from Latent Growth Models</em>"
        ) %>%
  kable_classic(full_width = F, html_font = "Times") %>%
  add_header_above(hdr) %>%
  add_footnote(label = "Bold values indicate terms p < .05. ", notation = "none")

for(i in 1:nrow(rs)){
    gsoep_tab2_kable <- gsoep_tab2_kable %>% kableExtra::group_rows(rs$le[i], rs$start[i], rs$end[i])
}

Kabling the Table

Table 3
Big Five Personality Trait Trajectory Associations with Life Events from Latent Growth Models
Extraversion
Agreeableness
Conscientiousness
Neuroticism
Openness
Term est. CI est. CI est. CI est. CI est. CI
ChldBrth
Intercept 0.005 [-0.04,0.05] 0.09 [-0.02,0.20] 0.12 [0.05,0.18] -0.04 [-0.09,0.02] -0.09 [-0.14,-0.04]
Slope -0.05 [-0.44,0.35] -1.19 [-3.39,1.00] 0.21 [-0.31,0.72] 0.07 [-0.38,0.53] 0.006 [-0.41,0.42]
Age -0.03 [-0.04,-0.03] -0.03 [-0.04,-0.03] -0.03 [-0.04,-0.03] -0.03 [-0.04,-0.03] -0.03 [-0.04,-0.03]
ChldMvOut
Intercept 0.02 [-0.02,0.06] 0.05 [-0.06,0.16] 0.15 [0.09,0.21] 0.01 [-0.03,0.06] 0.009 [-0.04,0.05]
Slope -0.51 [-0.87,-0.15] -1.62 [-4.11,0.86] -0.87 [-1.37,-0.37] 0.09 [-0.26,0.44] -0.65 [-1.07,-0.23]
Age 0.005 [0.003,0.008] 0.005 [0.003,0.008] 0.005 [0.003,0.008] 0.005 [0.003,0.008] 0.005 [0.003,0.008]
DadDied
Intercept -0.02 [-0.07,0.03] -0.07 [-0.16,0.03] 0.005 [-0.06,0.07] 0.04 [-0.02,0.10] -0.04 [-0.09,0.02]
Slope -0.57 [-1.00,-0.15] -0.92 [-2.73,0.88] -0.43 [-0.90,0.04] 0.01 [-0.40,0.43] -0.66 [-1.13,-0.20]
Age -0.004 [-0.007,-0.001] -0.004 [-0.007,-0.001] -0.004 [-0.007,-0.001] -0.004 [-0.007,-0.001] -0.004 [-0.007,-0.001]
Divorce
Intercept 0.05 [-0.01,0.12] -0.07 [-0.20,0.05] 0.12 [0.03,0.21] 0.10 [0.02,0.19] 0.03 [-0.03,0.10]
Slope -0.35 [-0.84,0.15] 1.01 [-1.14,3.17] 0.35 [-0.35,1.04] -0.67 [-1.38,0.04] -0.15 [-0.66,0.36]
Age -0.007 [-0.01,-0.003] -0.007 [-0.01,-0.003] -0.007 [-0.01,-0.003] -0.007 [-0.01,-0.003] -0.007 [-0.01,-0.003]
Married
Intercept 0.05 [0.000,0.09] 0.03 [-0.08,0.14] 0.10 [0.04,0.17] -0.002 [-0.06,0.06] 0.04 [-0.01,0.08]
Slope -0.06 [-0.46,0.34] -1.26 [-3.47,0.95] 0.21 [-0.30,0.72] 0.05 [-0.39,0.49] -0.003 [-0.40,0.40]
Age -0.03 [-0.03,-0.03] -0.03 [-0.03,-0.03] -0.03 [-0.03,-0.03] -0.03 [-0.03,-0.03] -0.03 [-0.03,-0.03]
MomDied
Intercept -0.02 [-0.07,0.02] 0.05 [-0.05,0.16] 0.08 [0.01,0.15] 0.03 [-0.02,0.09] 0.05 [-0.005,0.09]
Slope -0.35 [-0.73,0.03] -1.07 [-3.04,0.89] -0.82 [-1.37,-0.26] -0.06 [-0.47,0.35] -0.02 [-0.38,0.34]
Age 0.007 [0.005,0.010] 0.007 [0.005,0.010] 0.007 [0.005,0.010] 0.007 [0.005,0.010] 0.007 [0.005,0.010]
MoveIn
Intercept 0.07 [0.02,0.12] 0.02 [-0.10,0.13] 0.06 [-0.01,0.13] 0.08 [0.02,0.15] 0.05 [-0.003,0.09]
Slope -0.61 [-1.12,-0.09] -1.29 [-3.65,1.07] 0.33 [-0.22,0.88] -0.28 [-0.79,0.23] -0.22 [-0.67,0.22]
Age -0.03 [-0.03,-0.03] -0.03 [-0.03,-0.03] -0.03 [-0.03,-0.03] -0.03 [-0.03,-0.03] -0.03 [-0.03,-0.03]
NewPart
Intercept 0.17 [0.10,0.24] -0.09 [-0.22,0.03] -0.13 [-0.22,-0.04] -0.03 [-0.11,0.05] 0.13 [0.06,0.20]
Slope 0.29 [-0.22,0.79] 1.21 [-0.23,2.65] 0.71 [0.24,1.18] -0.14 [-0.71,0.44] 0.10 [-0.47,0.67]
Age -0.04 [-0.04,-0.03] -0.04 [-0.04,-0.03] -0.04 [-0.04,-0.03] -0.04 [-0.04,-0.03] -0.04 [-0.04,-0.03]
PartDied
Intercept 0.04 [-0.02,0.10] 0.10 [-0.11,0.30] 0.02 [-0.07,0.10] 0.10 [0.02,0.18] -0.09 [-0.15,-0.02]
Slope 0.48 [-0.04,0.99] 3.32 [-1.64,8.27] -0.29 [-0.98,0.40] -0.23 [-0.79,0.33] -0.11 [-0.61,0.38]
Age 0.03 [0.03,0.03] 0.03 [0.03,0.03] 0.03 [0.03,0.03] 0.03 [0.03,0.03] 0.03 [0.03,0.03]
SepPart
Intercept 0.09 [0.04,0.14] -0.11 [-0.24,0.03] 0.04 [-0.02,0.11] 0.09 [0.02,0.16] 0.07 [0.02,0.12]
Slope -0.14 [-0.52,0.24] 1.92 [-1.05,4.90] 0.64 [0.07,1.21] -0.77 [-1.38,-0.15] 0.010 [-0.38,0.40]
Age -0.02 [-0.02,-0.02] -0.02 [-0.02,-0.02] -0.02 [-0.02,-0.02] -0.02 [-0.02,-0.02] -0.02 [-0.02,-0.02]
Bold values indicate terms p < .05.

Plots

The same functional, iterative approach also applies to creating plots! I teach a whole course on data visualization, so I’m not going to spend a ton of time on this. But I’ll show you an example for basic R models and for lavaan.

Base R GLM

Set up the data frame

First, we’ll use some of what we learned in the purrr workshop to set ourselves up to be able to create these tables easily, using group_by() and nest() to create nested data frames for our target personality + outcome combinations. To do this, we’ll also use what you learned about filter() and mutate().

outcomes <- tribble(
  ~old, ~new,
  "chldbrth", "Child Birth"
  , "divorced", "Divorced"
  , "married", "Married"
  , "mvInPrtner", "Move in with Partner"
)

gsoep_nested4 <- gsoep %>%
  mutate(age = age - 45) %>%
  group_by(Trait, Outcome) %>%
  nest() %>%
  ungroup()
# A tibble: 20 × 3
   Outcome   Trait  data                  
   <chr>     <chr>  <list>                
 1 chldbrth  OP     <tibble [9,192 × 19]> 
 2 divorced  OP     <tibble [10,140 × 19]>
 3 married   OP     <tibble [9,398 × 19]> 
 4 mvInPrtnr OP     <tibble [9,262 × 19]> 
 5 chldbrth  DEP    <tibble [28,634 × 19]>
 6 chldbrth  NegAff <tibble [16,992 × 19]>
 7 chldbrth  PA     <tibble [16,969 × 19]>
 8 chldbrth  SE     <tibble [7,934 × 19]> 
 9 divorced  DEP    <tibble [31,416 × 19]>
10 divorced  NegAff <tibble [19,053 × 19]>
11 divorced  PA     <tibble [19,027 × 19]>
12 divorced  SE     <tibble [8,930 × 19]> 
13 married   DEP    <tibble [29,203 × 19]>
14 married   NegAff <tibble [17,316 × 19]>
15 married   PA     <tibble [17,295 × 19]>
16 married   SE     <tibble [8,061 × 19]> 
17 mvInPrtnr DEP    <tibble [28,891 × 19]>
18 mvInPrtnr NegAff <tibble [17,016 × 19]>
19 mvInPrtnr PA     <tibble [16,990 × 19]>
20 mvInPrtnr SE     <tibble [7,932 × 19]> 

Run Models

Now, we’ll run the models predicting outcomes from personality x age interactions (and their lower order terms)

mod2_fun <- function(d){
  d$o_value <- factor(d$o_value)
  glm(o_value ~ p_value*age, data = d, family = binomial(link = "logit"))
}

gsoep_nested4 <- gsoep_nested4 %>%
  mutate(m = map(data, mod2_fun))
gsoep_nested4
# A tibble: 20 × 4
   Outcome   Trait  data                   m     
   <chr>     <chr>  <list>                 <list>
 1 chldbrth  OP     <tibble [9,192 × 19]>  <glm> 
 2 divorced  OP     <tibble [10,140 × 19]> <glm> 
 3 married   OP     <tibble [9,398 × 19]>  <glm> 
 4 mvInPrtnr OP     <tibble [9,262 × 19]>  <glm> 
 5 chldbrth  DEP    <tibble [28,634 × 19]> <glm> 
 6 chldbrth  NegAff <tibble [16,992 × 19]> <glm> 
 7 chldbrth  PA     <tibble [16,969 × 19]> <glm> 
 8 chldbrth  SE     <tibble [7,934 × 19]>  <glm> 
 9 divorced  DEP    <tibble [31,416 × 19]> <glm> 
10 divorced  NegAff <tibble [19,053 × 19]> <glm> 
11 divorced  PA     <tibble [19,027 × 19]> <glm> 
12 divorced  SE     <tibble [8,930 × 19]>  <glm> 
13 married   DEP    <tibble [29,203 × 19]> <glm> 
14 married   NegAff <tibble [17,316 × 19]> <glm> 
15 married   PA     <tibble [17,295 × 19]> <glm> 
16 married   SE     <tibble [8,061 × 19]>  <glm> 
17 mvInPrtnr DEP    <tibble [28,891 × 19]> <glm> 
18 mvInPrtnr NegAff <tibble [17,016 × 19]> <glm> 
19 mvInPrtnr PA     <tibble [16,990 × 19]> <glm> 
20 mvInPrtnr SE     <tibble [7,932 × 19]>  <glm> 

Generating Model Predictions

The next step is to get predictions from the model. It is also good practice to always get standard errors and/or confidence intervals of the estimates. Thankfully, with lm() and glm(), this is easy. Lavaan will be covered briefly and more complex model forms are covered in my data visualization class.

glm_pred_fun <- function(m){
  rng_p <- range(m$model$p_value, na.rm = T)
  frame <- crossing(
    age = c(-15, 0, 15)
    , p_value = seq(rng_p[1], rng_p[2], length.out = 30)
  )
  
  pred <- predict(m, newdata = frame, se.fit = T)[1:2]
  frame <- frame %>% 
    mutate(fit = pred$fit
           , se = pred$se.fit
           , lower = fit - 1.96*se
           , upper = fit + 1.96*se
           ) %>%
    mutate_at(vars(fit, lower, upper), exp)
}

Generating Model Predictions

Now we run the predictions

gsoep_nested4 <- gsoep_nested4 %>%
  mutate(pred = map(m, glm_pred_fun))
gsoep_nested4
# A tibble: 20 × 5
   Outcome   Trait  data                   m      pred             
   <chr>     <chr>  <list>                 <list> <list>           
 1 chldbrth  OP     <tibble [9,192 × 19]>  <glm>  <tibble [90 × 6]>
 2 divorced  OP     <tibble [10,140 × 19]> <glm>  <tibble [90 × 6]>
 3 married   OP     <tibble [9,398 × 19]>  <glm>  <tibble [90 × 6]>
 4 mvInPrtnr OP     <tibble [9,262 × 19]>  <glm>  <tibble [90 × 6]>
 5 chldbrth  DEP    <tibble [28,634 × 19]> <glm>  <tibble [90 × 6]>
 6 chldbrth  NegAff <tibble [16,992 × 19]> <glm>  <tibble [90 × 6]>
 7 chldbrth  PA     <tibble [16,969 × 19]> <glm>  <tibble [90 × 6]>
 8 chldbrth  SE     <tibble [7,934 × 19]>  <glm>  <tibble [90 × 6]>
 9 divorced  DEP    <tibble [31,416 × 19]> <glm>  <tibble [90 × 6]>
10 divorced  NegAff <tibble [19,053 × 19]> <glm>  <tibble [90 × 6]>
11 divorced  PA     <tibble [19,027 × 19]> <glm>  <tibble [90 × 6]>
12 divorced  SE     <tibble [8,930 × 19]>  <glm>  <tibble [90 × 6]>
13 married   DEP    <tibble [29,203 × 19]> <glm>  <tibble [90 × 6]>
14 married   NegAff <tibble [17,316 × 19]> <glm>  <tibble [90 × 6]>
15 married   PA     <tibble [17,295 × 19]> <glm>  <tibble [90 × 6]>
16 married   SE     <tibble [8,061 × 19]>  <glm>  <tibble [90 × 6]>
17 mvInPrtnr DEP    <tibble [28,891 × 19]> <glm>  <tibble [90 × 6]>
18 mvInPrtnr NegAff <tibble [17,016 × 19]> <glm>  <tibble [90 × 6]>
19 mvInPrtnr PA     <tibble [16,990 × 19]> <glm>  <tibble [90 × 6]>
20 mvInPrtnr SE     <tibble [7,932 × 19]>  <glm>  <tibble [90 × 6]>

Plot the Predictions

One of the greatest strengths of list-columns and using purrr is what I think of as the data accordion. With list-columns, I can unnest() whatever level the models were estimated at and then reaggregate using whatever combinations / groups make the most sense.

So here, for example, we ran models for all Big Five trait (5) x life outcome (10) combinations. But having 50 separate plots would be a little silly, so we’d much rather create one for each outcome. But I’m not going to write that code 10 times because that’d be a waste. So instead if I reaggregate the data, I can then use a function + purrr to generate each of the plots.

Plot the Predictions

gsoep_plot4 <- gsoep_nested4 %>%
  select(-data, -m) %>%
  unnest(pred) %>%
  group_by(Outcome) %>%
  nest() %>%
  ungroup()
gsoep_plot4
# A tibble: 4 × 2
  Outcome   data              
  <chr>     <list>            
1 chldbrth  <tibble [450 × 7]>
2 divorced  <tibble [450 × 7]>
3 married   <tibble [450 × 7]>
4 mvInPrtnr <tibble [450 × 7]>

Plot the Predictions

When we write code for plots in ggplot, we have a lot of things that we end up writing over and over, especially for theme elements. To get around this, I use this little function to modify all of my theme elements in a single line.

my_theme <- function(){
  theme_classic() + 
  theme(
    legend.position = "bottom"
    , legend.title = element_text(face = "bold", size = rel(1))
    , legend.text = element_text(face = "italic", size = rel(1))
    , axis.text = element_text(face = "bold", size = rel(1.1), color = "black")
    , axis.title = element_text(face = "bold", size = rel(1.2))
    , panel.grid.major = element_line(color = "grey90", linewidth = .2)
    , plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)
    , plot.subtitle = element_text(face = "italic", size = rel(1.2), hjust = .5)
    , strip.text = element_text(face = "bold", size = rel(1.1), color = "white")
    , strip.background = element_rect(fill = "black")
    )
}

Plot the Predictions

Like I mentioned, once we reaggregate the data, we can then write a function to generate the plot for each outcome separately. Let’s start by building the basic code for the plot. We need it to:

  1. Have x = personality, y = predicted values, separate lines for moderator levels, and separate panels for different traits
  2. A line indicating personality-outcome associations (again separately across age groups) and a ribbon with the confidence interval around the prediction
plot_fun <- function(d){
  d %>%
    mutate(age_fac = factor(age, c(-15, 0, 15), c("30", "45", "60"))) %>%
    ggplot(aes(x = p_value, y = fit, color = age_fac)) +
    geom_ribbon(
      aes(ymin = lower, ymax = upper, fill = age_fac)
      , alpha = .4
    ) + 
    geom_line() + 
    facet_wrap(~Trait, scales = "free_x") + 
    my_theme()
}

Plot the Predictions

With our reaggregated data, we can easily just run the plots and view the results. This is getting there, but let’s make some additional modifications to get them publication ready.

gsoep_plot4 <- gsoep_plot4 %>%
  mutate(p = map(data, plot_fun))
gsoep_plot4
# A tibble: 4 × 3
  Outcome   data               p     
  <chr>     <list>             <list>
1 chldbrth  <tibble [450 × 7]> <gg>  
2 divorced  <tibble [450 × 7]> <gg>  
3 married   <tibble [450 × 7]> <gg>  
4 mvInPrtnr <tibble [450 × 7]> <gg>  
gsoep_plot4$p[[1]]

Plot the Predictions

Let’s add:

  1. full personality trait names
  2. the outcome in the title
  3. better axis and other labels
plot_fun <- function(d, outcome){
  out <- mapvalues(outcome, outcomes$old, outcomes$new, warn_missing = F)
  d %>%
    mutate(age_fac = factor(age, c(-15, 0, 15), c("30", "45", "60"))
           , Trait = factor(Trait, p_names$old, p_names$new)) %>%
    ggplot(aes(x = p_value, y = fit, color = age_fac)) +
    geom_ribbon(
      aes(ymin = lower, ymax = upper, fill = age_fac)
      , alpha = .4
    ) + 
    geom_line() + 
    labs(
      x = "Personality Trait Level"
      , y = "Predicted Value (CI)"
      , color = "Age"
      , fill = "Age"
      , title = sprintf("%s: Personality x Age Interaction", out)
    ) + 
    facet_wrap(~Trait, scales = "free_x") + 
    my_theme() + 
    theme(
      legend.position = c(.8, .25)
      
    )
  # ggsave(filename = sprintf("results/figures/%s.png", outcome), width = 6, height = 6)
  # ggsave(filename = sprintf("results/figures/%s.pdf", outcome), width = 6, height = 6)
}

Plot the Predictions

Now, we can run these and get the results.

gsoep_plot4 <- gsoep_plot4 %>%
  mutate(p = map2(data, Outcome, plot_fun))
gsoep_plot4
# A tibble: 4 × 3
  Outcome   data               p     
  <chr>     <list>             <list>
1 chldbrth  <tibble [450 × 7]> <gg>  
2 divorced  <tibble [450 × 7]> <gg>  
3 married   <tibble [450 × 7]> <gg>  
4 mvInPrtnr <tibble [450 × 7]> <gg>  
gsoep_plot4$p[[1]]

Lavaan Trajectories

Lavaan is trickier. Because of the way residual variance, etc. is calculted in SEM, it’s much less straightforward to estimate standard errors and confidence intervals of predictions (not to mention that lavaan just doesn’t make predictions easy). So instead, I just use some basic matrix algebra and bootstrapping to get these. If you aren’t comfortable with basic matrix algebra, you can just write a little function that uses terms individually.

lavaan_pred_fun <- function(m){
  coef <- coef(gsoep_nested3$m[[1]])[c("i~1", "s~1")]
  b <- bootstrapLavaan(m, parallel = "multicore", R = 100)
  frame <- tibble(intercept = 1, wave = seq(-1,1, length.out = 50))
  pred_boot <- function(x,y) bind_cols(frame, pred = as.vector(as.matrix(frame) %*% c(x,y)))
  frame <- frame %>%
    mutate(pred = as.vector(as.matrix(frame) %*% coef)) %>%
    left_join(
      tibble(sample = 1:100, pred = map2(b[,"i~1"], b[,"s~1"], pred_boot)) %>%
        unnest(pred) %>%
        group_by(wave) %>%
        summarize(lower = quantile(pred, probs = .025)
                  , upper = quantile(pred, probs = .975))
    )
}

Lavaan Trajectories

gsoep_nested2 <- gsoep_nested2 %>%
  mutate(pred = map(m, lavaan_pred_fun))
gsoep_nested2
# saveRDS(gsoep_nested2, file = "gsoep_nested2.RDS")
# A tibble: 5 × 7
  category trait data                   m        summary pred     all_term_tab  
  <chr>    <chr> <list>                 <list>   <list>  <list>   <list>        
1 Big5     C     <tibble [16,716 × 13]> <lavaan> <df>    <tibble> <kablExtr [1]>
2 Big5     E     <tibble [16,714 × 13]> <lavaan> <df>    <tibble> <kablExtr [1]>
3 Big5     A     <tibble [16,718 × 13]> <lavaan> <df>    <tibble> <kablExtr [1]>
4 Big5     O     <tibble [16,708 × 13]> <lavaan> <df>    <tibble> <kablExtr [1]>
5 Big5     N     <tibble [16,716 × 13]> <lavaan> <df>    <tibble> <kablExtr [1]>

Lavaan Trajectories

Once we’ve estimated the predictions, we can graph them just like we did for the previous example. (Again, a full discussion of plotting is beyond the scope of this course and is covered for a whole quarter in my data visualization class instead.)

gsoep_nested2 %>%
  select(-data, -m, -summary) %>%
  unnest(pred) %>%
  ggplot(aes(x = wave + 1, y = pred)) + 
    geom_ribbon(
      aes(ymin = lower, ymax = upper, fill = trait)
      , alpha = .4
    ) + 
    geom_line() + 
    scale_y_continuous(limits = c(1,7), breaks = seq(1,7,1)) + 
    scale_x_continuous(limits = c(-.25,2.25), breaks = 0:2, labels = c(2005, 2009, 2013)) + 
    labs(
        x = "Personality Trait Level"
        , y = "Predicted Value (CI)"
        , title = sprintf("Personality Trajectories")
      ) + 
    facet_wrap(~trait, scales = "free_x") + 
    my_theme() + 
    theme(legend.position = "none")

Lavaan Trajectories