Author

Emorie D Beck

Week 8 - Functional Tables and Figures

Loading required package: Matrix
This is lavaan 0.6-15
lavaan is FREE software! Please report any bugs.

Attaching package: 'lavaan'
The following object is masked from 'package:psych':

    cor2cov

Attaching package: 'lubridate'
The following objects are masked from 'package:base':

    date, intersect, setdiff, union
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.2     ✔ purrr   1.0.2
✔ tibble  3.2.1     ✔ dplyr   1.1.3
✔ tidyr   1.2.1     ✔ stringr 1.5.0
✔ readr   2.1.2     ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ ggplot2::%+%()           masks psych::%+%()
✖ ggplot2::alpha()         masks psych::alpha()
✖ dplyr::arrange()         masks plyr::arrange()
✖ lubridate::as.difftime() masks base::as.difftime()
✖ purrr::compact()         masks plyr::compact()
✖ dplyr::count()           masks plyr::count()
✖ lubridate::date()        masks base::date()
✖ dplyr::desc()            masks plyr::desc()
✖ tidyr::expand()          masks Matrix::expand()
✖ dplyr::failwith()        masks plyr::failwith()
✖ dplyr::filter()          masks stats::filter()
✖ dplyr::group_rows()      masks kableExtra::group_rows()
✖ dplyr::id()              masks plyr::id()
✖ lubridate::intersect()   masks base::intersect()
✖ dplyr::lag()             masks stats::lag()
✖ dplyr::mutate()          masks plyr::mutate()
✖ tidyr::pack()            masks Matrix::pack()
✖ dplyr::rename()          masks plyr::rename()
✖ lubridate::setdiff()     masks base::setdiff()
✖ dplyr::summarise()       masks plyr::summarise()
✖ dplyr::summarize()       masks plyr::summarize()
✖ lubridate::union()       masks base::union()
✖ tidyr::unpack()          masks Matrix::unpack()

Outline

  1. Questions on Homework
  2. Tables
  3. Figures
  4. (GitHub)

APA Tables

In psychology, we must work within the confines of APA style. Although these guidelines have been updated, the style guide remains quite similar to earlier guidelines with respect to tables.

But psychology research is heterogeneous and expectations for modern tables require combining multiple models in creative ways.

Small tweaks to data or model arguments can spell disaster for creating a table. It’s easy to make mistakes in copying values or matching different models to their respective rows and columns.

Thankfully, the increasing popularity of R has been coupled with more methods for creating a reproducible workflow that includes tables.

Tables in R

In this workshop, we will directly cover 3 different use cases, while a few others will be included in supplementary materials.

Personally, I favor non-automated tools, so we will cover the following packages:
- kable + kableExtra (html and LaTeX)
- papaja

Using these packages will build on earlier tutorials using tidyr, dplyr, workflow, and purrr and round out our discuss on data presentation using ggplot2.

For less flexible but more accessible tables see:
- apaTable
- sjPlot
- corrplot

Important Tools

Although it doesn’t cover all models, the broom and broom.mixed family of packages will provide easy to work with estimates of nearly all types of models and will also provide the model terms that are ideal for most APA tables, including estimates, standard errors, and confidence intervals.

lavaan models are slightly more complicated, but it’s actually relatively easy to deal with them (and how to extract their terms), assuming that you understand the models you are running.

Data

The data we’re going to use are from the teaching sample from the German Socioeconomic Panel Study. These data have been pre-cleaned (see earlier workshop on workflow and creating guidelines for tips).

The data we’ll use fall into three categories:
1. Personality trait composites: Negative Affect, Positive Affect, Self-Esteem, CESD Depression, and Optimism. These were cleaned, reversed coded, and composited prior to being included in this final data set.
2. Outcomes: Moving in with a partner, marriage, divorce, and child birth. These were cleaned, coded as 1 (occurred) or 0 (did not occur) according to whether an outcome occurred for each individual or not after each possible measured personality year. Moreover, people who experienced these outcomes prior to the target personality year are excluded.
3. Covariates: Age, gender (0 = male, 1 = female, education (0 = high school or below, 1 = college, 2 = higher than college), gross wages, self-rated health, smoking (0 = never smoked 1 = ever smoked), exercise, BMI, religion, parental education, and parental occupational prestige (ISEI). Each of these were composited for all available data up to the measured personality years.

Code
(gsoep <- read_csv("week8-data.csv"))

Basic: One DV/Outcome, Multiple Model Terms

We’ll start with a basic case, predicting who has a child from personality, both with and without control variables.

Becauce outcome variables are binary, we’ll use logistic regression.

The basic form of the model is: \(\log\Big(\frac{p_i}{1-p_i}\Big) = b_0 + b_1X_1 + b_2X_2 ... b_pXp\)

In other words, we’re predicting the log odds of having a child from a linear combination of predictor variables.

Set up the data frame

First, we’ll use some of what we learned in the purrr workshop to set ourselves up to be able to create these tables easily, using group_by() and nest() to create nested data frames for our target personality + outcome combinations. To do this, we’ll also use what you learned about filter() and mutate().

Code
gsoep_nested1 <- gsoep %>%
  filter(Outcome == "chldbrth") %>%
  group_by(Trait, Outcome) %>%
  nest() %>%
  ungroup()

Next, let’s pause and see what we have. We now have a data frame with 3 columns (Outcome, Trait, and data) and 4 rows. The data column is of class list, meaning it’s a “list column” that contains a tibble in each cell. This means that we can use purrr functions to run operations on each of these data frames individually but without having to copy and paste the same operation multiple times for each model we want to run.

Code
gsoep_nested1

Run Models

To run the models, I like to write short functions that are easier to read than including a local function within a call to purrr::map(). Here, we’re just going to write a simple function to predict child birth from personality.

Code
mod1_fun <- function(d){
  d$o_value <- factor(d$o_value)
  glm(o_value ~ p_value, data = d, family = binomial(link = "logit"))
}

gsoep_nested1 <- gsoep_nested1 %>%
  mutate(m = map(data, mod1_fun))

Now, when we look at the nested frame, we see an additional column, which is also a list, but this column contains <glm> objects rather than tibbles.

Code
gsoep_nested1

Get Key Terms

Now that we have the models, we want to get our key terms. I’m a big fan of using the function tidy from the broom package to do this. Bonus because it plays nicely with purrr. Double bonus because it will give us confidence intervals, which I generally prefer over p-values and standard erorrs because I find them more informative.

Code
gsoep_nested1 <- gsoep_nested1 %>%
  mutate(tidy = map(m, ~tidy(., conf.int = T)))
gsoep_nested1

Note that what I’ve used here is a local function, meaning that I’ve used the notation ~function(., arguments). The tilda tells R we want a local function, and the . tells R to use the mapped m column as the function input.

Now we have a fifth column, which is a list column called tidy that contains a tibble, just like the data column.

Creating a Table

Now we are finally ready to create a table! I’m going to use kable + kableExtra to do this in steps.

First, we’ll unnest the tidy column from our data frame. Before doing so, we will drop the data and m columns because they’ve done their work for now.

Code
tidy1 <- gsoep_nested1 %>%
  select(-data, -m) %>%
  unnest(tidy)
tidy1

As you can see, we now have lots of information about our model terms, which are already nicely indexed by Outcome and Trait combinations.

But before we’re ready to create a table, we have to make a few considerations:

  • What is our target term? In this case “p_value” which is the change in log odds associated with a 1 unit increase/decrease in p_value.
  • How will we denote significance? In this case, we’ll use confidence intervals whose signs match. We’ll then bold these terms for our table.
  • What is the desired final structure for the table? I’d like columns for Trait, estimate (b), and confidence intervals (CI) formatted to two decimal places and bolded if significant. I’d also like a span header denoting that the outcome measure is child birth.

What is our target term?

In this case “p_value” which is the change in log odds associated with a 1 unit increase/decrease in p_value.

Code
tidy1 <- tidy1 %>% filter(term == "p_value")
tidy1

How will we denote significance?

In this case, we’ll use confidence intervals whose signs match. We’ll then bold these terms for our table.

Code
tidy1 <- tidy1 %>% mutate(sig = ifelse(sign(conf.low) == sign(conf.high), "sig", "ns"))
tidy1

What is the desired final structure for the table?

I’d like columns for Trait, estimate (b), and confidence intervals (CI) formatted to two decimal places and bolded if significant. I’d also like a span header denoting that the outcome measure is child birth.

Before we do this, though, we need to convert our log odds to odds ratios, using the exp() function.

Code
tidy1 <- tidy1 %>%
  mutate_at(vars(estimate, conf.low, conf.high), exp) 
tidy1

Now, we can format them.

Code
tidy1 <- tidy1 %>%
  mutate_at(vars(estimate, conf.low, conf.high), ~sprintf("%.2f", .)) 
tidy1

sprintf() is my favorite base R formatting function. “%.2f” means I’m asking it to take a floating point number and include 2 digits after the “.” and 0 before. We can now see that the estimate, conf.low, and conf.high columns are of class <chr> instead of <dbl>.

But now we need to create our confidence intervals.

Code
tidy1 <- tidy1 %>%
  mutate(CI = sprintf("[%s, %s]", conf.low, conf.high))
tidy1

And bold the significant confidence intervals and estimates.

Code
tidy1 <- tidy1 %>%
  mutate_at(vars(estimate, CI), ~ifelse(sig == "sig", sprintf("<strong>%s</strong>", .), .))
tidy1

This reads as “for both the estimate and the CI columns, if the sig column is equal to”sig”, then let’s format it as bold using html. Otherwise, let’s leave it alone.” And indeed, we can see that the final result formats 3/4 rows.

Thankfully, these can be achieved without considerable reshaping of the data, which is why we’ve started here, so we’re almost done. We just need to get rid of some unnecessary columnns.

Code
tidy1 <- tidy1 %>%
  select(Trait, OR = estimate, CI)

Because we just have one target term and one outcome, we don’t need those columns, so we’re just keeping Trait, OR, which I renamed as such within in the select call, and CI.

Kabling a Table

Now let’s kable. You’ve likely used the kable() function from the knitr before. It’s a very useful and simple function in most occasions.

Code
kable(tidy1)
Trait OR CI
OP <strong>1.68</strong> <strong>[1.50, 1.88]</strong>
DEP <strong>1.15</strong> <strong>[1.07, 1.24]</strong>
NegAff <strong>1.11</strong> <strong>[1.01, 1.22]</strong>
PA <strong>1.97</strong> <strong>[1.77, 2.20]</strong>
SE 1.09 [0.97, 1.22]

It will automatically generate the html code needed to create a table. But if we look closely at the code, it gives us some gobbledigook where we inputted html, so we need a way around that. I’m also going to throw in kable_styling(full_width = F) from the kableExtra package to help out here. It’s not doing much, but it will make the formatted table print in your Viewer.

Code
kable(tidy1, escape = F) %>%
  kable_classic(full_width = F, html_font = "Times")
Trait OR CI
OP 1.68 [1.50, 1.88]
DEP 1.15 [1.07, 1.24]
NegAff 1.11 [1.01, 1.22]
PA 1.97 [1.77, 2.20]
SE 1.09 [0.97, 1.22]

Much better. But this still doesn’t look like an APA table, so let’s keep going.

  1. APA tables usually write out long names for our predictors, so let’s change those first. I’m going to create a reference tibble and use mapvalues() from the plyr function for this.
Code
p_names <- tibble(
  old = c("NegAff", "PA", "SE", "OP", "DEP"),
  new = c("Negative Affect", "Positive Affect", "Self-Esteem", "Optimism", "Depression")
)

tidy1 %>%
  mutate(Trait = mapvalues(Trait, from = p_names$old, to = p_names$new),
         Trait = factor(Trait, levels = p_names$new)) %>%
  arrange(Trait) %>%
  kable(., escape = F) %>%
  kable_classic(full_width = F, html_font = "Times")
Trait OR CI
Negative Affect 1.11 [1.01, 1.22]
Positive Affect 1.97 [1.77, 2.20]
Self-Esteem 1.09 [0.97, 1.22]
Optimism 1.68 [1.50, 1.88]
Depression 1.15 [1.07, 1.24]
  1. The alignment of the columns isn’t quite right. Let’s fix that. We’ll change the trait to right justified and b and CI to centered.
Code
tidy1 %>%
  mutate(Trait = mapvalues(Trait, from = p_names$old, to = p_names$new),
         Trait = factor(Trait, levels = p_names$new)) %>%
  arrange(Trait) %>%
  kable(.
        , escape = F
        , align = c("r", "c", "c")
        ) %>%
  kable_classic(full_width = F, html_font = "Times")
Trait OR CI
Negative Affect 1.11 [1.01, 1.22]
Positive Affect 1.97 [1.77, 2.20]
Self-Esteem 1.09 [0.97, 1.22]
Optimism 1.68 [1.50, 1.88]
Depression 1.15 [1.07, 1.24]
  1. But we’re still missing our span header. There’s a great function in the kableExtra package for this add_header_above. This function takes a named vector as argument, where the elements of the vector refer to the number of columns the named element should span.
Code
tidy1 %>%
  mutate(Trait = mapvalues(Trait, from = p_names$old, to = p_names$new),
         Trait = factor(Trait, levels = p_names$new)) %>%
  arrange(Trait) %>%
  kable(.
        , escape = F
        , align = c("r", "c", "c")
        ) %>%
  kable_classic(full_width = F, html_font = "Times") %>%
  add_header_above(c(" " = 1, "Birth of a Child" = 2))
Birth of a Child
Trait OR CI
Negative Affect 1.11 [1.01, 1.22]
Positive Affect 1.97 [1.77, 2.20]
Self-Esteem 1.09 [0.97, 1.22]
Optimism 1.68 [1.50, 1.88]
Depression 1.15 [1.07, 1.24]

Note that what the " " = 1 does is skip the Trait column. This is very useful because it let’s us not have a span header over every column.

  1. APA style requires we note how we denote significance and have a title, so let’s add a title and a note.
Code
tidy1 %>%
  mutate(Trait = mapvalues(Trait, from = p_names$old, to = p_names$new),
         Trait = factor(Trait, levels = p_names$new)) %>%
  arrange(Trait) %>%
  kable(.
        , escape = F
        , align = c("r", "c", "c")
        , caption = "<strong>Table 1</strong><br><em>Estimated Personality-Outcome Associations</em>"
        ) %>%
  kable_classic(full_width = F, html_font = "Times")%>%
  add_header_above(c(" " = 1, "Birth of a Child" = 2)) %>%
  add_footnote(label = "Bold values indicate terms whose confidence intervals did not overlap with 0", notation = "none")
Table 1
Estimated Personality-Outcome Associations
Birth of a Child
Trait OR CI
Negative Affect 1.11 [1.01, 1.22]
Positive Affect 1.97 [1.77, 2.20]
Self-Esteem 1.09 [0.97, 1.22]
Optimism 1.68 [1.50, 1.88]
Depression 1.15 [1.07, 1.24]
Bold values indicate terms whose confidence intervals did not overlap with 0

We did it!

A Quick Note: HTML v. LaTeX

When creating tables, I prefer using HTML when I need the resulting tables to be in HTML and LaTeX when I can place the tables in a PDF. The syntax using kable and kableExtra is the same with the following exceptions:

  1. The format argument in kable() would need to be set as format = "latex".
  2. The chunk option for a table to render would need to be set as {r, results = 'asis'}.
  3. Bolding would need to be done as \\textbf{}, rather than the html <strong></strong> tag.
  4. When using collapse_rows(), which we’ll get to later, you’d want to set the latex_hline argument to latex_hline = "none".

Extracting terms from lavaan

Most models in R can easily have terms extracted via broom or broom.mixed, but one major exception to this are SEM models in lavaan. Let’s step through an example using lavaan to show you some of my tricks for working with it.

lavaan Example 1:

Data

For this example, we’ll use a different data set from the SOEP that have the item level personality data. The code below reads it in, recodes, and reverse scores it.

Code
rev_code <- c("Big5__A_coarse", "Big5__C_lazy", "Big5__E_reserved", "Big5__N_dealStress")
(gsoep2 <- read_csv("week8-data-2.csv") %>%
   mutate_at(vars(matches("Big5")), function(x) {x[x < 0] <- NA; x}) %>%
   mutate_at(vars(matches("LifeEvent")), ~mapvalues(., seq(-7,1), c(rep(NA, 5), 0, NA, NA, 1), warn_missing = F)) %>%
    mutate_at(
    vars(all_of(rev_code))
    , ~as.numeric(reverse.code(., keys = -1, mini = 1, maxi = 7))
    ))

Now we’ll change it to long format for easier cleaning and reshaping

Code
gsoep2_long <- gsoep2 %>%
  pivot_longer(
    cols = c(starts_with("Big5"), starts_with("Life"))
    , names_to = c("category", "item")
    , names_sep = "__"
    , values_to = "value"
    , values_drop_na = T
  ) 

Data Cleaning

  • Now, we need to reshape the Big Five data for lavaan
  • If you learn nothing else from this class, I want you to learn this trick I use to run the same model multiple times on different variables
  • We are going to run a second-order latent growth model of the BFI-S, which has three items per Big Five trait
  • The item numbers are arbitrary but consistent across traits (3) as are the years (2005, 2009, 2013)
  • So if we separate the trait information from the item and year, then we can run identical models across traits
Code
gsoep2_lavaan <- gsoep2_long %>% 
  filter(category == "Big5") %>%
  separate(item, c("trait", "item"), sep = "_") %>%
  group_by(Procedural__SID, year, trait) %>%
  mutate(item = mapvalues(item, unique(item), 1:n())) %>%
  ungroup() %>%
  pivot_wider(
    names_from = c("item", "year")
    , names_prefix = "I"
    , values_from = "value"
  )
gsoep2_lavaan

Second-Order Latent Growth Model

  • Because of how we set up the data, we only have to write this out once. This is extra helpful for SEM because the model syntax can get really long and it would be time consuming to have to write this out separately for each of the Big Five
Code
mod <- '
  W1 =~ NA*I1_2005 + lambda1*I1_2005 + lambda2*I2_2005 + lambda3*I3_2005
  W2 =~ NA*I1_2009 + lambda1*I1_2009 + lambda2*I2_2009 + lambda3*I3_2009
  W3 =~ NA*I1_2013 + lambda1*I1_2013 + lambda2*I2_2013 + lambda3*I3_2013
  
  i =~ 1*W1 + 1*W2 + 1*W3
  s =~ -1*W1 + 0*W2 + 1*W3
  
  ## intercepts
  I1_2005 ~ t1*1
  I1_2009 ~ t2*1
  I1_2013 ~ t3*1
  
  I2_2005 ~ t1*1
  I2_2009 ~ t2*1
  I2_2013 ~ t3*1
  
  I3_2005 ~ t1*1
  I3_2009 ~ t2*1
  I3_2013 ~ t3*1
  
  ## correlated residuals across time
  I1_2005 ~~ I1_2009 + I1_2013
  I1_2009 ~~ I1_2013
  I2_2005 ~~ I2_2009 + I2_2013
  I2_2009 ~~ I2_2013
  I3_2005 ~~ I3_2009 + I3_2013
  I3_2009 ~~ I3_2013
  
  ## latent variable intercepts
  W1 ~ 0*1
  W2 ~ 0*1
  W3 ~ 0*1
  
  #model constraints for effect coding
  ## loadings must average to 1
  lambda1 == 3 - lambda2 - lambda3
  ## means must average to 0
  t1 == 0 - t2 - t3
  '

Now, we’ll write a little function that will run the model syntax across cross-sections of our data for each Big Five trait

Code
lavaan_fun <- function(d){
  m <- growth(
    mod
    , data = d
    , missing = "fiml"
  )
  return(m)
}

Now, we can create those cross-sections using tidyr::nest() and then run the model using purrr::map(). Having the trait information as rows in our data frame and using list columns let’s us separate out the trait information from the item/year information we need to run the models.

Code
gsoep_nested2 <- gsoep2_lavaan %>%
  group_by(category, trait) %>%
  nest() %>%
  ungroup() %>%
  mutate(m = map(data, lavaan_fun))
gsoep_nested2

Extracting Results

  • Summaries of SEM models are going to spit out a MUCH longer summary of results than the typical models we run and extract information from using broom/broom.mixed.
  • When I do SEM, I do make a table that includes all of these parameters for each model and put it in the supplement
  • But for the paper itself, we only want a subset of key model terms across all traits
Code
summary(gsoep_nested2$m[[1]])
lavaan 0.6.15 ended normally after 81 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        44
  Number of equality constraints                    14

  Number of observations                         16716
  Number of missing patterns                        38

Model Test User Model:
                                                      
  Test statistic                               284.272
  Degrees of freedom                                24
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  W1 =~                                               
    I1_2005 (lmb1)    1.053    0.001 1312.911    0.000
    I2_2005 (lmb2)    0.960    0.001  810.689    0.000
    I3_2005 (lmb3)    0.987    0.001 1142.417    0.000
  W2 =~                                               
    I1_2009 (lmb1)    1.053    0.001 1312.911    0.000
    I2_2009 (lmb2)    0.960    0.001  810.689    0.000
    I3_2009 (lmb3)    0.987    0.001 1142.417    0.000
  W3 =~                                               
    I1_2013 (lmb1)    1.053    0.001 1312.911    0.000
    I2_2013 (lmb2)    0.960    0.001  810.689    0.000
    I3_2013 (lmb3)    0.987    0.001 1142.417    0.000
  i =~                                                
    W1                1.000                           
    W2                1.000                           
    W3                1.000                           
  s =~                                                
    W1               -1.000                           
    W2                0.000                           
    W3                1.000                           

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .I1_2005 ~~                                          
   .I1_2009           0.029    0.010    3.073    0.002
   .I1_2013           0.046    0.011    4.130    0.000
 .I1_2009 ~~                                          
   .I1_2013           0.052    0.010    5.131    0.000
 .I2_2005 ~~                                          
   .I2_2009           0.775    0.025   31.118    0.000
   .I2_2013           0.685    0.029   23.661    0.000
 .I2_2009 ~~                                          
   .I2_2013           0.924    0.028   32.973    0.000
 .I3_2005 ~~                                          
   .I3_2009           0.148    0.011   13.042    0.000
   .I3_2013           0.139    0.013   11.011    0.000
 .I3_2009 ~~                                          
   .I3_2013           0.179    0.012   15.087    0.000
  i ~~                                                
    s                -0.004    0.005   -0.811    0.417

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .I1_2005   (t1)    0.443    0.089    4.952    0.000
   .I1_2009   (t2)   -0.027    0.005   -4.944    0.000
   .I1_2013   (t3)   -0.416    0.089   -4.650    0.000
   .I2_2005   (t1)    0.443    0.089    4.952    0.000
   .I2_2009   (t2)   -0.027    0.005   -4.944    0.000
   .I2_2013   (t3)   -0.416    0.089   -4.650    0.000
   .I3_2005   (t1)    0.443    0.089    4.952    0.000
   .I3_2009   (t2)   -0.027    0.005   -4.944    0.000
   .I3_2013   (t3)   -0.416    0.089   -4.650    0.000
   .W1                0.000                           
   .W2                0.000                           
   .W3                0.000                           
    i                 5.841    0.007  867.282    0.000
    s                 0.405    0.088    4.607    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .I1_2005           0.429    0.011   39.517    0.000
   .I2_2005           1.849    0.028   65.804    0.000
   .I3_2005           0.719    0.013   55.538    0.000
   .I1_2009           0.478    0.012   40.829    0.000
   .I2_2009           2.021    0.031   66.239    0.000
   .I3_2009           0.754    0.014   55.362    0.000
   .I1_2013           0.433    0.011   39.096    0.000
   .I2_2013           2.094    0.032   64.482    0.000
   .I3_2013           0.684    0.013   52.839    0.000
   .W1                0.144    0.018    7.990    0.000
   .W2                0.205    0.009   22.139    0.000
   .W3                0.084    0.018    4.696    0.000
    i                 0.380    0.009   44.413    0.000
    s                 0.057    0.010    5.808    0.000

Constraints:
                                               |Slack|
    lambda1 - (3-lambda2-lambda3)                0.000
    t1 - (0-t2-t3)                               0.000

So what I do is to write a little function. I try to write it so that it can be used across multiple qualitatively different models if I’m running different (e.g., see second example later). To do that, I have (1) the function and (2) a tibble called terms that contains the paths I want to extract.

Code
terms <- tribble(
  ~path,     ~new,
  "i~1", "Intercept",
  "s~1", "Slope",
  "i~~i", "Intercept Variance",
  "s~~s", "Slope Variance",
  "i~~s", "Intercept-Slope Covariance"
)

extract_fun <- function(m){
  parameterEstimates(m) %>%
    data.frame() %>%
    unite(path, lhs, op, rhs, sep = "") %>%
    filter(path %in% terms$path) %>%
    left_join(terms) %>%
    select(term = new, est, se, ci.lower, ci.upper, pvalue)
}

Now we can run that function to extract the results we care about

Code
gsoep_nested2 <- gsoep_nested2 %>%
  mutate(summary = map(m, extract_fun))
gsoep_nested2

Formatting Results

The way that we need to format the data is pretty standard, so below are three functions I use when I need to format results from lavaan for putting into a table

Code
round_fun <- function(x) if(!is.na(x)) if(abs(x) > .01) sprintf("%.2f", x) else sprintf("%.3f", x) else ""
pround_fun <- function(x) if(!is.na(x)) if(abs(x) > .01) sprintf("%.2f", x) else if(x >= .001) sprintf("%.3f", x) else "&lt; .001" else ""

format_fun <- function(d){
  d %>%
    mutate(sig = ifelse(pvalue < .05, "sig", "ns")) %>%
    rowwise() %>%
    mutate_at(vars(est, ci.lower, ci.upper), round_fun) %>%
    mutate_at(vars(pvalue), pround_fun) %>%
    ungroup() %>%
    mutate(CI = sprintf("[%s,%s]", ci.lower, ci.upper)) %>%
    mutate_at(vars(est, CI, pvalue), ~ifelse(sig == "sig" & !is.na(sig), sprintf("<strong>%s</strong>", .), .)) 
}

Now let’s run that and see what it looks like

Code
gsoep_tab <- gsoep_nested2 %>%
  select(-data, -m) %>%
  unnest(summary) %>%
  format_fun() 
gsoep_tab

We’re really close to being ready to make the table using kable, but we need to do a little reshaping and to remove some columns first.

Code
gsoep_tab <- gsoep_tab %>%
  select(-ci.lower, -ci.upper, -sig, -category, -pvalue) %>%
  pivot_wider(
    names_from = "trait"
    , names_glue = "{trait}_{.value}"
    , values_from = c("est", "CI")
  ) 
gsoep_tab

Unfortunately, pivot_wider() doesn’t really let us have too much control over the order of the columns, so we need to move them around so the estimates and CI’s for each trait are next to each other. There are many ways to do this, but I’m showing you two below:

Code
ord <- paste(rep(c("E", "A", "C", "N", "O"), each = 2), rep(c("est", "CI"), times = 5), sep = "_")
gsoep_tab <- gsoep_tab %>%
  select(term, all_of(ord)) %>%
  # select(starts_with("E"), starts_with("A"), starts_with("C"), starts_with("N"), starts_with("O"))
  mutate(term = factor(term, terms$new)) %>%
  arrange(term)
gsoep_tab

Kabling the Table

Now, let’s go ahead and create the table using kable. Like before, we’re going to use add_header_above(), but I’m going to show you a trick that I use for it to make specifying it a little easier. - add_header_above() takes in a named vector as input where the values have to sum to the number of columns in the data frame (11)

Code
hdr <- c(1, rep(2,5))
names(hdr) <- c(" ", "Extraversion", "Agreeableness", "Conscientiousness", "Neuroticism", "Openness")
gsoep_tab %>%
  kable(.
        , escape = F
        , align = c("r", rep("c", 10))
        , col.names = c("Term", rep(c("<em>est.</em>", "CI"), times = 5))
        , caption = "<strong>Table 2</strong><br><em>Big Five Personality Trait Trajectories from Latent Growth Models</em>"
        ) %>%
  kable_classic(full_width = F, html_font = "Times") %>%
  add_header_above(hdr) %>%
  add_footnote(label = "Bold values indicate terms p < .05", notation = "none")
Table 2
Big Five Personality Trait Trajectories from Latent Growth Models
Extraversion
Agreeableness
Conscientiousness
Neuroticism
Openness
Term est. CI est. CI est. CI est. CI est. CI
Intercept 5.84 [5.83,5.85]
Intercept 4.83 [4.81,4.84]
Intercept 5.41 [5.39,5.42]
Intercept 4.51 [4.49,4.53]
Intercept 3.84 [3.82,3.86]
Slope 0.40 [0.23,0.58]
Slope 0.10 [0.04,0.16]
Slope 0.49 [0.34,0.65]
Slope -0.40 [-0.57,-0.24]
Slope -0.95 [-1.05,-0.86]
Intercept Variance 0.38 [0.36,0.40]
Intercept Variance 0.69 [0.66,0.71]
Intercept Variance 0.35 [0.33,0.37]
Intercept Variance 0.66 [0.63,0.69]
Intercept Variance 0.69 [0.66,0.72]
Slope Variance 0.06 [0.04,0.08]
Slope Variance 0.07 [0.05,0.10]
Slope Variance 0.03 [0.009,0.05]
Slope Variance 0.07 [0.04,0.10]
Slope Variance 0.07 [0.04,0.10]
Intercept-Slope Covariance -0.004 [-0.02,0.006]
Intercept-Slope Covariance 0.004 [-0.010,0.02]
Intercept-Slope Covariance 0.009 [-0.003,0.02]
Intercept-Slope Covariance -0.007 [-0.02,0.009]
Intercept-Slope Covariance 0.02 [0.006,0.04]
Bold values indicate terms p < .05

lavaan Example 2

Second-Order Latent Growth Model: Slopes as Predictors

Now, let’s do a second example that is more complex and uses the intercepts and slopes we estimated to predict life outcomes. To do so, we first need to clean and prep the life event data and merge it back with our Big Five item-level data.

Note that in this case, we’re using long data two ways: for the Big Five traits and for the life events. In other words, we’re getting every combination of the two and are using inner_join() to make that happen!

Code
gsoep2_lavaan <- gsoep2_long %>%
  filter(category == "LifeEvent") %>%
  group_by(Procedural__SID, Demographic__DOB, item) %>%
  summarize(le_value = max(value)) %>%
  ungroup() %>%
  mutate(age = 2005 - Demographic__DOB - 45) %>%
  rename(le = item) %>%
  inner_join(gsoep2_lavaan)
gsoep2_lavaan

Data Cleaning

Now let’s create the nested data frame.

Code
gsoep_nested3 <- gsoep2_lavaan %>%
  drop_na(trait, le) %>%
  group_by(trait, le) %>%
  nest() %>%
  ungroup()
gsoep_nested3

Model Setup

We’re really just building on the second-order latent growth models we already ran. We already ran those and know that we have specified the models correctly / have enough variance to use the slopes and intercept to predict other things. So now we just need to add the new regression paths to that syntax to be able to run the models.

As with the model we specified before, by having the life event data long, we are able to do this just one time and have that work for all the life events. So using this trick, we can run 50 unique models with one set of model syntax!

Note that we have to modify the function that calls growth() because we have a categorical outcome that needs a different estimator.

Code
mod2 <- '
le_value ~ i + age
le_value ~ s
'

mod2 <- paste(mod, mod2, collapse = "\n")

lavaan_fun <- function(d){
  m <- growth(
    mod2
    , data = d
    , ordered = "le_value"
    , estimator = 'WLSMV'
    , missing = "pairwise"
    , parallel = "multicore"
  )
  return(m)
}

gsoep_nested3 <- gsoep_nested3 %>%
  mutate(m = map(data, lavaan_fun))
# saveRDS(gsoep_nested3, "gsoep_nested3.RDS")

Extract Results

Remember how I said that the reason we created the terms tibble was because it would make extract_fun() more flexible / portable? Let’s see that in action. We’ll create a new one that lists only the key additional terms in this model and use that to extract those estimates.

Code
gsoep_nested3 <- readRDS("gsoep_nested3.RDS")
terms <- tribble(
  ~path,     ~new,
  "le_value~i", "Intercept",
  "le_value~s", "Slope",
  "le_value~age", "Age"
)

gsoep_nested3 <- gsoep_nested3 %>%
  mutate(summary = map(m, extract_fun))

Formatting Results

Now we’re ready to format the results using the same steps as before. Note that I’m combining all them here because we’ve already stepped through them slowly: 1. Remove other list columns and unnest() 2. Format the results using format_fun() 3. Remove unnecessary columns and pivot_wider() 4. Change the order of the columns 5. Factor the terms to help us arrange the rows 6. Sort the rows by Life Event and then term

Code
ord <- paste(rep(c("E", "A", "C", "N", "O"), each = 2), rep(c("est", "CI"), times = 5), sep = "_")
gsoep_tab2 <- gsoep_nested3 %>%
  select(-data, -m) %>%
  unnest(summary) %>%
  format_fun() %>%
  select(-ci.lower, -ci.upper, -sig, -pvalue) %>%
  pivot_wider(
    names_from = "trait"
    , names_glue = "{trait}_{.value}"
    , values_from = c("est", "CI")
  ) %>%
  select(le, term, all_of(ord)) %>%
  # select(starts_with("E"), starts_with("A"), starts_with("C"), starts_with("N"), starts_with("O"))
  mutate(term = factor(term, terms$new)) %>%
  arrange(le, term)
gsoep_tab2

Kabling the Table

Now we’re ready to kable the table! This looks identical to before, but I’m going to show you one more trick using kableExtra::group_rows. Rather than having to count the rows and adding a bunch of manual calls, I always just use the data frame to count for me to avoid errors. Then, I can create the table and use a for loop to tack on all the grouped rows using that reference data frame. Much more flexible and less error prone!

Code
hdr <- c(1, rep(2,5))
names(hdr) <- c(" ", "Extraversion", "Agreeableness", "Conscientiousness", "Neuroticism", "Openness")
rs <- gsoep_tab2 %>% group_by(le) %>% tally() %>% 
    mutate(end = cumsum(n), start = lag(end) + 1, start = ifelse(is.na(start), 1, start))
gsoep_tab2_kable <- gsoep_tab2 %>%
  select(-le) %>%
  kable(.
        , escape = F
        , align = c("r", rep("c", 10))
        , col.names = c("Term", rep(c("<em>est.</em>", "CI"), times = 5))
        , caption = "<strong>Table 3</strong><br><em>Big Five Personality Trait Trajectory Associations with Life Events from Latent Growth Models</em>"
        ) %>%
  kable_classic(full_width = F, html_font = "Times") %>%
  add_header_above(hdr) %>%
  add_footnote(label = "Bold values indicate terms p < .05. ", notation = "none")

for(i in 1:nrow(rs)){
    gsoep_tab2_kable <- gsoep_tab2_kable %>% kableExtra::group_rows(rs$le[i], rs$start[i], rs$end[i])
}
gsoep_tab2_kable
Table 3
Big Five Personality Trait Trajectory Associations with Life Events from Latent Growth Models
Extraversion
Agreeableness
Conscientiousness
Neuroticism
Openness
Term est. CI est. CI est. CI est. CI est. CI
ChldBrth
Intercept 0.12 [0.05,0.18]
Intercept 0.005 [-0.04,0.05]
Intercept 0.09 [-0.02,0.20]
Intercept -0.09 [-0.14,-0.04]
Intercept -0.04 [-0.09,0.02]
Slope 0.21 [-0.31,0.72]
Slope -0.05 [-0.44,0.35]
Slope -1.19 [-3.39,1.00]
Slope 0.006 [-0.41,0.42]
Slope 0.07 [-0.38,0.53]
Age -0.03 [-0.04,-0.03]
Age -0.03 [-0.04,-0.03]
Age -0.03 [-0.04,-0.03]
Age -0.03 [-0.04,-0.03]
Age -0.03 [-0.04,-0.03]
ChldMvOut
Intercept 0.15 [0.09,0.21]
Intercept 0.02 [-0.02,0.06]
Intercept 0.05 [-0.06,0.16]
Intercept 0.009 [-0.04,0.05]
Intercept 0.01 [-0.03,0.06]
Slope -0.87 [-1.37,-0.37]
Slope -0.51 [-0.87,-0.15]
Slope -1.62 [-4.11,0.86]
Slope -0.65 [-1.07,-0.23]
Slope 0.09 [-0.26,0.44]
Age 0.005 [0.003,0.008]
Age 0.005 [0.003,0.008]
Age 0.005 [0.003,0.008]
Age 0.005 [0.003,0.008]
Age 0.005 [0.003,0.008]
DadDied
Intercept 0.005 [-0.06,0.07]
Intercept -0.02 [-0.07,0.03]
Intercept -0.07 [-0.16,0.03]
Intercept -0.04 [-0.09,0.02]
Intercept 0.04 [-0.02,0.10]
Slope -0.43 [-0.90,0.04]
Slope -0.57 [-1.00,-0.15]
Slope -0.92 [-2.73,0.88]
Slope -0.66 [-1.13,-0.20]
Slope 0.01 [-0.40,0.43]
Age -0.004 [-0.007,-0.001]
Age -0.004 [-0.007,-0.001]
Age -0.004 [-0.007,-0.001]
Age -0.004 [-0.007,-0.001]
Age -0.004 [-0.007,-0.001]
Divorce
Intercept 0.12 [0.03,0.21]
Intercept 0.05 [-0.01,0.12]
Intercept -0.07 [-0.20,0.05]
Intercept 0.03 [-0.03,0.10]
Intercept 0.10 [0.02,0.19]
Slope 0.35 [-0.35,1.04]
Slope -0.35 [-0.84,0.15]
Slope 1.01 [-1.14,3.17]
Slope -0.15 [-0.66,0.36]
Slope -0.67 [-1.38,0.04]
Age -0.007 [-0.01,-0.003]
Age -0.007 [-0.01,-0.003]
Age -0.007 [-0.01,-0.003]
Age -0.007 [-0.01,-0.003]
Age -0.007 [-0.01,-0.003]
Married
Intercept 0.10 [0.04,0.17]
Intercept 0.05 [0.000,0.09]
Intercept 0.03 [-0.08,0.14]
Intercept 0.04 [-0.01,0.08]
Intercept -0.002 [-0.06,0.06]
Slope 0.21 [-0.30,0.72]
Slope -0.06 [-0.46,0.34]
Slope -1.26 [-3.47,0.95]
Slope -0.003 [-0.40,0.40]
Slope 0.05 [-0.39,0.49]
Age -0.03 [-0.03,-0.03]
Age -0.03 [-0.03,-0.03]
Age -0.03 [-0.03,-0.03]
Age -0.03 [-0.03,-0.03]
Age -0.03 [-0.03,-0.03]
MomDied
Intercept 0.08 [0.01,0.15]
Intercept -0.02 [-0.07,0.02]
Intercept 0.05 [-0.05,0.16]
Intercept 0.05 [-0.005,0.09]
Intercept 0.03 [-0.02,0.09]
Slope -0.82 [-1.37,-0.26]
Slope -0.35 [-0.73,0.03]
Slope -1.07 [-3.04,0.89]
Slope -0.02 [-0.38,0.34]
Slope -0.06 [-0.47,0.35]
Age 0.007 [0.005,0.010]
Age 0.007 [0.005,0.010]
Age 0.007 [0.005,0.010]
Age 0.007 [0.005,0.010]
Age 0.007 [0.005,0.010]
MoveIn
Intercept 0.06 [-0.01,0.13]
Intercept 0.07 [0.02,0.12]
Intercept 0.02 [-0.10,0.13]
Intercept 0.05 [-0.003,0.09]
Intercept 0.08 [0.02,0.15]
Slope 0.33 [-0.22,0.88]
Slope -0.61 [-1.12,-0.09]
Slope -1.29 [-3.65,1.07]
Slope -0.22 [-0.67,0.22]
Slope -0.28 [-0.79,0.23]
Age -0.03 [-0.03,-0.03]
Age -0.03 [-0.03,-0.03]
Age -0.03 [-0.03,-0.03]
Age -0.03 [-0.03,-0.03]
Age -0.03 [-0.03,-0.03]
NewPart
Intercept -0.13 [-0.22,-0.04]
Intercept 0.17 [0.10,0.24]
Intercept -0.09 [-0.22,0.03]
Intercept 0.13 [0.06,0.20]
Intercept -0.03 [-0.11,0.05]
Slope 0.71 [0.24,1.18]
Slope 0.29 [-0.22,0.79]
Slope 1.21 [-0.23,2.65]
Slope 0.10 [-0.47,0.67]
Slope -0.14 [-0.71,0.44]
Age -0.04 [-0.04,-0.03]
Age -0.04 [-0.04,-0.03]
Age -0.04 [-0.04,-0.03]
Age -0.04 [-0.04,-0.03]
Age -0.04 [-0.04,-0.03]
PartDied
Intercept 0.02 [-0.07,0.10]
Intercept 0.04 [-0.02,0.10]
Intercept 0.10 [-0.11,0.30]
Intercept -0.09 [-0.15,-0.02]
Intercept 0.10 [0.02,0.18]
Slope -0.29 [-0.98,0.40]
Slope 0.48 [-0.04,0.99]
Slope 3.32 [-1.64,8.27]
Slope -0.11 [-0.61,0.38]
Slope -0.23 [-0.79,0.33]
Age 0.03 [0.03,0.03]
Age 0.03 [0.03,0.03]
Age 0.03 [0.03,0.03]
Age 0.03 [0.03,0.03]
Age 0.03 [0.03,0.03]
SepPart
Intercept 0.04 [-0.02,0.11]
Intercept 0.09 [0.04,0.14]
Intercept -0.11 [-0.24,0.03]
Intercept 0.07 [0.02,0.12]
Intercept 0.09 [0.02,0.16]
Slope 0.64 [0.07,1.21]
Slope -0.14 [-0.52,0.24]
Slope 1.92 [-1.05,4.90]
Slope 0.010 [-0.38,0.40]
Slope -0.77 [-1.38,-0.15]
Age -0.02 [-0.02,-0.02]
Age -0.02 [-0.02,-0.02]
Age -0.02 [-0.02,-0.02]
Age -0.02 [-0.02,-0.02]
Age -0.02 [-0.02,-0.02]
Bold values indicate terms p < .05.

Kabling lots of tables

The code below is a pretty chunk of code I use to generate tables for each model I run with all the model parameter estimates.

Code
all_term_tab_fun <- function(m, trait){
  long_trait <- mapvalues(trait, p_names$old, p_names$new, warn_missing = F)
  cap <- sprintf("<strong>Table SX</strong><br><em>Second Order Latent Growth Models of %s</em>", long_trait)
  note <- "Bold values indicate estimates were significant at p < .05. est. = unstandardized estimate. "
  tab <- parameterEstimates(m, standardized = T) %>%
    data.frame() %>%
    format_fun() %>%
    select(lhs:est, CI, pvalue) %>%
    kable(.
          , format = "html"
          , escape = F
          , align = c("r", "c", "l", "l", rep("c", 3))
          , col.names = c("LHS", "op", "RHS", "label", "est.", "CI", "<em>p</em>")
          , caption = cap
          ) %>%
    kable_classic(full_width = F, html_font = "Times") %>%
    footnote(note)
  # save_kable(tab, file = sprintf("results/tables/all-terms/%s.html", trait))
  return(tab)
}

gsoep_nested2 <- gsoep_nested2 %>%
  mutate(all_term_tab = map2(m, trait, all_term_tab_fun))

Plots

The same functional, iterative approach also applies to creating plots! I teach a whole course on data visualization, so I’m not going to spend a ton of time on this. But I’ll show you an example for basic R models and for lavaan.

Plots Example 1: Base R GLM

Set up the data frame

First, we’ll use some of what we learned in the purrr workshop to set ourselves up to be able to create these tables easily, using group_by() and nest() to create nested data frames for our target personality + outcome combinations. To do this, we’ll also use what you learned about filter() and mutate().

Code
outcomes <- tribble(
  ~old, ~new,
  "chldbrth", "Child Birth"
  , "divorced", "Divorced"
  , "married", "Married"
  , "mvInPrtner", "Move in with Partner"
)

gsoep_nested4 <- gsoep %>%
  mutate(age = age - 45) %>%
  group_by(Trait, Outcome) %>%
  nest() %>%
  ungroup()
gsoep_nested4

Run Models

Now, we’ll run the models predicting outcomes from personality x age interactions (and their lower order terms)

Code
mod2_fun <- function(d){
  d$o_value <- factor(d$o_value)
  glm(o_value ~ p_value*age, data = d, family = binomial(link = "logit"))
}

gsoep_nested4 <- gsoep_nested4 %>%
  mutate(m = map(data, mod2_fun))

Generating Model Predictions

The next step is to get predictions from the model. It is also good practice to always get standard errors and/or confidence intervals of the estimates. Thankfully, with lm() and glm(), this is easy. Lavaan will be covered briefly and more complex model forms are covered in my data visualization class.

Code
glm_pred_fun <- function(m){
  rng_p <- range(m$model$p_value, na.rm = T)
  frame <- crossing(
    age = c(-15, 0, 15)
    , p_value = seq(rng_p[1], rng_p[2], length.out = 30)
  )
  
  pred <- predict(m, newdata = frame, se.fit = T)[1:2]
  frame <- frame %>% 
    mutate(fit = pred$fit
           , se = pred$se.fit
           , lower = fit - 1.96*se
           , upper = fit + 1.96*se
           ) %>%
    mutate_at(vars(fit, lower, upper), exp)
}

Now we run the predictions

Code
gsoep_nested4 <- gsoep_nested4 %>%
  mutate(pred = map(m, glm_pred_fun))
gsoep_nested4

Plot the Predictions

One of the greatest strengths of list-columns and using purrr is what I think of as the data accordion. With list-columns, I can unnest() whatever level the models were estimated at and then reaggregate using whatever combinations / groups make the most sense.

So here, for example, we ran models for all Big Five trait (5) x life outcome (10) combinations. But having 50 separate plots would be a little silly, so we’d much rather create one for each outcome. But I’m not going to write that code 10 times because that’d be a waste. So instead if I reaggregate the data, I can then use a function + purrr to generate each of the plots.

Code
gsoep_plot4 <- gsoep_nested4 %>%
  select(-data, -m) %>%
  unnest(pred) %>%
  group_by(Outcome) %>%
  nest() %>%
  ungroup()
gsoep_plot4

When we write code for plots in ggplot, we have a lot of things that we end up writing over and over, especially for theme elements. To get around this, I use this little function to modify all of my theme elements in a single line.

Code
my_theme <- function(){
  theme_classic() + 
  theme(
    legend.position = "bottom"
    , legend.title = element_text(face = "bold", size = rel(1))
    , legend.text = element_text(face = "italic", size = rel(1))
    , axis.text = element_text(face = "bold", size = rel(1.1), color = "black")
    , axis.title = element_text(face = "bold", size = rel(1.2))
    , panel.grid.major = element_line(color = "grey90", linewidth = .2)
    , plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)
    , plot.subtitle = element_text(face = "italic", size = rel(1.2), hjust = .5)
    , strip.text = element_text(face = "bold", size = rel(1.1), color = "white")
    , strip.background = element_rect(fill = "black")
    )
}

Like I mentioned, once we reaggregate the data, we can then write a function to generate the plot for each outcome separately. Let’s start by building the basic code for the plot. We need it to: 1. Have x = personality, y = predicted values, separate lines for moderator levels, and separate panels for different traits 2. A line indicating personality-outcome associations (again separately across age groups) and a ribbon with the confidence interval around the prediction

Code
plot_fun <- function(d){
  d %>%
    mutate(age_fac = factor(age, c(-15, 0, 15), c("30", "45", "60"))) %>%
    ggplot(aes(x = p_value, y = fit, color = age_fac)) +
    geom_ribbon(
      aes(ymin = lower, ymax = upper, fill = age_fac)
      , alpha = .4
    ) + 
    geom_line() + 
    facet_wrap(~Trait, scales = "free_x") + 
    my_theme()
}

With our reaggregated data, we can easily just run the plots and view the results. This is getting there, but let’s make some additional modifications to get them publication ready.

Code
gsoep_plot4 <- gsoep_plot4 %>%
  mutate(p = map(data, plot_fun))
gsoep_plot4
Code
gsoep_plot4$p[[1]]

Let’s add: 1. full personality trait names 2. the outcome in the title 3. better axis and other labels

Code
plot_fun <- function(d, outcome){
  out <- mapvalues(outcome, outcomes$old, outcomes$new, warn_missing = F)
  d %>%
    mutate(age_fac = factor(age, c(-15, 0, 15), c("30", "45", "60"))
           , Trait = factor(Trait, p_names$old, p_names$new)) %>%
    ggplot(aes(x = p_value, y = fit, color = age_fac)) +
    geom_ribbon(
      aes(ymin = lower, ymax = upper, fill = age_fac)
      , alpha = .4
    ) + 
    geom_line() + 
    labs(
      x = "Personality Trait Level"
      , y = "Predicted Value (CI)"
      , color = "Age"
      , fill = "Age"
      , title = sprintf("%s: Personality x Age Interaction", out)
    ) + 
    facet_wrap(~Trait, scales = "free_x") + 
    my_theme() + 
    theme(
      legend.position = c(.8, .25)
      
    )
  # ggsave(filename = sprintf("results/figures/%s.png", outcome), width = 6, height = 6)
  # ggsave(filename = sprintf("results/figures/%s.pdf", outcome), width = 6, height = 6)
}

Now, we can run these and get the results.

Code
gsoep_plot4 <- gsoep_plot4 %>%
  mutate(p = map2(data, Outcome, plot_fun))
gsoep_plot4
Code
gsoep_plot4$p[[1]]

Lavaan Trajectories

Lavaan is trickier. Because of the way residual variance, etc. is calculted in SEM, it’s much less straightforward to estimate standard errors and confidence intervals of predictions (not to mention that lavaan just doesn’t make predictions easy). So instead, I just use some basic matrix algebra and bootstrapping to get these. If you aren’t comfortable with basic matrix algebra, you can just write a little function that uses terms individually.

Code
lavaan_pred_fun <- function(m){
  coef <- coef(gsoep_nested3$m[[1]])[c("i~1", "s~1")]
  b <- bootstrapLavaan(m, parallel = "multicore", R = 100)
  frame <- tibble(intercept = 1, wave = seq(-1,1, length.out = 50))
  pred_boot <- function(x,y) bind_cols(frame, pred = as.vector(as.matrix(frame) %*% c(x,y)))
  frame <- frame %>%
    mutate(pred = as.vector(as.matrix(frame) %*% coef)) %>%
    left_join(
      tibble(sample = 1:100, pred = map2(b[,"i~1"], b[,"s~1"], pred_boot)) %>%
        unnest(pred) %>%
        group_by(wave) %>%
        summarize(lower = quantile(pred, probs = .025)
                  , upper = quantile(pred, probs = .975))
    )
}

gsoep_nested2 <- gsoep_nested2 %>%
  mutate(pred = map(m, lavaan_pred_fun))
# saveRDS(gsoep_nested2, file = "gsoep_nested2.RDS")

Once we’ve estimated the predictions, we can graph them just like we did for the previous example. (Again, a full discussion of plotting is beyond the scope of this course and is covered for a whole quarter in my data visualization class instead.)

Code
gsoep_nested2 %>%
  select(-data, -m, -summary) %>%
  unnest(pred) %>%
  ggplot(aes(x = wave + 1, y = pred)) + 
    geom_ribbon(
      aes(ymin = lower, ymax = upper, fill = trait)
      , alpha = .4
    ) + 
    geom_line() + 
    scale_y_continuous(limits = c(1,7), breaks = seq(1,7,1)) + 
    scale_x_continuous(limits = c(-.25,2.25), breaks = 0:2, labels = c(2005, 2009, 2013)) + 
    labs(
        x = "Personality Trait Level"
        , y = "Predicted Value (CI)"
        , title = sprintf("Personality Trajectories")
      ) + 
    facet_wrap(~trait, scales = "free_x") + 
    my_theme() + 
    theme(legend.position = "none")