In psychology, we must work within the confines of APA style. Although these guidelines have been updated, the style guide remains quite similar to earlier guidelines with respect to tables.
But psychology research is heterogeneous and expectations for modern tables require combining multiple models in creative ways.
Small tweaks to data or model arguments can spell disaster for creating a table. It’s easy to make mistakes in copying values or matching different models to their respective rows and columns.
Thankfully, the increasing popularity of R has been coupled with more methods for creating a reproducible workflow that includes tables.
Tables in R
In this workshop, we will directly cover 3 different use cases, while a few others will be included in supplementary materials.
Personally, I favor non-automated tools, so we will cover the following packages:
- kable + kableExtra (html and LaTeX)
- papaja
Using these packages will build on earlier tutorials using tidyr, dplyr, workflow, and purrr and round out our discuss on data presentation using ggplot2.
Although it doesn’t cover all models, the broom and broom.mixed family of packages will provide easy to work with estimates of nearly all types of models and will also provide the model terms that are ideal for most APA tables, including estimates, standard errors, and confidence intervals.
lavaan models are slightly more complicated, but it’s actually relatively easy to deal with them (and how to extract their terms), assuming that you understand the models you are running.
Data
The data we’re going to use are from the teaching sample from the German Socioeconomic Panel Study. These data have been pre-cleaned (see earlier workshop on workflow and creating guidelines for tips).
The data we’ll use fall into three categories:
1. Personality trait composites: Negative Affect, Positive Affect, Self-Esteem, CESD Depression, and Optimism. These were cleaned, reversed coded, and composited prior to being included in this final data set.
2. Outcomes: Moving in with a partner, marriage, divorce, and child birth. These were cleaned, coded as 1 (occurred) or 0 (did not occur) according to whether an outcome occurred for each individual or not after each possible measured personality year. Moreover, people who experienced these outcomes prior to the target personality year are excluded.
3. Covariates: Age, gender (0 = male, 1 = female, education (0 = high school or below, 1 = college, 2 = higher than college), gross wages, self-rated health, smoking (0 = never smoked 1 = ever smoked), exercise, BMI, religion, parental education, and parental occupational prestige (ISEI). Each of these were composited for all available data up to the measured personality years.
Code
(gsoep <-read_csv("week8-data.csv"))
Basic: One DV/Outcome, Multiple Model Terms
We’ll start with a basic case, predicting who has a child from personality, both with and without control variables.
Becauce outcome variables are binary, we’ll use logistic regression.
The basic form of the model is: \(\log\Big(\frac{p_i}{1-p_i}\Big) = b_0 + b_1X_1 + b_2X_2 ... b_pXp\)
In other words, we’re predicting the log odds of having a child from a linear combination of predictor variables.
Set up the data frame
First, we’ll use some of what we learned in the purrr workshop to set ourselves up to be able to create these tables easily, using group_by() and nest() to create nested data frames for our target personality + outcome combinations. To do this, we’ll also use what you learned about filter() and mutate().
Next, let’s pause and see what we have. We now have a data frame with 3 columns (Outcome, Trait, and data) and 4 rows. The data column is of class list, meaning it’s a “list column” that contains a tibble in each cell. This means that we can use purrr functions to run operations on each of these data frames individually but without having to copy and paste the same operation multiple times for each model we want to run.
Code
gsoep_nested1
Run Models
To run the models, I like to write short functions that are easier to read than including a local function within a call to purrr::map(). Here, we’re just going to write a simple function to predict child birth from personality.
Code
mod1_fun <-function(d){ d$o_value <-factor(d$o_value)glm(o_value ~ p_value, data = d, family =binomial(link ="logit"))}gsoep_nested1 <- gsoep_nested1 %>%mutate(m =map(data, mod1_fun))
Now, when we look at the nested frame, we see an additional column, which is also a list, but this column contains <glm> objects rather than tibbles.
Code
gsoep_nested1
Get Key Terms
Now that we have the models, we want to get our key terms. I’m a big fan of using the function tidy from the broom package to do this. Bonus because it plays nicely with purrr. Double bonus because it will give us confidence intervals, which I generally prefer over p-values and standard erorrs because I find them more informative.
Note that what I’ve used here is a local function, meaning that I’ve used the notation ~function(., arguments). The tilda tells R we want a local function, and the . tells R to use the mapped m column as the function input.
Now we have a fifth column, which is a list column called tidy that contains a tibble, just like the data column.
Creating a Table
Now we are finally ready to create a table! I’m going to use kable + kableExtra to do this in steps.
First, we’ll unnest the tidy column from our data frame. Before doing so, we will drop the data and m columns because they’ve done their work for now.
As you can see, we now have lots of information about our model terms, which are already nicely indexed by Outcome and Trait combinations.
But before we’re ready to create a table, we have to make a few considerations:
What is our target term? In this case “p_value” which is the change in log odds associated with a 1 unit increase/decrease in p_value.
How will we denote significance? In this case, we’ll use confidence intervals whose signs match. We’ll then bold these terms for our table.
What is the desired final structure for the table? I’d like columns for Trait, estimate (b), and confidence intervals (CI) formatted to two decimal places and bolded if significant. I’d also like a span header denoting that the outcome measure is child birth.
What is our target term?
In this case “p_value” which is the change in log odds associated with a 1 unit increase/decrease in p_value.
Code
tidy1 <- tidy1 %>%filter(term =="p_value")tidy1
How will we denote significance?
In this case, we’ll use confidence intervals whose signs match. We’ll then bold these terms for our table.
What is the desired final structure for the table?
I’d like columns for Trait, estimate (b), and confidence intervals (CI) formatted to two decimal places and bolded if significant. I’d also like a span header denoting that the outcome measure is child birth.
Before we do this, though, we need to convert our log odds to odds ratios, using the exp() function.
sprintf() is my favorite base R formatting function. “%.2f” means I’m asking it to take a floating point number and include 2 digits after the “.” and 0 before. We can now see that the estimate, conf.low, and conf.high columns are of class <chr> instead of <dbl>.
But now we need to create our confidence intervals.
This reads as “for both the estimate and the CI columns, if the sig column is equal to”sig”, then let’s format it as bold using html. Otherwise, let’s leave it alone.” And indeed, we can see that the final result formats 3/4 rows.
Thankfully, these can be achieved without considerable reshaping of the data, which is why we’ve started here, so we’re almost done. We just need to get rid of some unnecessary columnns.
Code
tidy1 <- tidy1 %>%select(Trait, OR = estimate, CI)
Because we just have one target term and one outcome, we don’t need those columns, so we’re just keeping Trait, OR, which I renamed as such within in the select call, and CI.
Kabling a Table
Now let’s kable. You’ve likely used the kable() function from the knitr before. It’s a very useful and simple function in most occasions.
Code
kable(tidy1)
Trait
OR
CI
OP
<strong>1.68</strong>
<strong>[1.50, 1.88]</strong>
DEP
<strong>1.15</strong>
<strong>[1.07, 1.24]</strong>
NegAff
<strong>1.11</strong>
<strong>[1.01, 1.22]</strong>
PA
<strong>1.97</strong>
<strong>[1.77, 2.20]</strong>
SE
1.09
[0.97, 1.22]
It will automatically generate the html code needed to create a table. But if we look closely at the code, it gives us some gobbledigook where we inputted html, so we need a way around that. I’m also going to throw in kable_styling(full_width = F) from the kableExtra package to help out here. It’s not doing much, but it will make the formatted table print in your Viewer.
Much better. But this still doesn’t look like an APA table, so let’s keep going.
APA tables usually write out long names for our predictors, so let’s change those first. I’m going to create a reference tibble and use mapvalues() from the plyr function for this.
The alignment of the columns isn’t quite right. Let’s fix that. We’ll change the trait to right justified and b and CI to centered.
Code
tidy1 %>%mutate(Trait =mapvalues(Trait, from = p_names$old, to = p_names$new),Trait =factor(Trait, levels = p_names$new)) %>%arrange(Trait) %>%kable(. , escape = F , align =c("r", "c", "c") ) %>%kable_classic(full_width = F, html_font ="Times")
Trait
OR
CI
Negative Affect
1.11
[1.01, 1.22]
Positive Affect
1.97
[1.77, 2.20]
Self-Esteem
1.09
[0.97, 1.22]
Optimism
1.68
[1.50, 1.88]
Depression
1.15
[1.07, 1.24]
But we’re still missing our span header. There’s a great function in the kableExtra package for this add_header_above. This function takes a named vector as argument, where the elements of the vector refer to the number of columns the named element should span.
Code
tidy1 %>%mutate(Trait =mapvalues(Trait, from = p_names$old, to = p_names$new),Trait =factor(Trait, levels = p_names$new)) %>%arrange(Trait) %>%kable(. , escape = F , align =c("r", "c", "c") ) %>%kable_classic(full_width = F, html_font ="Times") %>%add_header_above(c(" "=1, "Birth of a Child"=2))
Birth of a Child
Trait
OR
CI
Negative Affect
1.11
[1.01, 1.22]
Positive Affect
1.97
[1.77, 2.20]
Self-Esteem
1.09
[0.97, 1.22]
Optimism
1.68
[1.50, 1.88]
Depression
1.15
[1.07, 1.24]
Note that what the " " = 1 does is skip the Trait column. This is very useful because it let’s us not have a span header over every column.
APA style requires we note how we denote significance and have a title, so let’s add a title and a note.
Code
tidy1 %>%mutate(Trait =mapvalues(Trait, from = p_names$old, to = p_names$new),Trait =factor(Trait, levels = p_names$new)) %>%arrange(Trait) %>%kable(. , escape = F , align =c("r", "c", "c") , caption ="<strong>Table 1</strong><br><em>Estimated Personality-Outcome Associations</em>" ) %>%kable_classic(full_width = F, html_font ="Times")%>%add_header_above(c(" "=1, "Birth of a Child"=2)) %>%add_footnote(label ="Bold values indicate terms whose confidence intervals did not overlap with 0", notation ="none")
Bold values indicate terms whose confidence intervals did not overlap with 0
We did it!
A Quick Note: HTML v. LaTeX
When creating tables, I prefer using HTML when I need the resulting tables to be in HTML and LaTeX when I can place the tables in a PDF. The syntax using kable and kableExtra is the same with the following exceptions:
The format argument in kable() would need to be set as format = "latex".
The chunk option for a table to render would need to be set as {r, results = 'asis'}.
Bolding would need to be done as \\textbf{}, rather than the html<strong></strong> tag.
When using collapse_rows(), which we’ll get to later, you’d want to set the latex_hline argument to latex_hline = "none".
Extracting terms from lavaan
Most models in R can easily have terms extracted via broom or broom.mixed, but one major exception to this are SEM models in lavaan. Let’s step through an example using lavaan to show you some of my tricks for working with it.
lavaan Example 1:
Data
For this example, we’ll use a different data set from the SOEP that have the item level personality data. The code below reads it in, recodes, and reverse scores it.
Because of how we set up the data, we only have to write this out once. This is extra helpful for SEM because the model syntax can get really long and it would be time consuming to have to write this out separately for each of the Big Five
Now, we’ll write a little function that will run the model syntax across cross-sections of our data for each Big Five trait
Code
lavaan_fun <-function(d){ m <-growth( mod , data = d , missing ="fiml" )return(m)}
Now, we can create those cross-sections using tidyr::nest() and then run the model using purrr::map(). Having the trait information as rows in our data frame and using list columns let’s us separate out the trait information from the item/year information we need to run the models.
Summaries of SEM models are going to spit out a MUCH longer summary of results than the typical models we run and extract information from using broom/broom.mixed.
When I do SEM, I do make a table that includes all of these parameters for each model and put it in the supplement
But for the paper itself, we only want a subset of key model terms across all traits
So what I do is to write a little function. I try to write it so that it can be used across multiple qualitatively different models if I’m running different (e.g., see second example later). To do that, I have (1) the function and (2) a tibble called terms that contains the paths I want to extract.
The way that we need to format the data is pretty standard, so below are three functions I use when I need to format results from lavaan for putting into a table
Unfortunately, pivot_wider() doesn’t really let us have too much control over the order of the columns, so we need to move them around so the estimates and CI’s for each trait are next to each other. There are many ways to do this, but I’m showing you two below:
Code
ord <-paste(rep(c("E", "A", "C", "N", "O"), each =2), rep(c("est", "CI"), times =5), sep ="_")gsoep_tab <- gsoep_tab %>%select(term, all_of(ord)) %>%# select(starts_with("E"), starts_with("A"), starts_with("C"), starts_with("N"), starts_with("O"))mutate(term =factor(term, terms$new)) %>%arrange(term)gsoep_tab
Kabling the Table
Now, let’s go ahead and create the table using kable. Like before, we’re going to use add_header_above(), but I’m going to show you a trick that I use for it to make specifying it a little easier. - add_header_above() takes in a named vector as input where the values have to sum to the number of columns in the data frame (11)
Table 2 Big Five Personality Trait Trajectories from Latent Growth Models
Extraversion
Agreeableness
Conscientiousness
Neuroticism
Openness
Term
est.
CI
est.
CI
est.
CI
est.
CI
est.
CI
Intercept
5.84
[5.83,5.85]
Intercept
4.83
[4.81,4.84]
Intercept
5.41
[5.39,5.42]
Intercept
4.51
[4.49,4.53]
Intercept
3.84
[3.82,3.86]
Slope
0.40
[0.23,0.58]
Slope
0.10
[0.04,0.16]
Slope
0.49
[0.34,0.65]
Slope
-0.40
[-0.57,-0.24]
Slope
-0.95
[-1.05,-0.86]
Intercept Variance
0.38
[0.36,0.40]
Intercept Variance
0.69
[0.66,0.71]
Intercept Variance
0.35
[0.33,0.37]
Intercept Variance
0.66
[0.63,0.69]
Intercept Variance
0.69
[0.66,0.72]
Slope Variance
0.06
[0.04,0.08]
Slope Variance
0.07
[0.05,0.10]
Slope Variance
0.03
[0.009,0.05]
Slope Variance
0.07
[0.04,0.10]
Slope Variance
0.07
[0.04,0.10]
Intercept-Slope Covariance
-0.004
[-0.02,0.006]
Intercept-Slope Covariance
0.004
[-0.010,0.02]
Intercept-Slope Covariance
0.009
[-0.003,0.02]
Intercept-Slope Covariance
-0.007
[-0.02,0.009]
Intercept-Slope Covariance
0.02
[0.006,0.04]
Bold values indicate terms p < .05
lavaan Example 2
Second-Order Latent Growth Model: Slopes as Predictors
Now, let’s do a second example that is more complex and uses the intercepts and slopes we estimated to predict life outcomes. To do so, we first need to clean and prep the life event data and merge it back with our Big Five item-level data.
Note that in this case, we’re using long data two ways: for the Big Five traits and for the life events. In other words, we’re getting every combination of the two and are using inner_join() to make that happen!
We’re really just building on the second-order latent growth models we already ran. We already ran those and know that we have specified the models correctly / have enough variance to use the slopes and intercept to predict other things. So now we just need to add the new regression paths to that syntax to be able to run the models.
As with the model we specified before, by having the life event data long, we are able to do this just one time and have that work for all the life events. So using this trick, we can run 50 unique models with one set of model syntax!
Note that we have to modify the function that calls growth() because we have a categorical outcome that needs a different estimator.
Code
mod2 <-'le_value ~ i + agele_value ~ s'mod2 <-paste(mod, mod2, collapse ="\n")lavaan_fun <-function(d){ m <-growth( mod2 , data = d , ordered ="le_value" , estimator ='WLSMV' , missing ="pairwise" , parallel ="multicore" )return(m)}gsoep_nested3 <- gsoep_nested3 %>%mutate(m =map(data, lavaan_fun))# saveRDS(gsoep_nested3, "gsoep_nested3.RDS")
Extract Results
Remember how I said that the reason we created the terms tibble was because it would make extract_fun() more flexible / portable? Let’s see that in action. We’ll create a new one that lists only the key additional terms in this model and use that to extract those estimates.
Now we’re ready to format the results using the same steps as before. Note that I’m combining all them here because we’ve already stepped through them slowly: 1. Remove other list columns and unnest() 2. Format the results using format_fun() 3. Remove unnecessary columns and pivot_wider() 4. Change the order of the columns 5. Factor the terms to help us arrange the rows 6. Sort the rows by Life Event and then term
Now we’re ready to kable the table! This looks identical to before, but I’m going to show you one more trick using kableExtra::group_rows. Rather than having to count the rows and adding a bunch of manual calls, I always just use the data frame to count for me to avoid errors. Then, I can create the table and use a for loop to tack on all the grouped rows using that reference data frame. Much more flexible and less error prone!
Table 3 Big Five Personality Trait Trajectory Associations with Life Events from Latent Growth Models
Extraversion
Agreeableness
Conscientiousness
Neuroticism
Openness
Term
est.
CI
est.
CI
est.
CI
est.
CI
est.
CI
ChldBrth
Intercept
0.12
[0.05,0.18]
Intercept
0.005
[-0.04,0.05]
Intercept
0.09
[-0.02,0.20]
Intercept
-0.09
[-0.14,-0.04]
Intercept
-0.04
[-0.09,0.02]
Slope
0.21
[-0.31,0.72]
Slope
-0.05
[-0.44,0.35]
Slope
-1.19
[-3.39,1.00]
Slope
0.006
[-0.41,0.42]
Slope
0.07
[-0.38,0.53]
Age
-0.03
[-0.04,-0.03]
Age
-0.03
[-0.04,-0.03]
Age
-0.03
[-0.04,-0.03]
Age
-0.03
[-0.04,-0.03]
Age
-0.03
[-0.04,-0.03]
ChldMvOut
Intercept
0.15
[0.09,0.21]
Intercept
0.02
[-0.02,0.06]
Intercept
0.05
[-0.06,0.16]
Intercept
0.009
[-0.04,0.05]
Intercept
0.01
[-0.03,0.06]
Slope
-0.87
[-1.37,-0.37]
Slope
-0.51
[-0.87,-0.15]
Slope
-1.62
[-4.11,0.86]
Slope
-0.65
[-1.07,-0.23]
Slope
0.09
[-0.26,0.44]
Age
0.005
[0.003,0.008]
Age
0.005
[0.003,0.008]
Age
0.005
[0.003,0.008]
Age
0.005
[0.003,0.008]
Age
0.005
[0.003,0.008]
DadDied
Intercept
0.005
[-0.06,0.07]
Intercept
-0.02
[-0.07,0.03]
Intercept
-0.07
[-0.16,0.03]
Intercept
-0.04
[-0.09,0.02]
Intercept
0.04
[-0.02,0.10]
Slope
-0.43
[-0.90,0.04]
Slope
-0.57
[-1.00,-0.15]
Slope
-0.92
[-2.73,0.88]
Slope
-0.66
[-1.13,-0.20]
Slope
0.01
[-0.40,0.43]
Age
-0.004
[-0.007,-0.001]
Age
-0.004
[-0.007,-0.001]
Age
-0.004
[-0.007,-0.001]
Age
-0.004
[-0.007,-0.001]
Age
-0.004
[-0.007,-0.001]
Divorce
Intercept
0.12
[0.03,0.21]
Intercept
0.05
[-0.01,0.12]
Intercept
-0.07
[-0.20,0.05]
Intercept
0.03
[-0.03,0.10]
Intercept
0.10
[0.02,0.19]
Slope
0.35
[-0.35,1.04]
Slope
-0.35
[-0.84,0.15]
Slope
1.01
[-1.14,3.17]
Slope
-0.15
[-0.66,0.36]
Slope
-0.67
[-1.38,0.04]
Age
-0.007
[-0.01,-0.003]
Age
-0.007
[-0.01,-0.003]
Age
-0.007
[-0.01,-0.003]
Age
-0.007
[-0.01,-0.003]
Age
-0.007
[-0.01,-0.003]
Married
Intercept
0.10
[0.04,0.17]
Intercept
0.05
[0.000,0.09]
Intercept
0.03
[-0.08,0.14]
Intercept
0.04
[-0.01,0.08]
Intercept
-0.002
[-0.06,0.06]
Slope
0.21
[-0.30,0.72]
Slope
-0.06
[-0.46,0.34]
Slope
-1.26
[-3.47,0.95]
Slope
-0.003
[-0.40,0.40]
Slope
0.05
[-0.39,0.49]
Age
-0.03
[-0.03,-0.03]
Age
-0.03
[-0.03,-0.03]
Age
-0.03
[-0.03,-0.03]
Age
-0.03
[-0.03,-0.03]
Age
-0.03
[-0.03,-0.03]
MomDied
Intercept
0.08
[0.01,0.15]
Intercept
-0.02
[-0.07,0.02]
Intercept
0.05
[-0.05,0.16]
Intercept
0.05
[-0.005,0.09]
Intercept
0.03
[-0.02,0.09]
Slope
-0.82
[-1.37,-0.26]
Slope
-0.35
[-0.73,0.03]
Slope
-1.07
[-3.04,0.89]
Slope
-0.02
[-0.38,0.34]
Slope
-0.06
[-0.47,0.35]
Age
0.007
[0.005,0.010]
Age
0.007
[0.005,0.010]
Age
0.007
[0.005,0.010]
Age
0.007
[0.005,0.010]
Age
0.007
[0.005,0.010]
MoveIn
Intercept
0.06
[-0.01,0.13]
Intercept
0.07
[0.02,0.12]
Intercept
0.02
[-0.10,0.13]
Intercept
0.05
[-0.003,0.09]
Intercept
0.08
[0.02,0.15]
Slope
0.33
[-0.22,0.88]
Slope
-0.61
[-1.12,-0.09]
Slope
-1.29
[-3.65,1.07]
Slope
-0.22
[-0.67,0.22]
Slope
-0.28
[-0.79,0.23]
Age
-0.03
[-0.03,-0.03]
Age
-0.03
[-0.03,-0.03]
Age
-0.03
[-0.03,-0.03]
Age
-0.03
[-0.03,-0.03]
Age
-0.03
[-0.03,-0.03]
NewPart
Intercept
-0.13
[-0.22,-0.04]
Intercept
0.17
[0.10,0.24]
Intercept
-0.09
[-0.22,0.03]
Intercept
0.13
[0.06,0.20]
Intercept
-0.03
[-0.11,0.05]
Slope
0.71
[0.24,1.18]
Slope
0.29
[-0.22,0.79]
Slope
1.21
[-0.23,2.65]
Slope
0.10
[-0.47,0.67]
Slope
-0.14
[-0.71,0.44]
Age
-0.04
[-0.04,-0.03]
Age
-0.04
[-0.04,-0.03]
Age
-0.04
[-0.04,-0.03]
Age
-0.04
[-0.04,-0.03]
Age
-0.04
[-0.04,-0.03]
PartDied
Intercept
0.02
[-0.07,0.10]
Intercept
0.04
[-0.02,0.10]
Intercept
0.10
[-0.11,0.30]
Intercept
-0.09
[-0.15,-0.02]
Intercept
0.10
[0.02,0.18]
Slope
-0.29
[-0.98,0.40]
Slope
0.48
[-0.04,0.99]
Slope
3.32
[-1.64,8.27]
Slope
-0.11
[-0.61,0.38]
Slope
-0.23
[-0.79,0.33]
Age
0.03
[0.03,0.03]
Age
0.03
[0.03,0.03]
Age
0.03
[0.03,0.03]
Age
0.03
[0.03,0.03]
Age
0.03
[0.03,0.03]
SepPart
Intercept
0.04
[-0.02,0.11]
Intercept
0.09
[0.04,0.14]
Intercept
-0.11
[-0.24,0.03]
Intercept
0.07
[0.02,0.12]
Intercept
0.09
[0.02,0.16]
Slope
0.64
[0.07,1.21]
Slope
-0.14
[-0.52,0.24]
Slope
1.92
[-1.05,4.90]
Slope
0.010
[-0.38,0.40]
Slope
-0.77
[-1.38,-0.15]
Age
-0.02
[-0.02,-0.02]
Age
-0.02
[-0.02,-0.02]
Age
-0.02
[-0.02,-0.02]
Age
-0.02
[-0.02,-0.02]
Age
-0.02
[-0.02,-0.02]
Bold values indicate terms p < .05.
Kabling lots of tables
The code below is a pretty chunk of code I use to generate tables for each model I run with all the model parameter estimates.
Code
all_term_tab_fun <-function(m, trait){ long_trait <-mapvalues(trait, p_names$old, p_names$new, warn_missing = F) cap <-sprintf("<strong>Table SX</strong><br><em>Second Order Latent Growth Models of %s</em>", long_trait) note <-"Bold values indicate estimates were significant at p < .05. est. = unstandardized estimate. " tab <-parameterEstimates(m, standardized = T) %>%data.frame() %>%format_fun() %>%select(lhs:est, CI, pvalue) %>%kable(. , format ="html" , escape = F , align =c("r", "c", "l", "l", rep("c", 3)) , col.names =c("LHS", "op", "RHS", "label", "est.", "CI", "<em>p</em>") , caption = cap ) %>%kable_classic(full_width = F, html_font ="Times") %>%footnote(note)# save_kable(tab, file = sprintf("results/tables/all-terms/%s.html", trait))return(tab)}gsoep_nested2 <- gsoep_nested2 %>%mutate(all_term_tab =map2(m, trait, all_term_tab_fun))
Plots
The same functional, iterative approach also applies to creating plots! I teach a whole course on data visualization, so I’m not going to spend a ton of time on this. But I’ll show you an example for basic R models and for lavaan.
Plots Example 1: Base R GLM
Set up the data frame
First, we’ll use some of what we learned in the purrr workshop to set ourselves up to be able to create these tables easily, using group_by() and nest() to create nested data frames for our target personality + outcome combinations. To do this, we’ll also use what you learned about filter() and mutate().
Code
outcomes <-tribble(~old, ~new,"chldbrth", "Child Birth" , "divorced", "Divorced" , "married", "Married" , "mvInPrtner", "Move in with Partner")gsoep_nested4 <- gsoep %>%mutate(age = age -45) %>%group_by(Trait, Outcome) %>%nest() %>%ungroup()gsoep_nested4
Run Models
Now, we’ll run the models predicting outcomes from personality x age interactions (and their lower order terms)
Code
mod2_fun <-function(d){ d$o_value <-factor(d$o_value)glm(o_value ~ p_value*age, data = d, family =binomial(link ="logit"))}gsoep_nested4 <- gsoep_nested4 %>%mutate(m =map(data, mod2_fun))
Generating Model Predictions
The next step is to get predictions from the model. It is also good practice to always get standard errors and/or confidence intervals of the estimates. Thankfully, with lm() and glm(), this is easy. Lavaan will be covered briefly and more complex model forms are covered in my data visualization class.
One of the greatest strengths of list-columns and using purrr is what I think of as the data accordion. With list-columns, I can unnest() whatever level the models were estimated at and then reaggregate using whatever combinations / groups make the most sense.
So here, for example, we ran models for all Big Five trait (5) x life outcome (10) combinations. But having 50 separate plots would be a little silly, so we’d much rather create one for each outcome. But I’m not going to write that code 10 times because that’d be a waste. So instead if I reaggregate the data, I can then use a function + purrr to generate each of the plots.
When we write code for plots in ggplot, we have a lot of things that we end up writing over and over, especially for theme elements. To get around this, I use this little function to modify all of my theme elements in a single line.
Like I mentioned, once we reaggregate the data, we can then write a function to generate the plot for each outcome separately. Let’s start by building the basic code for the plot. We need it to: 1. Have x = personality, y = predicted values, separate lines for moderator levels, and separate panels for different traits 2. A line indicating personality-outcome associations (again separately across age groups) and a ribbon with the confidence interval around the prediction
Code
plot_fun <-function(d){ d %>%mutate(age_fac =factor(age, c(-15, 0, 15), c("30", "45", "60"))) %>%ggplot(aes(x = p_value, y = fit, color = age_fac)) +geom_ribbon(aes(ymin = lower, ymax = upper, fill = age_fac) , alpha = .4 ) +geom_line() +facet_wrap(~Trait, scales ="free_x") +my_theme()}
With our reaggregated data, we can easily just run the plots and view the results. This is getting there, but let’s make some additional modifications to get them publication ready.
Lavaan is trickier. Because of the way residual variance, etc. is calculted in SEM, it’s much less straightforward to estimate standard errors and confidence intervals of predictions (not to mention that lavaan just doesn’t make predictions easy). So instead, I just use some basic matrix algebra and bootstrapping to get these. If you aren’t comfortable with basic matrix algebra, you can just write a little function that uses terms individually.
Once we’ve estimated the predictions, we can graph them just like we did for the previous example. (Again, a full discussion of plotting is beyond the scope of this course and is covered for a whole quarter in my data visualization class instead.)
---title: "Week 8 Workbook"author: "Emorie D Beck"format: html: code-tools: true code-copy: true code-line-numbers: true code-link: true theme: united highlight-style: tango df-print: paged code-fold: show toc: true toc-float: true self-contained: trueeditor: visualeditor_options: chunk_output_type: console---# Week 8 - Functional Tables and Figures```{r, echo = F}pkg <-c("knitr", "psych", "lme4", "broom", "broom.mixed", "kableExtra", "lubridate", "plyr", "tidyverse")pkg <- pkg[!pkg %in%rownames(installed.packages())]if(length(pkg) >0) map(pkg, install.packages)library(knitr)library(psych)library(lme4)library(lavaan)library(broom)library(broom.mixed)library(kableExtra)library(lubridate)library(plyr)library(tidyverse)``````{r setup, include=FALSE}knitr::opts_chunk$set(echo =TRUE, message =FALSE, warning =FALSE,results ='show',fig.width =4, fig.height =4, fig.retina =3)options(htmltools.dir.version =FALSE , knitr.kable.NA ="")```# Outline1. Questions on Homework2. Tables3. Figures4. (GitHub)# APA TablesIn psychology, we must work within the confines of APA style. Although these guidelines have been updated, the style guide remains quite similar to earlier guidelines with respect to tables.But psychology research is heterogeneous and expectations for modern tables require combining multiple models in creative ways.Small tweaks to data or model arguments can spell disaster for creating a table. It's easy to make mistakes in copying values or matching different models to their respective rows and columns.Thankfully, the increasing popularity of `R` has been coupled with more methods for creating a reproducible workflow that includes tables.## Tables in R {.smaller}In this workshop, we will directly cover 3 different use cases, while a few others will be included in supplementary materials.Personally, I favor non-automated tools, so we will cover the following packages:\- `kable` + `kableExtra` (<ahref="http://haozhu233.github.io/kableExtra/awesome_table_in_html.html">html</a> and <ahref="https://haozhu233.github.io/kableExtra/awesome_table_in_pdf.pdf">LaTeX</a>)\- <ahref="https://github.com/crsh/papaja">`papaja`</a>Using these packages will build on earlier tutorials using `tidyr`, `dplyr`, workflow, and `purrr` and round out our discuss on data presentation using `ggplot2`.For less flexible but more accessible tables see:\- <ahref="https://cran.r-project.org/web/packages/apaTables/vignettes/apaTables.html">`apaTable`</a>\- <ahref="http://www.strengejacke.de/sjPlot/">`sjPlot`</a>\- <ahref="https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html">`corrplot`</a>## Important ToolsAlthough it doesn't cover all models, the `broom` and `broom.mixed` family of packages will provide easy to work with estimates of nearly all types of models and will also provide the model terms that are ideal for most APA tables, including estimates, standard errors, and confidence intervals.`lavaan` models are slightly more complicated, but it's actually relatively easy to deal with them (and how to extract their terms), assuming that you understand the models you are running.## Data {.smaller}The data we're going to use are from the teaching sample from the German Socioeconomic Panel Study. These data have been pre-cleaned (see earlier workshop on workflow and creating guidelines for tips).The data we'll use fall into three categories:\1. **Personality trait composites:** Negative Affect, Positive Affect, Self-Esteem, CESD Depression, and Optimism. These were cleaned, reversed coded, and composited prior to being included in this final data set.\2. **Outcomes:** Moving in with a partner, marriage, divorce, and child birth. These were cleaned, coded as 1 (occurred) or 0 (did not occur) according to whether an outcome occurred for each individual or not *after* each possible measured personality year. Moreover, people who experienced these outcomes prior to the target personality year are excluded.\3. **Covariates:** Age, gender (0 = male, 1 = female, education (0 = high school or below, 1 = college, 2 = higher than college), gross wages, self-rated health, smoking (0 = never smoked 1 = ever smoked), exercise, BMI, religion, parental education, and parental occupational prestige (ISEI). Each of these were composited for all available data up to the measured personality years.```{r}(gsoep <-read_csv("week8-data.csv"))```# Basic: One DV/Outcome, Multiple Model TermsWe'll start with a basic case, predicting who has a child from personality, both with and without control variables.Becauce outcome variables are binary, we'll use logistic regression.The basic form of the model is: $\log\Big(\frac{p_i}{1-p_i}\Big) = b_0 + b_1X_1 + b_2X_2 ... b_pXp$In other words, we're predicting the log odds of having a child from a linear combination of predictor variables.## Set up the data frameFirst, we'll use some of what we learned in the `purrr` workshop to set ourselves up to be able to create these tables easily, using `group_by()` and `nest()` to create nested data frames for our target personality + outcome combinations. To do this, we'll also use what you learned about `filter()` and `mutate()`.```{r}gsoep_nested1 <- gsoep %>%filter(Outcome =="chldbrth") %>%group_by(Trait, Outcome) %>%nest() %>%ungroup()```Next, let's pause and see what we have. We now have a data frame with 3 columns (Outcome, Trait, and data) and 4 rows. The data column is of class list, meaning it's a "list column" that contains a `tibble` in each cell. This means that we can use `purrr` functions to run operations on each of these data frames individually but without having to copy and paste the same operation multiple times for each model we want to run.```{r}gsoep_nested1```## Run ModelsTo run the models, I like to write short functions that are easier to read than including a local function within a call to `purrr::map()`. Here, we're just going to write a simple function to predict child birth from personality.```{r}mod1_fun <-function(d){ d$o_value <-factor(d$o_value)glm(o_value ~ p_value, data = d, family =binomial(link ="logit"))}gsoep_nested1 <- gsoep_nested1 %>%mutate(m =map(data, mod1_fun))```Now, when we look at the nested frame, we see an additional column, which is also a list, but this column contains `<glm>` objects rather than `tibbles`.```{r}gsoep_nested1```## Get Key TermsNow that we have the models, we want to get our key terms. I'm a big fan of using the function `tidy` from the `broom` package to do this. Bonus because it plays nicely with `purrr`. Double bonus because it will give us confidence intervals, which I generally prefer over p-values and standard erorrs because I find them more informative.```{r}gsoep_nested1 <- gsoep_nested1 %>%mutate(tidy =map(m, ~tidy(., conf.int = T)))gsoep_nested1```Note that what I've used here is a local function, meaning that I've used the notation `~`function(., arguments). The tilda tells `R` we want a local function, and the `.` tells R to use the mapped `m` column as the function input.Now we have a fifth column, which is a list column called `tidy` that contains a `tibble`, just like the `data` column.## Creating a TableNow we are finally ready to create a table! I'm going to use `kable` + `kableExtra` to do this in steps.First, we'll unnest the `tidy` column from our data frame. Before doing so, we will drop the `data` and `m` columns because they've done their work for now.```{r}tidy1 <- gsoep_nested1 %>%select(-data, -m) %>%unnest(tidy)tidy1```As you can see, we now have lots of information about our model terms, which are already nicely indexed by Outcome and Trait combinations.But before we're ready to create a table, we have to make a few considerations:- What is our target term? In this case "p_value" which is the change in log odds associated with a 1 unit increase/decrease in p_value.\- How will we denote significance? In this case, we'll use confidence intervals whose signs match. We'll then bold these terms for our table.- What is the desired final structure for the table? I'd like columns for Trait, estimate (b), and confidence intervals (CI) formatted to two decimal places and bolded if significant. I'd also like a span header denoting that the outcome measure is child birth.### What is our target term?In this case "p_value" which is the change in log odds associated with a 1 unit increase/decrease in p_value.```{r}tidy1 <- tidy1 %>%filter(term =="p_value")tidy1```### How will we denote significance?In this case, we'll use confidence intervals whose signs match. We'll then bold these terms for our table.```{r}tidy1 <- tidy1 %>%mutate(sig =ifelse(sign(conf.low) ==sign(conf.high), "sig", "ns"))tidy1```### What is the desired final structure for the table?I'd like columns for Trait, estimate (b), and confidence intervals (CI) formatted to two decimal places and bolded if significant. I'd also like a span header denoting that the outcome measure is child birth.Before we do this, though, we need to convert our log odds to odds ratios, using the `exp()` function.```{r}tidy1 <- tidy1 %>%mutate_at(vars(estimate, conf.low, conf.high), exp) tidy1```Now, we can format them.```{r}tidy1 <- tidy1 %>%mutate_at(vars(estimate, conf.low, conf.high), ~sprintf("%.2f", .)) tidy1````sprintf()` is my favorite base `R` formatting function. "%.2f" means I'm asking it to take a floating point number and include 2 digits after the "." and 0 before. We can now see that the `estimate`, `conf.low`, and `conf.high` columns are of class `<chr>` instead of `<dbl>`.But now we need to create our confidence intervals.```{r}tidy1 <- tidy1 %>%mutate(CI =sprintf("[%s, %s]", conf.low, conf.high))tidy1```And bold the significant confidence intervals and estimates.```{r}tidy1 <- tidy1 %>%mutate_at(vars(estimate, CI), ~ifelse(sig =="sig", sprintf("<strong>%s</strong>", .), .))tidy1```This reads as "for both the estimate and the CI columns, if the sig column is equal to"sig", then let's format it as bold using html. Otherwise, let's leave it alone." And indeed, we can see that the final result formats 3/4 rows.Thankfully, these can be achieved without considerable reshaping of the data, which is why we've started here, so we're almost done. We just need to get rid of some unnecessary columnns.```{r}tidy1 <- tidy1 %>%select(Trait, OR = estimate, CI)```Because we just have one target term and one outcome, we don't need those columns, so we're just keeping Trait, OR, which I renamed as such within in the select call, and CI.## Kabling a TableNow let's `kable`. You've likely used the `kable()` function from the `knitr` before. It's a very useful and simple function in most occasions.```{r}kable(tidy1)```It will automatically generate the html code needed to create a table. But if we look closely at the code, it gives us some gobbledigook where we inputted html, so we need a way around that. I'm also going to throw in `kable_styling(full_width = F)` from the `kableExtra` package to help out here. It's not doing much, but it will make the formatted table print in your Viewer.```{r}kable(tidy1, escape = F) %>%kable_classic(full_width = F, html_font ="Times")```Much better. But this still doesn't look like an APA table, so let's keep going.1. APA tables usually write out long names for our predictors, so let's change those first. I'm going to create a reference tibble and use `mapvalues()` from the `plyr` function for this.\```{r}p_names <-tibble(old =c("NegAff", "PA", "SE", "OP", "DEP"),new =c("Negative Affect", "Positive Affect", "Self-Esteem", "Optimism", "Depression"))tidy1 %>%mutate(Trait =mapvalues(Trait, from = p_names$old, to = p_names$new),Trait =factor(Trait, levels = p_names$new)) %>%arrange(Trait) %>%kable(., escape = F) %>%kable_classic(full_width = F, html_font ="Times")```2. The alignment of the columns isn't quite right. Let's fix that. We'll change the trait to right justified and b and CI to centered.\```{r}tidy1 %>%mutate(Trait =mapvalues(Trait, from = p_names$old, to = p_names$new),Trait =factor(Trait, levels = p_names$new)) %>%arrange(Trait) %>%kable(. , escape = F , align =c("r", "c", "c") ) %>%kable_classic(full_width = F, html_font ="Times")```3. But we're still missing our span header. There's a great function in the `kableExtra` package for this `add_header_above`. This function takes a named vector as argument, where the elements of the vector refer to the number of columns the named element should span.```{r}tidy1 %>%mutate(Trait =mapvalues(Trait, from = p_names$old, to = p_names$new),Trait =factor(Trait, levels = p_names$new)) %>%arrange(Trait) %>%kable(. , escape = F , align =c("r", "c", "c") ) %>%kable_classic(full_width = F, html_font ="Times") %>%add_header_above(c(" "=1, "Birth of a Child"=2))```Note that what the `" " = 1` does is skip the Trait column. This is very useful because it let's us not have a span header over every column.4. APA style requires we note how we denote significance and have a title, so let's add a title and a note.\```{r}tidy1 %>%mutate(Trait =mapvalues(Trait, from = p_names$old, to = p_names$new),Trait =factor(Trait, levels = p_names$new)) %>%arrange(Trait) %>%kable(. , escape = F , align =c("r", "c", "c") , caption ="<strong>Table 1</strong><br><em>Estimated Personality-Outcome Associations</em>" ) %>%kable_classic(full_width = F, html_font ="Times")%>%add_header_above(c(" "=1, "Birth of a Child"=2)) %>%add_footnote(label ="Bold values indicate terms whose confidence intervals did not overlap with 0", notation ="none")```We did it!## A Quick Note: HTML v. LaTeXWhen creating tables, I prefer using HTML when I need the resulting tables to be in HTML and LaTeX when I can place the tables in a PDF. The syntax using `kable` and `kableExtra` is the same with the following exceptions:1. The `format` argument in `kable()` would need to be set as `format = "latex"`.\2. The chunk option for a table to render would need to be set as `{r, results = 'asis'}`.\3. Bolding would need to be done as `\\textbf{}`, rather than the `html``<strong></strong>` tag.\4. When using `collapse_rows()`, which we'll get to later, you'd want to set the `latex_hline` argument to `latex_hline = "none"`.# Extracting terms from `lavaan`Most models in `R` can easily have terms extracted via `broom` or `broom.mixed`, but one major exception to this are SEM models in `lavaan`. Let's step through an example using lavaan to show you some of my tricks for working with it.# `lavaan` Example 1:## DataFor this example, we'll use a different data set from the SOEP that have the item level personality data. The code below reads it in, recodes, and reverse scores it.```{r}rev_code <-c("Big5__A_coarse", "Big5__C_lazy", "Big5__E_reserved", "Big5__N_dealStress")(gsoep2 <-read_csv("week8-data-2.csv") %>%mutate_at(vars(matches("Big5")), function(x) {x[x <0] <-NA; x}) %>%mutate_at(vars(matches("LifeEvent")), ~mapvalues(., seq(-7,1), c(rep(NA, 5), 0, NA, NA, 1), warn_missing = F)) %>%mutate_at(vars(all_of(rev_code)) , ~as.numeric(reverse.code(., keys =-1, mini =1, maxi =7)) ))```Now we'll change it to long format for easier cleaning and reshaping```{r}gsoep2_long <- gsoep2 %>%pivot_longer(cols =c(starts_with("Big5"), starts_with("Life")) , names_to =c("category", "item") , names_sep ="__" , values_to ="value" , values_drop_na = T ) ```## Data Cleaning- Now, we need to reshape the Big Five data for lavaan- If you learn nothing else from this class, I want you to learn this trick I use to run the same model multiple times on different variables- We are going to run a second-order latent growth model of the BFI-S, which has three items per Big Five trait- The item numbers are arbitrary but consistent across traits (3) as are the years (2005, 2009, 2013)- So if we separate the trait information from the item and year, then we can run identical models across traits```{r}gsoep2_lavaan <- gsoep2_long %>%filter(category =="Big5") %>%separate(item, c("trait", "item"), sep ="_") %>%group_by(Procedural__SID, year, trait) %>%mutate(item =mapvalues(item, unique(item), 1:n())) %>%ungroup() %>%pivot_wider(names_from =c("item", "year") , names_prefix ="I" , values_from ="value" )gsoep2_lavaan```## Second-Order Latent Growth Model- Because of how we set up the data, we only have to write this out once. This is extra helpful for SEM because the model syntax can get really long and it would be time consuming to have to write this out separately for each of the Big Five```{r}mod <-' W1 =~ NA*I1_2005 + lambda1*I1_2005 + lambda2*I2_2005 + lambda3*I3_2005 W2 =~ NA*I1_2009 + lambda1*I1_2009 + lambda2*I2_2009 + lambda3*I3_2009 W3 =~ NA*I1_2013 + lambda1*I1_2013 + lambda2*I2_2013 + lambda3*I3_2013 i =~ 1*W1 + 1*W2 + 1*W3 s =~ -1*W1 + 0*W2 + 1*W3 ## intercepts I1_2005 ~ t1*1 I1_2009 ~ t2*1 I1_2013 ~ t3*1 I2_2005 ~ t1*1 I2_2009 ~ t2*1 I2_2013 ~ t3*1 I3_2005 ~ t1*1 I3_2009 ~ t2*1 I3_2013 ~ t3*1 ## correlated residuals across time I1_2005 ~~ I1_2009 + I1_2013 I1_2009 ~~ I1_2013 I2_2005 ~~ I2_2009 + I2_2013 I2_2009 ~~ I2_2013 I3_2005 ~~ I3_2009 + I3_2013 I3_2009 ~~ I3_2013 ## latent variable intercepts W1 ~ 0*1 W2 ~ 0*1 W3 ~ 0*1 #model constraints for effect coding ## loadings must average to 1 lambda1 == 3 - lambda2 - lambda3 ## means must average to 0 t1 == 0 - t2 - t3 '```Now, we'll write a little function that will run the model syntax across cross-sections of our data for each Big Five trait```{r}lavaan_fun <-function(d){ m <-growth( mod , data = d , missing ="fiml" )return(m)}```Now, we can create those cross-sections using `tidyr::nest()` and then run the model using `purrr::map()`. Having the trait information as rows in our data frame and using list columns let's us separate out the trait information from the item/year information we need to run the models.```{r}gsoep_nested2 <- gsoep2_lavaan %>%group_by(category, trait) %>%nest() %>%ungroup() %>%mutate(m =map(data, lavaan_fun))gsoep_nested2```## Extracting Results- Summaries of SEM models are going to spit out a MUCH longer summary of results than the typical models we run and extract information from using `broom`/`broom.mixed`.- When I do SEM, I do make a table that includes all of these parameters for each model and put it in the supplement- But for the paper itself, we only want a subset of key model terms across all traits```{r}summary(gsoep_nested2$m[[1]])```So what I do is to write a little function. I try to write it so that it can be used across multiple qualitatively different models if I'm running different (e.g., see second example later). To do that, I have (1) the function and (2) a `tibble` called `terms` that contains the paths I want to extract.```{r}terms <-tribble(~path, ~new,"i~1", "Intercept","s~1", "Slope","i~~i", "Intercept Variance","s~~s", "Slope Variance","i~~s", "Intercept-Slope Covariance")extract_fun <-function(m){parameterEstimates(m) %>%data.frame() %>%unite(path, lhs, op, rhs, sep ="") %>%filter(path %in% terms$path) %>%left_join(terms) %>%select(term = new, est, se, ci.lower, ci.upper, pvalue)}```Now we can run that function to extract the results we care about```{r}gsoep_nested2 <- gsoep_nested2 %>%mutate(summary =map(m, extract_fun))gsoep_nested2```## Formatting ResultsThe way that we need to format the data is pretty standard, so below are three functions I use when I need to format results from lavaan for putting into a table```{r}round_fun <-function(x) if(!is.na(x)) if(abs(x) > .01) sprintf("%.2f", x) elsesprintf("%.3f", x) else""pround_fun <-function(x) if(!is.na(x)) if(abs(x) > .01) sprintf("%.2f", x) elseif(x >= .001) sprintf("%.3f", x) else"< .001"else""format_fun <-function(d){ d %>%mutate(sig =ifelse(pvalue < .05, "sig", "ns")) %>%rowwise() %>%mutate_at(vars(est, ci.lower, ci.upper), round_fun) %>%mutate_at(vars(pvalue), pround_fun) %>%ungroup() %>%mutate(CI =sprintf("[%s,%s]", ci.lower, ci.upper)) %>%mutate_at(vars(est, CI, pvalue), ~ifelse(sig =="sig"&!is.na(sig), sprintf("<strong>%s</strong>", .), .)) }```Now let's run that and see what it looks like```{r}gsoep_tab <- gsoep_nested2 %>%select(-data, -m) %>%unnest(summary) %>%format_fun() gsoep_tab```We're really close to being ready to make the table using kable, but we need to do a little reshaping and to remove some columns first.```{r}gsoep_tab <- gsoep_tab %>%select(-ci.lower, -ci.upper, -sig, -category, -pvalue) %>%pivot_wider(names_from ="trait" , names_glue ="{trait}_{.value}" , values_from =c("est", "CI") ) gsoep_tab```Unfortunately, `pivot_wider()` doesn't really let us have too much control over the order of the columns, so we need to move them around so the estimates and CI's for each trait are next to each other. There are many ways to do this, but I'm showing you two below:```{r}ord <-paste(rep(c("E", "A", "C", "N", "O"), each =2), rep(c("est", "CI"), times =5), sep ="_")gsoep_tab <- gsoep_tab %>%select(term, all_of(ord)) %>%# select(starts_with("E"), starts_with("A"), starts_with("C"), starts_with("N"), starts_with("O"))mutate(term =factor(term, terms$new)) %>%arrange(term)gsoep_tab```## Kabling the TableNow, let's go ahead and create the table using kable. Like before, we're going to use `add_header_above()`, but I'm going to show you a trick that I use for it to make specifying it a little easier. - `add_header_above()` takes in a named vector as input where the values have to sum to the number of columns in the data frame (11)```{r}hdr <-c(1, rep(2,5))names(hdr) <-c(" ", "Extraversion", "Agreeableness", "Conscientiousness", "Neuroticism", "Openness")gsoep_tab %>%kable(. , escape = F , align =c("r", rep("c", 10)) , col.names =c("Term", rep(c("<em>est.</em>", "CI"), times =5)) , caption ="<strong>Table 2</strong><br><em>Big Five Personality Trait Trajectories from Latent Growth Models</em>" ) %>%kable_classic(full_width = F, html_font ="Times") %>%add_header_above(hdr) %>%add_footnote(label ="Bold values indicate terms p < .05", notation ="none")```# `lavaan` Example 2## Second-Order Latent Growth Model: Slopes as PredictorsNow, let's do a second example that is more complex and uses the intercepts and slopes we estimated to predict life outcomes. To do so, we first need to clean and prep the life event data and merge it back with our Big Five item-level data.Note that in this case, we're using long data two ways: for the Big Five traits **and** for the life events. In other words, we're getting every combination of the two and are using `inner_join()` to make that happen!```{r}gsoep2_lavaan <- gsoep2_long %>%filter(category =="LifeEvent") %>%group_by(Procedural__SID, Demographic__DOB, item) %>%summarize(le_value =max(value)) %>%ungroup() %>%mutate(age =2005- Demographic__DOB -45) %>%rename(le = item) %>%inner_join(gsoep2_lavaan)gsoep2_lavaan```## Data CleaningNow let's create the nested data frame.```{r}gsoep_nested3 <- gsoep2_lavaan %>%drop_na(trait, le) %>%group_by(trait, le) %>%nest() %>%ungroup()gsoep_nested3```## Model SetupWe're really just building on the second-order latent growth models we already ran. We already ran those and know that we have specified the models correctly / have enough variance to use the slopes and intercept to predict other things. So now we just need to add the new regression paths to that syntax to be able to run the models.As with the model we specified before, by having the life event data long, we are able to do this just one time and have that work for all the life events. So using this trick, we can run 50 unique models with one set of model syntax!Note that we have to modify the function that calls `growth()` because we have a categorical outcome that needs a different estimator.```{r, eval = F}mod2 <-'le_value ~ i + agele_value ~ s'mod2 <-paste(mod, mod2, collapse ="\n")lavaan_fun <-function(d){ m <-growth( mod2 , data = d , ordered ="le_value" , estimator ='WLSMV' , missing ="pairwise" , parallel ="multicore" )return(m)}gsoep_nested3 <- gsoep_nested3 %>%mutate(m =map(data, lavaan_fun))# saveRDS(gsoep_nested3, "gsoep_nested3.RDS")```## Extract ResultsRemember how I said that the reason we created the `terms` tibble was because it would make `extract_fun()` more flexible / portable? Let's see that in action. We'll create a new one that lists only the key additional terms in this model and use that to extract those estimates.```{r}gsoep_nested3 <-readRDS("gsoep_nested3.RDS")terms <-tribble(~path, ~new,"le_value~i", "Intercept","le_value~s", "Slope","le_value~age", "Age")gsoep_nested3 <- gsoep_nested3 %>%mutate(summary =map(m, extract_fun))```## Formatting ResultsNow we're ready to format the results using the same steps as before. Note that I'm combining all them here because we've already stepped through them slowly: 1. Remove other list columns and `unnest`() 2. Format the results using `format_fun()` 3. Remove unnecessary columns and `pivot_wider()` 4. Change the order of the columns 5. Factor the terms to help us arrange the rows 6. Sort the rows by Life Event and then term```{r}ord <-paste(rep(c("E", "A", "C", "N", "O"), each =2), rep(c("est", "CI"), times =5), sep ="_")gsoep_tab2 <- gsoep_nested3 %>%select(-data, -m) %>%unnest(summary) %>%format_fun() %>%select(-ci.lower, -ci.upper, -sig, -pvalue) %>%pivot_wider(names_from ="trait" , names_glue ="{trait}_{.value}" , values_from =c("est", "CI") ) %>%select(le, term, all_of(ord)) %>%# select(starts_with("E"), starts_with("A"), starts_with("C"), starts_with("N"), starts_with("O"))mutate(term =factor(term, terms$new)) %>%arrange(le, term)gsoep_tab2```## Kabling the TableNow we're ready to kable the table! This looks identical to before, but I'm going to show you one more trick using kableExtra::group_rows. Rather than having to count the rows and adding a bunch of manual calls, I always just use the data frame to count for me to avoid errors. Then, I can create the table and use a `for` loop to tack on all the grouped rows using that reference data frame. Much more flexible and less error prone!```{r}hdr <-c(1, rep(2,5))names(hdr) <-c(" ", "Extraversion", "Agreeableness", "Conscientiousness", "Neuroticism", "Openness")rs <- gsoep_tab2 %>%group_by(le) %>%tally() %>%mutate(end =cumsum(n), start =lag(end) +1, start =ifelse(is.na(start), 1, start))gsoep_tab2_kable <- gsoep_tab2 %>%select(-le) %>%kable(. , escape = F , align =c("r", rep("c", 10)) , col.names =c("Term", rep(c("<em>est.</em>", "CI"), times =5)) , caption ="<strong>Table 3</strong><br><em>Big Five Personality Trait Trajectory Associations with Life Events from Latent Growth Models</em>" ) %>%kable_classic(full_width = F, html_font ="Times") %>%add_header_above(hdr) %>%add_footnote(label ="Bold values indicate terms p < .05. ", notation ="none")for(i in1:nrow(rs)){ gsoep_tab2_kable <- gsoep_tab2_kable %>% kableExtra::group_rows(rs$le[i], rs$start[i], rs$end[i])}gsoep_tab2_kable```## Kabling lots of tablesThe code below is a pretty chunk of code I use to generate tables for each model I run with all the model parameter estimates.```{r}all_term_tab_fun <-function(m, trait){ long_trait <-mapvalues(trait, p_names$old, p_names$new, warn_missing = F) cap <-sprintf("<strong>Table SX</strong><br><em>Second Order Latent Growth Models of %s</em>", long_trait) note <-"Bold values indicate estimates were significant at p < .05. est. = unstandardized estimate. " tab <-parameterEstimates(m, standardized = T) %>%data.frame() %>%format_fun() %>%select(lhs:est, CI, pvalue) %>%kable(. , format ="html" , escape = F , align =c("r", "c", "l", "l", rep("c", 3)) , col.names =c("LHS", "op", "RHS", "label", "est.", "CI", "<em>p</em>") , caption = cap ) %>%kable_classic(full_width = F, html_font ="Times") %>%footnote(note)# save_kable(tab, file = sprintf("results/tables/all-terms/%s.html", trait))return(tab)}gsoep_nested2 <- gsoep_nested2 %>%mutate(all_term_tab =map2(m, trait, all_term_tab_fun))```# PlotsThe same functional, iterative approach also applies to creating plots! I teach a whole course on data visualization, so I'm not going to spend a ton of time on this. But I'll show you an example for basic R models and for lavaan.# Plots Example 1: Base R GLM## Set up the data frameFirst, we'll use some of what we learned in the `purrr` workshop to set ourselves up to be able to create these tables easily, using `group_by()` and `nest()` to create nested data frames for our target personality + outcome combinations. To do this, we'll also use what you learned about `filter()` and `mutate()`.```{r}outcomes <-tribble(~old, ~new,"chldbrth", "Child Birth" , "divorced", "Divorced" , "married", "Married" , "mvInPrtner", "Move in with Partner")gsoep_nested4 <- gsoep %>%mutate(age = age -45) %>%group_by(Trait, Outcome) %>%nest() %>%ungroup()gsoep_nested4```## Run ModelsNow, we'll run the models predicting outcomes from personality x age interactions (and their lower order terms)```{r}mod2_fun <-function(d){ d$o_value <-factor(d$o_value)glm(o_value ~ p_value*age, data = d, family =binomial(link ="logit"))}gsoep_nested4 <- gsoep_nested4 %>%mutate(m =map(data, mod2_fun))```## Generating Model PredictionsThe next step is to get predictions from the model. It is also good practice to always get standard errors and/or confidence intervals of the estimates. Thankfully, with `lm()` and `glm()`, this is easy. Lavaan will be covered briefly and more complex model forms are covered in my data visualization class.```{r}glm_pred_fun <-function(m){ rng_p <-range(m$model$p_value, na.rm = T) frame <-crossing(age =c(-15, 0, 15) , p_value =seq(rng_p[1], rng_p[2], length.out =30) ) pred <-predict(m, newdata = frame, se.fit = T)[1:2] frame <- frame %>%mutate(fit = pred$fit , se = pred$se.fit , lower = fit -1.96*se , upper = fit +1.96*se ) %>%mutate_at(vars(fit, lower, upper), exp)}```Now we run the predictions```{r}gsoep_nested4 <- gsoep_nested4 %>%mutate(pred =map(m, glm_pred_fun))gsoep_nested4```## Plot the PredictionsOne of the greatest strengths of list-columns and using `purrr` is what I think of as the data accordion. With list-columns, I can `unnest()` whatever level the models were estimated at and then reaggregate using whatever combinations / groups make the most sense.So here, for example, we ran models for all Big Five trait (5) x life outcome (10) combinations. But having 50 separate plots would be a little silly, so we'd much rather create one for each outcome. But I'm not going to write that code 10 times because that'd be a waste. So instead if I reaggregate the data, I can then use a function + `purrr` to generate each of the plots.```{r}gsoep_plot4 <- gsoep_nested4 %>%select(-data, -m) %>%unnest(pred) %>%group_by(Outcome) %>%nest() %>%ungroup()gsoep_plot4```When we write code for plots in ggplot, we have a lot of things that we end up writing over and over, especially for theme elements. To get around this, I use this little function to modify all of my theme elements in a single line.```{r}my_theme <-function(){theme_classic() +theme(legend.position ="bottom" , legend.title =element_text(face ="bold", size =rel(1)) , legend.text =element_text(face ="italic", size =rel(1)) , axis.text =element_text(face ="bold", size =rel(1.1), color ="black") , axis.title =element_text(face ="bold", size =rel(1.2)) , panel.grid.major =element_line(color ="grey90", linewidth = .2) , plot.title =element_text(face ="bold", size =rel(1.2), hjust = .5) , plot.subtitle =element_text(face ="italic", size =rel(1.2), hjust = .5) , strip.text =element_text(face ="bold", size =rel(1.1), color ="white") , strip.background =element_rect(fill ="black") )}```Like I mentioned, once we reaggregate the data, we can then write a function to generate the plot for each outcome separately. Let's start by building the basic code for the plot. We need it to: 1. Have x = personality, y = predicted values, separate lines for moderator levels, and separate panels for different traits 2. A line indicating personality-outcome associations (again separately across age groups) and a ribbon with the confidence interval around the prediction```{r}plot_fun <-function(d){ d %>%mutate(age_fac =factor(age, c(-15, 0, 15), c("30", "45", "60"))) %>%ggplot(aes(x = p_value, y = fit, color = age_fac)) +geom_ribbon(aes(ymin = lower, ymax = upper, fill = age_fac) , alpha = .4 ) +geom_line() +facet_wrap(~Trait, scales ="free_x") +my_theme()}```With our reaggregated data, we can easily just run the plots and view the results. This is getting there, but let's make some additional modifications to get them publication ready.```{r}gsoep_plot4 <- gsoep_plot4 %>%mutate(p =map(data, plot_fun))gsoep_plot4gsoep_plot4$p[[1]]```Let's add: 1. full personality trait names 2. the outcome in the title 3. better axis and other labels```{r}plot_fun <-function(d, outcome){ out <-mapvalues(outcome, outcomes$old, outcomes$new, warn_missing = F) d %>%mutate(age_fac =factor(age, c(-15, 0, 15), c("30", "45", "60")) , Trait =factor(Trait, p_names$old, p_names$new)) %>%ggplot(aes(x = p_value, y = fit, color = age_fac)) +geom_ribbon(aes(ymin = lower, ymax = upper, fill = age_fac) , alpha = .4 ) +geom_line() +labs(x ="Personality Trait Level" , y ="Predicted Value (CI)" , color ="Age" , fill ="Age" , title =sprintf("%s: Personality x Age Interaction", out) ) +facet_wrap(~Trait, scales ="free_x") +my_theme() +theme(legend.position =c(.8, .25) )# ggsave(filename = sprintf("results/figures/%s.png", outcome), width = 6, height = 6)# ggsave(filename = sprintf("results/figures/%s.pdf", outcome), width = 6, height = 6)}```Now, we can run these and get the results.```{r}gsoep_plot4 <- gsoep_plot4 %>%mutate(p =map2(data, Outcome, plot_fun))gsoep_plot4gsoep_plot4$p[[1]]```## Lavaan TrajectoriesLavaan is trickier. Because of the way residual variance, etc. is calculted in SEM, it's much less straightforward to estimate standard errors and confidence intervals of predictions (not to mention that lavaan just doesn't make predictions easy). So instead, I just use some basic matrix algebra and bootstrapping to get these. If you aren't comfortable with basic matrix algebra, you can just write a little function that uses terms individually.```{r, eval = F}lavaan_pred_fun <-function(m){ coef <-coef(gsoep_nested3$m[[1]])[c("i~1", "s~1")] b <-bootstrapLavaan(m, parallel ="multicore", R =100) frame <-tibble(intercept =1, wave =seq(-1,1, length.out =50)) pred_boot <-function(x,y) bind_cols(frame, pred =as.vector(as.matrix(frame) %*%c(x,y))) frame <- frame %>%mutate(pred =as.vector(as.matrix(frame) %*% coef)) %>%left_join(tibble(sample =1:100, pred =map2(b[,"i~1"], b[,"s~1"], pred_boot)) %>%unnest(pred) %>%group_by(wave) %>%summarize(lower =quantile(pred, probs = .025) , upper =quantile(pred, probs = .975)) )}gsoep_nested2 <- gsoep_nested2 %>%mutate(pred =map(m, lavaan_pred_fun))# saveRDS(gsoep_nested2, file = "gsoep_nested2.RDS")``````{r, echo = F}gsoep_nested2 <-readRDS("gsoep_nested2.RDS")```Once we've estimated the predictions, we can graph them just like we did for the previous example. (Again, a full discussion of plotting is beyond the scope of this course and is covered for a whole quarter in my data visualization class instead.)```{r}gsoep_nested2 %>%select(-data, -m, -summary) %>%unnest(pred) %>%ggplot(aes(x = wave +1, y = pred)) +geom_ribbon(aes(ymin = lower, ymax = upper, fill = trait) , alpha = .4 ) +geom_line() +scale_y_continuous(limits =c(1,7), breaks =seq(1,7,1)) +scale_x_continuous(limits =c(-.25,2.25), breaks =0:2, labels =c(2005, 2009, 2013)) +labs(x ="Personality Trait Level" , y ="Predicted Value (CI)" , title =sprintf("Personality Trajectories") ) +facet_wrap(~trait, scales ="free_x") +my_theme() +theme(legend.position ="none")```