Week 5: Visualizing Time Series and Trends

Emorie D Beck

Quick Side Note: Custom Themes

  • We have a lot to cover today, so I’m going to skip over some of the usual “how to start with the basics and make it aesthetically pleasing”
  • Instead, we’ll create a custom these that captures some of our usual additions
  • This will save us both time and text!
  • I highly recommend doing this in your own work
my_theme <- function(){
  theme_bw() + 
  theme(
    legend.position = "bottom"
    , legend.title = element_text(face = "bold", size = rel(1))
    , legend.text = element_text(face = "italic", size = rel(1))
    , axis.text = element_text(face = "bold", size = rel(1.1), color = "black")
    , axis.title = element_text(face = "bold", size = rel(1.2))
    , plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)
    , plot.subtitle = element_text(face = "italic", size = rel(1.2), hjust = .5)
    , strip.text = element_text(face = "bold", size = rel(1.1), color = "white")
    , strip.background = element_rect(fill = "black")
    )
}

Time Series

What is a time series?

“a series of values of a quantity obtained at successive times, often with equal intervals between them”

What is a time series?

# A tibble: 100 × 2
       t value
   <int> <dbl>
 1     1     5
 2     2     5
 3     3     3
 4     4     4
 5     5     4
 6     6     4
 7     7     4
 8     8     3
 9     9     3
10    10     2
11    11     3
12    12     4
13    13     4
14    14     3
15    15     3
# ℹ 85 more rows

Who should care about time series?

People who:

  • study longitudinal change (e.g., development)

  • study variability (e.g., experience sampling, passive sensing)

  • run experiments with multiple trials

  • study cohort or age differences

  • simulations (e.g., trace plots in bayesian models)

  • Time is everywhere, and ignoring it can be problematic

What will we cover with time series:

  • Univariate time series
  • Multivariate time series
  • Connected scatter plots
  • Smoothing
  • Detrended time series

This isn’t the first time we’ve seen time series, but today we’ll focus on telling stories with time series

Univariate and Multivariate Time Series

Why visualize a time series if you don’t care about the trend?

  • This is another way to describe your data that can make sure that you see if something went wrong

Univariate and Multivariate Time Series

  • How you visualize the trends you are trying to uncover in a time series will depend on the research question you are asking
    • e.g., very basic time series visualizations are great for descriptives
    • But to include it in a presentation / papers, we usually want to add more affordances that clarify nothing went wrong
    • Affordances include, text, shading, and more, in aligment with Gestalt principles and how we process different aspects of visualizations

But First, Our Data

  • These are some Experience Sampling Method data I collected during my time in graduate school Beck & Jackson (2022)
  • In that paper I built personalized machine learning models of behaviors and experiences from sets of:
    • psychological
    • situational
    • and time variables
  • We also saw these in Week 2
load(url("https://github.com/emoriebeck/psc290-data-viz-2022/raw/main/05-week5-time-series/01-data/ipcs_data.RData"))
ipcs_data %>% 
  print(n = 6)
# A tibble: 4,222 × 70
  SID   Full_Date    afraid angry attentive content excited goaldir guilty happy
  <chr> <chr>         <dbl> <dbl>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl> <dbl>
1 02    2018-10-22 …      1     2         4       4       2       5      2     3
2 02    2018-10-22 …      1     1         4       3       2       5      1     3
3 02    2018-10-23 …      2     1         2       3       1       2      2     3
4 02    2018-10-23 …      2     2         4       3       2       4      1     3
5 02    2018-10-23 …      2     1         4       4       3       4      1     3
6 02    2018-10-24 …      2     1         4       4       2       4      1     3
# ℹ 4,216 more rows
# ℹ 60 more variables: proud <dbl>, purposeful <dbl>,
#   agreeableness_Compassion <dbl>, agreeableness_Respectfulness <dbl>,
#   agreeableness_Trust <dbl>, conscientiousness_Organization <dbl>,
#   conscientiousness_Productiveness <dbl>,
#   conscientiousness_Responsibility <dbl>, extraversion_Assertiveness <dbl>,
#   extraversion_Energy.Level <dbl>, extraversion_Sociability <dbl>, …

But First, Our Data

Let’s simplify a bit and say we care about 4 different states for two people:

ipcs_data %>%
  filter(SID == c("216")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  print(n = 10)
# A tibble: 108 × 7
   SID   Full_Date         beep excited goaldir content guilty
   <chr> <chr>            <int>   <dbl>   <dbl>   <dbl>  <dbl>
 1 216   2019-12-09 12:08     1       3       4       4      2
 2 216   2019-12-09 16:06     2       3       4       4      1
 3 216   2019-12-09 20:14     3       3       3       3      1
 4 216   2019-12-10 12:02     4       3       4       3      2
 5 216   2019-12-10 16:08     5       2       4       3      2
 6 216   2019-12-10 20:05     6       3       3       4      2
 7 216   2019-12-10 8:01      7       2       4       3      2
 8 216   2019-12-11 12:29     8       3       3       3      2
 9 216   2019-12-11 16:05     9       2       3       3      2
10 216   2019-12-11 20:10    10       3       4       3      1
# ℹ 98 more rows
ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  print(n = 10)
# A tibble: 48 × 7
   SID   Full_Date         beep excited goaldir content guilty
   <chr> <chr>            <int>   <dbl>   <dbl>   <dbl>  <dbl>
 1 02    2018-10-22 13:23     1       2       5       4      2
 2 02    2018-10-22 17:23     2       2       5       3      1
 3 02    2018-10-23 10:00     3       1       2       3      2
 4 02    2018-10-23 13:24     4       2       4       3      1
 5 02    2018-10-23 17:53     5       3       4       4      1
 6 02    2018-10-24 10:00     6       2       4       4      1
 7 02    2018-10-24 13:23     7       2       4       3      1
 8 02    2018-10-24 17:29     8       4       4       4      1
 9 02    2018-10-24 21:23     9       3       3       3      2
10 02    2018-10-25 13:29    10       3       3       4      2
# ℹ 38 more rows

Univariate Time Series

It’s hard to make much sense of this because we end up trying to draw the line connecting the points with our eyes:

p <- ipcs_data %>%
  filter(SID == c("02")) %>%
  select(
    SID, Full_Date, beep, excited
    , goaldir, content, guilty
    ) %>%
  ggplot(aes(x = beep, y = excited)) + 
    my_theme(); p

Univariate Time Series

It’s hard to make much sense of this because we end up trying to draw the line connecting the points with our eyes:

p + 
  geom_point()

Univariate Time Series

It’s easier to make much sense of this because we can just follow the line:

p + 
    geom_line() + 
    geom_point()

Univariate Time Series

But often in time series, we won’t want / need to plot the points:

Take a moment and clean up this figure.

p + 
  geom_line()

Univariate Time Series

One way to highlight increases is using trend lines:

p + 
    geom_line(color = "grey", size = .9) + 
    geom_smooth(method = "lm", formula = y ~ poly(x,2)) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , title = "Contentedness increased in the second week"
      , subtitle = "Participant 2"
      , caption = "y ~ x2"
    ) 

Univariate Time Series

Another way to highlight changes is to use area, but our scale doesn’t start at zero, so this plot is misleading.

p + 
    geom_area(fill = "purple4", alpha = .2) + 
    geom_line(color = "purple4", size = .9) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(0,5), breaks = seq(1,5,1), labels = 1:5) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , title = "Contentedness increased in the second week"
      , subtitle = "Participant 2"
    ) + 
    my_theme()

Univariate Time Series

Let’s build this up:

p <- ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content-1)) + 
    geom_area(fill = "purple4", alpha = .2) + 
    geom_line(color = "purple4", size = .9) + 
    my_theme(); p

Univariate Time Series

Add scales and labels:

p <- p + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(0,4), breaks = seq(0,4,1), labels = 1:5) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , title = "Contentedness increased in the second week"
      , subtitle = "Participant 2"
    ); p 

Univariate Time Series

Add some text to highlight what we want to show:

p + 
    geom_vline(aes(xintercept = 28), linetype = "dashed", size = 1) + 
    annotate("text", label = "Week 1", x = 15, y = 0.1, hjust = .5) + 
    annotate("text", label = "Week 2", x = 40, y = 0.1, hjust = .5) 

Multivariate Time Series

  • We can also apply the same principles to either:
    • the same variable across participants
    • different variables within the same participant

Multivariate Time Series: Mini Multiples Plots

Multivariate Time Series: Mini Multiples Plots

subs <- ipcs_data %>% group_by(SID) %>% tally() %>% arrange(desc(n)) %>% slice_head(n = 20) %>% pull(SID)
ipcs_data %>% 
  filter(SID %in% subs) %>%
  select(SID, Full_Date, beep, content) %>%
  ggplot(aes(x = beep, y = content)) + 
  geom_line() + 
  geom_point(size = .5) + 
  facet_wrap(~SID, nrow = 4) + 
  my_theme()

Multivariate Time Series: Mini Multiples Plots

nested_minis <- ipcs_data %>%
  group_by(SID) %>%
  filter(n() > 30) %>%
  ungroup() %>%
  select(SID) %>%
  distinct() %>%
  mutate(id2 = 1:n())
nested_minis <- nested_minis %>% 
  inner_join(ipcs_data) %>% 
  mutate(gr = ifelse(id2 <= 20, 1, ifelse(id2 <= 40, 2, ifelse(id2 <= 60, 3, 4)))) %>%
  group_by(gr) %>%
  nest() %>%
  ungroup()
mini_mult_plot <- function(d){
  d %>% 
    ggplot(aes(x = beep, y = content)) + 
      geom_line() + 
      geom_point(size = .5) + 
      facet_wrap(~SID, nrow = 4) + 
      my_theme()
}

nested_minis <- nested_minis %>%
  mutate(p = map(data, mini_mult_plot)); nested_minis
# A tibble: 4 × 3
     gr data                  p     
  <dbl> <list>                <list>
1     1 <tibble [1,121 × 72]> <gg>  
2     2 <tibble [1,321 × 72]> <gg>  
3     3 <tibble [1,170 × 72]> <gg>  
4     4 <tibble [610 × 72]>   <gg>  

Multivariate Time Series: Same variable; different participants

One way to build a multivariate time series figure is using the color aesthetic:

p2 <- ipcs_data %>%
  filter(SID %in% c("05", "02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content)) + 
    geom_line(aes(color = SID)) + 
    my_theme(); p2

Multivariate Time Series: Same variable; different participants

One way to build a multivariate time series figure is using the color aesthetic, improved with our own color scale:

p3 <- p2 + 
    scale_color_manual(values = c("darkblue", "orange3")) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(1,5), breaks = seq(1,5,1), labels = 1:5) +
    labs(
      x = "Beep (1-50)"
      , y = "Momentary Contentedness (1-5)"
      , color = "Participant ID"
    ) + 
    my_theme() +
    theme(legend.position = "bottom"); p3

Multivariate Time Series: Same variable; different participants

Remember, legends add cognitive load, so let’s add labels:

p3 + 
    annotate("label", label = "Participant 02", color = "white", fill = "darkblue", x = 33, y = 2.5, hjust = 0) + 
    annotate("label", label = "Participant 5", color = "white", fill = "orange3", x = 18, y = 1.75, hjust = 0) 

Multivariate Time Series: Same variable; different participants

Let’s (1) add the story with a title, (2) change the colors to match the story, (3) add trend lines:

p2 + 
    geom_smooth(aes(color = SID), method = "lm",se = F) +
    annotate("label", label = "Participant 02", color = "white", fill = "darkblue", x = 33, y = 2.5, hjust = 0) + 
    annotate("label", label = "Participant 5", color = "white", fill = "grey60", x = 18, y = 1.75, hjust = 0) + 
    scale_color_manual(values = c("darkblue", "grey60")) + 
    labs(
      title = "Both Participants' Contentedness Increased"
      , subtitle = "But Participant 5 remained more content on average"
    )

Multivariate Time Series: Different variables; same participant

Basic Syntax:

p4 <- ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value)) + 
    geom_line(aes(color = item)) + 
    my_theme(); p4

Multivariate Time Series: Different variables; same participant

Let’s improve the colors and add some labels

p4 <- p4 + 
    geom_point(size = .9) + 
    annotate("text", label = "Goal\nDirected", x = 50, y = 4, hjust = 0) + 
    annotate("text", label = "Guilty", x = 50, y = 2, hjust = 0) + 
    scale_color_manual(values = c("orchid4", "orchid2")) +
    scale_x_continuous(limits = c(1,57), breaks = seq(0,50,5)) + 
    theme(legend.position = "none"); p4

Multivariate Time Series: Different variables; same participant

Let’s (1) add a story with the title/subtitle and (2) highlight a core part of the time series

p4 + 
    annotate(
      "rect", xmin = 13, xmax = 22, ymin = 1.8
      , ymax = 3.2, fill = "orchid", alpha = .3
      ) + 
    annotate("text", label = "Goal\nDirected", x = 50, y = 4, hjust = 0) + 
    annotate("text", label = "Guilty", x = 50, y = 2, hjust = 0) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Rated Momentary Value (1-5)"
      , title = "When goal-directedness was high, guilt was low"
      , subtitle = "Guilt was rarely equal to or higher than goal-directedness"
    ) 

Multivariate Time Series: Different variables; same participant

Alternatively, we can facet instead of splitting items by color:

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value-1)) + 
    geom_area(aes(fill = item), alpha = .4) + 
    geom_line(color = "orchid4", size = .9) + 
    geom_point(size = .9) + 
    scale_fill_manual(values = c("orchid4", "orchid2")) +
    scale_x_continuous(limits = c(1,50), breaks = seq(0,50,5)) +
    scale_y_continuous(limits = c(0,4), breaks = seq(0,4,1), labels = 1:5) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Rated Momentary Value (1-5)"
      , title = "When goal-directedness was high, guilt was low"
      , subtitle = "Guilt was rarely equal to or higher than goal-directedness"
    ) + 
    facet_wrap(~item, nrow = 2) + 
    theme_classic() + 
    theme(
      legend.position = "none"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , legend.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.2))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
      , strip.background = element_rect(fill = "orchid4")
      , strip.text = element_text(face = "bold", size = rel(1.2), color = "white")
      )

Multivariate Time Series: Spaghetti Plots

  • We’ve talked several times about trends of either a single individual or aggregate trends that collapse across multiple individuals.
  • But what if we want to combine the two?
  • Let’s load in a new data set that will allow us to examine growth trajectories across longer periods of time
  • We’ve got a variety of variables assessed longitudinally for multiple individuals
  • Age is time varying, so we have both year and age as time variables to index on
load(url("https://github.com/emoriebeck/psc290-data-viz-2022/blob/main/05-week5-time-series/01-data/gsoep.RData?raw=true"))
gsoep
# A tibble: 480,798 × 11
    year   SID SRhealth satHealth marital chldbrth     gender yearBrth mortality
   <dbl> <dbl>    <dbl>     <dbl>   <dbl>    <dbl> <hvn_lbll>    <dbl>     <dbl>
 1  1984   901       NA        NA      NA       NA          2     1951         0
 2  1984  1001       NA        NA      NA       NA          2     1913         0
 3  1984  1101       NA        NA      NA       NA          2     1906         0
 4  1984  1201       NA        NA      NA       NA          1     1911         0
 5  1984  1202       NA        NA      NA       NA          2     1913         0
 6  1984  1301       NA        NA      NA       NA          2     1943         0
 7  1984  1302       NA        NA      NA       NA          1     1965         0
 8  1984  1901       NA        NA      NA       NA          2     1948         0
 9  1984  2001       NA        NA      NA       NA          2     1949         0
10  1984  2002       NA        NA      NA       NA          1     1952         0
# ℹ 480,788 more rows
# ℹ 2 more variables: job <dbl>, age <dbl>

Multivariate Time Series: Spaghetti Plots

gsoep %>%
  select(SID, age, SRhealth, satHealth) %>%
  filter(age < 100 & age > 19) %>% 
  drop_na() %>%
  filter(SID %in% sample(unique(.$SID), 1000)) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) 
# A tibble: 13,336 × 4
     SID   age item      value
   <dbl> <dbl> <chr>     <dbl>
 1  2501    68 SRhealth      1
 2  2501    68 satHealth     2
 3 25801    60 SRhealth      3
 4 25801    60 satHealth     6
 5 27403    27 SRhealth      3
 6 27403    27 satHealth     5
 7 29402    29 SRhealth      5
 8 29402    29 satHealth    10
 9 35701    43 SRhealth      4
10 35701    43 satHealth     9
# ℹ 13,326 more rows

Multivariate Time Series: Spaghetti Plots

p6 <- gsoep %>%
  select(SID, age, SRhealth, satHealth) %>%
  filter(age < 100 & age > 19) %>% 
  drop_na() %>%
  filter(SID %in% sample(unique(.$SID), 1000)) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = age, y = value, color = item)) + 
  my_theme()

Multivariate Time Series: Spaghetti Plots

p6 + 
    geom_point(alpha = .2, size = .5, color = "grey") + 
    geom_smooth(
      aes(group = interaction(SID, item))
      , method = "lm", se = F
      , alpha = .3, size = .5
      ) + 
    scale_color_manual(values = c("darkblue", "darkred")) + 
    facet_wrap(~item) + 
    theme(legend.position = "none")

Multivariate Time Series: Spaghetti Plots

  • But this is a longitudinal change, so we can’t just do that. We have to estimate the growth models first

  • The model we’ll use is:

  • \(Y_{ij} = \beta_{0j} + \beta_{1j}*age_{ij} + \epsilon_{ij}\)

    • \(\beta_{0j} = \gamma_{00} + u_{0j}\)
    • \(\beta_{1j} = \gamma_{10} + u_{1j}\)

\[\epsilon \sim \mathcal{N}(0, \sigma^2)\]

\[ \begin{align*} \Biggr[ \begin{array}{c} u_{0j}\\ u_{1j}\end{array} \Biggr] \sim \mathcal{N} \left( \Biggr[ \begin{array}{c} 0\\ 0\end{array} \Biggr], \begin{array}{cc} \tau_{00}^2 & \tau_{10} \\ \tau_{10} & \tau_{11}^2 \end{array} \right) \end{align*} \]

  • \(\beta_{0j}\) is the intercept for person \(j\) at time \(i\)
  • \(\beta_{1j}\) is the linear change for person \(j\)
library(lme4)
lmer_fun <- function(d) lmer(value ~ 1 + age + (1 + age | SID), data = d)

m_nested <- gsoep %>%
  select(SID, age, SRhealth, satHealth) %>%
  filter(age < 100 & age > 19) %>% 
  drop_na() %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  mutate(age = age/10) %>%
  group_by(item) %>%
  nest() %>%
  ungroup() %>%
  mutate(
    m = map(data, lmer_fun)
    , tidy = map(m, broom.mixed::tidy)); m_nested
# A tibble: 2 × 4
  item      data                   m         tidy            
  <chr>     <list>                 <list>    <list>          
1 SRhealth  <tibble [227,516 × 3]> <lmerMod> <tibble [6 × 6]>
2 satHealth <tibble [227,516 × 3]> <lmerMod> <tibble [6 × 6]>

Multivariate Time Series: Spaghetti Plots

  • But this is a longitudinal change, so we can’t just do that. We have to estimate the growth models first

  • The model we’ll use is:

  • \(Y_{ij} = \beta_{0j} + \beta_{1j}*age_{ij} + \epsilon_{ij}\)

    • \(\beta_{0j} = \gamma_{00} + u_{0j}\)
    • \(\beta_{1j} = \gamma_{10} + u_{1j}\)

\[\epsilon \sim \mathcal{N}(0, \sigma^2)\]

\[ \begin{align*} \Biggr[ \begin{array}{c} u_{0j}\\ u_{1j}\end{array} \Biggr] \sim \mathcal{N} \left( \Biggr[ \begin{array}{c} 0\\ 0\end{array} \Biggr], \begin{array}{cc} \tau_{00}^2 & \tau_{10} \\ \tau_{10} & \tau_{11}^2 \end{array} \right) \end{align*} \]

  • \(\beta_{0j}\) is the intercept for person \(j\) at time \(i\)
  • \(\beta_{1j}\) is the linear change for person \(j\)
m_nested %>% 
  select(-data, -m) %>%
  unnest(tidy) %>%
  print(n = 12)
# A tibble: 12 × 7
   item      effect   group    term                 estimate std.error statistic
   <chr>     <chr>    <chr>    <chr>                   <dbl>     <dbl>     <dbl>
 1 SRhealth  fixed    <NA>     (Intercept)             4.57    0.0111      410. 
 2 SRhealth  fixed    <NA>     age                    -0.245   0.00242    -101. 
 3 SRhealth  ran_pars SID      sd__(Intercept)         1.05   NA            NA  
 4 SRhealth  ran_pars SID      cor__(Intercept).age   -0.822  NA            NA  
 5 SRhealth  ran_pars SID      sd__age                 0.214  NA            NA  
 6 SRhealth  ran_pars Residual sd__Observation         0.616  NA            NA  
 7 satHealth fixed    <NA>     (Intercept)             9.04    0.0261      346. 
 8 satHealth fixed    <NA>     age                    -0.488   0.00575     -84.7
 9 satHealth ran_pars SID      sd__(Intercept)         2.42   NA            NA  
10 satHealth ran_pars SID      cor__(Intercept).age   -0.818  NA            NA  
11 satHealth ran_pars SID      sd__age                 0.510  NA            NA  
12 satHealth ran_pars Residual sd__Observation         1.47   NA            NA  

Multivariate Time Series: Spaghetti Plots

fx_pred_fun <- function(m){
  tibble(age = seq(2,9.9, .1)) %>%
    mutate(
      pred = predict(m, newdata = ., re.form = NA)
    )
}

rx_pred_fun <- function(m, d){
  d %>%
    select(SID, age) %>%
    distinct() %>%
    mutate(pred = predict(m, newdata = .))
}

m_nested <- m_nested %>%
  mutate(fx_pred = map(m, fx_pred_fun)
         , rx_pred = map2(m, data, rx_pred_fun))
# A tibble: 2 × 6
  item      data                   m         tidy             fx_pred  rx_pred 
  <chr>     <list>                 <list>    <list>           <list>   <list>  
1 SRhealth  <tibble [227,516 × 3]> <lmerMod> <tibble [6 × 6]> <tibble> <tibble>
2 satHealth <tibble [227,516 × 3]> <lmerMod> <tibble [6 × 6]> <tibble> <tibble>

Multivariate Time Series: Spaghetti Plots

set.seed(6)
sids <- sample(unique(m_nested$data[[1]]$SID), 1000)
m_nested %>%
  select(item, rx_pred) %>%
  unnest(rx_pred) %>%
  filter(SID %in% sids) 
# A tibble: 13,812 × 4
   item       SID   age  pred
   <chr>    <dbl> <dbl> <dbl>
 1 SRhealth 12301   5.1  2.28
 2 SRhealth 23904   2.4  2.99
 3 SRhealth 31701   4.3  3.75
 4 SRhealth 34402   4    2.49
 5 SRhealth 34701   4.7  2.21
 6 SRhealth 52802   3.6  4.08
 7 SRhealth 66202   3.3  3.38
 8 SRhealth 74002   6.9  2.64
 9 SRhealth 75201   6    2.24
10 SRhealth 75901   7    2.29
# ℹ 13,802 more rows

Multivariate Time Series: Spaghetti Plots

set.seed(6)
sids <- sample(unique(m_nested$data[[1]]$SID), 1000)
p7 <- m_nested %>%
  select(item, rx_pred) %>%
  unnest(rx_pred) %>%
  filter(SID %in% sids) %>%
  ggplot(aes(x = age*10, y = pred, color = item)) + 
    geom_line(
      aes(group = factor(SID))
      , alpha = .25
      , size = .75) + 
    facet_wrap(~item) + 
    my_theme() + 
    theme(legend.position = "none"); p7

Multivariate Time Series: Spaghetti Plots

p7 <- p7 + 
    geom_line(
      data = m_nested %>% 
        select(item, fx_pred) %>%
        unnest(fx_pred)
      , size = 1
      ) + 
    scale_color_manual(values = c("darkblue", "darkred")); p7

Multivariate Time Series: Spaghetti Plots

p7 + 
    labs(x = "Age"
         , y = "Model Predicted Values (0-10)"
         , title = "Self-Rated Health and Health Satisfaction Decreased\nAcross the Lifespan, on Average")

A Quick Note on Smoothing

Often, we want to smooth our trajectories, like if we wanted to look at the trajectory of self-rated health and satisfaction with health across age groups

gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  mutate(gender = haven::zap_labels(gender)) %>%
  drop_na() %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  ungroup()
# A tibble: 597 × 5
     age marital gender satHealth SRhealth
   <dbl>   <dbl>  <dbl>     <dbl>    <dbl>
 1    20       1      1      7        4.33
 2    20       1      2      8.06     3.87
 3    20       2      2      4.5      2   
 4    20       4      1      8.01     4.04
 5    20       4      2      7.70     3.89
 6    21       1      1      8.58     4.42
 7    21       1      2      7.82     3.82
 8    21       2      2      8.67     4.67
 9    21       4      1      7.83     3.97
10    21       4      2      7.71     3.90
# ℹ 587 more rows

Smoothing

  • But we have to choose how and when to smooth carefully
  • E.g., when do we choose a linear model v. a loess line?
  • E.g., Do we do an overall trajectory or stratify by group?
gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  drop_na() %>%
  mutate(gender = haven::zap_labels(gender)) %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  )
# A tibble: 1,194 × 5
     age marital gender item      value
   <dbl>   <dbl>  <dbl> <chr>     <dbl>
 1    20       1      1 SRhealth   4.33
 2    20       1      1 satHealth  7   
 3    20       1      2 SRhealth   3.87
 4    20       1      2 satHealth  8.06
 5    20       2      2 SRhealth   2   
 6    20       2      2 satHealth  4.5 
 7    20       4      1 SRhealth   4.04
 8    20       4      1 satHealth  8.01
 9    20       4      2 SRhealth   3.89
10    20       4      2 satHealth  7.70
# ℹ 1,184 more rows

Smoothing

p8 <- gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  mutate(gender = haven::zap_labels(gender)) %>%
  drop_na() %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = age, y = value)) +
    facet_wrap(~item, nrow = 2) + 
    my_theme(); p8

Smoothing

p8  + 
    geom_line(aes(color = interaction(gender, marital))) + 
    geom_smooth(method = "loess") + 
    theme(legend.position = "bottom")

Smoothing

We may want to highlight overall trajectories v. stratefied ones differently to highlight their consistency:

p9 <- gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  drop_na() %>%
  mutate(gender = haven::zap_labels(gender)) %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  ungroup() %>%
  mutate(marital = factor(
    marital
    , 1:4
    , c("Married", "Separated", "Widowed", "Never Married")
    ), gender = factor(gender, c(1,2), c("Male", "Female"))
    ) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = age, y = value)) + 
    my_theme(); p9

Smoothing

We may want to highlight overall trajectories v. stratefied ones differently to highlight their consistency:

p9 + 
    geom_line(aes(color = interaction(gender, marital))) + 
    geom_smooth(method = "lm", color = "orchid3", fill = "orchid3") + 
    scale_color_grey(start = .01, end = .99) + 
    facet_wrap(~item, nrow = 2) + 
    theme(legend.position = "bottom")

Smoothing

p9 + 
    geom_smooth(
      aes(color = interaction(gender, marital))
      , method = "loess"
      , se = F
      ) + 
    geom_smooth(method = "lm", color = "orchid3", fill = "orchid3") + 
    scale_color_grey(start = .01, end = .99) + 
    facet_wrap(~item, nrow = 2) + 
    labs(x = "Age", y = "Smoothed Value", color = NULL) + 
    guides(color = guide_legend(nrow = 4)) + 
    theme(legend.position = "bottom")

Slopegraphs

  • We didn’t have a chance to review slopegraphs when discussing visualizing associations
  • But they also apply to visualizing change, especially when we have two wave measures of change (e.g., pre-post designs)
  • When considered over time, these figures also give us a sense of rank-order consistency

Slopegraphs

First, let’s wrangle our data:

pomp <- function(x) (x - min(x, na.rm = T))/(max(x, na.rm = T) - min(x, na.rm = T))*10
gsoep_sg <- gsoep %>%
  select(SID, age, satHealth, SRhealth, marital, gender) %>%
  filter(age %in% c(60,70)) %>%
  drop_na() %>% 
  group_by(SID) %>%
  filter(n() == 2) %>%
  group_by(age) %>%
  mutate(SRhealth = pomp(SRhealth)) %>%
  group_by(age, SID, marital, gender) %>%
  mutate(health = mean(SRhealth, satHealth)) %>%
  group_by(age, marital, gender) %>%
  summarize(health = mean(health)) %>%
  ungroup() %>%
  mutate(marital = factor(
    marital
    , 1:4
    , c("Married", "Separated", "Widowed", "Never Married")
    ), gender = factor(gender, c(1,2), c("Male", "Female"))
    ); gsoep_sg
# A tibble: 16 × 4
     age marital       gender health
   <dbl> <fct>         <fct>   <dbl>
 1    60 Married       Male     5.59
 2    60 Married       Female   5.67
 3    60 Separated     Male     5.08
 4    60 Separated     Female   5.62
 5    60 Widowed       Male     5.56
 6    60 Widowed       Female   5.46
 7    60 Never Married Male     4.75
 8    60 Never Married Female   6   
 9    70 Married       Male     5.45
10    70 Married       Female   5.20
11    70 Separated     Male     5   
12    70 Separated     Female   4.44
13    70 Widowed       Male     4.81
14    70 Widowed       Female   4.81
15    70 Never Married Male     5.83
16    70 Never Married Female   5   

Slopegraphs

gsoep_sg %>%
  ggplot(aes(x = age, y = health, group = interaction(marital, gender))) + 
    geom_line(aes(color = interaction(marital, gender))) + 
    geom_point(size = 2) + 
    geom_text(
      data = . %>% filter(age == 60)
      , aes(label = paste(marital, gender, sep = ", "))
      , hjust = 1
      , nudge_x = -1
      ) + 
    scale_x_continuous(
      limits = c(52, 72), breaks = c(60,70)
      , labels = c("Age 60", "Age 70"), position = "top"
      ) + 
    scale_y_continuous(position = "right") + 
    scale_color_viridis_d() + 
    labs(x = NULL
         , y = "Health"
         , title = "Only widowed men perceived worsening health\nfrom 60-70") + 
    my_theme() + 
    theme(
      legend.position = "none"
      , axis.line = element_blank()
      , panel.border = element_rect(fill = NA, linewidth = 1.5)
      )