Week 5 (Workbook) - Time Series

Author

Emorie D Beck

Workspace

Quick Side Note: Custom Themes

We have a lot to cover today, so I’m going to skip over some of the usual “how to start with the basics and make it aesthetically pleasing”
Instead, we’ll create a custom these that captures some of our usual additions
This will save us both time and text!
I highly recommend doing this in your own work

Code

my_theme <- function(){
  theme_classic() + 
  theme(
    legend.position = "bottom"
    , legend.title = element_text(face = "bold", size = rel(1))
    , legend.text = element_text(face = "italic", size = rel(1))
    , axis.text = element_text(face = "bold", size = rel(1.1), color = "black")
    , axis.title = element_text(face = "bold", size = rel(1.2))
    , plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)
    , plot.subtitle = element_text(face = "italic", size = rel(1.2), hjust = .5)
    , strip.text = element_text(face = "bold", size = rel(1.1), color = "white")
    , strip.background = element_rect(fill = "black")
    )
}

::::

Time Series

What is a time series?

“a series of values of a quantity obtained at successive times, often with equal intervals between them”

Code

set.seed(7)
tibble(
  t = 1:100
       , value = ceiling(arima.sim(n = 100, list(ar = c(0.8897, -0.4858), ma = c(-0.2279, 0.2488)),
          sd = sqrt(.5)) + 3)
  ) %>% 
  print(n = 15)

# A tibble: 100 × 2
       t value
   <int> <dbl>
 1     1     5
 2     2     5
 3     3     3
 4     4     4
 5     5     4
 6     6     4
 7     7     4
 8     8     3
 9     9     3
10    10     2
11    11     3
12    12     4
13    13     4
14    14     3
15    15     3
# ℹ 85 more rows

Who should care about time series?

People who:

study longitudinal change (e.g., development)
study variability (e.g., experience sampling, passive sensing)
run experiments with multiple trials
study cohort or age differences
simulations (e.g., trace plots in bayesian models)
Time is everywhere, and ignoring it can be problematic

What will we cover with time series:

Univariate time series
Multivariate time series
Connected scatter plots
Smoothing
Detrended time series

Note that this isn’t the first time we’ve seen time series, but today we’ll focus on telling stories with time series

Quick clarification: Time Series v Trends

Trends can mean many things, from autocorrelation to mean-level change
However, in psychology, we often think of trends as patterns of normative longitudinal change
This is fine usage, but trends don’t have be linear / curvilinear increases / decreases
- seasonal trends
- cohort effects
- etc. are all trends, too

Univariate and Multivariate Time Series

Why visualize a time series if you don’t care about the trend?

This is another way to describe your data that can make sure that you see if something went wrong

How you visualize the trends you are trying to uncover in a time series will depend on the research question you are asking
- e.g., very basic time series visualizations are great for descriptives
- But to include it in a presentation / papers, we usually want to add more affordances that clarify nothing went wrong
- Affordances include, text, shading, and more, in aligment with Gestalt principles and how we process different aspects of visualizations

But First, Our Data

These are some Experience Sampling Method data I collected during my time in graduate school Beck & Jackson (2022)
In that paper I built personalized machine learning models of behaviors and experiences from sets of:
- psychological
- situational
- and time variables
We also saw these in Week 2

Code

load(url("https://github.com/emoriebeck/psc290-data-viz-2022/raw/main/05-week5-time-series/01-data/ipcs_data.RData"))
ipcs_data %>% 
  print(n = 6)

# A tibble: 4,222 × 70
  SID   Full_Date    afraid angry attentive content excited goaldir guilty happy
  <chr> <chr>         <dbl> <dbl>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl> <dbl>
1 02    2018-10-22 …      1     2         4       4       2       5      2     3
2 02    2018-10-22 …      1     1         4       3       2       5      1     3
3 02    2018-10-23 …      2     1         2       3       1       2      2     3
4 02    2018-10-23 …      2     2         4       3       2       4      1     3
5 02    2018-10-23 …      2     1         4       4       3       4      1     3
6 02    2018-10-24 …      2     1         4       4       2       4      1     3
# ℹ 4,216 more rows
# ℹ 60 more variables: proud <dbl>, purposeful <dbl>,
#   agreeableness_Compassion <dbl>, agreeableness_Respectfulness <dbl>,
#   agreeableness_Trust <dbl>, conscientiousness_Organization <dbl>,
#   conscientiousness_Productiveness <dbl>,
#   conscientiousness_Responsibility <dbl>, extraversion_Assertiveness <dbl>,
#   extraversion_Energy.Level <dbl>, extraversion_Sociability <dbl>, …

Let’s simplify a bit and say we care about 4 different states for two people:

Code

ipcs_data %>%
  filter(SID == c("216")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  print(n = 10)

# A tibble: 108 × 7
   SID   Full_Date         beep excited goaldir content guilty
   <chr> <chr>            <int>   <dbl>   <dbl>   <dbl>  <dbl>
 1 216   2019-12-09 12:08     1       3       4       4      2
 2 216   2019-12-09 16:06     2       3       4       4      1
 3 216   2019-12-09 20:14     3       3       3       3      1
 4 216   2019-12-10 12:02     4       3       4       3      2
 5 216   2019-12-10 16:08     5       2       4       3      2
 6 216   2019-12-10 20:05     6       3       3       4      2
 7 216   2019-12-10 8:01      7       2       4       3      2
 8 216   2019-12-11 12:29     8       3       3       3      2
 9 216   2019-12-11 16:05     9       2       3       3      2
10 216   2019-12-11 20:10    10       3       4       3      1
# ℹ 98 more rows

Univariate Time Series

It’s hard to make much sense of this because we end up trying to draw the line connecting the points with our eyes:

Code

p <- ipcs_data %>%
  filter(SID == c("02")) %>%
  select(
    SID, Full_Date, beep, excited
    , goaldir, content, guilty
    ) %>%
  ggplot(aes(x = beep, y = excited)) + 
    my_theme(); p

It’s hard to make much sense of this because we end up trying to draw the line connecting the points with our eyes:

Code

p + 
  geom_point()

It’s easier to make much sense of this because we can just follow the line:

Code

p + 
    geom_line() + 
    geom_point()

But often in time series, we won’t want / need to plot the points:

Code

p + 
  geom_line()

One way to highlight increases is using trend lines:

Code

p + 
    geom_line(color = "grey", size = .9) + 
    geom_smooth(method = "lm", formula = y ~ poly(x,2)) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , title = "Contentedness increased in the second week"
      , subtitle = "Participant 2"
      , caption = "y ~ x2"
    )

Another way to highlight changes is to use area, but our scale doesn’t start at zero, so this plot is misleading.

Code

p + 
    geom_area(fill = "purple4", alpha = .2) + 
    geom_line(color = "purple4", size = .9) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(0,5), breaks = seq(1,5,1), labels = 1:5) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , title = "Contentedness increased in the second week"
      , subtitle = "Participant 2"
    ) + 
    my_theme()

Let’s build this up:

Code

p <- ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content-1)) + 
    geom_area(fill = "purple4", alpha = .2) + 
    geom_line(color = "purple4", size = .9) + 
    my_theme(); p

Add scales and labels:

Code

p <- p + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(0,4), breaks = seq(0,4,1), labels = 1:5) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , title = "Contentedness increased in the second week"
      , subtitle = "Participant 2"
    ); p

Add some text to highlight what we want to show:

Code

p + 
    geom_vline(aes(xintercept = 28), linetype = "dashed", size = 1) + 
    annotate("text", label = "Week 1", x = 15, y = 0.1, hjust = .5) + 
    annotate("text", label = "Week 2", x = 40, y = 0.1, hjust = .5)

Multivariate Time Series

We can also apply the same principles to either:
- the same variable across participants
- different variables within the same participant

Mini Multiples Plots

Code

subs <- ipcs_data %>% group_by(SID) %>% tally() %>% arrange(desc(n)) %>% slice_head(n = 20) %>% pull(SID)
ipcs_data %>% 
  filter(SID %in% subs) %>%
  select(SID, Full_Date, beep, content) %>%
  ggplot(aes(x = beep, y = content)) + 
  geom_line() + 
  geom_point(size = .5) + 
  facet_wrap(~SID, nrow = 4) + 
  my_theme()

Code

nested_minis <- ipcs_data %>%
  group_by(SID) %>%
  filter(n() > 30) %>%
  ungroup() %>%
  select(SID) %>%
  distinct() %>%
  mutate(id2 = 1:n())
nested_minis <- nested_minis %>% 
  inner_join(ipcs_data) %>% 
  mutate(gr = ifelse(id2 <= 20, 1, ifelse(id2 <= 40, 2, ifelse(id2 <= 60, 3, 4)))) %>%
  group_by(gr) %>%
  nest() %>%
  ungroup()

Code

mini_mult_plot <- function(d){
  d %>% 
    ggplot(aes(x = beep, y = content)) + 
      geom_line() + 
      geom_point(size = .5) + 
      facet_wrap(~SID, nrow = 4) + 
      my_theme()
}

nested_minis <- nested_minis %>%
  mutate(p = map(data, mini_mult_plot)); nested_minis

Same variable; different participants

One way to build a multivariate time series figure is using the color aesthetic:

Code

p2 <- ipcs_data %>%
  filter(SID %in% c("05", "02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content)) + 
    geom_line(aes(color = SID)) + 
    my_theme(); p2

One way to build a multivariate time series figure is using the color aesthetic, improved with our own color scale:

Code

p3 <- p2 + 
    scale_color_manual(values = c("darkblue", "orange3")) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(1,5), breaks = seq(1,5,1), labels = 1:5) +
    labs(
      x = "Beep (1-50)"
      , y = "Momentary Contentedness (1-5)"
      , color = "Participant ID"
    ) + 
    my_theme() +
    theme(legend.position = "bottom"); p3

Remember, legends add cognitive load, so let’s add labels:

Code

p3 + 
    annotate("label", label = "Participant 02", color = "white", fill = "darkblue", x = 33, y = 2.5, hjust = 0) + 
    annotate("label", label = "Participant 5", color = "white", fill = "orange3", x = 18, y = 1.75, hjust = 0)

Let’s (1) add the story with a title, (2) change the colors to match the story, (3) add trend lines:

Code

p2 + 
    geom_smooth(aes(color = SID), method = "lm",se = F) +
    annotate("label", label = "Participant 02", color = "white", fill = "darkblue", x = 33, y = 2.5, hjust = 0) + 
    annotate("label", label = "Participant 5", color = "white", fill = "grey60", x = 18, y = 1.75, hjust = 0) + 
    scale_color_manual(values = c("darkblue", "grey60")) + 
    labs(
      title = "Both Participants' Contentedness Increased"
      , subtitle = "But Participant 5 remained more content on average"
    )

Different variables; Same Participant

Basic Syntax:

Code

p4 <- ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value)) + 
    geom_line(aes(color = item)) + 
    my_theme(); p4

Let’s improve the colors and add some labels

Code

p4 <- p4 + 
    geom_point(size = .9) + 
    annotate("text", label = "Goal\nDirected", x = 50, y = 4, hjust = 0) + 
    annotate("text", label = "Guilty", x = 50, y = 2, hjust = 0) + 
    scale_color_manual(values = c("orchid4", "orchid2")) +
    scale_x_continuous(limits = c(1,57), breaks = seq(0,50,5)) + 
    theme(legend.position = "none"); p4

Let’s (1) add a story with the title/subtitle and (2) highlight a core part of the time series

Code

p4 + 
    annotate(
      "rect", xmin = 13, xmax = 22, ymin = 1.8
      , ymax = 3.2, fill = "orchid", alpha = .3
      ) + 
    annotate("text", label = "Goal\nDirected", x = 50, y = 4, hjust = 0) + 
    annotate("text", label = "Guilty", x = 50, y = 2, hjust = 0) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Rated Momentary Value (1-5)"
      , title = "When goal-directedness was high, guilt was low"
      , subtitle = "Guilt was rarely equal to or higher than goal-directedness"
    )

Alternatively, we can facet instead of splitting items by color:

Code

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value-1)) + 
    geom_area(aes(fill = item), alpha = .4) + 
    geom_line(color = "orchid4", size = .9) + 
    geom_point(size = .9) + 
    scale_fill_manual(values = c("orchid4", "orchid2")) +
    scale_x_continuous(limits = c(1,50), breaks = seq(0,50,5)) +
    scale_y_continuous(limits = c(0,4), breaks = seq(0,4,1), labels = 1:5) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Rated Momentary Value (1-5)"
      , title = "When goal-directedness was high, guilt was low"
      , subtitle = "Guilt was rarely equal to or higher than goal-directedness"
    ) + 
    facet_wrap(~item, nrow = 2) + 
    theme_classic() + 
    theme(
      legend.position = "none"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , legend.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.2))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
      , strip.background = element_rect(fill = "orchid4")
      , strip.text = element_text(face = "bold", size = rel(1.2), color = "white")
      )

Spaghetti Plots

We’ve talked several times about trends of either a single individual or aggregate trends that collapse across multiple individuals.
But what if we want to combine the two?
Let’s load in a new data set that will allow us to examine growth trajectories across longer periods of time
We’ve got a variety of variables assessed longitudinally for multiple individuals
Age is time varying, so we have both year and age as time variables to index on

Code

load(url("https://github.com/emoriebeck/psc290-data-viz-2022/blob/main/05-week5-time-series/01-data/gsoep.RData?raw=true"))
gsoep

Code

gsoep %>%
  select(SID, age, SRhealth, satHealth) %>%
  filter(age < 100 & age > 19) %>% 
  drop_na() %>%
  filter(SID %in% sample(unique(.$SID), 1000)) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  )

Code

p6 <- gsoep %>%
  select(SID, age, SRhealth, satHealth) %>%
  filter(age < 100 & age > 19) %>% 
  drop_na() %>%
  filter(SID %in% sample(unique(.$SID), 1000)) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = age, y = value, color = item)) + 
  my_theme()
p6

Code

p6 + 
    geom_point(alpha = .2, size = .5, color = "grey") + 
    geom_smooth(
      aes(group = interaction(SID, item))
      , method = "lm", se = F
      , alpha = .3, size = .5
      ) + 
    scale_color_manual(values = c("darkblue", "darkred")) + 
    facet_wrap(~item) + 
    theme(legend.position = "none")

Model Predictions

But this is a longitudinal change, so we can’t just do that. We have to estimate the growth models first
The model we’ll use is:
$Y_{ij} = \beta_{0j} + \beta_{1j}*age_{ij} + \epsilon_{ij}$
- $\beta_{0j} = \gamma_{00} + u_{0j}$
- $\beta_{1j} = \gamma_{10} + u_{1j}$

\[\epsilon \sim \mathcal{N}(0, \sigma^2)\]

\[ \begin{align*} \Biggr[ \begin{array}{c} u_{0j}\\ u_{1j}\end{array} \Biggr] \sim \mathcal{N} \left( \Biggr[ \begin{array}{c} 0\\ 0\end{array} \Biggr], \begin{array}{cc} \tau_{00}^2 & \tau_{10} \\ \tau_{10} & \tau_{11}^2 \end{array} \right) \end{align*} \]

$\beta_{0j}$ is the intercept for person $j$ at time $i$
$\beta_{1j}$ is the linear change for person $j$

Code

library(lme4)
lmer_fun <- function(d) lmer(value ~ 1 + age + (1 + age | SID), data = d)

m_nested <- gsoep %>%
  select(SID, age, SRhealth, satHealth) %>%
  filter(age < 100 & age > 19) %>% 
  drop_na() %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  mutate(age = age/10) %>%
  group_by(item) %>%
  nest() %>%
  ungroup() %>%
  mutate(
    m = map(data, lmer_fun)
    , tidy = map(m, broom.mixed::tidy)); m_nested

Code

m_nested %>% 
  select(-data, -m) %>%
  unnest(tidy) %>%
  print(n = 12)

# A tibble: 12 × 7
   item      effect   group    term                 estimate std.error statistic
   <chr>     <chr>    <chr>    <chr>                   <dbl>     <dbl>     <dbl>
 1 SRhealth  fixed    <NA>     (Intercept)             4.57    0.0111      410. 
 2 SRhealth  fixed    <NA>     age                    -0.245   0.00242    -101. 
 3 SRhealth  ran_pars SID      sd__(Intercept)         1.05   NA            NA  
 4 SRhealth  ran_pars SID      cor__(Intercept).age   -0.822  NA            NA  
 5 SRhealth  ran_pars SID      sd__age                 0.214  NA            NA  
 6 SRhealth  ran_pars Residual sd__Observation         0.616  NA            NA  
 7 satHealth fixed    <NA>     (Intercept)             9.04    0.0261      346. 
 8 satHealth fixed    <NA>     age                    -0.488   0.00575     -84.7
 9 satHealth ran_pars SID      sd__(Intercept)         2.42   NA            NA  
10 satHealth ran_pars SID      cor__(Intercept).age   -0.818  NA            NA  
11 satHealth ran_pars SID      sd__age                 0.510  NA            NA  
12 satHealth ran_pars Residual sd__Observation         1.47   NA            NA

Code

fx_pred_fun <- function(m){
  tibble(age = seq(2,9.9, .1)) %>%
    mutate(
      pred = predict(m, newdata = ., re.form = NA)
    )
}

rx_pred_fun <- function(m, d){
  d %>%
    select(SID, age) %>%
    distinct() %>%
    mutate(pred = predict(m, newdata = .))
}

m_nested <- m_nested %>%
  mutate(fx_pred = map(m, fx_pred_fun)
         , rx_pred = map2(m, data, rx_pred_fun))

Code

set.seed(6)
sids <- sample(unique(m_nested$data[[1]]$SID), 1000)
m_nested %>%
  select(item, rx_pred) %>%
  unnest(rx_pred) %>%
  filter(SID %in% sids)

Code

set.seed(6)
sids <- sample(unique(m_nested$data[[1]]$SID), 1000)
p7 <- m_nested %>%
  select(item, rx_pred) %>%
  unnest(rx_pred) %>%
  filter(SID %in% sids) %>%
  ggplot(aes(x = age*10, y = pred, color = item)) + 
    geom_line(
      aes(group = factor(SID))
      , alpha = .25
      , size = .75) + 
    facet_wrap(~item) + 
    my_theme() + 
    theme(legend.position = "none"); p7

Code

p7 <- p7 + 
    geom_line(
      data = m_nested %>% 
        select(item, fx_pred) %>%
        unnest(fx_pred)
      , size = 1
      ) + 
    scale_color_manual(values = c("darkblue", "darkred")); p7

Code

p7 + 
    labs(x = "Age"
         , y = "Model Predicted Values (0-10)"
         , title = "Self-Rated Health and Health Satisfaction Decreased\nAcross the Lifespan, on Average")

A Quick Note on Smoothing

Often, we want to smooth our trajectories, like if we wanted to look at the trajectory of self-rated health and satisfaction with health across age groups

Code

gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  mutate(gender = haven::zap_labels(gender)) %>%
  drop_na() %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  ungroup()

But we have to choose how and when to smooth carefully
E.g., when do we choose a linear model v. a loess line?
E.g., Do we do an overall trajectory or stratify by group?

Code

gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  drop_na() %>%
  mutate(gender = haven::zap_labels(gender)) %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  )

Code

p8 <- gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  mutate(gender = haven::zap_labels(gender)) %>%
  drop_na() %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = age, y = value)) +
    facet_wrap(~item, nrow = 2) + 
    my_theme(); p8

Code

p8  + 
    geom_line(aes(color = interaction(gender, marital))) + 
    geom_smooth(method = "loess") + 
    theme(legend.position = "bottom")

We may want to highlight overall trajectories v. stratefied ones differently to highlight their consistency:

Code

p9 <- gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  drop_na() %>%
  mutate(gender = haven::zap_labels(gender)) %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  ungroup() %>%
  mutate(marital = factor(
    marital
    , 1:4
    , c("Married", "Separated", "Widowed", "Never Married")
    ), gender = factor(gender, c(1,2), c("Male", "Female"))
    ) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = age, y = value)) + 
    my_theme(); p9

We may want to highlight overall trajectories v. stratefied ones differently to highlight their consistency:

Code

p9 + 
    geom_line(aes(color = interaction(gender, marital))) + 
    geom_smooth(method = "lm", color = "orchid3", fill = "orchid3") + 
    scale_color_grey(start = .01, end = .99) + 
    facet_wrap(~item, nrow = 2) + 
    theme(legend.position = "bottom")

Code

p9 + 
    geom_smooth(
      aes(color = interaction(gender, marital))
      , method = "loess"
      , se = F
      ) + 
    geom_smooth(method = "lm", color = "orchid3", fill = "orchid3") + 
    scale_color_grey(start = .01, end = .99) + 
    facet_wrap(~item, nrow = 2) + 
    labs(x = "Age", y = "Smoothed Value", color = NULL) + 
    guides(color = guide_legend(nrow = 4)) + 
    theme(legend.position = "bottom")

Slopegraphs

We didn’t have a chance to review slopegraphs when discussing visualizing associations
But they also apply to visualizing change, especially when we have two wave measures of change (e.g., pre-post designs)
When considered over time, these figures also give us a sense of rank-order consistency

First, let’s wrangle our data:

Code

pomp <- function(x) (x - min(x, na.rm = T))/(max(x, na.rm = T) - min(x, na.rm = T))*10
gsoep_sg <- gsoep %>%
  select(SID, age, satHealth, SRhealth, marital, gender) %>%
  filter(age %in% c(60,70)) %>%
  drop_na() %>% 
  group_by(SID) %>%
  filter(n() == 2) %>%
  group_by(age) %>%
  mutate(SRhealth = pomp(SRhealth)) %>%
  group_by(age, SID, marital, gender) %>%
  mutate(health = mean(SRhealth, satHealth)) %>%
  group_by(age, marital, gender) %>%
  summarize(health = mean(health)) %>%
  ungroup() %>%
  mutate(marital = factor(
    marital
    , 1:4
    , c("Married", "Separated", "Widowed", "Never Married")
    ), gender = factor(gender, c(1,2), c("Male", "Female"))
    ); gsoep_sg

Code

gsoep_sg %>%
  ggplot(aes(x = age, y = health, group = interaction(marital, gender))) + 
    geom_line(aes(color = interaction(marital, gender))) + 
    geom_point(size = 2) + 
    geom_text(
      data = . %>% filter(age == 60)
      , aes(label = paste(marital, gender, sep = ", "))
      , hjust = 1
      , nudge_x = -1
      ) + 
    scale_x_continuous(
      limits = c(52, 72), breaks = c(60,70)
      , labels = c("Age 60", "Age 70"), position = "top"
      ) + 
    scale_y_continuous(position = "right") + 
    scale_color_viridis_d() + 
    labs(x = NULL
         , y = "Health"
         , title = "Only widowed men perceived worsening health\nfrom 60-70") + 
    my_theme() + 
    theme(
      legend.position = "none"
      , axis.line = element_blank()
      , panel.border = element_rect(fill = NA, linewidth = 1.5)
      )

--- title: "Week 5 (Workbook) - Time Series" author: "Emorie D Beck" format: html: code-tools: true code-copy: true code-line-numbers: true code-link: true theme: united highlight-style: tango df-print: paged code-fold: show toc: true toc-float: true self-contained: true # height: 900 footer: "PSC 290 - Data Visualization" logo: "https://github.com/emoriebeck/psc290-data-viz-2022/raw/main/01-week1-intro/02-code/02-images/ucdavis_logo_blue.png" editor_options: chunk_output_type: console --- # Workspace ```{r, echo = F} knitr::opts_chunk$set(echo = TRUE, warning = F, message = F, error = F, out.width = "90%") options(knitr.kable.NA = '') ``` ```{r, echo = F} library(RColorBrewer) library(knitr) library(kableExtra) library(plyr) library(broom) library(tidyverse) ``` ## Quick Side Note: Custom Themes {.smaller} - We have a lot to cover today, so I'm going to skip over some of the usual "how to start with the basics and make it aesthetically pleasing" - Instead, we'll create a custom these that captures some of our usual additions\ - This will save us both time and text!\ - I highly recommend doing this in your own work ```{r} my_theme <- function(){ theme_classic() + theme( legend.position = "bottom" , legend.title = element_text(face = "bold", size = rel(1)) , legend.text = element_text(face = "italic", size = rel(1)) , axis.text = element_text(face = "bold", size = rel(1.1), color = "black") , axis.title = element_text(face = "bold", size = rel(1.2)) , plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5) , plot.subtitle = element_text(face = "italic", size = rel(1.2), hjust = .5) , strip.text = element_text(face = "bold", size = rel(1.1), color = "white") , strip.background = element_rect(fill = "black") ) } ``` :::: # Time Series ## What is a time series? > "a series of values of a quantity obtained at successive times, often with equal intervals between them" ```{r, fig.align='center', out.width="70%", echo = F} knitr::include_graphics("https://github.com/emoriebeck/psc290-data-viz-2022/raw/main/05-week5-time-series/02-images/bpm-trend.png") ``` ```{r} set.seed(7) tibble( t = 1:100 , value = ceiling(arima.sim(n = 100, list(ar = c(0.8897, -0.4858), ma = c(-0.2279, 0.2488)), sd = sqrt(.5)) + 3) ) %>% print(n = 15) ``` ```{r, echo=F, fig.show='show', fig.align='right', fig.dim= c(8,4), out.width="120%"} set.seed(7) tibble( t = 1:100 , value = ceiling(arima.sim(n = 100, list(ar = c(0.8897, -0.4858), ma = c(-0.2279, 0.2488)), sd = sqrt(.5)) + 3) ) %>% ggplot(aes(x = t, y = value)) + geom_line(color = "seagreen4", size = .75) + scale_y_continuous(limits = c(1,5), breaks = seq(1,5,1)) + labs(x = "Time Point (1-100)" , y = "Value" , title = "Sample Time Series Plot") + my_theme() ``` ## Who should care about time series? People who: - study longitudinal change (e.g., development) - study variability (e.g., experience sampling, passive sensing) - run experiments with multiple trials - study cohort or age differences - simulations (e.g., trace plots in bayesian models) - Time is everywhere, and ignoring it can be problematic ## What will we cover with time series: - Univariate time series\ - Multivariate time series\ - Connected scatter plots - Smoothing - Detrended time series Note that this isn't the first time we've seen time series, but today we'll focus on *telling stories with time series* ## Quick clarification: Time Series v Trends - Trends can mean many things, from autocorrelation to mean-level change - However, in psychology, we often think of trends as patterns of normative longitudinal change - This is fine usage, but trends don't have be linear / curvilinear increases / decreases - seasonal trends - cohort effects - etc. are all trends, too # Univariate and Multivariate Time Series Why visualize a time series if you don't care about the trend? - This is another way to **describe** your data that can make sure that you see if something went wrong ```{r, echo = F} knitr::include_graphics("https://github.com/emoriebeck/psc290-data-viz-2022/raw/main/05-week5-time-series/02-images/eiko-tweet.png") ``` - How you visualize the trends you are trying to uncover in a time series will depend on the research question you are asking - e.g., very basic time series visualizations are great for descriptives - But to include it in a presentation / papers, we usually want to add more affordances that clarify nothing went wrong - Affordances include, text, shading, and more, in aligment with Gestalt principles and how we process different aspects of visualizations ## But First, Our Data - These are some Experience Sampling Method data I collected during my time in graduate school [Beck & Jackson (2022)](https://psyarxiv.com/syhw5) - In that paper I built personalized machine learning models of behaviors and experiences from sets of: - psychological - situational - and time variables - We also saw these in Week 2 ```{r, echo = T} load(url("https://github.com/emoriebeck/psc290-data-viz-2022/raw/main/05-week5-time-series/01-data/ipcs_data.RData")) ipcs_data %>% print(n = 6) ``` ```{r, echo=F} ipcs_data <- ipcs_data %>% group_by(SID) %>% mutate(beep = 1:n()) %>% ungroup() ``` Let's simplify a bit and say we care about 4 different states for two people: ```{r} ipcs_data %>% filter(SID == c("216")) %>% select(SID, Full_Date, beep, excited , goaldir, content, guilty) %>% print(n = 10) ``` ## Univariate Time Series It's hard to make much sense of this because we end up trying to draw the line connecting the points with our eyes: ```{r} p <- ipcs_data %>% filter(SID == c("02")) %>% select( SID, Full_Date, beep, excited , goaldir, content, guilty ) %>% ggplot(aes(x = beep, y = excited)) + my_theme(); p ``` It's hard to make much sense of this because we end up trying to draw the line connecting the points with our eyes: ```{r} #| code-line-numbers: "|1-4|5-7" p + geom_point() ``` It's easier to make much sense of this because we can just follow the line: ```{r} p + geom_line() + geom_point() ``` But often in time series, we won't want / need to plot the points: ```{r} p + geom_line() ``` One way to highlight increases is using trend lines: ```{r} p + geom_line(color = "grey", size = .9) + geom_smooth(method = "lm", formula = y ~ poly(x,2)) + scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + labs( x = "Beep (1-50)" , y = "Self-Reported Momentary Contentedness (1-5)" , title = "Contentedness increased in the second week" , subtitle = "Participant 2" , caption = "y ~ x2" ) ``` Another way to highlight changes is to use area, but our scale doesn't start at zero, so this plot is misleading. ```{r} p + geom_area(fill = "purple4", alpha = .2) + geom_line(color = "purple4", size = .9) + scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + scale_y_continuous(limits = c(0,5), breaks = seq(1,5,1), labels = 1:5) + labs( x = "Beep (1-50)" , y = "Self-Reported Momentary Contentedness (1-5)" , title = "Contentedness increased in the second week" , subtitle = "Participant 2" ) + my_theme() ``` Let's build this up: ```{r} p <- ipcs_data %>% filter(SID == c("02")) %>% select(SID, Full_Date, beep, excited , goaldir, content, guilty) %>% ggplot(aes(x = beep, y = content-1)) + geom_area(fill = "purple4", alpha = .2) + geom_line(color = "purple4", size = .9) + my_theme(); p ``` Add scales and labels: ```{r} p <- p + scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + scale_y_continuous(limits = c(0,4), breaks = seq(0,4,1), labels = 1:5) + labs( x = "Beep (1-50)" , y = "Self-Reported Momentary Contentedness (1-5)" , title = "Contentedness increased in the second week" , subtitle = "Participant 2" ); p ``` Add some text to highlight what we want to show: ```{r} p + geom_vline(aes(xintercept = 28), linetype = "dashed", size = 1) + annotate("text", label = "Week 1", x = 15, y = 0.1, hjust = .5) + annotate("text", label = "Week 2", x = 40, y = 0.1, hjust = .5) ``` ## Multivariate Time Series - We can also apply the same principles to either: - the same variable across participants - different variables within the same participant ### Mini Multiples Plots ```{r, echo = F, fig.dim=c(12,8)} subs <- ipcs_data %>% group_by(SID) %>% tally() %>% arrange(desc(n)) %>% slice_head(n = 20) %>% pull(SID) ipcs_data %>% filter(SID %in% subs) %>% select(SID, Full_Date, beep, excited , goaldir, content, guilty) %>% ggplot(aes(x = beep, y = content)) + geom_line() + geom_point(size = .5) + labs( x = "Beep" , y = "Momentary Contentedness (1-5)" , title = "Mini Multiples Plot of Contentedness Time Series") + facet_wrap(~SID, nrow = 4) + my_theme() ``` ```{r, eval = T, fig.dim=c(12,8)} subs <- ipcs_data %>% group_by(SID) %>% tally() %>% arrange(desc(n)) %>% slice_head(n = 20) %>% pull(SID) ipcs_data %>% filter(SID %in% subs) %>% select(SID, Full_Date, beep, content) %>% ggplot(aes(x = beep, y = content)) + geom_line() + geom_point(size = .5) + facet_wrap(~SID, nrow = 4) + my_theme() ``` ```{r} nested_minis <- ipcs_data %>% group_by(SID) %>% filter(n() > 30) %>% ungroup() %>% select(SID) %>% distinct() %>% mutate(id2 = 1:n()) nested_minis <- nested_minis %>% inner_join(ipcs_data) %>% mutate(gr = ifelse(id2 <= 20, 1, ifelse(id2 <= 40, 2, ifelse(id2 <= 60, 3, 4)))) %>% group_by(gr) %>% nest() %>% ungroup() ``` ```{r} mini_mult_plot <- function(d){ d %>% ggplot(aes(x = beep, y = content)) + geom_line() + geom_point(size = .5) + facet_wrap(~SID, nrow = 4) + my_theme() } nested_minis <- nested_minis %>% mutate(p = map(data, mini_mult_plot)); nested_minis ``` ### Same variable; different participants One way to build a multivariate time series figure is using the `color` aesthetic: ```{r} p2 <- ipcs_data %>% filter(SID %in% c("05", "02")) %>% select(SID, Full_Date, beep, excited , goaldir, content, guilty) %>% ggplot(aes(x = beep, y = content)) + geom_line(aes(color = SID)) + my_theme(); p2 ``` One way to build a multivariate time series figure is using the `color` aesthetic, *improved with our own color scale*: ```{r} p3 <- p2 + scale_color_manual(values = c("darkblue", "orange3")) + scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + scale_y_continuous(limits = c(1,5), breaks = seq(1,5,1), labels = 1:5) + labs( x = "Beep (1-50)" , y = "Momentary Contentedness (1-5)" , color = "Participant ID" ) + my_theme() + theme(legend.position = "bottom"); p3 ``` Remember, legends add cognitive load, so let's add `labels`: ```{r} p3 + annotate("label", label = "Participant 02", color = "white", fill = "darkblue", x = 33, y = 2.5, hjust = 0) + annotate("label", label = "Participant 5", color = "white", fill = "orange3", x = 18, y = 1.75, hjust = 0) ``` Let's (1) add the story with a title, (2) change the colors to match the story, (3) add trend lines: ```{r} p2 + geom_smooth(aes(color = SID), method = "lm",se = F) + annotate("label", label = "Participant 02", color = "white", fill = "darkblue", x = 33, y = 2.5, hjust = 0) + annotate("label", label = "Participant 5", color = "white", fill = "grey60", x = 18, y = 1.75, hjust = 0) + scale_color_manual(values = c("darkblue", "grey60")) + labs( title = "Both Participants' Contentedness Increased" , subtitle = "But Participant 5 remained more content on average" ) ``` ### Different variables; Same Participant Basic Syntax: ```{r} p4 <- ipcs_data %>% filter(SID == "02") %>% select(SID, Full_Date, beep, goaldir, guilty) %>% pivot_longer( cols = c(goaldir, guilty) , names_to = "item" , values_to = "value" ) %>% ggplot(aes(x = beep, y = value)) + geom_line(aes(color = item)) + my_theme(); p4 ``` Let's improve the colors and add some labels ```{r} p4 <- p4 + geom_point(size = .9) + annotate("text", label = "Goal\nDirected", x = 50, y = 4, hjust = 0) + annotate("text", label = "Guilty", x = 50, y = 2, hjust = 0) + scale_color_manual(values = c("orchid4", "orchid2")) + scale_x_continuous(limits = c(1,57), breaks = seq(0,50,5)) + theme(legend.position = "none"); p4 ``` Let's (1) add a story with the title/subtitle and (2) highlight a core part of the time series ```{r} p4 + annotate( "rect", xmin = 13, xmax = 22, ymin = 1.8 , ymax = 3.2, fill = "orchid", alpha = .3 ) + annotate("text", label = "Goal\nDirected", x = 50, y = 4, hjust = 0) + annotate("text", label = "Guilty", x = 50, y = 2, hjust = 0) + labs( x = "Beep (1-50)" , y = "Self-Rated Momentary Value (1-5)" , title = "When goal-directedness was high, guilt was low" , subtitle = "Guilt was rarely equal to or higher than goal-directedness" ) ``` Alternatively, we can `facet` instead of splitting items by `color`: ```{r} ipcs_data %>% filter(SID == "02") %>% select(SID, Full_Date, beep, goaldir, guilty) %>% pivot_longer( cols = c(goaldir, guilty) , names_to = "item" , values_to = "value" ) %>% ggplot(aes(x = beep, y = value-1)) + geom_area(aes(fill = item), alpha = .4) + geom_line(color = "orchid4", size = .9) + geom_point(size = .9) + scale_fill_manual(values = c("orchid4", "orchid2")) + scale_x_continuous(limits = c(1,50), breaks = seq(0,50,5)) + scale_y_continuous(limits = c(0,4), breaks = seq(0,4,1), labels = 1:5) + labs( x = "Beep (1-50)" , y = "Self-Rated Momentary Value (1-5)" , title = "When goal-directedness was high, guilt was low" , subtitle = "Guilt was rarely equal to or higher than goal-directedness" ) + facet_wrap(~item, nrow = 2) + theme_classic() + theme( legend.position = "none" , legend.text = element_text(face = "bold", size = rel(1.1)) , legend.title = element_text(face = "bold", size = rel(1.1)) , axis.text = element_text(face = "bold", color ="black", size = rel(1.1)) , axis.title = element_text(face = "bold", size = rel(1.1)) , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.2)) , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1)) , strip.background = element_rect(fill = "orchid4") , strip.text = element_text(face = "bold", size = rel(1.2), color = "white") ) ```                                                                                                                                                                                                                                                                                                                                                                                    ### Spaghetti Plots - We've talked several times about trends of either a single individual or aggregate trends that collapse across multiple individuals. - But what if we want to combine the two? - Let's load in a new data set that will allow us to examine growth trajectories across longer periods of time - We've got a variety of variables assessed longitudinally for multiple individuals - Age is time varying, so we have both year and age as time variables to index on ```{r} load(url("https://github.com/emoriebeck/psc290-data-viz-2022/blob/main/05-week5-time-series/01-data/gsoep.RData?raw=true")) gsoep ``` ```{r} gsoep %>% select(SID, age, SRhealth, satHealth) %>% filter(age < 100 & age > 19) %>% drop_na() %>% filter(SID %in% sample(unique(.$SID), 1000)) %>% pivot_longer( cols = c(SRhealth, satHealth) , names_to = "item" , values_to = "value" ) ``` ```{r} p6 <- gsoep %>% select(SID, age, SRhealth, satHealth) %>% filter(age < 100 & age > 19) %>% drop_na() %>% filter(SID %in% sample(unique(.$SID), 1000)) %>% pivot_longer( cols = c(SRhealth, satHealth) , names_to = "item" , values_to = "value" ) %>% ggplot(aes(x = age, y = value, color = item)) + my_theme() p6 ``` ```{r} p6 + geom_point(alpha = .2, size = .5, color = "grey") + geom_smooth( aes(group = interaction(SID, item)) , method = "lm", se = F , alpha = .3, size = .5 ) + scale_color_manual(values = c("darkblue", "darkred")) + facet_wrap(~item) + theme(legend.position = "none") ``` ### Model Predictions - But this is a longitudinal change, so we can't just do that. We have to estimate the growth models first - The model we'll use is: - $Y_{ij} = \beta_{0j} + \beta_{1j}*age_{ij} + \epsilon_{ij}$ - $\beta_{0j} = \gamma_{00} + u_{0j}$ - $\beta_{1j} = \gamma_{10} + u_{1j}$ $$\epsilon \sim \mathcal{N}(0, \sigma^2)$$ $$ \begin{align*} \Biggr[ \begin{array}{c} u_{0j}\\ u_{1j}\end{array} \Biggr] \sim \mathcal{N} \left( \Biggr[ \begin{array}{c} 0\\ 0\end{array} \Biggr], \begin{array}{cc} \tau_{00}^2 & \tau_{10} \\ \tau_{10} & \tau_{11}^2 \end{array} \right) \end{align*} $$ - $\beta_{0j}$ is the intercept for person $j$ at time $i$ - $\beta_{1j}$ is the linear change for person $j$ ```{r} library(lme4) lmer_fun <- function(d) lmer(value ~ 1 + age + (1 + age | SID), data = d) m_nested <- gsoep %>% select(SID, age, SRhealth, satHealth) %>% filter(age < 100 & age > 19) %>% drop_na() %>% pivot_longer( cols = c(SRhealth, satHealth) , names_to = "item" , values_to = "value" ) %>% mutate(age = age/10) %>% group_by(item) %>% nest() %>% ungroup() %>% mutate( m = map(data, lmer_fun) , tidy = map(m, broom.mixed::tidy)); m_nested ``` ```{r} m_nested %>% select(-data, -m) %>% unnest(tidy) %>% print(n = 12) ``` ```{r} fx_pred_fun <- function(m){ tibble(age = seq(2,9.9, .1)) %>% mutate( pred = predict(m, newdata = ., re.form = NA) ) } rx_pred_fun <- function(m, d){ d %>% select(SID, age) %>% distinct() %>% mutate(pred = predict(m, newdata = .)) } m_nested <- m_nested %>% mutate(fx_pred = map(m, fx_pred_fun) , rx_pred = map2(m, data, rx_pred_fun)) ``` ```{r} set.seed(6) sids <- sample(unique(m_nested$data[[1]]$SID), 1000) m_nested %>% select(item, rx_pred) %>% unnest(rx_pred) %>% filter(SID %in% sids) ``` ```{r} set.seed(6) sids <- sample(unique(m_nested$data[[1]]$SID), 1000) p7 <- m_nested %>% select(item, rx_pred) %>% unnest(rx_pred) %>% filter(SID %in% sids) %>% ggplot(aes(x = age*10, y = pred, color = item)) + geom_line( aes(group = factor(SID)) , alpha = .25 , size = .75) + facet_wrap(~item) + my_theme() + theme(legend.position = "none"); p7 ``` ```{r} p7 <- p7 + geom_line( data = m_nested %>% select(item, fx_pred) %>% unnest(fx_pred) , size = 1 ) + scale_color_manual(values = c("darkblue", "darkred")); p7 ``` ```{r} p7 + labs(x = "Age" , y = "Model Predicted Values (0-10)" , title = "Self-Rated Health and Health Satisfaction Decreased\nAcross the Lifespan, on Average") ``` ## A Quick Note on Smoothing Often, we want to smooth our trajectories, like if we wanted to look at the trajectory of self-rated health and satisfaction with health across age groups ```{r} gsoep %>% select(SID, age, SRhealth, satHealth, marital, gender) %>% mutate(gender = haven::zap_labels(gender)) %>% drop_na() %>% filter(age >= 20 & age <= 100) %>% group_by(age, marital, gender) %>% summarize_at(vars(satHealth, SRhealth), mean) %>% ungroup() ``` - But we have to choose how and when to smooth carefully - E.g., when do we choose a linear model v. a loess line? - E.g., Do we do an overall trajectory or stratify by group? ```{r, eval = F} gsoep %>% select(SID, age, SRhealth, satHealth, marital, gender) %>% drop_na() %>% mutate(gender = haven::zap_labels(gender)) %>% filter(age >= 20 & age <= 100) %>% group_by(age, marital, gender) %>% summarize_at(vars(satHealth, SRhealth), mean) %>% ungroup() %>% pivot_longer( cols = c(SRhealth, satHealth) , names_to = "item" , values_to = "value" ) ``` ```{r} p8 <- gsoep %>% select(SID, age, SRhealth, satHealth, marital, gender) %>% mutate(gender = haven::zap_labels(gender)) %>% drop_na() %>% filter(age >= 20 & age <= 100) %>% group_by(age, marital, gender) %>% summarize_at(vars(satHealth, SRhealth), mean) %>% pivot_longer( cols = c(SRhealth, satHealth) , names_to = "item" , values_to = "value" ) %>% ggplot(aes(x = age, y = value)) + facet_wrap(~item, nrow = 2) + my_theme(); p8 ``` ```{r} p8 + geom_line(aes(color = interaction(gender, marital))) + geom_smooth(method = "loess") + theme(legend.position = "bottom") ``` We may want to highlight overall trajectories v. stratefied ones differently to highlight their consistency: ```{r} p9 <- gsoep %>% select(SID, age, SRhealth, satHealth, marital, gender) %>% drop_na() %>% mutate(gender = haven::zap_labels(gender)) %>% filter(age >= 20 & age <= 100) %>% group_by(age, marital, gender) %>% summarize_at(vars(satHealth, SRhealth), mean) %>% ungroup() %>% mutate(marital = factor( marital , 1:4 , c("Married", "Separated", "Widowed", "Never Married") ), gender = factor(gender, c(1,2), c("Male", "Female")) ) %>% pivot_longer( cols = c(SRhealth, satHealth) , names_to = "item" , values_to = "value" ) %>% ggplot(aes(x = age, y = value)) + my_theme(); p9 ``` We may want to highlight overall trajectories v. stratefied ones differently to highlight their consistency: ```{r, eval = F} p9 + geom_line(aes(color = interaction(gender, marital))) + geom_smooth(method = "lm", color = "orchid3", fill = "orchid3") + scale_color_grey(start = .01, end = .99) + facet_wrap(~item, nrow = 2) + theme(legend.position = "bottom") ``` ```{r, fig.dim=c(6,6)} p9 + geom_smooth( aes(color = interaction(gender, marital)) , method = "loess" , se = F ) + geom_smooth(method = "lm", color = "orchid3", fill = "orchid3") + scale_color_grey(start = .01, end = .99) + facet_wrap(~item, nrow = 2) + labs(x = "Age", y = "Smoothed Value", color = NULL) + guides(color = guide_legend(nrow = 4)) + theme(legend.position = "bottom") ``` ## Slopegraphs - We didn't have a chance to review slopegraphs when discussing visualizing associations - But they also apply to visualizing change, especially when we have two wave measures of change (e.g., pre-post designs) - When considered over time, these figures also give us a sense of rank-order consistency First, let's wrangle our data: ```{r} pomp <- function(x) (x - min(x, na.rm = T))/(max(x, na.rm = T) - min(x, na.rm = T))*10 gsoep_sg <- gsoep %>% select(SID, age, satHealth, SRhealth, marital, gender) %>% filter(age %in% c(60,70)) %>% drop_na() %>% group_by(SID) %>% filter(n() == 2) %>% group_by(age) %>% mutate(SRhealth = pomp(SRhealth)) %>% group_by(age, SID, marital, gender) %>% mutate(health = mean(SRhealth, satHealth)) %>% group_by(age, marital, gender) %>% summarize(health = mean(health)) %>% ungroup() %>% mutate(marital = factor( marital , 1:4 , c("Married", "Separated", "Widowed", "Never Married") ), gender = factor(gender, c(1,2), c("Male", "Female")) ); gsoep_sg ``` ```{r} gsoep_sg %>% ggplot(aes(x = age, y = health, group = interaction(marital, gender))) + geom_line(aes(color = interaction(marital, gender))) + geom_point(size = 2) + geom_text( data = . %>% filter(age == 60) , aes(label = paste(marital, gender, sep = ", ")) , hjust = 1 , nudge_x = -1 ) + scale_x_continuous( limits = c(52, 72), breaks = c(60,70) , labels = c("Age 60", "Age 70"), position = "top" ) + scale_y_continuous(position = "right") + scale_color_viridis_d() + labs(x = NULL , y = "Health" , title = "Only widowed men perceived worsening health\nfrom 60-70") + my_theme() + theme( legend.position = "none" , axis.line = element_blank() , panel.border = element_rect(fill = NA, linewidth = 1.5) ) ```