Week 5: Visualizing Time Series and Trends

Emorie D Beck

Time Series

What is a time series?

“a series of values of a quantity obtained at successive times, often with equal intervals between them”

What is a time series?

# A tibble: 100 × 2
       t value
   <int> <dbl>
 1     1     5
 2     2     5
 3     3     3
 4     4     4
 5     5     4
 6     6     4
 7     7     4
 8     8     3
 9     9     3
10    10     2
11    11     3
12    12     4
13    13     4
14    14     3
15    15     3
# … with 85 more rows

Who should care about time series?

People who:

  • study longitudinal change (e.g., development)

  • study variability (e.g., experience sampling, passive sensing)

  • run experiments with multiple trials

  • study cohort or age differences

  • simulations (e.g., trace plots in bayesian models)

  • Time is everywhere, and ignoring it can be problematic

What will we cover with time series:

  • Univariate time series
  • Multivariate time series
  • Connected scatter plots
  • Smoothing
  • Detrended time series

This isn’t the first time we’ve seen time series, but today we’ll focus on telling stories with time series

Univariate and Multivariate Time Series

Why visualize a time series if you don’t care about the trend?

  • This is another way to describe your data that can make sure that you see if something went wrong

Univariate and Multivariate Time Series

  • How you visualize the trends you are trying to uncover in a time series will depend on the research question you are asking
    • e.g., very basic time series visualizations are great for descriptives
    • But to include it in a presentation / papers, we usually want to add more affordances that clarify nothing went wrong
    • Affordances include, text, shading, and more, in aligment with Gestalt principles and how we process different aspects of visualizations

But First, Our Data

  • These are some Experience Sampling Method data I collected during my time in graduate school Beck & Jackson (2022)
  • In that paper I built personalized machine learning models of behaviors and experiences from sets of:
    • psychological
    • situational
    • and time variables
  • We also saw these in Week 2
load(url("https://github.com/emoriebeck/psc290-data-viz-2022/raw/main/05-week5-time-series/01-data/ipcs_data.RData"))
ipcs_data %>% 
  print(n = 6)
# A tibble: 4,222 × 70
  SID   Full_D…¹ afraid angry atten…² content excited goaldir guilty happy proud
  <chr> <chr>     <dbl> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl> <dbl> <dbl>
1 02    2018-10…      1     2       4       4       2       5      2     3     4
2 02    2018-10…      1     1       4       3       2       5      1     3     3
3 02    2018-10…      2     1       2       3       1       2      2     3     2
4 02    2018-10…      2     2       4       3       2       4      1     3     3
5 02    2018-10…      2     1       4       4       3       4      1     3     3
6 02    2018-10…      2     1       4       4       2       4      1     3     3
# … with 4,216 more rows, 59 more variables: purposeful <dbl>,
#   agreeableness_Compassion <dbl>, agreeableness_Respectfulness <dbl>,
#   agreeableness_Trust <dbl>, conscientiousness_Organization <dbl>,
#   conscientiousness_Productiveness <dbl>,
#   conscientiousness_Responsibility <dbl>, extraversion_Assertiveness <dbl>,
#   extraversion_Energy.Level <dbl>, extraversion_Sociability <dbl>,
#   neuroticism_Anxiety <dbl>, neuroticism_Depression <dbl>, …

But First, Our Data

Let’s simplify a bit and say we care about 4 different states for two people:

ipcs_data %>%
  filter(SID == c("216")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  print(n = 10)
# A tibble: 108 × 7
   SID   Full_Date         beep excited goaldir content guilty
   <chr> <chr>            <int>   <dbl>   <dbl>   <dbl>  <dbl>
 1 216   2019-12-09 12:08     1       3       4       4      2
 2 216   2019-12-09 16:06     2       3       4       4      1
 3 216   2019-12-09 20:14     3       3       3       3      1
 4 216   2019-12-10 12:02     4       3       4       3      2
 5 216   2019-12-10 16:08     5       2       4       3      2
 6 216   2019-12-10 20:05     6       3       3       4      2
 7 216   2019-12-10 8:01      7       2       4       3      2
 8 216   2019-12-11 12:29     8       3       3       3      2
 9 216   2019-12-11 16:05     9       2       3       3      2
10 216   2019-12-11 20:10    10       3       4       3      1
# … with 98 more rows
ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  print(n = 10)
# A tibble: 48 × 7
   SID   Full_Date         beep excited goaldir content guilty
   <chr> <chr>            <int>   <dbl>   <dbl>   <dbl>  <dbl>
 1 02    2018-10-22 13:23     1       2       5       4      2
 2 02    2018-10-22 17:23     2       2       5       3      1
 3 02    2018-10-23 10:00     3       1       2       3      2
 4 02    2018-10-23 13:24     4       2       4       3      1
 5 02    2018-10-23 17:53     5       3       4       4      1
 6 02    2018-10-24 10:00     6       2       4       4      1
 7 02    2018-10-24 13:23     7       2       4       3      1
 8 02    2018-10-24 17:29     8       4       4       4      1
 9 02    2018-10-24 21:23     9       3       3       3      2
10 02    2018-10-25 13:29    10       3       3       4      2
# … with 38 more rows

Univariate Time Series

It’s hard to make much sense of this because we end up trying to draw the line connecting the points with our eyes:

ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = excited)) + 
    geom_point() + 
    theme_classic()

Univariate Time Series

It’s easier to make much sense of this because we can just follow the line:

ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = excited)) + 
    geom_line() + 
    geom_point() + 
    theme_classic()

Univariate Time Series

But often in time series, we won’t want / need to plot the points:

ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = excited)) + 
    geom_line() + 
    theme_classic()

Take a moment and clean up this figure.

Univariate Time Series

Take a moment and clean up this figure.

One way to highlight increases is using trend lines:

ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content)) + 
    geom_line(color = "grey", size = .9) + 
    geom_smooth(method = "lm", formula = y ~ poly(x,2)) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , title = "Ccontentedness increased in the second week"
      , subtitle = "Participant 2"
      , caption = "y ~ x2"
    ) + 
    theme_classic() + 
    theme(
      axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.1))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
    )

Univariate Time Series

Another way to highlight changes is to use area:

ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content-1)) + 
    geom_area(fill = "purple4", alpha = .2) + 
    geom_line(color = "purple4", size = .9) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(0,4), breaks = seq(0,4,1), labels = 1:5) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , title = "Ccontentedness increased in the second week"
      , subtitle = "Participant 2"
    ) + 
    theme_classic() + 
    theme(
      axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.1))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
    )

Univariate Time Series

Another way to highlight changes is to use area, but let’s also add some text to highlight what we want to show:

ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content-1)) + 
    geom_area(fill = "purple4", alpha = .2) + 
    geom_line(color = "purple4", size = .9) + 
    geom_vline(aes(xintercept = 28), linetype = "dashed", size = 1) + 
    annotate("text", label = "Week 1", x = 15, y = 0.1, hjust = .5) + 
    annotate("text", label = "Week 2", x = 40, y = 0.1, hjust = .5) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(0,4), breaks = seq(0,4,1), labels = 1:5) +
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , title = "Ccontentedness increased in the second week"
      , subtitle = "Participant 2"
    ) + 
    theme_classic() + 
    theme(
      axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.1))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
    )

Multivariate Time Series

  • We can also apply the same principles to either:
    • the same variable across participants
    • different variables within the same participant

Multivariate Time Series

One way to build a multivariate time series figure is using the color aesthetic:

ipcs_data %>%
  filter(SID %in% c("05", "02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content)) + 
    geom_line(aes(color = SID)) + 
    theme_classic()

The same variable across participants

Oof, this looks a little rough

Multivariate Time Series

One way to build a multivariate time series figure is using the color aesthetic, improved with our own color scale:

ipcs_data %>%
  filter(SID %in% c("05", "02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content)) + 
    geom_line(aes(color = SID)) + 
    scale_color_manual(values = c("darkblue", "orange3")) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(1,5), breaks = seq(1,5,1), labels = 1:5) +
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , color = "Participant ID"
    ) + 
    theme_classic() +
    theme(
      legend.position = "bottom"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , legend.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      )

The same variable across participants

Multivariate Time Series

One way to build a multivariate time series figure is using the color aesthetic. Bur remember, legends add cognitive load, so let’s add labels:

ipcs_data %>%
  filter(SID %in% c("05", "02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content)) + 
    geom_line(aes(color = SID), size = .8) + 
    annotate("label", label = "Participant 02", color = "white", fill = "darkblue", x = 33, y = 2.5, hjust = 0) + 
    annotate("label", label = "Participant 5", color = "white", fill = "orange3", x = 18, y = 1.75, hjust = 0) + 
    scale_color_manual(values = c("darkblue", "orange3")) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(1,5), breaks = seq(1,5,1), labels = 1:5) +
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , color = "Participant ID"
    ) + 
    theme_classic() + 
    theme(
      legend.position = "bottom"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , legend.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      )

The same variable across participants

But what’s the story here?

Multivariate Time Series

Let’s (1) add the story with a title, (2) change the colors to match the story, (3) add trend lines:

ipcs_data %>%
  filter(SID %in% c("05", "02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content)) + 
    geom_line(aes(color = SID), size = .8) +
    geom_smooth(aes(color = SID), method = "lm",se = F) +
    annotate("label", label = "Participant 02", color = "white", fill = "darkblue", x = 33, y = 2.5, hjust = 0) + 
    annotate("label", label = "Participant 5", color = "white", fill = "grey60", x = 18, y = 1.75, hjust = 0) + 
    scale_color_manual(values = c("darkblue", "grey60")) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(1,5), breaks = seq(1,5,1), labels = 1:5) +
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , color = "Participant ID"
      , title = "Both Participants' Contentedness Increased"
      , subtitle = "But Participant 5 remained more content on average"
    ) + 
    theme_classic() + 
    theme(
      legend.position = "bottom"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , legend.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.1))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
      )

The same variable across participants

Multivariate Time Series

Basic Syntax:

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value)) + 
    geom_line(aes(color = item)) + 
    theme_classic()

Different variables within the same participant

Multivariate Time Series

Let’s improve the colors and add some labels (I’m feeling lavendar haze) ::::{.columns} :::{.column}

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value)) + 
    geom_line(aes(color = item), size = .9) + 
    geom_point(size = .9) + 
    annotate("text", label = "Goal\nDirected", x = 50, y = 4, hjust = 0) + 
    annotate("text", label = "Guilty", x = 50, y = 2, hjust = 0) + 
    scale_color_manual(values = c("orchid4", "orchid2")) +
    scale_x_continuous(limits = c(1,57), breaks = seq(0,50,5)) + 
    theme_classic() + 
    theme(legend.position = "none")

::: :::{.column}

::: ::::

Different variables within the same participant

Multivariate Time Series

Let’s (1) add a story with the title/subtitle and (2) highlight a core part of the time series

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value)) + 
    annotate("rect", xmin = 13, xmax = 22, ymin = 1.8, ymax = 3.2, fill = "orchid", alpha = .3) + 
    geom_line(aes(color = item), size = .9) + 
    geom_point(size = .9) + 
    annotate("text", label = "Goal\nDirected", x = 50, y = 4, hjust = 0) + 
    annotate("text", label = "Guilty", x = 50, y = 2, hjust = 0) + 
    scale_color_manual(values = c("orchid4", "orchid2")) +
    scale_x_continuous(limits = c(1,57), breaks = seq(0,50,5)) +
    labs(
      x = "Beep (1-50)"
      , y = "Self-Rated Momentary Value (1-5)"
      , title = "When goal-directedness was high, guilt was low"
      , subtitle = "Guilt was rarely equal to or higher than goal-directedness"
    ) + 
    theme_classic() + 
    theme(legend.position = "none")

Different variables within the same participant

Multivariate Time Series

Let’s clean up our theme elements:

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value)) + 
    annotate("rect", xmin = 13, xmax = 22, ymin = 1.8, ymax = 3.2, fill = "orchid", alpha = .3) + 
    geom_line(aes(color = item), size = .9) + 
    geom_point(size = .9) + 
    annotate("text", label = "Goal\nDirected", x = 50, y = 4, hjust = 0) + 
    annotate("text", label = "Guilty", x = 50, y = 2, hjust = 0) + 
    scale_color_manual(values = c("orchid4", "orchid2")) +
    scale_x_continuous(limits = c(1,57), breaks = seq(0,50,5)) +
    labs(
      x = "Beep (1-50)"
      , y = "Self-Rated Momentary Value (1-5)"
      , title = "When goal-directedness was high, guilt was low"
      , subtitle = "Guilt was rarely equal to or higher than goal-directedness"
    ) + 
    theme_classic() + 
    theme(
      legend.position = "none"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , legend.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.2))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
      )

Different variables within the same participant

Multivariate Time Series

Alternatively, we can facet instead of splitting items by color:

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value-1)) + 
    geom_area(aes(fill = item), alpha = .4) + 
    geom_line(color = "orchid4", size = .9) + 
    geom_point(size = .9) + 
    scale_fill_manual(values = c("orchid4", "orchid2")) +
    scale_x_continuous(limits = c(1,50), breaks = seq(0,50,5)) +
    scale_y_continuous(limits = c(0,4), breaks = seq(0,4,1), labels = 1:5) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Rated Momentary Value (1-5)"
      , title = "When goal-directedness was high, guilt was low"
      , subtitle = "Guilt was rarely equal to or higher than goal-directedness"
    ) + 
    facet_wrap(~item, nrow = 2) + 
    theme_classic() + 
    theme(
      legend.position = "none"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , legend.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.2))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
      , strip.background = element_rect(fill = "orchid4")
      , strip.text = element_text(face = "bold", size = rel(1.2), color = "white")
      )

Different variables within the same participant

Connected Scatter Plots

  • Connected scatter plots have popped up in our visualizations because they require a lot of visual literacy and can be confusing if not executed incredibly carefully

Connected Scatter Plots

First, let’s wrangle our data. We’ll create a positive and negative emotion composite in order to get around the ordinal nature of the data.

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, afraid:purposeful) %>%
  pivot_longer(
    cols = afraid:purposeful
    , names_to = "item"
    , values_to = "value"
  ) %>%
  mutate(valence = ifelse(item %in% c("afraid", "angry", "guilty"), "Negative", "Positive")) %>%
  group_by(SID, Full_Date, beep, valence) %>%
  summarize(value = mean(value, na.rm = T)) %>%
  ungroup() %>%
  pivot_wider(
    names_from = "valence"
    , values_from = "value"
  ) %>%
  arrange(beep) 
# A tibble: 48 × 5
   SID   Full_Date         beep Negative Positive
   <chr> <chr>            <int>    <dbl>    <dbl>
 1 02    2018-10-22 13:23     1     1.67     3.71
 2 02    2018-10-22 17:23     2     1        3.29
 3 02    2018-10-23 10:00     3     1.67     2.14
 4 02    2018-10-23 13:24     4     1.67     3   
 5 02    2018-10-23 17:53     5     1.33     3.57
 6 02    2018-10-24 10:00     6     1.33     3.43
 7 02    2018-10-24 13:23     7     1.33     3   
 8 02    2018-10-24 17:29     8     1.33     4   
 9 02    2018-10-24 21:23     9     2        2.71
10 02    2018-10-25 13:29    10     1.67     3.43
# … with 38 more rows

Connected Scatter Plots

Let’s look at them negative and positive emotion composites using geom_path()

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, afraid:purposeful) %>%
  pivot_longer(
    cols = afraid:purposeful
    , names_to = "item"
    , values_to = "value"
  ) %>%
  mutate(valence = ifelse(item %in% c("afraid", "angry", "guilty"), "Negative", "Positive")) %>%
  group_by(SID, Full_Date, beep, valence) %>%
  summarize(value = mean(value, na.rm = T)) %>%
  ungroup() %>%
  pivot_wider(
    names_from = "valence"
    , values_from = "value"
  ) %>%
  arrange(beep) %>%
  ggplot(aes(x = Negative, y = Positive)) + 
    geom_path(aes(color = beep)) + 
    geom_point() + 
    scale_color_viridis_c() + 
    theme_classic() + 
    theme(legend.position = "bottom")

This isn’t convincing me. Maybe let’s try a different geom

Connected Scatter Plots

Let’s look at them negative and positive emotion composites using geom_segment()

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, afraid:purposeful) %>%
  pivot_longer(
    cols = afraid:purposeful
    , names_to = "item"
    , values_to = "value"
  ) %>%
  mutate(valence = ifelse(item %in% c("afraid", "angry", "guilty"), "Negative", "Positive")) %>%
  group_by(SID, Full_Date, beep, valence) %>%
  summarize(value = mean(value, na.rm = T)) %>%
  ungroup() %>%
  pivot_wider(
    names_from = "valence"
    , values_from = "value"
  ) %>%
  ggplot(aes(x = Negative, y = Positive, label = beep)) + 
    geom_segment(aes(
                    xend=c(tail(Negative, n=-1), NA)
                    , yend=c(tail(Positive, n=-1), NA)
                    , color = beep
                  )
                 , arrow = arrow(length = unit(0.4, "cm"))
      ) + 
    geom_point() + 
  scale_color_viridis_c() + 
  theme_classic()

This isn’t convincing me either, but what if we have a stronger correlation and fewer points?

Connected Scatter Plots

  • Remember our week 3 data, where we had self-rated health and other variables over years?
load(url("https://github.com/emoriebeck/psc290-data-viz-2022/blob/main/05-week5-time-series/01-data/gsoep.RData?raw=true"))
gsoep
# A tibble: 480,798 × 11
    year   SID SRhealth satHealth marital chldbrth gender  yearB…¹ morta…²   job
   <dbl> <dbl>    <dbl>     <dbl>   <dbl>    <dbl> <dbl+l>   <dbl>   <dbl> <dbl>
 1  1984   901       NA        NA      NA       NA 2 [[2]…    1951       0    NA
 2  1984  1001       NA        NA      NA       NA 2 [[2]…    1913       0    NA
 3  1984  1101       NA        NA      NA       NA 2 [[2]…    1906       0    NA
 4  1984  1201       NA        NA      NA       NA 1 [[1]…    1911       0    NA
 5  1984  1202       NA        NA      NA       NA 2 [[2]…    1913       0    NA
 6  1984  1301       NA        NA      NA       NA 2 [[2]…    1943       0    NA
 7  1984  1302       NA        NA      NA       NA 1 [[1]…    1965       0    NA
 8  1984  1901       NA        NA      NA       NA 2 [[2]…    1948       0    NA
 9  1984  2001       NA        NA      NA       NA 2 [[2]…    1949       0    NA
10  1984  2002       NA        NA      NA       NA 1 [[1]…    1952       0    NA
# … with 480,788 more rows, 1 more variable: age <dbl>, and abbreviated
#   variable names ¹​yearBrth, ²​mortality

Connected Scatter Plots

  • Remember our week 3 data, where we had self-rated health and other variables over years?
  • Let’s clean these data:
gsoep %>%
  select(SID, age, SRhealth, satHealth) %>%
  filter(age < 100 & age > 19) %>%
  mutate(age_gr = as.numeric(mapvalues(age, seq(20,99,1), rep(seq(20,99,2), each = 2)))) %>%
  drop_na() %>%
  group_by(age, age_gr) %>%
  summarize_at(vars(SRhealth, satHealth), mean) %>%
  group_by(age_gr) %>%
  summarize_at(vars(SRhealth, satHealth), mean) %>%
  ungroup() 
# A tibble: 40 × 3
   age_gr SRhealth satHealth
    <dbl>    <dbl>     <dbl>
 1     20     3.96      7.81
 2     22     3.92      7.72
 3     24     3.91      7.66
 4     26     3.86      7.55
 5     28     3.83      7.48
 6     30     3.78      7.44
 7     32     3.74      7.34
 8     34     3.70      7.27
 9     36     3.67      7.20
10     38     3.60      7.08
# … with 30 more rows

Connected Scatter Plots

  • Remember our week 3 data, where we had self-rated health and other variables over years?
gsoep %>%
  select(SID, age, SRhealth, satHealth) %>%
  filter(age < 100 & age > 19) %>%
  mutate(age_gr = as.numeric(mapvalues(age, seq(20,99,1), rep(seq(20,99,2), each = 2)))) %>%
  drop_na() %>%
  group_by(age, age_gr) %>%
  summarize_at(vars(SRhealth, satHealth), mean) %>%
  group_by(age_gr) %>%
  summarize_at(vars(SRhealth, satHealth), mean) %>%
  ungroup() %>%
  ggplot(aes(x = SRhealth, y = satHealth, label = age_gr)) + 
    geom_segment(aes(
                    xend=c(tail(SRhealth, n=-1), NA)
                    , yend=c(tail(satHealth, n=-1), NA)
                    , color = age_gr
                  )
                 , arrow = arrow(length = unit(0.4, "cm"))
      ) + 
    geom_point() + 
    scale_y_continuous(limits = c(0,10), breaks = seq(0,10,2)) + 
    scale_color_viridis_c() + 
    theme_classic()

Meh, the relationship between these is too linear, so this isn’t the right graph for it

Multivariate Time Series: Spaghetti Plots

Rather than a connected scatterplot showing general patterns of change, we may want to look at person-level trajectories coupled with an overall trajectory.

gsoep %>%
  select(SID, age, SRhealth, satHealth) %>%
  filter(age < 100 & age > 19) %>% 
  drop_na() %>%
  filter(SID %in% sample(unique(.$SID), 1000)) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) 
# A tibble: 13,336 × 4
     SID   age item      value
   <dbl> <dbl> <chr>     <dbl>
 1  2501    68 SRhealth      1
 2  2501    68 satHealth     2
 3 25801    60 SRhealth      3
 4 25801    60 satHealth     6
 5 27403    27 SRhealth      3
 6 27403    27 satHealth     5
 7 29402    29 SRhealth      5
 8 29402    29 satHealth    10
 9 35701    43 SRhealth      4
10 35701    43 satHealth     9
# … with 13,326 more rows

Multivariate Time Series: Spaghetti Plots

Rather than a connected scatterplot showing general patterns of change, we may want to look at person-level trajectories coupled with an overall trajectory.

gsoep %>%
  select(SID, age, SRhealth, satHealth) %>%
  filter(age < 100 & age > 19) %>% 
  drop_na() %>%
  filter(SID %in% sample(unique(.$SID), 1000)) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = age, y = value, color = item)) + 
    geom_point(alpha = .2, size = .5, color = "grey") + 
    geom_smooth(
      aes(group = interaction(SID, item))
      , method = "lm", se = F
      , alpha = .3, size = .5
      ) + 
    scale_color_manual(values = c("darkblue", "darkred")) + 
    facet_wrap(~item) + 
    theme_classic() + 
    theme(legend.position = "none")

Multivariate Time Series: Spaghetti Plots

  • But this is a longitudinal change, so we can’t just do that. We have to estimate the growth models first

  • The model we’ll use is:

  • \(Y_{ij} = \beta_{0j} + \beta_{1j}*age_{ij} + \epsilon_{ij}\)

    • \(\beta_{0j} = \gamma_{00} + u_{0j}\)
    • \(\beta_{1j} = \gamma_{10} + u_{1j}\)

\[\epsilon \sim \mathcal{N}(0, \sigma^2)\]

\[ \begin{align*} \Biggr[ \begin{array}{c} u_{0j}\\ u_{1j}\end{array} \Biggr] \sim \mathcal{N} \left( \Biggr[ \begin{array}{c} 0\\ 0\end{array} \Biggr], \begin{array}{cc} \tau_{00}^2 & \tau_{10} \\ \tau_{10} & \tau_{11}^2 \end{array} \right) \end{align*} \]

  • \(\beta_{0j}\) is the intercept for person \(j\) at time \(i\)
  • \(\beta_{1j}\) is the linear change for person \(j\)
library(lme4)
lmer_fun <- function(d) lmer(value ~ 1 + age + (1 + age | SID), data = d)

m_nested <- gsoep %>%
  select(SID, age, SRhealth, satHealth) %>%
  filter(age < 100 & age > 19) %>% 
  drop_na() %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  mutate(age = age/10) %>%
  group_by(item) %>%
  nest() %>%
  ungroup() %>%
  mutate(
    m = map(data, lmer_fun)
    , tidy = map(m, broom.mixed::tidy)); m_nested
# A tibble: 2 × 4
  item      data                   m         tidy            
  <chr>     <list>                 <list>    <list>          
1 SRhealth  <tibble [227,516 × 3]> <lmerMod> <tibble [6 × 6]>
2 satHealth <tibble [227,516 × 3]> <lmerMod> <tibble [6 × 6]>

Multivariate Time Series: Spaghetti Plots

  • But this is a longitudinal change, so we can’t just do that. We have to estimate the growth models first

  • The model we’ll use is:

  • \(Y_{ij} = \beta_{0j} + \beta_{1j}*age_{ij} + \epsilon_{ij}\)

    • \(\beta_{0j} = \gamma_{00} + u_{0j}\)
    • \(\beta_{1j} = \gamma_{10} + u_{1j}\)

\[\epsilon \sim \mathcal{N}(0, \sigma^2)\]

\[ \begin{align*} \Biggr[ \begin{array}{c} u_{0j}\\ u_{1j}\end{array} \Biggr] \sim \mathcal{N} \left( \Biggr[ \begin{array}{c} 0\\ 0\end{array} \Biggr], \begin{array}{cc} \tau_{00}^2 & \tau_{10} \\ \tau_{10} & \tau_{11}^2 \end{array} \right) \end{align*} \]

  • \(\beta_{0j}\) is the intercept for person \(j\) at time \(i\)
  • \(\beta_{1j}\) is the linear change for person \(j\)
m_nested %>% 
  select(-data, -m) %>%
  unnest(tidy) %>%
  print(n = 12)
# A tibble: 12 × 7
   item      effect   group    term                 estimate std.error statistic
   <chr>     <chr>    <chr>    <chr>                   <dbl>     <dbl>     <dbl>
 1 SRhealth  fixed    <NA>     (Intercept)             4.57    0.0111      410. 
 2 SRhealth  fixed    <NA>     age                    -0.245   0.00242    -101. 
 3 SRhealth  ran_pars SID      sd__(Intercept)         1.05   NA            NA  
 4 SRhealth  ran_pars SID      cor__(Intercept).age   -0.822  NA            NA  
 5 SRhealth  ran_pars SID      sd__age                 0.214  NA            NA  
 6 SRhealth  ran_pars Residual sd__Observation         0.616  NA            NA  
 7 satHealth fixed    <NA>     (Intercept)             9.04    0.0261      346. 
 8 satHealth fixed    <NA>     age                    -0.488   0.00575     -84.7
 9 satHealth ran_pars SID      sd__(Intercept)         2.42   NA            NA  
10 satHealth ran_pars SID      cor__(Intercept).age   -0.818  NA            NA  
11 satHealth ran_pars SID      sd__age                 0.510  NA            NA  
12 satHealth ran_pars Residual sd__Observation         1.47   NA            NA  

Multivariate Time Series: Spaghetti Plots

fx_pred_fun <- function(m){
  tibble(age = seq(2,9.9, .1)) %>%
    mutate(
      pred = predict(m, newdata = ., re.form = NA)
    )
}

rx_pred_fun <- function(m, d){
  d %>%
    select(SID, age) %>%
    distinct() %>%
    mutate(pred = predict(m, newdata = .))
}

m_nested <- m_nested %>%
  mutate(fx_pred = map(m, fx_pred_fun)
         , rx_pred = map2(m, data, rx_pred_fun))
# A tibble: 2 × 6
  item      data                   m         tidy             fx_pred  rx_pred 
  <chr>     <list>                 <list>    <list>           <list>   <list>  
1 SRhealth  <tibble [227,516 × 3]> <lmerMod> <tibble [6 × 6]> <tibble> <tibble>
2 satHealth <tibble [227,516 × 3]> <lmerMod> <tibble [6 × 6]> <tibble> <tibble>

Multivariate Time Series: Spaghetti Plots

sids <- sample(unique(m_nested$data[[1]]$SID), 1000)
m_nested %>%
  select(item, rx_pred) %>%
  unnest(rx_pred) %>%
  filter(SID %in% sids) %>%
  ggplot(aes(x = age*10, y = pred, color = item)) + 
    geom_line(
      aes(group = factor(SID))
      , alpha = .25
      , size = .75) + 
    geom_line(
      data = m_nested %>% 
        select(item, fx_pred) %>%
        unnest(fx_pred)
      , size = 1
      ) + 
    scale_color_manual(values = c("darkblue", "darkred")) + 
    facet_wrap(~item) + 
    theme_classic() + 
    theme(legend.position = "none")

Multivariate Time Series: Spaghetti Plots

sids <- sample(unique(m_nested$data[[1]]$SID), 1000)
m_nested %>%
  select(item, rx_pred) %>%
  unnest(rx_pred) %>%
  filter(SID %in% sids) %>%
  ggplot(aes(x = age*10, y = pred, color = item)) + 
    geom_line(
      aes(group = factor(SID))
      , alpha = .25
      , size = .75) + 
    geom_line(
      data = m_nested %>% 
        select(item, fx_pred) %>%
        unnest(fx_pred)
      , size = 1
      ) + 
    scale_color_manual(values = c("darkblue", "darkred")) + 
    labs(x = "Age"
         , y = "Model Predicted Values (0-10)"
         , title = "Self-Rated Health and Health Satisfaction Decreased\nAcross the Lifespan, on Average") + 
    facet_wrap(~item) + 
    theme_classic() + 
    theme(legend.position = "none"
          , axis.text = element_text(face = "bold", size = rel(1.1))
          , axis.title = element_text(face = "bold", size = rel(1.2))
          , plot.title = element_text(face = "bold", size = rel(1.1), hjust=.5)
          , strip.background = element_rect(fill = "darkblue")
          , strip.text = element_text(face = "bold", size = rel(1.1), color = "white"))

Quick Note on Smoothing

Often, we want to smooth our trajectories, like if we wanted to look at the trajectory of self-rated health and satisfaction with health across age groups

gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  drop_na() %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean)
# A tibble: 597 × 5
# Groups:   age, marital [310]
     age marital gender         satHealth SRhealth
   <dbl>   <dbl> <dbl+lbl>          <dbl>    <dbl>
 1    20       1 1 [[1] Male]        7        4.33
 2    20       1 2 [[2] Female]      8.06     3.87
 3    20       2 2 [[2] Female]      4.5      2   
 4    20       4 1 [[1] Male]        8.01     4.04
 5    20       4 2 [[2] Female]      7.70     3.89
 6    21       1 1 [[1] Male]        8.58     4.42
 7    21       1 2 [[2] Female]      7.82     3.82
 8    21       2 2 [[2] Female]      8.67     4.67
 9    21       4 1 [[1] Male]        7.83     3.97
10    21       4 2 [[2] Female]      7.71     3.90
# … with 587 more rows

Smoothing

  • But we have to choose how and when to smooth carefully
  • E.g., when do we choose a linear model v. a loess line?
  • E.g., Do we do an overall trajectory or stratify by group?
gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  drop_na() %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = age, y = value)) + 
    geom_line(aes(color = interaction(gender, marital))) + 
    geom_smooth(method = "loess") + 
    facet_wrap(~item, nrow = 2) + 
    theme_classic() + 
    theme(legend.position = "bottom")

Smoothing

gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  drop_na() %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = age, y = value)) + 
    geom_line(aes(color = interaction(gender, marital))) + 
    geom_smooth(method = "lm") + 
    facet_wrap(~item, nrow = 2) + 
    theme_classic() + 
    theme(legend.position = "bottom")

Smoothing

We may want to highlight overall trajectories v. stratefied ones differently to highlight their consistency:

gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  drop_na() %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  ungroup() %>%
  mutate(marital = factor(
    marital
    , 1:4
    , c("Married", "Separated", "Widowed", "Never Married")
    ), gender = factor(gender, c(1,2), c("Male", "Female"))
    ) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = age, y = value)) + 
    geom_line(aes(color = interaction(gender, marital))) + 
    geom_smooth(method = "lm", color = "orchid3", fill = "orchid3") + 
    scale_color_grey(start = .01, end = .99) + 
    facet_wrap(~item, nrow = 2) + 
    theme_classic() + 
    theme(legend.position = "bottom")

Smoothing

gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  drop_na() %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  mutate(marital = factor(
    marital
    , 1:4
    , c("Married", "Separated", "Widowed", "Never Married")
    ), gender = factor(gender, c(1,2), c("Male", "Female"))
    ) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = age, y = value)) + 
    geom_smooth(
      aes(color = interaction(gender, marital))
      , method = "loess"
      , se = F
      ) + 
    geom_smooth(method = "lm", color = "orchid3", fill = "orchid3") + 
    scale_color_grey(start = .01, end = .99) + 
    facet_wrap(~item, nrow = 2) + 
    labs(x = "Age", y = "Smoothed Value", color = NULL) + 
    guides(color = guide_legend(nrow = 4)) + 
    theme_classic() + 
    theme(legend.position = "bottom")

Smoothing

gsoep %>%
  select(SID, age, SRhealth, satHealth, marital, gender) %>%
  drop_na() %>%
  filter(age >= 20 & age <= 100) %>%
  group_by(age, marital, gender) %>%
  summarize_at(vars(satHealth, SRhealth), mean) %>%
  mutate(marital = factor(
    marital
    , 1:4
    , c("Married", "Separated", "Widowed", "Never Married")
    ), gender = factor(gender, c(1,2), c("Male", "Female"))
    ) %>%
  pivot_longer(
    cols = c(SRhealth, satHealth)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = age, y = value)) + 
    geom_smooth(
      aes(color = interaction(gender, marital))
      , method = "loess"
      , se = F
      , span = .25
      , size = .5
      ) + 
    geom_smooth(method = "lm", color = "orchid3", fill = "orchid3") + 
    scale_color_grey(start = .01, end = .99) + 
    facet_wrap(~item, nrow = 2) + 
    labs(x = "Age", y = "Smoothed Value", color = NULL) + 
    guides(color = guide_legend(nrow = 4)) + 
    theme_classic() + 
    theme(legend.position = "bottom"
          , axis.text = element_text(face = "bold", size = rel(1.1))
          , axis.title = element_text(face = "bold", size = rel(1.2))
          , strip.background = element_rect(fill = "orchid4")
          , strip.text = element_text(face = "bold", size = rel(1.2), color = "white"))

Slopegraphs

  • We didn’t have a chance to review slopegraphs when discussing visualizing associations
  • But they also apply to visualizing change, especially when we have two wave measures of change (e.g., pre-post designs)
  • When considered over time, these figures also give us a sense of rank-order consistency

Slopegraphs

First, let’s wrangle our data:

pomp <- function(x) (x - min(x, na.rm = T))/(max(x, na.rm = T) - min(x, na.rm = T))*10
gsoep_sg <- gsoep %>%
  select(SID, age, satHealth, SRhealth, marital, gender) %>%
  filter(age %in% c(60,70)) %>%
  drop_na() %>% 
  group_by(SID) %>%
  filter(n() == 2) %>%
  group_by(age) %>%
  mutate(SRhealth = pomp(SRhealth)) %>%
  group_by(age, SID, marital, gender) %>%
  mutate(health = mean(SRhealth, satHealth)) %>%
  group_by(age, marital, gender) %>%
  summarize(health = mean(health)) %>%
  ungroup() %>%
  mutate(marital = factor(
    marital
    , 1:4
    , c("Married", "Separated", "Widowed", "Never Married")
    ), gender = factor(gender, c(1,2), c("Male", "Female"))
    ); gsoep_sg
# A tibble: 16 × 4
     age marital       gender health
   <dbl> <fct>         <fct>   <dbl>
 1    60 Married       Male     5.59
 2    60 Married       Female   5.67
 3    60 Separated     Male     5.08
 4    60 Separated     Female   5.62
 5    60 Widowed       Male     5.56
 6    60 Widowed       Female   5.46
 7    60 Never Married Male     4.75
 8    60 Never Married Female   6   
 9    70 Married       Male     5.45
10    70 Married       Female   5.20
11    70 Separated     Male     5   
12    70 Separated     Female   4.44
13    70 Widowed       Male     4.81
14    70 Widowed       Female   4.81
15    70 Never Married Male     5.83
16    70 Never Married Female   5   

Slopegraphs

gsoep_sg %>%
  ggplot(aes(x = age, y = health, group = interaction(marital, gender))) + 
    geom_line(aes(color = interaction(marital, gender))) + 
    geom_point(size = 2) + 
    geom_text(data = . %>% filter(age == 60)
              , aes(label = paste(marital, gender, sep = ", "))
              , hjust = 1
              , nudge_x = -1
              ) + 
    scale_x_continuous(
      limits = c(52, 72), breaks = c(60,70)
      , labels = c("Age 60", "Age 70"), position = "top"
      ) + 
    scale_y_continuous(position = "right") + 
    scale_color_viridis_d() + 
    labs(x = NULL
         , y = "Health"
         , title = "Only widowed men perceived worsening health\nfrom 60-70") + 
    theme_classic() + 
    theme(
      legend.position = "none"
      , axis.line = element_blank()
      , axis.text = element_text(face = "bold", size = rel(1.1), color = "black")
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)
      , panel.border = element_rect(fill = NA, size = 1.5)
      )