── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Problem Set #4
Overview:
In this problem set, you will be using the ggplot2 package (part of tidyverse) to practice (1) piecing together and (2) polishing visualizations. As with other problems sets, you’re encouraged to use your own data, so instructions here will be somewhat vague.
In Part 1, you’ll build two visualizations, using annotations and hacks to improve their appearance. In Part 2, you’ll piece these together using patchwork or cowplot.
We’re going to use a little non-traditional data this week, so it’s possible you may not have your own. If you don’t, we’ll use data from the janeaustenr
package. Specifically, we’re going to examine the sentiments from Emma.
Part 1: Text Analysis and Polishing Visualizations
Background & Setup
This problem set is also a chance for me to show you a little more about text analysis. In class, we did sentiment analysis, but here we’ll look at n-grams, which allow us to examine the relationship between words (e.g., how often do certain sequences appear). Specifically, we’re going to look at bigrams (aka. n-grams where n=2).
To do so, we’ll use the same unnest_tokens()
function that we used in class. But this time, our token will be ngrams
instead of words.
First, we’ll separate the text into bigrams.
Code
Now, we need to get rid of stop words in either piece of the stopwords. To do so, we’ll separate()
the bigrams, filter()
out rows with stopwords, and then unite()
the bigrams back together.
Question 1: Changes in bigram frequency
- Plot the frequency of the top 30 bigrams in Emma. Put the counts on the x-axis and the frequencies on the y-axis
- Make sure to:
- label your x- and y-axes
- Add a title
- Use your custom theme
- Choose a fill of choice
- Save the plot as p1.1
Question 2: Sentiment
Now let’s create a second plot that has the sentiment across chapters.
There are some notable events throughout the novel: - Frank Churchill finally appears in person in chapter 23 - Emma is jealous of Miss Fairfax’s proficiency in chapter 26 - Mr. Churchill’s secret engagement to Miss Fairfax is revealed in chapter 46 - Emma and Knightley fight over her treatment of Miss Bates in chapter 43 - Emma and Knightley reconcile and get engaged in chapter 49
Code
Joining with `by = join_by(word)`
Joining with `by = join_by(word)`
Warning in inner_join(., get_sentiments("bing")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 9467 of `x` matches multiple rows in `y`.
ℹ Row 4099 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
- Plot a time series of the frequency of the sentiments across chapters
- Shade the area under the plot using
geom_area()
. (Make sure to set the alpha value low) - Use
geom_smooth()
to plot the trajectories across time - Make sure to:
- label your x- and y-axes
- Add a title
- Use your custom theme
- Split the plots across five facets using
facet_wrap()
- Save the plot as p1.2
Now, let’s use annotate to highlight Emma’s fight with Miss Bate’s and Mr. Knightley:
- Use
annotate()
to add a rectangle between chapter 41 and 48, when a series of negative events pick up. - Use
annotate()
to add text “Emma fights with Miss Bates and Knightley.” (Hint use\n
to include line breaks as needed).
Part 2: Piecing Plots Together
Next, you’ll practice Week 6 skills, specifically how to visualize uncertainty. To give you a chance to also practice smoothing and model predictions from time series, Q2 will also have you make predictions across a time series.
Question 1: Piecing Plots Together
Now that we have our plots, let’s put them back together
- Piece the plots together using your package of choice. Make sure to remove the titles!
- Use
plot_annotation()
to
- Add a shared title and subtitle
- Label the panels as A and B
If you’re using provided data, use the skeleton code below to get the observations for 20 people with at least 15 observations.
Render to html and submit problem set
Render to html by clicking the “Render” button near the top of your RStudio window (icon with blue arrow)
- Go to the Canvas –> Assignments –> Problem Set 4
- Submit both .qmd and .html files
- Use this naming convention “lastname_firstname_ps#” for your .qmd and html files (e.g. beck_emorie_ps4.qmd & beck_emorie_ps4.html)