Early Career:
My work uses (1) messy timeseries from EMA / ESM and (2) messy secondary data from longitudinal panel studies
I had some basic CS training that made this much, much easier
I spent graduate school writing R tutorials and teaching R workshops
This class let’s me bring those together, coupling them what I’ve learned project management
After successful completion of this course, you will be able to:
1. Build your own research workflow that can be ported to future projects.
2. Learn new programming skills that will help you efficiently, accurately, and deliberately clean and manage your data.
3. Create a bank of code and tools that can be used for a variety of types of research.
Assignment Weights | Percent |
---|---|
Class Participation | 20% |
Problem Sets | 40% |
Final Project Proposal | 10%* |
Class Presentation | 10%* |
Final Project | 20%* |
Total | 100% |
Participate in a https://www.tidytuesday.com.
2 pt extra credit for each one you participate in (max 6 pt total).
Can post on Twitter or just create a document with the code and output
Submit on Canvas
92.5% - 100% = A; 89.5% - 92.4% = A-
87.5% - 89.4% = B+; 82.5% - 87.4% = B; 79.5% - 82.4% = B-
77.5% - 79.4% = C+; 72.5% - 77.4% = C; 69.5% - 72.4% = C-
67.5% - 69.4% = D+; 62.5% - 67.4% = D; 59.5% - 62.4% = D-
0% - 59.4% = F
dplyr
tidyr
purrr
R
Example: New Data Collection
1. Conceptualization
2. Funding acquisition
3. Preregistration
4. Project Building
5. Data Collection
6. Data Cleaning
7. Data Analysis
8. Writing (and rewriting)
9. Submission
10. Revision (and possibly crying)
11. ACCEPTANCE
Example: Secondary Data
1. Conceptualization
2. Data search
3. Project Building
4. Data documentation
5. Preregistration
6. Data Cleaning
7. Data Analysis
8. Writing (and rewriting)
9. Submission
10. Revision (and possibly crying)
11. ACCEPTANCE
Experimental Data
1. Gather all data files
2. Quality checks for each file
3. Load all files
4. Merge all files
5. Check all descriptives
6. Scoring, coding, and data transformation
7. Recheck all descriptives
8. Correlations and visualization
9. Restructure data for analyses
Secondary Data
1. Gather all data files
2. Load each file
3. Extract variables used
4. Rename variables, possibly deal with time variables
4. Merge all files
5. Check all descriptives
6. Scoring, coding, and data transformation
7. Recheck all descriptives
8. Correlations and visualization
9. Restructure data for analyses
In this class, we will focus on building tools for:
lm()
, glm()
, lmer()
, nlme()
, lavaan
, brms
)max()
function takes as input a collection of numbers (e.g., 3,5,6) and returns as output the number with the maximum valuelm()
function takes in as inputs a dataset and a statistical model you specify within the function, and returns as output the results of the regression modelBase R
as.character()
functionprint()
functionsetwd()
functionR packages
tidyverse
package for manipulating and visualizing dataigraph
package for network analysesleaflet
package for mappingrvest
package for webscrapingrtweet
package for streaming and downloading data from Twittertidyverse
tidymodels
)tidyverse
Three ways to execute commands in R
Assignment refers to creating an “object” and assigning values to it
<-
is the assignment operator
=
is the assignment operatorobject_name <- object_values
R is an “object-oriented” programming language (like Python, JavaScript). So, what is an “object”?
a
and b
are the names of objects I assigned values toBen Skinner says “Objects are like boxes in which we can put things: data, functions, and even other objects.”
Many commercial statistical software packages (e.g., SPSS, Stata) operate on datasets, which consist of rows of observations and columns of variables
Usually, these packages can open only one dataset at a time
By contrast, in R everything is an object and there is no limit to the number of objects R can hold (except memory)
The fundamental data structure in R is the “vector”
A vector is a collection of values
The individual values within a vector are called “elements”
Values in a vector can be numeric, character (e.g., “Apple”), or some other type
Below we use the combine function c()
to create a numeric vector that contains three elements
c()
“combines values into a vector or list”Vector where the elements are characters
Either in the R console or within the R markdown file, do the following:
v1
with three elements, where all the elements are numbers. Then print the values.v2
with four elements, where all the elements are characters (i.e., enclosed in single ’’ or double “” quotes). Then print the values.v3
with five elements, where some elements are numeric and some elements are characters. Then print the values.v1 <- c(1, 2, 3)
# create a vector called v1 with three elements
# all the elements are numbers
v1 # print value
[1] 1 2 3
One difference between atomic vectors and lists: homogeneous vs. heterogeneous elements
Functions are pre-written bits of code that accomplish some task.
Functions generally follow three sequential steps:
sum()
function calculates sum of elements in a vectorComponents of a function
sum()
, length()
, seq()
)sum(c(1,2,3))
,
seq(10,15)
seq()
Usually, function arguments have names
seq()
function includes the arguments from
, to
, by
Many function arguments have “default values”, set by whoever wrote the function
seq()
: seq(from=1, to=1, by=1)
Contents of help files
sum()
returns vector of length 1 whose value is sum of input vectorhtml_document
or pdf_document
(this document was created with revealjs
)Do this with a partner
Approach for creating a Quarto document.
Let’s take a few minutes and have you peruse the Quarto site to build familiarity (I still access it all the time when I forget how to do specific things)
I especially want you to take some time to peruse documents on YAML headers:
Reminders:
Problem set 1 due next Monday at 12:01 AM (grace period until 9 AM)
Make sure to check out the readings
Next time:
Bring your data, ideally loaded into R (or at a piece of it is)
Part 1: Reproducibility and Using Workflows to Reflect Your Values
Part 2: Data Manipulation: dplyr
PSC 290 - Data Cleaning and Management FQ23