Chapter 2 Data Cleaning
The data preparation procedure across samples and methods was nearly identical, with the exception of whether data were pooled across samples (Methods 1A-2B) or kept separate (Methods 3 and 4). First, before preregistration, data preparation began by cataloging the available data in each sample in a comprehensive codebook, with the goal of compiling all of the information necessary for harmonizing across samples. For each sample, we compiled information on (1) the original variable name in the raw data, (2) the question text, (3) the scale (e.g., numeric, Likert-like scale points, labels for nominal variables), and (4) the available waves or years for each of the target variables. We also included information (5) categorizing the variable (e.g., personality, outcome), (6) a higher order variable name construct (e.g., Agreeableness, crystallized cognitive ability) that matched across samples, (7) a lower order variable name that indexed unique items (e.g., friendly, WAIS Information), (8) a note on whether an item was reverse scored along with its (9) minimum and (10) maximum, (11) a string of R code for recoding scales (e.g., recoding missing values as NA, changing levels of nominal variables to harmonize across samples), and two rules for how composites were to be created, either (12) within a wave (e.g., compositing a personality or cognitive scale) or (13) longitudinally across them (e.g., for covariates to index participants’ highest level of education prior to baseline).
Second, once the codebook was created and all the data were received, the codebook and data files were sequentially loaded into R, and data were renamed, reverse-scored, and rescaled as documented in the codebook. Then, for each of the categories of data (covariates, personality, and cognitive ability), data were composited as documented in Table 2. Finally, these three sources of data were combined into a single data set within each sample separately. The full data cleaning procedure can be seen in Chapter 1 (Data Cleaning) in the online materials.
Third, data from each sample was combined into a single data set with harmonized labels and scales for Methods 1 (A and B) and 2 (A and B). For methods that include sample-level moderators (Methods 1A, 2B, and 3), this also includes combining the personality, cognitive, and covariate data with sample-level data. In Methods 1A and 2B, this happens in the initial step, while in Method 3, these are added after estimating the models for each sample separately (see online materials).
Fourth, we harmonized the continuous personality and cognitive ability measures using POMP scoring (range 0-10). Continuous variables with easily harmonized scales (i.e. age and years of education) were centered, with age centered at 60 and education centered at 12 years. Nominal variables were harmonized during the earlier data cleaning steps such that 0 = “male” and 1 = “female” for gender.
2.1 Berlin Aging Study (BASE-I)
The Berlin Aging Study (I) is a longitudinal interdisciplinary study of aging that began in 1990. The data are available, by application, from https://www.base-berlin.mpg.de/en/system/files/media/pdf/a97295f241d06bf496b5eeae4251a34e/base-data-transfer_form_online.pdf.
Participants included 516 individuals over 70, stratified on age and gender, recruited from a sample of the Berlin City Registry. To date, most participants are deceased, with most attrition from the study due to mortality or severe health complications. Since 1990, there have been seven follow-up measurement occasions (8 total).
Sample sizes vary over time, from 516 (T1) to 23 (T8). To ensure adequate sample sizes, the present study will use data up to the fourth wave, which included 164 participants with full data. This provides 99% power to detect a correlation effect size of ~.32, two-tailed alpha at .05.
2.1.1 Load Data
(basei_codebook <- (codebook %>% filter(study == "BASE-I"))$codebook[[1]] %>%
mutate(orig_itemname = str_to_lower(orig_itemname)))## # A tibble: 138 × 17
## study dataset category name itemname wave year stem orig_itemname description scale reverse_code recode mini maxi
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 base-i social covaria… educ… eduYears 1 1990 <NA> s1kon28 Years of E… "int… <NA> ifels… NA NA
## 2 base-i social covaria… educ… eduYears 1 1990 <NA> z1e10 General ed… "1 =… <NA> ifels… NA NA
## 3 base-i demograp… covaria… gend… sex 1 1990 esex z1esex Sex of par… <NA> <NA> ifels… NA NA
## 4 base-i demograp… covaria… gend… sex 2 1993 esex z2esex Sex of par… <NA> <NA> ifels… NA NA
## 5 base-i demograp… covaria… gend… sex 3 1995 esex z3esex Sex of par… <NA> <NA> ifels… NA NA
## 6 base-i demograp… covaria… gend… sex 4 1997 esex z4esex Sex of par… <NA> <NA> ifels… NA NA
## 7 base-i demograp… covaria… gend… sex 5 2000 esex z5esex Sex of par… <NA> <NA> ifels… NA NA
## 8 base-i demograp… covaria… gend… sex 6 2004 esex z6esex Sex of par… <NA> <NA> ifels… NA NA
## 9 base-i demograp… covaria… gend… sex 7 2005 esex z7esex Sex of par… <NA> <NA> ifels… NA NA
## 10 base-i demograp… covaria… SRhe… SRhealth 1 1990 E13X… z1e13x1r How would … "1 =… yes ifels… 1 5
## # ℹ 128 more rows
## # ℹ 2 more variables: comp_rule <chr>, long_rule <chr>
2.1.2 Recoding & Reverse Scoring
rename_fun <- function(cb, var){
old.names <- unique((basei_codebook %>% filter(name == var))$orig_itemname)
df <- basei %>%
select(SID = id, one_of(old.names)) %>%
gather(key = orig_itemname, value = value, -SID, na.rm=T)
if(length(old.names) > 1){
df %>% left_join(cb %>% select(itemname, wave, orig_itemname, reverse_code:long_rule))
} else {
df %>% left_join(cb %>% select(itemname, orig_itemname, reverse_code:long_rule) %>% distinct()) %>% mutate(wave = "0")
}
}
`
# join data with recoding info
basei_recode <- basei_codebook %>%
select(category, name, itemname, wave, orig_itemname, reverse_code:long_rule) %>%
filter(category %in% c("pers", "outcome", "covariates")) %>%
group_by(category, name) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, name, rename_fun))
# recode
recode_fun <- function(rule, y, year){
x <- y$value
if(!is.na(rule)){y$value <- eval(parse(text = rule))}
return(y)
}
basei_recode <- basei_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(wave = as.numeric(wave)) %>%
group_by(recode, wave) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(recode, data, wave), recode_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(value < 0 | is.nan(value) | is.infinite(value), NA, value))))
# reverse code
basei_recode <- basei_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(value = ifelse(reverse_code == "no" | is.na(reverse_code), value,
reverse.code(-1, value, mini = mini, maxi = maxi)))))
fun_call <- function(x, rule){
switch(rule,
average = mean(x, na.rm = T),
mode = Mode(x)[1],
sum = sum(x, na.rm = T),
skip = unique(x)[1],
select = unique(x)[1],
max = max(x, na.rm = T),
min = min(x, na.rm = T))
}2.1.3 Covariates
# composite WITHIN years
basei_cov <- basei_recode %>%
filter(category == "covariates") %>%
select(-category) %>%
mutate(data = map(data, ~(.) %>%
filter(!is.na(value)) %>%
mutate(comp_rule = ifelse(is.na(comp_rule), "skip", comp_rule)) %>%
group_by(comp_rule, SID, wave, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, ~fun_call((.x)$value, .y))) %>%
unnest(data) %>%
filter(wave <= 1)
))
comp_fun <- function(d, rule){
d %>%
group_by(SID, name) %>%
summarize(value = fun_call(data, rule)) %>%
ungroup() %>%
distinct()
}
# composite ACROSS years
basei_cov <- basei_cov %>%
unnest(data) %>%
mutate(long_rule = ifelse(is.na(long_rule), "skip", long_rule)) %>%
group_by(long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(data, long_rule), comp_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value)) %>%
select(-long_rule) %>%
spread(name, value) %>%
filter(!is.na(SID))2.1.4 Personality Variables
basei_pers <- basei_recode %>%
filter(category == "pers") %>%
select(-category) %>%
unnest(data) %>%
filter(wave == 1 & !is.na(value)) %>%
group_by(SID, wave, name, itemname, comp_rule) %>%
summarize(value = mean(value, na.rm = T)) %>%
ungroup()
# alpha's
basei_alpha <- basei_pers %>%
filter(!is.na(value)) %>%
select(name, itemname, wave, SID, value) %>%
group_by(name, wave) %>%
nest() %>%
mutate(data = map(data, ~(.) %>% distinct() %>% pivot_wider(names_from = itemname, values_from = value, values_fn = list(mean))),
alpha = map(data, possibly(~psych::alpha((.) %>% select(-SID), check.keys = T), NA_real_)))
comp_fun <- function(df, rule){
df %>%
group_by(SID) %>%
summarize(value = fun_call(value, rule)) %>%
ungroup() %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value))
}
# create composites
basei_pers <- basei_pers %>%
group_by(name, comp_rule, wave) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, comp_fun)) %>%
unnest(data) %>%
select(-comp_rule)2.1.5 Outcome Variables
basei_cog_waves <- basei_recode %>%
filter(category == "outcome") %>%
unnest(data) %>%
group_by(SID, name) %>%
summarize(o_year = max(wave)) %>%
ungroup()
# composite within years
basei_out <- basei_recode %>%
filter(category == "outcome") %>%
select(-category) %>%
unnest(data) %>%
right_join(basei_cog_waves) %>%
filter(wave == o_year & wave > 1) %>%
group_by(name, wave, SID) %>%
summarize(value = sum(value, na.rm = T)) %>%
ungroup()2.1.6 Combine Data
basei_combined <- basei_pers %>%
rename(Trait = name, p_value = value, p_year = wave) %>%
full_join(
basei_out %>%
rename(Outcome = name, o_value = value, o_year = wave)
) %>% full_join(basei_cov) %>%
filter(!is.na(p_value) & !is.na(o_value))2.2 Einstein Aging Study (EAS)
The Einstein Aging Study is an ongoing longitudinal study of older adults in the United States that began in 1980. The data are available on a project-by-project basis by submitting a concept proposal at https://www.einstein.yu.edu/departments/neurology/clinical-research-program/eas/data-sharing.aspx.
Since 1993, the EAS has systematically recruited a representative aging sample in the Bronx, New York. As of 2017, 2,600 participants were enrolled in the study. As of 2010, approximately 200 of the enrolled participants had autopsy data. More information on the study can be found at http://www.einstein.yu.edu/departments/neurology/clinical-research-program/EAS/.
Sample sizes vary over time, with ranges across waves not publicly available. However, we suspect approximately 2,000 participants to have basic personality and cognitive ability data. This yields 99% power to detect a zero-order correlation effect size of .10, two-tailed at alpha .05.
2.2.1 Load Data
## # A tibble: 16 × 15
## study dataset category name itemname year orig_itemname description scale reverse_code recode mini maxi comp_rule
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <lgl> <lgl> <chr>
## 1 EAS <NA> match education educati… base… Educyrs Education <NA> NA <NA> NA NA skip
## 2 EAS <NA> match gender gender annu… Gender Gender <NA> NA ifels… NA NA skip
## 3 EAS <NA> match SRhealth SRhealth annu… <NA> Self-rated… <NA> NA code … NA NA skip
## 4 EAS <NA> match age age long… AgeAtWave Raw age inte… NA ifels… NA NA skip
## 5 EAS <NA> outcome crystalli… BosNami… annu… BostonFree Word Fluen… inte… NA ifels… NA NA sum
## 6 EAS <NA> outcome crystalli… waisInfo annu… Infraw WAIS III: … inte… NA ifels… NA NA sum
## 7 EAS dataset pers A ipipApos 1 Agreeablenes… IPIP NEO A 5 to… NA ifels… NA NA average
## 8 EAS dataset pers C ipipCpos 1 Conscientiou… IPIP NEO C 5 to… NA ifels… NA NA average
## 9 EAS dataset pers E ipipEpos 1 Extraversion… IPIP NEO E 5 to… NA ifels… NA NA average
## 10 EAS dataset pers N ipipNpos 1 NeuroticismP… IPIP NEO N 5 to… NA ifels… NA NA average
## 11 EAS dataset pers O ipipOpos 1 OpenToExperi… IPIP NEO O 5 to… NA ifels… NA NA average
## 12 EAS dataset pers A ipipAneg 1 Agreeablenes… IPIP NEO A 5 to… NA ifels… NA NA average
## 13 EAS dataset pers C ipipCneg 1 Conscientiou… IPIP NEO C 5 to… NA ifels… NA NA average
## 14 EAS dataset pers E ipipEneg 1 Extraversion… IPIP NEO E 5 to… NA ifels… NA NA average
## 15 EAS dataset pers N ipipNneg 1 NeuroticismN… IPIP NEO N 5 to… NA ifels… NA NA average
## 16 EAS dataset pers O ipipOneg 1 OpenToExperi… IPIP NEO O 5 to… NA ifels… NA NA average
## # ℹ 1 more variable: long_rule <chr>
# old.names1 <- unique((eas_codebook %>% filter(dataset == "jrp"))$orig_itemname)
# old.names2 <- unique((eas_codebook %>% filter(dataset == "dataset"))$orig_itemname)
old.names <- unique(eas_codebook$orig_itemname)
eas <- sprintf("%s/eas/Behavior_with_Master_Data_2021_09_24.xlsx", data_path) %>%
read_excel(.) %>%
select(SID = Id, wave = Wave, year = BehaviorDate, one_of(old.names)) %>%
mutate(year = lubridate::year(year)
, Gender = as.numeric(mapvalues(Gender, c("M", "F"), c(0,1))))
# full_join(
# sprintf("%s/eas/dataset.xlsx", data_path) %>%
# read_excel(.) %>%
# select(SID = id, one_of(old.names2))
# )
eas_long <- eas %>%
pivot_longer(values_to = "value"
, names_to = "orig_itemname"
, cols = c(-SID, -wave, -year)
, values_drop_na = T)2.2.2 Recoding & Reverse Scoring
# join data with recoding info
eas_recode <- eas_codebook %>%
filter(category %in% c("pers", "outcome", "covariates") & !is.na(orig_itemname)) %>%
select(category, name, itemname, orig_itemname, reverse_code:long_rule) %>%
group_by(category, name) %>%
nest() %>%
ungroup() %>%
mutate(data = map(data, ~(.) %>% left_join(eas_long)))
# recode
recode_fun <- function(rule, y, year){
x <- y$value
if(!is.na(rule)){y$value <- eval(parse(text = rule))}
return(y)
}
eas_recode <- eas_recode %>%
mutate(data = map(data, ~(.) %>%
group_by(recode, year) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(recode, data, year), recode_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(value < 0 | is.nan(value) | is.infinite(value), NA, value))))
# reverse code
eas_recode <- eas_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(value = ifelse(reverse_code == "no" | is.na(reverse_code), value,
reverse.code(-1, value, mini = mini, maxi = maxi)))))
fun_call <- function(x, rule){
switch(rule,
average = mean(x, na.rm = T),
mode = Mode(x)[1],
sum = sum(x, na.rm = T),
skip = unique(x)[1],
select = unique(x)[1],
max = max(x, na.rm = T),
min = min(x, na.rm = T))
}2.2.3 Covariates
# composite WITHIN years
eas_cov <- eas_recode %>%
filter(category == "covariates") %>%
select(-category) %>%
mutate(data = map(data, ~(.) %>%
filter(!is.na(value)) %>%
left_join(eas_waves) %>%
mutate(comp_rule = ifelse(is.na(comp_rule), "skip", comp_rule)) %>%
group_by(comp_rule, SID, year, p_year, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, ~fun_call((.x)$value, .y))) %>%
unnest(data) %>%
filter(year <= p_year)
# summarize(value = fun_call(value, comp_rule)) %>%
# ungroup() %>%
# mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value))
))
comp_fun <- function(d, rule, p_year){
print(paste(rule, p_year))
d %>%
group_by(SID, name) %>%
summarize(value = fun_call(data, rule)) %>%
ungroup() %>%
distinct()
}
# composite ACROSS years
eas_cov <- eas_cov %>%
unnest(data) %>%
mutate(long_rule = ifelse(is.na(long_rule), "skip", long_rule)) %>%
filter(year <= p_year) %>%
group_by(p_year, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(data, long_rule, p_year), comp_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value)) %>%
select(-long_rule) %>%
spread(name, value) %>%
filter(!is.na(SID))2.2.4 Personality Variables
eas_pers <- eas_recode %>%
filter(category == "pers") %>%
select(-category) %>%
unnest(data) %>%
left_join(eas_waves) %>%
group_by(SID, p_year, year, name, itemname, comp_rule) %>%
summarize(value = mean(value, na.rm = T)) %>%
ungroup()
# alpha's
eas_alpha <- eas_pers %>%
filter(!is.na(value)) %>%
select(name, itemname, year, SID, value) %>%
group_by(name, year) %>%
nest() %>%
mutate(data = map(data, ~(.) %>% distinct() %>% pivot_wider(names_from = itemname, values_from = value, values_fn = list(mean))),
alpha = map(data, possibly(~psych::alpha((.) %>% select(-SID)), NA_real_)))
comp_fun <- function(df, rule){
df %>%
group_by(SID) %>%
summarize(value = fun_call(value, rule)) %>%
ungroup() %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value))
}
# create composites
eas_pers <- eas_pers %>%
group_by(name, comp_rule, year) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, comp_fun)) %>%
unnest(data) %>%
select(-comp_rule)2.2.5 Outcome Variables
# composite within years
eas_out <- eas_recode %>%
filter(category == "outcome") %>%
select(-category) %>%
unnest(data) %>%
group_by(SID, name) %>%
filter(year == max(year)) %>%
filter(!is.na(value)) %>%
group_by(name, itemname, year) %>%
mutate(value = pomp(value)) %>%
group_by(name, year, SID) %>%
summarize(value = mean(value, na.rm = T)) %>%
ungroup() 2.2.6 Combine Data
eas_combined <- eas_pers %>%
rename(Trait = name, p_value = value, p_year = year) %>%
full_join(eas_out %>% rename(Outcome = name, o_value = value, o_year = year)) %>%
full_join(eas_cov) %>%
filter(!is.na(p_value) & !is.na(o_value))2.3 German Socioeconomic Panel (GSOEP)
The German Socioeconomic Panel Study (GSOEP; Socio-Economic Panel, 2017) is an ongoing longitudinal study of Germans collected by the German Institute of Economic Research (DIW Berlin). The data are freely available at https://www.diw.de/soep by application.
Data have been collected annually since 1984 (the latest data release includes data up to 2017). Participants have been recruited from more than 11,000 households, which are nationally representative of private German households. 20,000 individuals are sampled each year, on average. It is critical to note that the GSOEP samples households, not individuals, and the households consist of individuals living in both the “old” and “new” federal states (the former West and East Germany), foreigners, and recent immigrants to Germany.
Sample size varies by year, ranging from approximately 10,000 (1989) to 31,000 (2013). This provides 99% power to detect a zero-order correlation effect size of ~.06, two-tailed at alpha < .05.
2.3.1 Load Data
gsoep_read_fun <- function(Year, WL){
old.names <- (gsoep_codebook %>% filter(year == Year | category == "proc"))$orig_itemname
p <- sprintf("%s/gsoep/%sp.sav", data_path, WL) %>% haven::read_sav(.) %>%
full_join(sprintf("%s/gsoep/%skind.sav", data_path, WL) %>% haven::read_sav(.)) %>%
full_join(sprintf("%s/gsoep/%spequiv.sav", data_path, WL) %>% haven::read_sav(.)) %>%
full_join(sprintf("%s/gsoep/%spgen.sav", data_path, WL) %>% haven::read_sav(.)) %>%
full_join(sprintf("%s/gsoep/%spkal.sav", data_path, WL) %>% haven::read_sav(.)) %>%
select(one_of(old.names)) %>%
gather(key = orig_itemname, value = value, -persnr, -hhnr, na.rm = T)
sprintf("%s/gsoep/%shbrutto.sav", data_path, WL) %>% haven::read_sav(.) %>%
full_join(sprintf("%s/gsoep/%sh.sav", data_path, WL) %>% haven::read_sav(.)) %>%
select(one_of(old.names)) %>%
gather(key = orig_itemname, value = value, -hhnr, na.rm = T) %>%
full_join(p %>% select(persnr, hhnr) %>% distinct()) %>%
full_join(p)
}gsoep_codebook <- (codebook %>% filter(study == "GSOEP"))$codebook[[1]] %>%
mutate(orig_itemname = str_to_lower(orig_itemname))
gsoep_codebook## # A tibble: 250 × 17
## study dataset category name itemname wave waveletter year orig_itemname description scale reverse_code recode mini
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 gsoep apequiv covariates age age 1 a 1984 d1110184 Age of Indiv… <NA> <NA> ifels… NA
## 2 gsoep bpequiv covariates age age 2 b 1985 d1110185 Age of Indiv… <NA> <NA> ifels… NA
## 3 gsoep cpequiv covariates age age 3 c 1986 d1110186 Age of Indiv… <NA> <NA> ifels… NA
## 4 gsoep dpequiv covariates age age 4 d 1987 d1110187 Age of Indiv… <NA> <NA> ifels… NA
## 5 gsoep epequiv covariates age age 5 e 1988 d1110188 Age of Indiv… <NA> <NA> ifels… NA
## 6 gsoep fpequiv covariates age age 6 f 1989 d1110189 Age of Indiv… <NA> <NA> ifels… NA
## 7 gsoep gpequiv covariates age age 7 g 1990 d1110190 Age of Indiv… <NA> <NA> ifels… NA
## 8 gsoep hpequiv covariates age age 8 h 1991 d1110191 Age of Indiv… <NA> <NA> ifels… NA
## 9 gsoep ipequiv covariates age age 9 i 1992 d1110192 Age of Indiv… <NA> <NA> ifels… NA
## 10 gsoep jpequiv covariates age age 10 j 1993 d1110193 Age of Indiv… <NA> <NA> ifels… NA
## # ℹ 240 more rows
## # ℹ 3 more variables: maxi <dbl>, comp_rule <chr>, long_rule <chr>
gsoep <- gsoep_codebook %>%
select(wave, waveletter, year) %>%
filter(complete.cases(.)) %>%
distinct() %>%
arrange(year) %>%
filter(year != "2018") %>%
mutate(data = map2(year, waveletter, gsoep_read_fun))
old.names <- unique(gsoep_codebook$orig_itemname)
gsoep_cog <- sprintf("%s/gsoep/cognit.sav", data_path) %>% haven::read_sav(.) %>%
select(persnr, hhnr, year = syear, one_of(old.names)) %>%
filter(year == 2012) %>%
haven::zap_labels(.) %>%
select(-hhnr) %>%
gather(key = orig_itemname, value = value, -persnr, -year, na.rm = T)
gsoep_cog_subs <- unique(gsoep_cog$persnr)
gsoep_long <- gsoep %>% unnest(data) %>%
select(-hhnr) %>%
filter(persnr %in% gsoep_cog_subs) %>%
full_join(gsoep_cog) %>%
select(-wave, -waveletter) %>%
rename(SID = persnr)
save(gsoep, file = sprintf("%s/data/clean/gsoep_raw.RData", local_path))
rm(gsoep)2.3.2 Recoding & Reverse Scoring
gsoep_waves <- p_waves %>% filter(Study == "GSOEP") %>% select(Used) %>% distinct()
# join data with recoding info
gsoep_recode <- gsoep_codebook %>%
filter(category %in% c("pers", "outcome", "covariates")) %>%
select(category, name, itemname, wave, year, orig_itemname, reverse_code:long_rule) %>%
group_by(category, name) %>%
nest() %>%
ungroup() %>%
mutate(data = map(data, ~(.) %>% left_join(gsoep_long)))
# recode
recode_fun <- function(rule, y, year, p_year){
x <- y$value
if(!is.na(rule)){y$value <- eval(parse(text = rule))}
return(y)
}
gsoep_recode <- gsoep_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(p_year = 2005) %>%
group_by(recode, year, p_year) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(recode, data, year, p_year), recode_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(value < 0 | is.nan(value) | is.infinite(value), NA, value))))
# reverse code
gsoep_recode <- gsoep_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(value = ifelse(reverse_code == "no" | is.na(reverse_code), value,
reverse.code(-1, value, mini = mini, maxi = maxi)))))
fun_call <- function(x, rule){
switch(rule,
average = mean(x, na.rm = T),
mode = Mode(x)[1],
sum = sum(x, na.rm = T),
skip = unique(x)[1],
select = unique(x)[1],
max = max(x, na.rm = T),
min = min(x, na.rm = T))
}2.3.3 Covariates
# composite WITHIN years
gsoep_cov <- gsoep_recode %>%
filter(category == "covariates") %>%
select(-category) %>%
mutate(data = map(data, ~(.) %>%
filter(!is.na(value)) %>%
mutate(comp_rule = ifelse(is.na(comp_rule), "skip", comp_rule)) %>%
group_by(comp_rule, SID, year, p_year, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, ~fun_call((.x)$value, .y))) %>%
unnest(data) %>%
filter(year <= p_year)
# summarize(value = fun_call(value, comp_rule)) %>%
# ungroup() %>%
# mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value))
))
comp_fun <- function(d, rule, p_year){
print(paste(rule, p_year))
d %>%
group_by(SID, name) %>%
summarize(value = fun_call(data, rule)) %>%
ungroup() %>%
distinct()
}
# composite ACROSS years
gsoep_cov <- gsoep_cov %>%
unnest(data) %>%
mutate(long_rule = ifelse(is.na(long_rule), "skip", long_rule)) %>%
filter(year <= p_year) %>%
group_by(p_year, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(data, long_rule, p_year), comp_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value)) %>%
select(-long_rule) %>%
spread(name, value) %>%
filter(!is.na(SID))2.3.4 Personality Variables
gsoep_pers <- gsoep_recode %>%
filter(category == "pers") %>%
select(-category) %>%
unnest(data) %>%
filter(year == "2005" & !is.na(value)) %>%
group_by(SID, p_year, year, name, itemname, comp_rule) %>%
summarize(value = mean(value, na.rm = T)) %>%
ungroup()
# alpha's
gsoep_alpha <- gsoep_pers %>%
filter(!is.na(value)) %>%
select(name, itemname, year, SID, value) %>%
group_by(name, year) %>%
nest() %>%
mutate(data = map(data, ~(.) %>% distinct() %>% pivot_wider(names_from = itemname, values_from = value, values_fn = list(mean))),
alpha = map(data, possibly(~psych::alpha((.) %>% select(-SID)), NA_real_)))
comp_fun <- function(df, rule){
df %>%
group_by(SID) %>%
summarize(value = fun_call(value, rule)) %>%
ungroup() %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value))
}
# create composites
gsoep_pers <- gsoep_pers %>%
group_by(name, comp_rule, year) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, comp_fun)) %>%
unnest(data) %>%
select(-comp_rule)2.3.6 Combine Data
gsoep_combined <- gsoep_pers %>%
rename(Trait = name, p_value = value, p_year = year) %>%
full_join(gsoep_out %>% rename(Outcome = name, o_value = value, o_year = year)) %>%
full_join(gsoep_cov) %>%
filter(!is.na(p_value) & !is.na(o_value))2.4 Household, Income, and Labour Dynamics in Australia (HILDA)
The Household Income and Labour Dynamics in Australia (HILDA; Wilkins, Laß, Butterworth, & Vera-Toscano, 2019) study is an ongoing longitudinal study of Australian households. These data are available through application from https://melbourneinstitute.unimelb.edu.au/hilda/for-data-users.
Participants were recruited from more than 17,000 individuals. Data have been collected annually since 2001. The latest data release includes 17 waves of data from 2001 to 2017. More documentation can be found in the HILDA data dictionary at https://www.online.fbe.unimelb.edu.au/HILDAodd/srchSubjectAreas.aspx.
Sample sizes vary by year, ranging from 12,408 (2004) to 17,693 (2016). This provides 99% power to detect a zero-order correlation effect size of ~.03, two tailed at alpha .05.
2.4.1 Load Data
hilda_read_fun <- function(Year, WL){
old.names <- (hilda_codebook %>% filter(year == Year | year == 0))$orig_itemname
sprintf("%s/hilda/Combined %s180c.sav", data_path, WL) %>% haven::read_sav(.) %>%
select(one_of(old.names))
}## # A tibble: 217 × 17
## study dataset category name itemname wave_letter ...7 year orig_itemname description scale reverse_code recode mini
## <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 hilda NA covariates educa… educati… A 01 2001 aedhigh1 Highest ed… 1=po… <NA> ifels… NA
## 2 hilda NA covariates educa… educati… B 02 2002 bedhigh1 Highest ed… 1=po… <NA> ifels… NA
## 3 hilda NA covariates educa… educati… C 03 2003 cedhigh1 Highest ed… 1=po… <NA> ifels… NA
## 4 hilda NA covariates educa… educati… D 04 2004 dedhigh1 Highest ed… 1=po… <NA> ifels… NA
## 5 hilda NA covariates educa… educati… E 05 2005 eedhigh1 Highest ed… 1=po… <NA> ifels… NA
## 6 hilda NA covariates educa… educati… F 06 2006 fedhigh1 Highest ed… 1=po… <NA> ifels… NA
## 7 hilda NA covariates educa… educati… G 07 2007 gedhigh1 Highest ed… 1=po… <NA> ifels… NA
## 8 hilda NA covariates educa… educati… H 08 2008 hedhigh1 Highest ed… 1=po… <NA> ifels… NA
## 9 hilda NA covariates educa… educati… I 09 2009 iedhigh1 Highest ed… 1=po… <NA> ifels… NA
## 10 hilda NA covariates educa… educati… J 10 2010 jedhigh1 Highest ed… 1=po… <NA> ifels… NA
## # ℹ 207 more rows
## # ℹ 3 more variables: maxi <dbl>, comp_rule <chr>, long_rule <chr>
2.4.2 Recoding & Reverse Scoring
hilda_waves <- p_waves %>% filter(Study == "HILDA") %>% select(Used) %>% distinct()
rename_fun <- function(df, Year){
df <- df %>%
haven::zap_labels(.) %>%
gather(key = orig_itemname, value = value, -xwaveid, na.rm=T) %>%
mutate(value=as.numeric(value))
hilda_codebook %>%
select(category, name, itemname, year, orig_itemname, reverse_code:long_rule) %>%
filter(year == Year & category %in% c("pers", "outcome", "covariates")) %>%
full_join(df)
}
# join data with recoding info
hilda_recode <- hilda %>%
mutate(data = map2(data, year, rename_fun)) %>%
select(-year, -wave_letter) %>%
unnest(data) %>%
distinct() %>%
rename(SID = xwaveid) %>%
group_by(category, name) %>%
nest() %>%
ungroup()
# recode
recode_fun <- function(rule, y, year, p_year){
x <- y$value
if(!is.na(rule)){y$value <- eval(parse(text = rule))}
return(y)
}
hilda_recode <- hilda_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(p_year = 2005) %>%
group_by(recode, year, p_year) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(recode, data, year, p_year), recode_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(value < 0 | is.nan(value) | is.infinite(value), NA, value))))
# reverse code
hilda_recode <- hilda_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(value = ifelse(reverse_code == "no" | is.na(reverse_code), value,
reverse.code(-1, value, mini = mini, maxi = maxi)))))
fun_call <- function(x, rule){
switch(rule,
average = mean(x, na.rm = T),
mode = Mode(x)[1],
sum = sum(x, na.rm = T),
skip = unique(x)[1],
select = unique(x)[1],
max = max(x, na.rm = T),
min = min(x, na.rm = T))
}2.4.3 Covariates
# composite WITHIN years
hilda_cov <- hilda_recode %>%
filter(category == "covariates") %>%
select(-category) %>%
mutate(data = map(data, ~(.) %>%
filter(!is.na(value)) %>%
mutate(comp_rule = ifelse(is.na(comp_rule), "skip", comp_rule)) %>%
group_by(comp_rule, SID, year, p_year, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, ~fun_call((.x)$value, .y))) %>%
unnest(data) %>%
filter(year <= p_year)
# summarize(value = fun_call(value, comp_rule)) %>%
# ungroup() %>%
# mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value))
))
comp_fun <- function(d, rule, p_year){
print(paste(rule, p_year))
d %>%
group_by(SID, name) %>%
summarize(value = fun_call(data, rule)) %>%
ungroup() %>%
distinct()
}
# composite ACROSS years
hilda_cov <- hilda_cov %>%
unnest(data) %>%
mutate(long_rule = ifelse(is.na(long_rule), "skip", long_rule)) %>%
filter(year <= p_year) %>%
group_by(p_year, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(data, long_rule, p_year), comp_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value)) %>%
select(-long_rule) %>%
spread(name, value) %>%
filter(!is.na(SID))2.4.4 Personality Variables
hilda_pers <- hilda_recode %>%
filter(category == "pers") %>%
select(-category) %>%
unnest(data) %>%
filter(year == "2005" & !is.na(value)) %>%
group_by(SID, p_year, year, name, itemname, comp_rule) %>%
summarize(value = mean(value, na.rm = T)) %>%
ungroup()
# alpha's
hilda_alpha <- hilda_pers %>%
filter(!is.na(value)) %>%
select(name, itemname, year, SID, value) %>%
group_by(name, year) %>%
nest() %>%
mutate(data = map(data, ~(.) %>% distinct() %>% pivot_wider(names_from = itemname, values_from = value, values_fn = list(mean))),
alpha = map(data, possibly(~psych::alpha((.) %>% select(-SID)), NA_real_)))
comp_fun <- function(df, rule){
df %>%
group_by(SID) %>%
summarize(value = fun_call(value, rule)) %>%
ungroup() %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value))
}
# create composites
hilda_pers <- hilda_pers %>%
group_by(name, comp_rule, year) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, comp_fun)) %>%
unnest(data) %>%
select(-comp_rule)2.4.6 Combine Data
hilda_combined <- hilda_pers %>%
rename(Trait = name, p_value = value, p_year = year) %>%
full_join(hilda_out %>% rename(Outcome = name, o_value = value, o_year = year)) %>%
full_join(hilda_cov) %>%
filter(!is.na(p_value) & !is.na(o_value))2.5 Health and Retirement Study (HRS)
The Health and Retirement Study [HRS; (juster1995overview?)] is an ongoing longitudinal study of households in the United States. These data are available at https://hrs.isr.umich.edu by creating a free account.
Participants were recruited from more than 35,000 individuals from the financial households of individuals born between 1931 and 1941 in the US. Data have been collected biannually since 1992. The latest data release includes data up to 2016. On average, 10,000 individuals are sampled each wave More information on the HRS can be found at https://hrs.isr.umich.edu/documentation/survey-design, but, in short, the HRS is a nationally representative sample of adults over 50 in the US. It is critical to note that the HRS samples households of the original cohort and follows individuals and their spouses or partners until their death.
Sample size varies by year, ranging from approximately 7,500 (2014) to 15,500 (1992). (https://hrs.isr.umich.edu/sites/default/files/biblio/ResponseRates_2017.pdf). This provides 99% power to detect a zero-order correlation effect size of ~.04, two-tailed at alpha .05.
2.5.1 Load Data
hrs_read_fun <- function(year) {
read_da <- function(da, dct, Year){
print(paste(da, dct, year, sep = " "))
data.file <- sprintf("%s/hrs/%s/%s", data_path, Year, da)
# Set path to the dictionary file "*.DCT"
dict.file <- sprintf("%s/hrs/%s/%s", data_path, Year, dct)
# Read the dictionary file
df.dict <- read.table(dict.file, skip = 1, fill = TRUE, stringsAsFactors = FALSE)
# Set column names for dictionary dataframe
colnames(df.dict) <- c("col.num","col.type","col.name","col.width","col.lbl")
# Remove last row which only contains a closing }
row <- which(df.dict$col.name == "HHID")
df.dict <- df.dict[-nrow(df.dict),]
if(row == 2){df.dict <- df.dict[-1,]}
# Extract numeric value from column width field
df.dict$col.width <- as.integer(sapply(df.dict$col.width, gsub, pattern = "[^0-9\\.]", replacement = ""))
# Convert column types to format to be used with read_fwf function
df.dict$col.type <- sapply(df.dict$col.type, function(x) ifelse(x %in% c("int","byte","long"), "i", ifelse(x == "float", "n", ifelse(x == "double", "d", "c"))))
# Read the data file into a dataframe
df <- read_fwf(file = data.file, fwf_widths(widths = df.dict$col.width, col_names = df.dict$col.name), col_types = paste(df.dict$col.type, collapse = ""))
# Add column labels to headers
attributes(df)$variable.labels <- df.dict$col.lbl
old.names <- (hrs_codebook %>% filter(year == Year))$orig_itemname
if(any(c("PN", "HHID") %in% colnames(df)) & any(old.names %in% colnames(df))){
# if(any(c("PN", "HHID") %in% colnames(df))){
df <- df %>%
mutate(hhidpn = 1000*as.numeric(HHID) + as.numeric(PN)) %>%
select(one_of(c("PN", "HHID")), one_of(old.names)) %>%
distinct()
# gather(key = item, value = value, -hhidpn)
} else {df <- NA}
return(df)
}
# Set path to the data file "*.DA"
files <- list.files(sprintf("%s/hrs/%s", data_path, year))
df2 <- tibble(
da = files[grepl(".da", files) | grepl(".DA", files)],
dct = files[grepl(".dct", files) | grepl(".DCT", files)]
) %>%
mutate(data = map2(da, dct, possibly(~read_da(.x, .y, year), NA_real_))) %>%
filter(!is.na(data)) %>%
select(-da, -dct)
if(nrow(df2) != 0){df2$data %>% reduce(full_join) %>% distinct()} else {NA}
}hrs_codebook <- (codebook %>% filter(study == "HRS"))$codebook[[1]] %>%
mutate(orig_itemname = str_to_upper(orig_itemname)) %>%
mutate_at(vars(orig_itemname, name, itemname), ~str_remove_all(., "[[:space:]]"))
hrs_codebook## # A tibble: 183 × 17
## study dataset category name itemname wave waveletter year orig_itemname description scale reverse_code recode mini
## <chr> <chr> <chr> <chr> <chr> <dbl> <lgl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 hrs Rand covariates educat… eduYears 1 NA 1992 RAEDYRS R Years of… ".M=… <NA> ifels… NA
## 2 hrs Rand covariates gender demgend… 1 NA 1992 RAGENDER R Gender ".M=… <NA> ifels… NA
## 3 hrs Rand covariates SRheal… rHlthSR 1 NA 1992 R1SHLT W1 Self-re… "1.E… yes ifels… 1
## 4 hrs Rand covariates SRheal… rHlthSR 2 NA 1994 R2SHLT W2 Self-re… "1.E… yes ifels… 1
## 5 hrs Rand covariates SRheal… rHlthSR 3 NA 1996 R3SHLT W3 Self-re… "1.E… yes ifels… 1
## 6 hrs Rand covariates SRheal… rHlthSR 4 NA 1998 R4SHLT W4 Self-re… "1.E… yes ifels… 1
## 7 hrs Rand covariates SRheal… rHlthSR 5 NA 2000 R5SHLT W5 Self-re… "1.E… yes ifels… 1
## 8 hrs Rand covariates SRheal… rHlthSR 6 NA 2002 R6SHLT W6 Self-re… "1.E… yes ifels… 1
## 9 hrs Rand covariates SRheal… rHlthSR 7 NA 2004 R7SHLT W7 Self-re… "1.E… yes ifels… 1
## 10 hrs Rand covariates SRheal… rHlthSR 8 NA 2006 R8SHLT W8 Self-re… "1.E… yes ifels… 1
## # ℹ 173 more rows
## # ℹ 3 more variables: maxi <dbl>, comp_rule <chr>, long_rule <chr>
old.names <- unique(hrs_codebook$orig_itemname)
hrs.paq <- tibble(
year = sprintf("%s/hrs", data_path) %>% list.files(., pattern = "^[0-9]")
, data = map(year, hrs_read_fun)
, names = map(data, colnames)
) %>%
filter(!is.na(data))
old.names <- unique((hrs_codebook %>% filter(dataset == "Rand"))$orig_itemname)
hrs.rand <- sprintf("%s/hrs/randhrs1992_2016v1.sav", data_path) %>%
haven::read_sav(.) %>%
haven::zap_labels(.) %>%
select(SID = HHIDPN, one_of(old.names)) %>%
gather(key = orig_itemname, value = value, -SID, na.rm = T)
hrs_long <- hrs.paq %>%
mutate(data = map(data, ~(.) %>%
gather(key = orig_itemname, value = value, -HHID, -PN))) %>%
select(-names, -year) %>%
unnest(data) %>%
mutate(SID = 1000*as.numeric(HHID) + as.numeric(PN)) %>%
select(-PN, -HHID)
hrs.subs <- unique(hrs_long$SID)[unique(hrs_long$SID) %in% unique(hrs.rand$SID)]
hrs_long <- hrs_long %>%
bind_rows(hrs.rand %>% select(orig_itemname, value, SID)) %>%
filter(SID %in% hrs.subs)
save(hrs.rand, hrs.paq, file = sprintf("%s/data/clean/hrs_raw.RData", local_path))
rm(list = c("hrs.paq", "hrs.rand"))2.5.2 Recoding & Reverse Scoring
hrs_waves <- p_waves %>% filter(Study == "HRS") %>% select(Used) %>% distinct()
# join data with recoding info
hrs_recode <- hrs_codebook %>%
filter(category %in% c("pers", "outcome", "covariates")) %>%
select(category, name, itemname, wave, year, orig_itemname, reverse_code:long_rule) %>%
group_by(category, name) %>%
nest() %>%
ungroup() %>%
mutate(data = map(data, ~(.) %>% left_join(hrs_long)))
# recode
recode_fun <- function(rule, y, year, p_year){
x <- y$value
if(!is.na(rule)){y$value <- eval(parse(text = rule))}
return(y)
}
hrs_recode <- hrs_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(year = mapvalues(year, seq(2006, 2016, 2), rep(c(2006, 2010, 2014), each = 2)),
p_year = 2006) %>%
group_by(recode, year, p_year) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(recode, data, year, p_year), recode_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(value < 0 | is.nan(value) | is.infinite(value), NA, value))))
# reverse code
hrs_recode <- hrs_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(value = ifelse(reverse_code == "no" | is.na(reverse_code), value,
reverse.code(-1, value, mini = mini, maxi = maxi)))))
fun_call <- function(x, rule){
switch(rule,
average = mean(x, na.rm = T),
mode = Mode(x)[1],
sum = sum(x, na.rm = T),
skip = unique(x)[1],
select = unique(x)[1],
max = max(x, na.rm = T),
min = min(x, na.rm = T))
}2.5.3 Covariates
# compositing within years
year_comp_fun <- function(df, rule){
df %>%
# group by person and item (collapse across age)
group_by(SID, name, year, p_year, long_rule) %>%
summarize(value = fun_call(value, rule)) %>%
ungroup() %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value))
}
# composite WITHIN years
hrs_cov <- hrs_recode %>%
filter(category == "covariates") %>%
select(-category) %>%
unnest(data) %>%
mutate(comp_rule = ifelse(is.na(comp_rule), "skip", comp_rule)) %>%
group_by(comp_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, year_comp_fun)) %>%
unnest(data)
comp_fun <- function(d, rule, p_year){
d %>%
filter(year <= p_year) %>%
group_by(SID, name) %>%
summarize(value = fun_call(value, rule)) %>%
ungroup() %>%
distinct()
}
# composite ACROSS years
hrs_cov <- hrs_cov %>%
mutate(long_rule = ifelse(is.na(long_rule), "skip", long_rule)) %>%
group_by(p_year, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(data, long_rule, p_year), comp_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value)) %>%
select(-long_rule) %>%
spread(name, value) %>%
filter(!is.na(SID))2.5.4 Personality Variables
hrs_pers <- hrs_recode %>%
filter(category == "pers") %>%
select(-category) %>%
unnest(data) %>%
filter(year == "2006" & !is.na(value)) %>%
group_by(SID, p_year, year, name, itemname, comp_rule) %>%
summarize(value = mean(value, na.rm = T)) %>%
ungroup()
# alpha's
hrs_alpha <- hrs_pers %>%
filter(!is.na(value)) %>%
select(name, itemname, year, SID, value) %>%
group_by(name, year) %>%
nest() %>%
mutate(data = map(data, ~(.) %>% distinct() %>% pivot_wider(names_from = itemname, values_from = value, values_fn = list(mean))),
alpha = map(data, possibly(~psych::alpha((.) %>% select(-SID)), NA_real_)))
comp_fun <- function(df, rule){
df %>%
group_by(SID) %>%
summarize(value = fun_call(value, rule)) %>%
ungroup() %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value))
}
# create composites
hrs_pers <- hrs_pers %>%
group_by(name, comp_rule, year) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, comp_fun)) %>%
unnest(data) %>%
select(-comp_rule)2.5.6 Combine Data
hrs_combined <- hrs_pers %>%
rename(Trait = name, p_value = value, p_year = year) %>%
full_join(hrs_out %>% rename(Outcome = name, o_value = value, o_year = year)) %>%
full_join(hrs_cov) %>%
filter(!is.na(p_value) & !is.na(o_value))2.6 Longitudinal Aging Study Amsterdam (LASA)
The Longitudinal Aging Study Amsterdam is an ongoing longitudinal study that began in 1992. These data are available, through application at https://www.lasa-vu.nl/data/availability_data/availability_data.htm.
Participants who were between 55 and 84 years at the start of the study were recruited from the NESTOR study on Living Arrangements and Social Networks of older adults, which was randomly selected from municipality registers in 1992, with an oversampling of the oldest old and men. Data are collected approximately every three years (1992, 1995, 1998, 2001, 2005, 2008, 2011, 2015, 2018). Additional cohorts are introduced every 10 years, with additional participants (Cohorts 2 and 3) recruited in 2002 and 2012. Additional information and documentation are available at https://www.lasa-vu.nl/index.htm.
Sample sizes vary by year, ranging from 1522 (Wave H; 763 Cohort 1, 759 Cohort 2) to 3107 (Wave B, Cohort 1). This provides 99% power to detect a zero-order correlation effect size of ~.10, two-tailed at alpha .05.
2.6.1 Load Data
lasa_read_fun <- function(x){
d <- sprintf("%s/lasa/%s", data_path, x) %>%
read_sav(.) %>%
select(one_of(lasa_sids),
one_of(old.names),
one_of(str_to_lower(old.names)),
one_of(str_to_upper(old.names))) %>%
as_tibble() %>%
haven::zap_labels(.)
colnames(d) <- str_to_lower(colnames(d))
return(d)
}lasa_codebook <- (codebook %>% filter(study == "LASA"))$codebook[[1]] %>%
mutate(orig_itemname = orig_itemname)
lasa_codebook## # A tibble: 105 × 17
## study dataset category name itemname wave wave_letter year orig_itemname description scale reverse_code recode mini
## <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 LASA LASAz004 covariates educ… eduYears NA b 1992 aedu education … "no … <NA> ifels… NA
## 2 LASA LASAz004 covariates gend… sex NA b 1992 sex sex respon… "mal… <NA> ifels… NA
## 3 LASA LASAb036 covariates SRhe… SRhealth NA b 1992 bsubhea1 Self-perce… "na,… yes ifels… 1
## 4 LASA LASAc036 covariates SRhe… SRhealth NA c 1995 csubhea1 Self-perce… "na,… yes ifels… 1
## 5 LASA LASAd036 covariates SRhe… SRhealth NA d 1998 dsubhea1 Self-perce… "na,… yes ifels… 1
## 6 LASA LASAe036 covariates SRhe… SRhealth NA e 2001 esubhea1 Self-perce… "na,… yes ifels… 1
## 7 LASA LASAf036 covariates SRhe… SRhealth NA f 2005 fsubhea1 Self-perce… "na,… yes ifels… 1
## 8 LASA LASAg036 covariates SRhe… SRhealth NA g 2008 gsubhea1 Self-perce… "na,… yes ifels… 1
## 9 LASA LASAh036 covariates SRhe… SRhealth NA h 2011 hsubhea1 Self-perce… "na,… yes ifels… 1
## 10 LASA LASAi036 covariates SRhe… SRhealth NA i 2015 isubhea1 Self-perce… "na,… yes ifels… 1
## # ℹ 95 more rows
## # ℹ 3 more variables: maxi <dbl>, comp_rule <chr>, long_rule <chr>
lasa_sids <- c("respnr", "RESPNR", "RespNr", "Respnr")
old.names <- unique(lasa_codebook$orig_itemname)
datasets <- sprintf("%s/lasa", data_path) %>% list.files(., pattern = "SAV")
lasa <- tibble(datasets = datasets) %>%
mutate(data = map(datasets, lasa_read_fun),
ncol = map_dbl(data, ncol)) %>%
filter(ncol > 1)
lasa <- reduce(lasa$data, full_join) %>%
haven::zap_labels(.)
lasa_long <- lasa %>%
rename(SID = respnr) %>%
pivot_longer(-SID
, names_to = "orig_itemname"
, values_to = "value"
, values_drop_na = T)
save(lasa, file = sprintf("%s/data/clean/lasa_raw.RData", local_path))2.6.2 Recoding & Reverse Scoring
# join data with recoding info
lasa_recode <- lasa_codebook %>%
select(category, name, itemname, year, orig_itemname, reverse_code:long_rule) %>%
mutate(orig_itemname = str_to_lower(orig_itemname)) %>%
filter(category %in% c("pers", "outcome", "covariates")) %>%
group_by(category, name) %>%
nest() %>%
ungroup() %>%
mutate(data = map(data, ~(.) %>% left_join(lasa_long)))
# recode
recode_fun <- function(rule, y, year){
x <- y$value
if(!is.na(rule)){y$value <- eval(parse(text = rule))}
return(y)
}
lasa_recode <- lasa_recode %>%
mutate(data = map(data, ~(.) %>%
group_by(recode, year) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(recode, data, year), recode_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(value < 0 | is.nan(value) | is.infinite(value), NA, value))))
# reverse code
lasa_recode <- lasa_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(value = ifelse(reverse_code == "no" | is.na(reverse_code), value,
reverse.code(-1, value, mini = mini, maxi = maxi)))))
fun_call <- function(x, rule){
switch(rule,
average = mean(x, na.rm = T),
mode = Mode(x)[1],
sum = sum(x, na.rm = T),
skip = unique(x)[1],
select = unique(x)[1],
max = max(x, na.rm = T),
min = min(x, na.rm = T))
}
lasa_waves <- lasa_recode %>%
filter(category == "pers") %>%
unnest(data) %>%
group_by(SID) %>%
summarize(p_year = min(year)) %>%
ungroup()2.6.3 Covariates
# composite WITHIN years
lasa_cov <- lasa_recode %>%
filter(category == "covariates") %>%
select(-category) %>%
mutate(data = map(data, ~(.) %>%
filter(!is.na(value)) %>%
left_join(lasa_waves) %>%
mutate(comp_rule = ifelse(is.na(comp_rule), "skip", comp_rule)) %>%
group_by(comp_rule, SID, year, p_year, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, ~fun_call((.x)$value, .y))) %>%
unnest(data) %>%
filter(year <= p_year)
))
comp_fun <- function(d, rule, p_year){
print(paste(rule, p_year))
d %>%
group_by(SID, name) %>%
summarize(value = fun_call(data, rule)) %>%
ungroup() %>%
distinct()
}
# composite ACROSS years
lasa_cov <- lasa_cov %>%
unnest(data) %>%
mutate(long_rule = ifelse(is.na(long_rule), "skip", long_rule)) %>%
filter(year <= p_year) %>%
group_by(p_year, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(data, long_rule, p_year), comp_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value)) %>%
select(-long_rule) %>%
spread(name, value) %>%
filter(!is.na(SID))2.6.4 Personality Variables
lasa_pers <- lasa_recode %>%
filter(category == "pers") %>%
select(-category) %>%
unnest(data) %>%
left_join(lasa_waves) %>%
filter(year == p_year) %>%
filter(!is.na(value)) %>%
group_by(SID, year, p_year, name, itemname, comp_rule) %>%
summarize(value = mean(value, na.rm = T)) %>%
ungroup()
# alpha's
lasa_alpha <- lasa_pers %>%
filter(!is.na(value)) %>%
select(name, itemname, year, SID, value) %>%
group_by(name) %>%
nest() %>%
mutate(data = map(data, ~(.) %>% distinct() %>% pivot_wider(names_from = itemname, values_from = value, values_fn = list(mean))),
alpha = map(data, possibly(~psych::alpha((.) %>% select(-SID)), NA_real_)))
comp_fun <- function(df, rule){
df %>%
group_by(SID) %>%
summarize(value = fun_call(value, rule)) %>%
ungroup() %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value))
}
# create composites
lasa_pers <- lasa_pers %>%
group_by(name, comp_rule, year) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, comp_fun)) %>%
unnest(data) %>%
select(-comp_rule)2.6.5 Outcome Variables
lasa_cog_waves <- lasa_recode %>%
filter(category == "outcome") %>%
unnest(data) %>%
group_by(name, SID) %>%
summarize(o_year = max(year)) %>%
ungroup()
# composite within years
lasa_out <- lasa_recode %>%
filter(category == "outcome") %>%
select(-category) %>%
unnest(data) %>%
right_join(lasa_cog_waves) %>%
filter(year == o_year) %>%
group_by(name, year, SID) %>%
summarize(value = sum(value, na.rm = T)) %>%
ungroup()2.6.6 Combine Data
lasa_combined <- lasa_pers %>%
rename(Trait = name, p_value = value, p_year = year) %>%
full_join(
lasa_out %>%
rename(Outcome = name, o_value = value, o_year = year)
) %>% full_join(lasa_cov) %>%
filter(!is.na(p_value) & !is.na(o_value))2.7 MAP
The RUSH Memory and Aging Project (MAP) is an ongoing longitudinal study that began in 1997 (a2012overview?). These data are available, through application from https://www.radc.rush.edu/requests.htm.
Participants who were 65 and older were recruited from retirement communities and subsidized senior housing facilities throughout Chicagoland and northeastern Illinois beginning in 1997. Data are collected annually, and all participants are organ donors. Additional participants are recuited each year. Additional information and documentation on the data can be found at https://www.radc.rush.edu/docs/var/variables.htm.
Sample sizes vary by year, ranging from 52 (1997) to 2205 participants. This provides 99% power to detect a zero-order correlation effect size of ~.09, two-tailed at alpha .05.
2.7.1 Load Data
(map_codebook <- (codebook %>% filter(study == "RADC-MAP"))$codebook[[1]] %>%
mutate(orig_itemname = str_to_lower(orig_itemname)))## # A tibble: 7 × 15
## study dataset category name itemname year orig_itemname description scale reverse_code recode mini maxi comp_rule
## <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <chr>
## 1 RADC-MAP NA outcome cryst… bosNami… long… cts_bname Boston nam… <NA> no ifels… NA NA sum
## 2 RADC-MAP NA covariates age ageBase… 0 age_bl The age at… <NA> no ifels… NA NA skip
## 3 RADC-MAP NA covariates educa… eduYears 0 educ The years … <NA> no ifels… NA NA skip
## 4 RADC-MAP NA covariates gender sex 0 msex Self-repor… "1 =… no ifels… NA NA skip
## 5 RADC-MAP NA pers C C 0 conscientiou… The variab… "Res… no ifels… NA NA average
## 6 RADC-MAP NA pers E E 0 extraversion… The variab… "Res… no ifels… NA NA average
## 7 RADC-MAP NA pers N N 0 neuroticism_… The variab… "Res… no ifels… NA NA average
## # ℹ 1 more variable: long_rule <chr>
2.7.2 Recoding & Reverse Scoring
rename_fun <- function(cb, var){
old.names <- unique((map_codebook %>% filter(name == var))$orig_itemname)
df <- map %>%
select(SID = projid, wave = fu_year, one_of(old.names)) %>%
gather(key = orig_itemname, value = value, -SID, -wave, na.rm = T) %>%
left_join(cb %>% select(itemname, orig_itemname, reverse_code:long_rule))
}
# join data with recoding info
map_recode <- map_codebook %>%
select(category, name, itemname, year, orig_itemname, reverse_code:long_rule) %>%
filter(category %in% c("pers", "outcome", "covariates")) %>%
group_by(category, name) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, name, rename_fun))
# recode
recode_fun <- function(rule, y, year){
x <- y$value
if(!is.na(rule)){y$value <- eval(parse(text = rule))}
return(y)
}
map_recode <- map_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(wave = as.numeric(wave)) %>%
group_by(recode, wave) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(recode, data, wave), recode_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(value < 0 | is.nan(value) | is.infinite(value), NA, value))))
# reverse code
map_recode <- map_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(value = ifelse(reverse_code == "no" | is.na(reverse_code), value,
reverse.code(-1, value, mini = mini, maxi = maxi)))))
fun_call <- function(x, rule){
switch(rule,
average = mean(x, na.rm = T),
mode = Mode(x)[1],
sum = sum(x, na.rm = T),
skip = unique(x)[1],
select = unique(x)[1],
max = max(x, na.rm = T),
min = min(x, na.rm = T))
}2.7.3 Covariates
# composite WITHIN years
map_cov <- map_recode %>%
filter(category == "covariates") %>%
select(-category) %>%
mutate(data = map(data, ~(.) %>%
filter(!is.na(value)) %>%
mutate(comp_rule = ifelse(is.na(comp_rule), "skip", comp_rule)) %>%
group_by(comp_rule, SID, wave, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, ~fun_call((.x)$value, .y))) %>%
unnest(data) %>%
filter(wave == 0)
))
comp_fun <- function(d, rule){
d %>%
group_by(SID, name) %>%
summarize(value = fun_call(data, rule)) %>%
ungroup() %>%
distinct()
}
# composite ACROSS years
map_cov <- map_cov %>%
unnest(data) %>%
mutate(long_rule = ifelse(is.na(long_rule), "skip", long_rule)) %>%
group_by(long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(data, long_rule), comp_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value)) %>%
select(-long_rule) %>%
spread(name, value) %>%
filter(!is.na(SID))2.7.5 Outcome Variables
map_cog_waves <- map_recode %>%
filter(category == "outcome") %>%
unnest(data) %>%
group_by(SID, name) %>%
summarize(o_year = max(wave)) %>%
ungroup()
# composite within years
map_out <- map_recode %>%
filter(category == "outcome") %>%
select(-category) %>%
unnest(data) %>%
right_join(map_cog_waves) %>%
filter(wave == o_year & wave > 0) %>%
filter(!is.na(value)) %>%
group_by(name, itemname, wave) %>%
mutate(value = pomp(value)) %>%
group_by(name, wave, SID) %>%
summarize(value = mean(value, na.rm = T)) %>%
ungroup()2.7.6 Combine Data
map_combined <- map_pers %>%
rename(Trait = name, p_value = value, p_year = wave) %>%
full_join(
map_out %>%
rename(Outcome = name, o_value = value, o_year = wave)
) %>% full_join(map_cov) %>%
filter(!is.na(p_value) & !is.na(o_value))2.8 MARS
The RUSH Minority Aging Research Study (MARS) is an ongoing longitudinal study that began in 2004. These data are available, through application, from https://www.radc.rush.edu/requests.htm.
Participants were Black individuals 65 and older who were recruited from community locations in the Chicago Metropolitan area and suburbs beginning in 2004. Data are collected annually. Additional participants are recruited each year. Additional information and documentation on the data can be found at https://www.radc.rush.edu/docs/var/variables.htm.
Sample sizes vary by year, ranging from 80 (2004) to 790 (2020). This provides 99% power to detect a zero-order correlation effect size of ~.17, two-tailed at alpha .05.
(mars_codebook <- (codebook %>% filter(study == "MARS"))$codebook[[1]] %>%
mutate(orig_itemname = str_to_lower(orig_itemname)))## # A tibble: 5 × 15
## study dataset category name itemname year orig_itemname description scale reverse_code recode mini maxi comp_rule
## <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <chr>
## 1 MARS NA outcome crystall… bosNami… long… cts_bname Boston nam… <NA> no ifels… NA NA sum
## 2 MARS NA covariates age ageBase… base… age_bl The age at… <NA> no ifels… NA NA skip
## 3 MARS NA covariates education eduYears base… educ The years … <NA> no ifels… NA NA skip
## 4 MARS NA covariates gender sex base… msex Self-repor… "1 =… no ifels… NA NA skip
## 5 MARS NA pers N N base… neuroticism_… The variab… "Res… no ifels… NA NA average
## # ℹ 1 more variable: long_rule <chr>
mars <- sprintf("%s/rush-radc/dataset_1034_long_03-26-2021.xlsx", data_path) %>% read_excel() %>%
full_join(sprintf("%s/rush-radc/dataset_1034_basic_03-26-2021.xlsx", data_path) %>% read_excel()) %>%
filter(study == "MARS")2.8.1 Recoding & Reverse Scoring
rename_fun <- function(cb, var){
old.names <- unique((mars_codebook %>% filter(name == var))$orig_itemname)
df <- mars %>%
select(SID = projid, wave = fu_year, one_of(old.names)) %>%
gather(key = orig_itemname, value = value, -SID, -wave, na.rm = T) %>%
left_join(cb %>% select(itemname, orig_itemname, reverse_code:long_rule))
}
# join data with recoding info
mars_recode <- mars_codebook %>%
select(category, name, itemname, year, orig_itemname, reverse_code:long_rule) %>%
filter(category %in% c("pers", "outcome", "covariates")) %>%
group_by(category, name) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, name, rename_fun))
# recode
recode_fun <- function(rule, y, year){
x <- y$value
if(!is.na(rule)){y$value <- eval(parse(text = rule))}
return(y)
}
mars_recode <- mars_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(wave = as.numeric(wave)) %>%
group_by(recode, wave) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(recode, data, wave), recode_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(value < 0 | is.nan(value) | is.infinite(value), NA, value))))
# reverse code
mars_recode <- mars_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(value = ifelse(reverse_code == "no" | is.na(reverse_code), value,
reverse.code(-1, value, mini = mini, maxi = maxi)))))
fun_call <- function(x, rule){
switch(rule,
average = mean(x, na.rm = T),
mode = Mode(x)[1],
sum = sum(x, na.rm = T),
skip = unique(x)[1],
select = unique(x)[1],
max = max(x, na.rm = T),
min = min(x, na.rm = T))
}2.8.2 Covariates
# composite WITHIN years
mars_cov <- mars_recode %>%
filter(category == "covariates") %>%
select(-category) %>%
mutate(data = map(data, ~(.) %>%
filter(!is.na(value)) %>%
mutate(comp_rule = ifelse(is.na(comp_rule), "skip", comp_rule)) %>%
group_by(comp_rule, SID, wave, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, ~fun_call((.x)$value, .y))) %>%
unnest(data) %>%
filter(wave == 0)
))
comp_fun <- function(d, rule){
d %>%
group_by(SID, name) %>%
summarize(value = fun_call(data, rule)) %>%
ungroup() %>%
distinct()
}
# composite ACROSS years
mars_cov <- mars_cov %>%
unnest(data) %>%
mutate(long_rule = ifelse(is.na(long_rule), "skip", long_rule)) %>%
group_by(long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(data, long_rule), comp_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value)) %>%
select(-long_rule) %>%
spread(name, value) %>%
filter(!is.na(SID))2.8.4 Outcome Variables
mars_cog_waves <- mars_recode %>%
filter(category == "outcome") %>%
unnest(data) %>%
group_by(SID, name) %>%
summarize(o_year = max(wave)) %>%
ungroup()
# composite within years
mars_out <- mars_recode %>%
filter(category == "outcome") %>%
select(-category) %>%
unnest(data) %>%
right_join(mars_cog_waves) %>%
filter(wave == o_year & wave > 0) %>%
filter(!is.na(value)) %>%
group_by(name, itemname, wave) %>%
mutate(value = pomp(value)) %>%
group_by(name, wave, SID) %>%
summarize(value = mean(value, na.rm = T)) %>%
ungroup()2.8.5 Combine Data
mars_combined <- mars_pers %>%
rename(Trait = name, p_value = value, p_year = wave) %>%
full_join(
mars_out %>%
rename(Outcome = name, o_value = value, o_year = wave)
) %>% full_join(mars_cov) %>%
filter(!is.na(p_value) & !is.na(o_value))2.9 ROS
The RUSH Religious Orders Study (ROS) is an ongoing longitudinal study that began in 1994 (a2012overview?). These data are available, through application from https://www.radc.rush.edu/requests.htm.
Older (65 and above) Catholic nuns, priests, and brothers with no prior dementia diagnosis and who agreed to annual evaluations and eventual organ donation were recruited from more than 40 groups across the United States. Additional participants are recuited each year. Additional information and documentation on the data can be found at https://www.radc.rush.edu/docs/var/variables.htm.
Sample sizes vary bt year from 353 participants (1994) to 1487 participants, including 797 deceased participants with autopsy data (2019, 2020). This provides 99% power to detect a zero-order correlation effect size of ~.11, two-tailed at alpha .05.
2.9.1 Load Data
(ros_codebook <- (codebook %>% filter(study == "ROS"))$codebook[[1]] %>%
mutate(orig_itemname = str_to_lower(orig_itemname)))## # A tibble: 9 × 15
## study dataset category name itemname year orig_itemname description scale reverse_code recode mini maxi comp_rule
## <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <chr>
## 1 ROS NA outcome crystall… bosNami… long… cts_bname Boston nam… <NA> no ifels… NA NA sum
## 2 ROS NA covariates age ageBase… base… age_bl The age at… <NA> no ifels… NA NA skip
## 3 ROS NA covariates education eduYears base… educ The years … <NA> no ifels… NA NA skip
## 4 ROS NA covariates gender sex base… msex Self-repor… "1 =… no ifels… NA NA skip
## 5 ROS NA pers A A base… agreeableness The variab… "Res… no ifels… NA NA average
## 6 ROS NA pers C C base… conscientiou… The variab… "Res… no ifels… NA NA average
## 7 ROS NA pers E E base… extraversion… The variab… "Res… no ifels… NA NA average
## 8 ROS NA pers N N base… neuroticism_… The variab… "Res… no ifels… NA NA average
## 9 ROS NA pers O O base… openness The variab… "Res… no ifels… NA NA average
## # ℹ 1 more variable: long_rule <chr>
2.9.2 Recoding & Reverse Scoring
rename_fun <- function(cb, var){
old.names <- unique((ros_codebook %>% filter(name == var))$orig_itemname)
df <- ros %>%
select(SID = projid, wave = fu_year, one_of(old.names)) %>%
gather(key = orig_itemname, value = value, -SID, -wave, na.rm = T) %>%
left_join(cb %>% select(itemname, orig_itemname, reverse_code:long_rule))
}
# join data with recoding info
ros_recode <- ros_codebook %>%
select(category, name, itemname, year, orig_itemname, reverse_code:long_rule) %>%
filter(category %in% c("pers", "outcome", "covariates")) %>%
group_by(category, name) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, name, rename_fun))
# recode
recode_fun <- function(rule, y, year){
x <- y$value
if(!is.na(rule)){y$value <- eval(parse(text = rule))}
return(y)
}
ros_recode <- ros_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(wave = as.numeric(wave)) %>%
group_by(recode, wave) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(recode, data, wave), recode_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(value < 0 | is.nan(value) | is.infinite(value), NA, value))))
# reverse code
ros_recode <- ros_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(value = ifelse(reverse_code == "no" | is.na(reverse_code), value,
reverse.code(-1, value, mini = mini, maxi = maxi)))))
fun_call <- function(x, rule){
switch(rule,
average = mean(x, na.rm = T),
mode = Mode(x)[1],
sum = sum(x, na.rm = T),
skip = unique(x)[1],
select = unique(x)[1],
max = max(x, na.rm = T),
min = min(x, na.rm = T))
}2.9.3 Covariates
# composite WITHIN years
ros_cov <- ros_recode %>%
filter(category == "covariates") %>%
select(-category) %>%
mutate(data = map(data, ~(.) %>%
filter(!is.na(value)) %>%
mutate(comp_rule = ifelse(is.na(comp_rule), "skip", comp_rule)) %>%
group_by(comp_rule, SID, wave, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, ~fun_call((.x)$value, .y))) %>%
unnest(data) %>%
filter(wave == 0)
))
comp_fun <- function(d, rule){
d %>%
group_by(SID, name) %>%
summarize(value = fun_call(data, rule)) %>%
ungroup() %>%
distinct()
}
# composite ACROSS years
ros_cov <- ros_cov %>%
unnest(data) %>%
mutate(long_rule = ifelse(is.na(long_rule), "skip", long_rule)) %>%
group_by(long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(data, long_rule), comp_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value)) %>%
select(-long_rule) %>%
spread(name, value) %>%
filter(!is.na(SID))2.9.5 Outcome Variables
ros_cog_waves <- ros_recode %>%
filter(category == "outcome") %>%
unnest(data) %>%
group_by(SID, name) %>%
summarize(o_year = max(wave)) %>%
ungroup()
# composite within years
ros_out <- ros_recode %>%
filter(category == "outcome") %>%
select(-category) %>%
unnest(data) %>%
right_join(ros_cog_waves) %>%
filter(wave == o_year & wave > 0) %>%
filter(!is.na(value)) %>%
group_by(name, itemname, wave) %>%
mutate(value = pomp(value)) %>%
group_by(name, wave, SID) %>%
summarize(value = mean(value, na.rm = T)) %>%
ungroup()2.9.6 Combine Data
ros_combined <- ros_pers %>%
rename(Trait = name, p_value = value, p_year = wave) %>%
full_join(
ros_out %>%
rename(Outcome = name, o_value = value, o_year = wave)
) %>% full_join(ros_cov) %>%
filter(!is.na(p_value) & !is.na(o_value))2.10 Origins of the Variances of the Oldest-Old: Octogenarian Twins (OCTO-TWIN)
The Origins of the Variances of the Oldest-Old: Octogenarian Twins (OCTO-TWIN) study was a longitudinal study of twin-pairs in Sweden born before 1913 that began in 1991. Data are available, by application, from the study leaders at the Karolinska Institute.
Twin-pairs from the Swedish Twin registry who were born before 1913 were invited to be part of the study in 1991. Additional follow-ups were collected in 1993, 1995, 1997, and 1999 on all surviving twins. The initial sample included 351 twin pairs (149 monozygotic and 202 same-sex dizygotic pairs). More information on the study can be found at https://webcache.googleusercontent.com/search?q=cache:HcheF2l9zc0J:https://psy.gu.se/digitalAssets/1469/1469717_octo-twin-brief-presentation.pdf+&cd=1&hl=en&ct=clnk&gl=us&client=safari.
Sample sizes vary by year, from 222 (wave 5; 43 pairs) to 702 (wave 1; 351 pairs), which yields 99% power to detect a correlation effect size of ~.19, two-tailed at alpha .05.
2.10.1 Load Data
(octotwin_codebook <- (codebook %>% filter(study == "OCTO-TWIN"))$codebook[[1]] %>%
mutate(orig_itemname = str_to_lower(orig_itemname)))## # A tibble: 99 × 16
## study dataset category name itemname wave year orig_itemname description scale reverse_code recode mini maxi
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 octo-twin OCTO_Wave1 covariates educ… eduYears 1 1991 v130 years scho… "int… <NA> ifels… NA NA
## 2 octo-twin OCTO_Wave1 covariates gend… sex 1 1991 sex Respondent… <NA> <NA> ifels… NA NA
## 3 octo-twin OCTO_Wave1 covariates gend… sex2 1 1991 medrec_sex Sex <NA> <NA> ifels… NA NA
## 4 octo-twin OCTO_Wave2 covariates gend… sex2 2 1993 medrec_sex Sex <NA> <NA> ifels… NA NA
## 5 octo-twin OCTO_Wave3 covariates gend… sex2 3 1995 medrec_sex Sex <NA> <NA> ifels… NA NA
## 6 octo-twin OCTO_Wave4 covariates gend… sex2 4 1997 medrec_sex Sex <NA> <NA> ifels… NA NA
## 7 octo-twin OCTO_Wave5 covariates gend… sex2 5 2000 medrec_sex Sex <NA> <NA> ifels… NA NA
## 8 octo-twin OCTO_Wave1 covariates SRhe… overall… 1 1991 v201 overall he… "Nam… yes ifels… 1 3
## 9 octo-twin OCTO_Wave2 covariates SRhe… overall… 2 1993 b201 overall he… "Nam… yes ifels… 1 3
## 10 octo-twin OCTO_Wave3 covariates SRhe… overall… 3 1995 c201 overall he… "Nam… yes ifels… 1 3
## # ℹ 89 more rows
## # ℹ 2 more variables: comp_rule <chr>, long_rule <chr>
2.10.2 Recoding & Reverse Scoring
rename_fun <- function(cb, var){
old.names <- unique((octotwin_codebook %>% filter(name == var))$orig_itemname)
df <- octotwin %>%
mutate(SID = paste0(v1, v2)) %>%
select(SID, one_of(old.names)) %>%
gather(key = orig_itemname, value = value, -SID, na.rm=T)
if(length(old.names) > 1){
df %>% left_join(cb %>% select(itemname, wave, orig_itemname, reverse_code:long_rule))
} else {
df %>% left_join(cb %>% select(itemname, orig_itemname, reverse_code:long_rule) %>% distinct()) %>% mutate(wave = "0")
}
}
`
# join data with recoding info
octotwin_recode <- octotwin_codebook %>%
select(category, name, itemname, wave, orig_itemname, reverse_code:long_rule) %>%
filter(category %in% c("pers", "outcome", "covariates")) %>%
group_by(category, name) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, name, possibly(rename_fun, NA_real_)))
# recode
recode_fun <- function(rule, y, year){
x <- y$value
if(!is.na(rule)){y$value <- eval(parse(text = rule))}
return(y)
}
octotwin_recode <- octotwin_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(wave = as.numeric(wave)) %>%
group_by(recode, wave) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(recode, data, wave), recode_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(value < 0 | is.nan(value) | is.infinite(value), NA, value))))
# reverse code
octotwin_recode <- octotwin_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(value = ifelse(reverse_code == "no" | is.na(reverse_code), value,
reverse.code(-1, value, mini = mini, maxi = maxi)))))
fun_call <- function(x, rule){
switch(rule,
average = mean(x, na.rm = T),
mode = Mode(x)[1],
sum = sum(x, na.rm = T),
skip = unique(x)[1],
select = unique(x)[1],
max = max(x, na.rm = T),
min = min(x, na.rm = T))
}2.10.3 Covariates
# composite WITHIN years
octotwin_cov <- octotwin_recode %>%
filter(category == "covariates") %>%
select(-category) %>%
mutate(data = map(data, ~(.) %>%
filter(!is.na(value)) %>%
mutate(comp_rule = ifelse(is.na(comp_rule), "skip", comp_rule)) %>%
group_by(comp_rule, SID, wave, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, ~fun_call((.x)$value, .y))) %>%
unnest(data) %>%
filter(wave <= 1)
))
comp_fun <- function(d, rule){
d %>%
group_by(SID, name) %>%
summarize(value = fun_call(data, rule)) %>%
ungroup() %>%
distinct()
}
# composite ACROSS years
octotwin_cov <- octotwin_cov %>%
unnest(data) %>%
mutate(long_rule = ifelse(is.na(long_rule), "skip", long_rule)) %>%
group_by(long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(data, long_rule), comp_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value)) %>%
select(-long_rule) %>%
spread(name, value) %>%
filter(!is.na(SID))2.10.4 Personality Variables
octotwin_pers <- octotwin_recode %>%
filter(category == "pers") %>%
select(-category) %>%
unnest(data) %>%
filter(wave == 1 & !is.na(value)) %>%
group_by(SID, wave, name, itemname, comp_rule) %>%
summarize(value = mean(value, na.rm = T)) %>%
ungroup()
# alpha's
octotwin_alpha <- octotwin_pers %>%
filter(!is.na(value)) %>%
select(name, itemname, wave, SID, value) %>%
group_by(name, wave) %>%
nest() %>%
mutate(data = map(data, ~(.) %>% distinct() %>% pivot_wider(names_from = itemname, values_from = value, values_fn = list(mean))),
alpha = map(data, possibly(~psych::alpha((.) %>% select(-SID), check.keys = T), NA_real_)))
comp_fun <- function(df, rule){
df %>%
group_by(SID) %>%
summarize(value = fun_call(value, rule)) %>%
ungroup() %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value))
}
# create composites
octotwin_pers <- octotwin_pers %>%
group_by(name, comp_rule, wave) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, comp_fun)) %>%
unnest(data) %>%
select(-comp_rule)2.10.5 Outcome Variables
octotwin_cog_waves <- octotwin_recode %>%
filter(category == "outcome") %>%
unnest(data) %>%
group_by(SID, name) %>%
summarize(o_year = max(wave)) %>%
ungroup()
# composite within years
octotwin_out <- octotwin_recode %>%
filter(category == "outcome") %>%
select(-category) %>%
unnest(data) %>%
right_join(octotwin_cog_waves) %>%
filter(wave == o_year & wave > 1) %>%
group_by(name, wave, SID) %>%
summarize(value = sum(value, na.rm = T)) %>%
ungroup()2.10.6 Combine Data
octotwin_combined <- octotwin_pers %>%
rename(Trait = name, p_value = value, p_year = wave) %>%
full_join(
octotwin_out %>%
rename(Outcome = name, o_value = value, o_year = wave)
) %>% full_join(octotwin_cov) %>%
filter(!is.na(p_value) & !is.na(o_value))2.11 Swedish Adoption Twin Study of Aging (SATSA)
The Swedish Adoption Twin Study of Aging (SATSA) is a longitudinal study of twin pairs from the Swedish Twin Registry that began in 1984. Data are available through the ICPSR database at https://www.icpsr.umich.edu/web/ICPSR/studies/3843.
All twin-pairs on the Swedish Twin Registry who were separated at an early age were invited to be a part of the study in 1984. A control sample of twins reared together were also included. Additional waves of all participants were collected in 1987, 1990, 1993, 2004, 2007, 2010, 2012, and 2014. More information, including codebooks, scales, and variable search functions can be found at https://www.maelstrom-research.org/mica/individual-study/satsa/#.
Sample sizes vary by wave, ranging from 2018 participants at baseline (1984) to 379 participants (IPT7). Given that the target measures were collected at baseline, this provides 99% power to detect a zero-order correlation effect size of ~.10, two-tailed at alpha .05.
2.11.1 Load Data
loadRData <- function(fileName){
#loads an RData file, and returns it
load(fileName)
get(ls()[ls() != "fileName"])
}
satsa_read_fun <- function(x){
# prob_vars <- c("FHEART", "FPARKIN", "FSTROKE")
y <- sprintf("%s/satsa/%s", data_path, x) %>% loadRData(.) %>%
select(SID = TWINNR, one_of(old.names)) %>%
as_tibble()
# if(any(prob_vars %in% colnames(y))){
# y <- y %>% mutate_at(vars(one_of(prob_vars)), ~as.numeric(as.character(.)))
# }
return(y)
}satsa_codebook <- (codebook %>% filter(study == "SATSA"))$codebook[[1]] %>%
mutate_at(vars(orig_itemname), str_to_upper)
satsa_codebook## # A tibble: 302 × 18
## study dataset category name itemname wave wave_letter year item_stem orig_itemname description scale reverse_code
## <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 satsa SATSA_Q1 covariates educa… educati… NA <NA> 1984 EDUC EDUC Education "1\t… <NA>
## 2 satsa SATSA_Q1 covariates gender sex NA <NA> 1984 <NA> SEX SEX "1 =… <NA>
## 3 satsa SATSA_IPT1 covariates gender sex NA <NA> 1985 <NA> SEX SEX "1 =… <NA>
## 4 satsa SATSA_Q2 covariates gender sex NA <NA> 1987 <NA> SEX SEX "1 =… <NA>
## 5 satsa SATSA_IPT2 covariates gender sex NA <NA> 1989 <NA> SEX SEX "1 =… <NA>
## 6 satsa SATSA_Q3 covariates gender sex NA <NA> 1990 <NA> SEX SEX "1 =… <NA>
## 7 satsa SATSA_IPT3 covariates gender sex NA <NA> 1992 <NA> SEX SEX "1 =… <NA>
## 8 satsa SATSA_Q4 covariates gender sex NA <NA> 1993 <NA> SEX SEX "1 =… <NA>
## 9 satsa SATSA_IPT4 covariates gender sex NA <NA> 1995 <NA> SEX SEX "1 =… <NA>
## 10 satsa SATSA_IPT5 covariates gender sex NA <NA> 1999 <NA> SEX SEX "1 =… <NA>
## # ℹ 292 more rows
## # ℹ 5 more variables: recode <chr>, mini <dbl>, maxi <dbl>, comp_rule <chr>, long_rule <chr>
old.names <- unique(satsa_codebook$orig_itemname) %>% str_to_upper
datasets <- sprintf("%s/satsa", data_path) %>% list.files(., pattern = ".rda")
satsa <- tibble(datasets = datasets) %>%
mutate(data = map(datasets, satsa_read_fun),
ncol = map_dbl(data, ncol)) %>%
filter(ncol != 0)
satsa <- reduce(satsa$data, full_join) %>% haven::zap_labels(.)
satsa <- satsa %>%
mutate_if(is.factor, ~as.numeric(sub("^\\(0*([0-9]+)\\).+$", "\\1", .)))
satsa_long <- satsa %>%
gather(key = orig_itemname, value = value, -SID, na.rm = T)
save(satsa, file = sprintf("%s/data/clean/satsa_raw.RData", local_path))2.11.2 Recoding & Reverse Scoring
satsa_waves <- p_waves %>% filter(Study == "SATSA") %>% select(Used) %>% distinct()
# join data with recoding info
satsa_recode <- satsa_codebook %>%
select(category, name, itemname, year, orig_itemname, reverse_code:long_rule) %>%
filter(category %in% c("pers", "outcome", "covariates")) %>%
group_by(category, name) %>%
nest() %>%
ungroup() %>%
mutate(data = map(data, ~(.) %>% left_join(satsa_long)))
# recode
recode_fun <- function(rule, y, year, p_year){
x <- y$value
if(!is.na(rule)){y$value <- eval(parse(text = rule))}
return(y)
}
satsa_recode <- satsa_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(p_year = 1984) %>%
group_by(recode, year, p_year) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(recode, data, year, p_year), recode_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(value < 0 | is.nan(value) | is.infinite(value), NA, value))))
# reverse code
satsa_recode <- satsa_recode %>%
mutate(data = map(data, ~(.) %>%
mutate(value = ifelse(reverse_code == "no" | is.na(reverse_code), value,
reverse.code(-1, value, mini = mini, maxi = maxi)))))
fun_call <- function(x, rule){
switch(rule,
average = mean(x, na.rm = T),
mode = Mode(x)[1],
sum = sum(x, na.rm = T),
skip = unique(x)[1],
select = unique(x)[1],
max = max(x, na.rm = T),
min = min(x, na.rm = T))
}2.11.3 Covariates
# composite WITHIN years
satsa_cov <- satsa_recode %>%
filter(category == "covariates") %>%
select(-category) %>%
mutate(data = map(data, ~(.) %>%
filter(!is.na(value)) %>%
mutate(comp_rule = ifelse(is.na(comp_rule), "skip", comp_rule)) %>%
group_by(comp_rule, SID, year, p_year, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, ~fun_call((.x)$value, .y))) %>%
unnest(data) %>%
filter(year <= p_year)
))
comp_fun <- function(d, rule, p_year){
print(paste(rule, p_year))
d %>%
group_by(SID, name) %>%
summarize(value = fun_call(data, rule)) %>%
ungroup() %>%
distinct()
}
# composite ACROSS years
satsa_cov <- satsa_cov %>%
unnest(data) %>%
mutate(long_rule = ifelse(is.na(long_rule), "skip", long_rule)) %>%
filter(year <= p_year) %>%
group_by(p_year, long_rule) %>%
nest() %>%
ungroup() %>%
mutate(data = pmap(list(data, long_rule, p_year), comp_fun)) %>%
unnest(data) %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value)) %>%
select(-long_rule) %>%
spread(name, value) %>%
filter(!is.na(SID))2.11.4 Personality Variables
satsa_pers <- satsa_recode %>%
filter(category == "pers") %>%
select(-category) %>%
unnest(data) %>%
filter(year == 1984 & !is.na(value)) %>%
group_by(SID, p_year, year, name, itemname, comp_rule) %>%
summarize(value = mean(value, na.rm = T)) %>%
ungroup()
# alpha's
satsa_alpha <- satsa_pers %>%
filter(!is.na(value)) %>%
select(name, itemname, year, SID, value) %>%
group_by(name, year) %>%
nest() %>%
mutate(data = map(data, ~(.) %>% distinct() %>% pivot_wider(names_from = itemname, values_from = value, values_fn = list(mean))),
alpha = map(data, possibly(~psych::alpha((.) %>% select(-SID)), NA_real_)))
comp_fun <- function(df, rule){
df %>%
group_by(SID) %>%
summarize(value = fun_call(value, rule)) %>%
ungroup() %>%
mutate(value = ifelse(is.infinite(value) | is.nan(value), NA, value))
}
# create composites
satsa_pers <- satsa_pers %>%
group_by(name, comp_rule, year) %>%
nest() %>%
ungroup() %>%
mutate(data = map2(data, comp_rule, comp_fun)) %>%
unnest(data) %>%
select(-comp_rule)2.11.6 Combine Data
satsa_combined <- satsa_pers %>%
rename(Trait = name, p_value = value, p_year = year) %>%
full_join(satsa_out %>% rename(Outcome = name, o_value = value, o_year = year)) %>%
full_join(satsa_cov) %>%
filter(!is.na(p_value) & !is.na(o_value))2.12 Seattle Longitudinal Study (SLS)
The Seattle Longitudinal Study is an ongoing longitudinal study of adult Seattle residents that began in 1956. Data are available, by request from https://sls.psychiatry.uw.edu/researchers/public-access-data-sets/.
The first cohort of participants were recruited from HMO plan members of the Group Health Cooperative of Puget Sound in Seattle. Seven years later, the first cohort and a new cohort of similarly aged participants were solicited again. Over the next half century, this procedure was repeated every seven years and is ongoing. More information on the measures collected can be found at https://sls.psychiatry.uw.edu/researchers/measures/.
Sample sizes vary by year, from 302 (1956) to 421 (2005). This provides 99% power to detect correlation effect sizes of ~.23, two-tailed alpha at .05.
2.13 Descriptives of All Studies
loadRData <- function(fileName, type){
#loads an RData file, and returns it
path <- sprintf("%s/data/clean/%s_cleaned.RData", local_path, fileName)
load(path)
get(ls()[grepl(type, ls())])
}
ipd_data <- tibble(
study = studies[!studies %in% c("CNLSY", "SLS")]
, data = map(str_to_lower(study), ~loadRData(., "combined"))
) %>% mutate(
data = map(data, ~(.) %>%
ungroup() %>%
mutate(SID = as.character(SID)))
, study = mapvalues(study, studies, studies_long)
) %>%
unnest(data) %>%
mutate(age = ifelse(is.na(age), p_year - yearBrth, age))
ipd_data <- ipd_data %>%
group_by(study, Trait, SID) %>%
filter(p_year == min(p_year)) %>%
ungroup() %>%
select(-p_year, -o_year, -Outcome) %>%
rename(crystallized = o_value) %>%
distinct() %>%
group_by(study) %>%
nest() %>%
ungroup() %>%
mutate(wide_data = map(data, ~(.) %>% pivot_wider(names_from = "Trait", values_from = "p_value")))cln <- c("Study", "E", "A", "C", "N", "O", "Crystallized / Knowledge", "Age (Years)", "Education (Years)", "% Women", "Valid N (Range)")
desc_tab <- ipd_data %>%
select(-p_year, -o_year, -yearBrth, -SRhealth, -gender) %>%
group_by(study, Trait, Outcome) %>%
mutate_at(vars(p_value, o_value),
~((. - min(., na.rm = T))/(max(., na.rm = T) - min(., na.rm = T))*10)) %>%
pivot_wider(names_from = Outcome, values_from = o_value, values_fn = mean) %>%
pivot_wider(names_from = Trait, values_from = p_value, values_fn = mean) %>%
pivot_longer(cols = c(education:C), values_to = "value", names_to = "item", values_drop_na = T) %>%
group_by(study, item) %>%
summarize(est = sprintf("%.2f (%.2f)", mean(value, na.rm = T), sd(value, na.rm = T))) %>%
ungroup() %>%
pivot_wider(names_from = item, values_from = est) %>%
full_join(
ipd_data %>%
select(study, Trait, p_value, Outcome, SID, o_value) %>%
distinct() %>%
group_by(study, Trait) %>%
tally() %>%
group_by(study) %>%
summarize(n = sprintf("%i - %i", min(n), max(n)))
) %>%
full_join(
ipd_data %>%
select(study, SID, gender) %>%
distinct() %>%
group_by(study) %>%
summarize(gender = mean(gender, na.rm = T)*100) %>%
ungroup()
) %>%
select(study, E, A, C, N, O, crystallized, age, education, gender, n) %>%
kable(., "html"
, digits = 2
, col.names = cln
, caption = "<strong>Table 3</strong><br><em>Descriptive Statistics of All Harmonized Measures Across Samples"
, align = c("l", rep("c",9))) %>%
kable_classic(full_width = F, html_font = "Times New Roman") %>%
add_header_above(c(" " = 1, "Personality Characteristics" = 5, " " = 5)) %>%
add_footnote(notation = "none", label = "<em>Note.</em> E = Extraversion; A = Agreeableness; C = Conscientiousness; N = Neuroticism; O = Openness; Age, education, and gender were assessed at the first baseline personality assessment. Valid N (Range) indicates the range of valid observations with complete personality trait and outcome data across different trait measures.", escape = F)
desc_tab
save_kable(desc_tab, file = sprintf("%s/results/tables/tab-3-desc.html", local_path))r_fun <- function(d, study){
cols <- c("E", "A", "C", "N", "O", "crystallized", "age", "gender", "education")
cols_long <- mapvalues(cols, c(traits$short_name, outcomes$short_name, covars$short_name)
, c(traits$long_name, outcomes$long_name, covars$long_name))
colns <- paste(1:length(cols), ". ", cols_long, sep = "")
r <- d %>%
select(all_of(cols)) %>%
cor(., use = "pairwise")
r <- apply(r, c(1,2), function(x) ifelse(is.na(x), "", ifelse(x == 1, "--", sprintf("%.2f", x))))
r[upper.tri(r)] <- ""
diag(r) <- "--"
rownames(r) <- colns; colnames(r) <- seq(1,length(cols))
tab <- r %>%
as.data.frame() %>%
rownames_to_column(" ") %>%
kable(.
, "html"
, escape = F
, align = c("l", rep("c", length(cols)))
, caption = sprintf("<strong>Table SX</strong><br><em>Zero-Order Correlations Between All Study Variables in %s</em>", study)
) %>%
kable_classic(full_width = F, html_font = "Times New Roman")
save_kable(tab, file = sprintf("%s/results/tables/zero-order-cors/%s.html", local_path, study))
return(tab)
}
loadRData <- function(fileName, type){
#loads an RData file, and returns it
path <- sprintf("%s/data/clean/%s_cleaned.RData", local_path, fileName)
print(path)
load(path)
get(ls()[grepl(type, ls())])
}
ipd3_meta_data <- tibble(
study = studies[!studies %in% c("CNLSY", "SLS")]
, data = map(str_to_lower(study), ~loadRData(., "combined"))
) %>% mutate(
data = map(data, ~(.) %>%
ungroup() %>%
mutate(SID = as.character(SID)))
, study = mapvalues(study, studies, studies_long)
) %>%
unnest(data) %>%
mutate(age = ifelse(is.na(age), p_year - yearBrth, age))## [1] "~/Documents/projects/data synthesis/crystallized/data/clean/base-i_cleaned.RData"
## [1] "~/Documents/projects/data synthesis/crystallized/data/clean/eas_cleaned.RData"
## [1] "~/Documents/projects/data synthesis/crystallized/data/clean/gsoep_cleaned.RData"
## [1] "~/Documents/projects/data synthesis/crystallized/data/clean/hilda_cleaned.RData"
## [1] "~/Documents/projects/data synthesis/crystallized/data/clean/hrs_cleaned.RData"
## [1] "~/Documents/projects/data synthesis/crystallized/data/clean/lasa_cleaned.RData"
## [1] "~/Documents/projects/data synthesis/crystallized/data/clean/map_cleaned.RData"
## [1] "~/Documents/projects/data synthesis/crystallized/data/clean/mars_cleaned.RData"
## [1] "~/Documents/projects/data synthesis/crystallized/data/clean/octo-twin_cleaned.RData"
## [1] "~/Documents/projects/data synthesis/crystallized/data/clean/ros_cleaned.RData"
## [1] "~/Documents/projects/data synthesis/crystallized/data/clean/satsa_cleaned.RData"
## # A tibble: 119,597 × 13
## study Trait p_value p_year SID Outcome o_value o_year education gender SRhealth yearBrth age
## <chr> <chr> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 BASE E 3.83 1990 10004 crystallized 2.17 2000 17 0 4 1918 72
## 2 BASE E 3.83 1990 10010 crystallized 3.08 1995 11 0 2 1911 79
## 3 BASE E 2.5 1990 10033 crystallized 0.909 1997 8 1 5 1910 80
## 4 BASE E 3 1990 10034 crystallized 6.54 1995 10 0 5 1914 76
## 5 BASE E 2.5 1990 10111 crystallized 6.54 1995 13 1 4 1901 89
## 6 BASE E 3.17 1990 10115 crystallized 10 1995 11 0 2 1918 72
## 7 BASE E 3 1990 10116 crystallized 3.85 1995 10 0 4 1917 73
## 8 BASE E 3 1990 10145 crystallized 6.36 1997 10 1 5 1897 93
## 9 BASE E 3.33 1990 10175 crystallized 2.31 1995 8 1 3 1911 79
## 10 BASE E 3.33 1990 10188 crystallized 5.77 2004 10 1 2 1903 87
## # ℹ 119,587 more rows
ipd_cors <- ipd3_meta_data %>%
select(-p_year, -o_year, -yearBrth, -SRhealth) %>%
group_by(study, Trait, Outcome) %>%
mutate_at(vars(p_value, o_value),
~((. - min(., na.rm = T))/(max(., na.rm = T) - min(., na.rm = T))*10)) %>%
pivot_wider(names_from = Outcome, values_from = o_value, values_fn = mean) %>%
pivot_wider(names_from = Trait, values_from = p_value, values_fn = mean) %>%
group_by(study) %>%
nest() %>%
ungroup() %>%
mutate(rtab = map2(data, study, r_fun))