What is the best way to treat labelled variables imported with haven?

I have about 15 SPSS election studies files saved as .sav files. My group and I will be recoding about 10 variables for each study to run some logistic regressions.

I have used haven() to import all the files, so it looks like all the variables are of the haven_labelled() class.

I have always been a little confused about how to handle this class of variables, however I have observed a lot of improved performance as the haven() and labelled() packages have been updated, so I'm inclined to keep using it as opposed to using, e.g. rio or foreign.

But I want to get a sense of what best practices should be before we start this effort so we don't look back with regret.

Each study file has about 200 variables, with a mix of factors and numeric variables. But to start, I'm wondering how I should go about recoding the sex variable so that I end up with a variable male where 1 is male and 0 is not.

One thing I want to ask about is the car::Recode() method of recoding variables as opposed to the dplyr::recode variable way. I personally find the dplyr::recode() syntax very clunky and the help documentation poor. I am also not sure about the best way to set missing values.

To be specific, I think I have three specific questions.

Question 1: is there a compelling reason to use dplyr::recode as opposed to car::Recode? My own answer is that car::Recode() looks sufficient and easy to use.

Question 2: Should I make a point of converting variables to factors or numeric or will I be OK, leaving variables as haven_labelled with updated value labels? I am concerned about this quote from the haven documentation about the labelled_class: ''This class provides few methods, as I expect you’ll coerce to a standard R class (e.g. a factor()) soon after importing''

However, maybe the haven_labelled class has been improved and is sufficiently different from the labelled class that it is no longer necessary to force conversion to other standard R classes.

Question 3: is there any advantage to setting missing values with the labelled (e.g. na_range(), na_values()) rather than with the car::Recode() method ?

My inclination is that there clear disadvantages to using the labelled methods and I should stick with the car::Recode() method.

Thank you .

#FAKE DATA
library(labelled)
var1<-labelled(rep(c(1,5), 100), c(male = 1, female = 5))
var2<-labelled(sample(c(1,3,5,7,8,9), size=200, replace=T), c('strongly agree'=1, 'agree'=3, 'disagree'=5, 'strongly disagree'=7, 'DK'=8, 'refused'=9))
#give variable labels
var_label(var1)<-'Respondent\'s sex'
var_label(var2)<-'free trade is a good thing'
df<-data.frame(var1=var1, var2=var2)
str(df)
#This works really well; and I really like this. 
look_for(df, 'sex')
look_for(df, 'free trade')
#the Car way
df$male<-car::Recode(df$var1, "5=0")
#Check results
df$male 
#value labels are still there, so would have to be removed or updated
as_factor(df$male)
#Remove value labels
val_labels(df$male)<-NULL
#Check 
class(df$male) #left with a numeric variable
#The other car way, keeping and modifying value labels
df$male2<-car::Recode(df$var1, "5=0")
df$male2
val_label(df$male2, 0)<-c('female')
val_label(df$male2, 5)<-NULL
val_labels(df$male2)
#Check class
class(df$male2)
#Can run numeric functions on it
mean(df$male2)
#easily convert to factor
as_factor(df$male2)

#How to handle missing values
#The CAR way
#use car to set missing values to NA
df$free_trade<-Recode(df$var2, "8=NA; 9=NA")
#Check class
class(df$free_trade)
#can still run numeric functions on haven_labelled
mean(df$free_trade, na.rm=T)
#table
table(df$free_trade)
#did the na recode work?
table(is.na(df$free_trade))
#check value labels
val_labels(df$free_trade)   

#How to handle missing values
#The CAR way
#use car to set missing values to NA
df$free_trade<-Recode(df$var2, "8=NA; 9=NA")
#Check class
class(df$free_trade)
#can still run numeric functions on haven_labelled
mean(df$free_trade, na.rm=T)
#table
table(df$free_trade)
#did the na recode work?
table(is.na(df$free_trade))
#check value labels
val_labels(df$free_trade)      

#set missing values the labelled way
table(df$var2)
na_values(df$var2)<-c(8,9)
#check
df$var2
#but a table function of does not pick up 8 and 9 as m isisng
table(df$var2)
#this seems to not work very well
table(to_factor(df$var2))
to_factor(df$var2)

What is the best way to treat labelled variables imported with haven?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112