I have a 3-stage stratified sampling design of a national survey. I have a code in Stata to do the weighting, but I struggle to reproduce it with R survey package.
Sampling design is the following. The sampling universe is stratified into 49 strata. Within each stratum, sampling is done in three stages. (1) PPS selection of a precinct. (2) Random systematic selection of a household using random route technique, basically. (3) Random systematic selection a respondent within a household using some Kish technique modification.
There is a Stata weighting code that is assumed to preform well:
svyset precinct [pweight=indwt], strata(strt) fpc(npsu) singleunit(certainty) || qnum, fpc(nhh) || _n, fpc(nhhm)
Here, precinct
are precints' numbers, strt
are strata codes. Population sizes used for FPC are npsu
- number of PSUs (i.e. precincts) per stratum, nhh
- numbers of households in PSUs, nhhm
- number of eligible members in a household. qnum
- questionnaire unique numbers, which are the same for both selected households and respondents.
I try to reproduce it with the following R code.
library(survey)
options("survey.lonely.psu" = "certainty")
svy_data <- svydesign(ids = ~precinct + qnum,
strata = ~strt,
weights = ~indwt,
fpc = ~npsu + nhh,
data = data)
I can't do fpc = ~npsu + nhh + nhhm
, because than I get an error:
Error in popsize < sampsize : non-conformable arrays.
Resulting confidence intervals through "confint(svymean(...))" in R doesn't match with Stata confidence intervals through tabout
ado package. They are close, but shifted a bit in R.
My assumption is that I should do something that Stata's _n
term does, and get a 3-stage design instead of a 2-stage one. How could I do that?
Or is there anything else I can try to improve in my R code to match Stata?