I'm evaluating the test-retest reliability of a questionnaire. I've got more than 200 pairs of variables (nominal, ordinal, or numeric), each measured at test and retest, for whom I compute a concordance statistic and its bootstrapped confidence interval.
I managed to create a function to obtain the 3 computations (estimation, lower and upper bound) for each pairs in a single step but I struggle to creating one allowing me to group the results by supplementary variables.
Because I'll need to plot the results, I would like eventually a dataframe output.
For simplicity, I made a toy example of 8 variables in a dataframe called dfSO
dput(dfSO)
structure(list(qordi1T = structure(c(5L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 4L, 5L, 5L, 4L, 2L, 5L, 5L, 3L), .Label = c("Je n'ai pas de médecin de médecin généraliste attitré",
"Moins d'un an", "1 à 2 ans", "2 à 5 ans", "Plus de 5 ans"), class = c("ordered",
"factor")), qordi2T = structure(c(3L, 3L, 2L, 4L, 4L, 4L, 1L,
2L, 1L, 3L, 4L, 5L, 3L, 3L, 5L, 2L), .Label = c("Mauvaise", "Passable",
"Bonne", "Très bonne", "Excellente"), class = c("ordered", "factor"
)), qnum1T = c(70L, 90L, 90L, 100L, 100L, 70L, 80L, 100L, 50L,
40L, 100L, 100L, 75L, 75L, 95L, 40L), qnum2T = c(100L, 85L, 100L,
100L, 100L, 100L, 100L, 85L, 100L, 70L, 100L, 100L, 95L, 75L,
100L, 80L), qnomi1T = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L), .Label = c("Oui", "Non"), class = "factor"),
qnomi2T = structure(c(1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 2L, 1L, 2L), .Label = c("Oui", "Non"), class = "factor"),
proT = structure(c(7L, 7L, 5L, 7L, 5L, 7L, 4L, 7L, 1L, 6L,
7L, 1L, 1L, 5L, 4L, 5L), .Label = c("Agriculteur exploitant",
"Artisan, commerçant, chef d'entreprise", "Cadre ou profession intellectuelle supérieure :- Profession libérale et assimilée- Cadre de la fonction publique- Profession intellectuelle et artistique- Cadre d'entreprise et ingénieur",
"Employé :- de la fonction publique- administratif d'entreprise- de commerce- Personnel de services directs aux particuliers",
"Ouvrier :- qualifié de type industriel, artisanal, de la manutention, du magasinage et du transport, chauffeurs- non qualifié de type industriel et artisanal- agricole",
"Profession intermédiaire :- de l'enseignement, de la santé, de la fonction publique et assimilés- administrative et commerciales des entreprises- Technicien- Contremaître, agent de maîtrise",
"N/A"), class = "factor"), sexT = structure(c(1L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Femme",
"Homme"), class = "factor"), qordi1RT = structure(c(5L, 3L,
4L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 5L, 4L, 2L, 5L, 5L, 2L), .Label = c("Je n'ai pas de médecin de médecin généraliste attitré",
"Moins d'un an", "1 à 2 ans", "2 à 5 ans", "Plus de 5 ans"
), class = c("ordered", "factor")), qordi2RT = structure(c(3L,
4L, 1L, 4L, 4L, 5L, 2L, 2L, 2L, 3L, 4L, 5L, 3L, 3L, 5L, 2L
), .Label = c("Mauvaise", "Passable", "Bonne", "Très bonne",
"Excellente"), class = c("ordered", "factor")), qnum1RT = c(67L,
87L, 88L, 100L, 94L, 79L, 100L, 100L, 50L, 55L, 100L, 99L,
88L, 70L, 89L, 18L), qnum2RT = c(98L, 89L, 99L, 100L, 81L,
100L, 100L, 100L, 100L, 77L, 63L, 99L, 98L, 71L, 100L, 100L
), qnomi1RT = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("Oui", "Non"), class = "factor"),
qnomi2RT = structure(c(1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Oui", "Non"), class = "factor"),
proRT = structure(c(7L, 7L, 5L, 4L, 7L, 7L, 4L, 7L, 1L, 6L,
4L, 1L, 1L, 5L, 4L, 7L), .Label = c("Agriculteur exploitant",
"Artisan, commerçant, chef d'entreprise", "Cadre ou profession intellectuelle supérieure :- Profession libérale et assimilée- Cadre de la fonction publique- Profession intellectuelle et artistique- Cadre d'entreprise et ingénieur",
"Employé :- de la fonction publique- administratif d'entreprise- de commerce- Personnel de services directs aux particuliers",
"Ouvrier :- qualifié de type industriel, artisanal, de la manutention, du magasinage et du transport, chauffeurs- non qualifié de type industriel et artisanal- agricole",
"Profession intermédiaire :- de l'enseignement, de la santé, de la fonction publique et assimilés- administrative et commerciales des entreprises- Technicien- Contremaître, agent de maîtrise",
"N/A"), class = "factor"), sexRT = structure(c(1L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Femme",
"Homme"), class = "factor")), row.names = c(NA, -16L), class = "data.frame")
There are 2 ordinal variables, 4 nominal and 2 numeric, each measured at test (T) and retest (RT)
> str(dfSO)
'data.frame': 16 obs. of 16 variables:
$ qordi1T : Ord.factor w/ 5 levels "Je n'ai pas de médecin de médecin généraliste attitré"<..: 5 4 4 5 5 5 5 5 4 5 ...
$ qordi2T : Ord.factor w/ 5 levels "Mauvaise"<"Passable"<..: 3 3 2 4 4 4 1 2 1 3 ...
$ qnum1T : int 70 90 90 100 100 70 80 100 50 40 ...
$ qnum2T : int 100 85 100 100 100 100 100 85 100 70 ...
$ qnomi1T : Factor w/ 2 levels "Oui","Non": 2 2 2 1 1 1 2 2 2 1 ...
$ qnomi2T : Factor w/ 2 levels "Oui","Non": 1 2 2 1 2 1 2 2 2 2 ...
$ proT : Factor w/ 7 levels "Agriculteur exploitant",..: 7 7 5 7 5 7 4 7 1 6 ...
$ sexT : Factor w/ 2 levels "Femme","Homme": 1 1 1 2 2 1 1 1 1 2 ...
$ qordi1RT: Ord.factor w/ 5 levels "Je n'ai pas de médecin de médecin généraliste attitré"<..: 5 3 4 5 5 5 5 5 4 4 ...
$ qordi2RT: Ord.factor w/ 5 levels "Mauvaise"<"Passable"<..: 3 4 1 4 4 5 2 2 2 3 ...
$ qnum1RT : int 67 87 88 100 94 79 100 100 50 55 ...
$ qnum2RT : int 98 89 99 100 81 100 100 100 100 77 ...
$ qnomi1RT: Factor w/ 2 levels "Oui","Non": 2 2 2 2 1 1 2 2 2 1 ...
$ qnomi2RT: Factor w/ 2 levels "Oui","Non": 1 2 2 1 2 1 2 2 1 2 ...
$ proRT : Factor w/ 7 levels "Agriculteur exploitant",..: 7 7 5 4 7 7 4 7 1 6 ...
$ sexRT : Factor w/ 2 levels "Femme","Homme": 1 1 1 2 2 1 1 1 1 2 ...
To compute the concordance statistic on qualitative variable, the rel::gac function takes as input a matrix with n subjects and two observations (n*2 matrix)
I made the following function to extract Gwet AC2 (quadratic ponderation) estimations and its bootstrapped CI for ordered variables
library("rel")
library("boot")
ordiTF <- sapply(dfSO[, 1:8], is.ordered)
ordi <- which(ordiTF == TRUE)
g <- function(data, x)
gac(data[x, c(1, 2)], weight = "quadratic", conf.level = 1-(0.05/8))[[5]]
offset <- 8
B <- 500
item <- c()
est <- c()
lci <- c()
uci <- c()
for (i in ordi) {
item <- c(item, i)
est <- c(est, gac(data = dfSO[ , c(i, i + offset)], weight = "quadratic", conf.level = 1-(0.05/8))[[5]])
b <- boot(dfSO[ ,c(i, i + offset)], g, B)
lci <- c(lci, boot.ci(b, type = "bca")$bca[4])
uci <- c(uci, boot.ci(b, type = "bca")$bca[5])
}
data.frame(item = item, est = est, lci = lci, uci = uci)
What I would like now, is to compute this statistic for each pairs by a third variable, say 'sex'. I'll begin with grouping by this variable, but then I'm not able anymore to specify the columns adequately.
For example, if I want to compute AC2 between ordi1 at test and retest by sexwith dplyr, I don't know anymore how to specify these variables. I tried different options, among which:
dfSO %>%
group_by(sexT) %>%
select(sexT, qordi1T, qordi1RT) %$%
gac(as.matrix(qordi1T, qordi1RT), weight = "quadratic", conf.level = 1-(0.05/8))[[5]]
But none of them did the job
As a non-english native speaker, my first limitation to answer this pb is my difficulty to google it, and I'm aware that this question might have been already asked, but if so, I wasn't able to find it.
Thanks for your time and for your help