Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201894

How to compute a summary by group with dplyr with a function that takes a matrix as argument

$
0
0

I'm evaluating the test-retest reliability of a questionnaire. I've got more than 200 pairs of variables (nominal, ordinal, or numeric), each measured at test and retest, for whom I compute a concordance statistic and its bootstrapped confidence interval.

I managed to create a function to obtain the 3 computations (estimation, lower and upper bound) for each pairs in a single step but I struggle to creating one allowing me to group the results by supplementary variables.

Because I'll need to plot the results, I would like eventually a dataframe output.

For simplicity, I made a toy example of 8 variables in a dataframe called dfSO

dput(dfSO)  
structure(list(qordi1T = structure(c(5L, 4L, 4L, 5L, 5L, 5L, 
5L, 5L, 4L, 5L, 5L, 4L, 2L, 5L, 5L, 3L), .Label = c("Je n'ai pas de médecin de médecin généraliste attitré", 
"Moins d'un an", "1 à 2 ans", "2 à 5 ans", "Plus de 5 ans"), class = c("ordered", 
"factor")), qordi2T = structure(c(3L, 3L, 2L, 4L, 4L, 4L, 1L, 
2L, 1L, 3L, 4L, 5L, 3L, 3L, 5L, 2L), .Label = c("Mauvaise", "Passable", 
"Bonne", "Très bonne", "Excellente"), class = c("ordered", "factor"
)), qnum1T = c(70L, 90L, 90L, 100L, 100L, 70L, 80L, 100L, 50L, 
40L, 100L, 100L, 75L, 75L, 95L, 40L), qnum2T = c(100L, 85L, 100L, 
100L, 100L, 100L, 100L, 85L, 100L, 70L, 100L, 100L, 95L, 75L, 
100L, 80L), qnomi1T = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L, 
2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L), .Label = c("Oui", "Non"), class = "factor"), 
    qnomi2T = structure(c(1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 
    2L, 1L, 1L, 1L, 2L, 1L, 2L), .Label = c("Oui", "Non"), class = "factor"), 
    proT = structure(c(7L, 7L, 5L, 7L, 5L, 7L, 4L, 7L, 1L, 6L, 
    7L, 1L, 1L, 5L, 4L, 5L), .Label = c("Agriculteur exploitant", 
    "Artisan, commerçant, chef d'entreprise", "Cadre ou profession intellectuelle supérieure :- Profession libérale et assimilée- Cadre de la fonction publique- Profession intellectuelle et artistique- Cadre d'entreprise et ingénieur", 
    "Employé :- de la fonction publique- administratif d'entreprise- de commerce- Personnel de services directs aux particuliers", 
    "Ouvrier :- qualifié de type industriel, artisanal, de la manutention, du magasinage et du transport, chauffeurs- non qualifié de type industriel et artisanal- agricole", 
    "Profession intermédiaire :- de l'enseignement, de la santé, de la fonction publique et assimilés- administrative et commerciales des entreprises- Technicien- Contremaître, agent de maîtrise", 
    "N/A"), class = "factor"), sexT = structure(c(1L, 1L, 1L, 
    2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Femme", 
    "Homme"), class = "factor"), qordi1RT = structure(c(5L, 3L, 
    4L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 5L, 4L, 2L, 5L, 5L, 2L), .Label = c("Je n'ai pas de médecin de médecin généraliste attitré", 
    "Moins d'un an", "1 à 2 ans", "2 à 5 ans", "Plus de 5 ans"
    ), class = c("ordered", "factor")), qordi2RT = structure(c(3L, 
    4L, 1L, 4L, 4L, 5L, 2L, 2L, 2L, 3L, 4L, 5L, 3L, 3L, 5L, 2L
    ), .Label = c("Mauvaise", "Passable", "Bonne", "Très bonne", 
    "Excellente"), class = c("ordered", "factor")), qnum1RT = c(67L, 
    87L, 88L, 100L, 94L, 79L, 100L, 100L, 50L, 55L, 100L, 99L, 
    88L, 70L, 89L, 18L), qnum2RT = c(98L, 89L, 99L, 100L, 81L, 
    100L, 100L, 100L, 100L, 77L, 63L, 99L, 98L, 71L, 100L, 100L
    ), qnomi1RT = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 
    2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("Oui", "Non"), class = "factor"), 
    qnomi2RT = structure(c(1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 
    2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Oui", "Non"), class = "factor"), 
    proRT = structure(c(7L, 7L, 5L, 4L, 7L, 7L, 4L, 7L, 1L, 6L, 
    4L, 1L, 1L, 5L, 4L, 7L), .Label = c("Agriculteur exploitant", 
    "Artisan, commerçant, chef d'entreprise", "Cadre ou profession intellectuelle supérieure :- Profession libérale et assimilée- Cadre de la fonction publique- Profession intellectuelle et artistique- Cadre d'entreprise et ingénieur", 
    "Employé :- de la fonction publique- administratif d'entreprise- de commerce- Personnel de services directs aux particuliers", 
    "Ouvrier :- qualifié de type industriel, artisanal, de la manutention, du magasinage et du transport, chauffeurs- non qualifié de type industriel et artisanal- agricole", 
    "Profession intermédiaire :- de l'enseignement, de la santé, de la fonction publique et assimilés- administrative et commerciales des entreprises- Technicien- Contremaître, agent de maîtrise", 
    "N/A"), class = "factor"), sexRT = structure(c(1L, 1L, 1L, 
    2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Femme", 
    "Homme"), class = "factor")), row.names = c(NA, -16L), class = "data.frame")

There are 2 ordinal variables, 4 nominal and 2 numeric, each measured at test (T) and retest (RT)

> str(dfSO)
'data.frame':   16 obs. of  16 variables:
 $ qordi1T : Ord.factor w/ 5 levels "Je n'ai pas de médecin de médecin généraliste attitré"<..: 5 4 4 5 5 5 5 5 4 5 ...
 $ qordi2T : Ord.factor w/ 5 levels "Mauvaise"<"Passable"<..: 3 3 2 4 4 4 1 2 1 3 ...
 $ qnum1T  : int  70 90 90 100 100 70 80 100 50 40 ...
 $ qnum2T  : int  100 85 100 100 100 100 100 85 100 70 ...
 $ qnomi1T : Factor w/ 2 levels "Oui","Non": 2 2 2 1 1 1 2 2 2 1 ...
 $ qnomi2T : Factor w/ 2 levels "Oui","Non": 1 2 2 1 2 1 2 2 2 2 ...
 $ proT    : Factor w/ 7 levels "Agriculteur exploitant",..: 7 7 5 7 5 7 4 7 1 6 ...
 $ sexT    : Factor w/ 2 levels "Femme","Homme": 1 1 1 2 2 1 1 1 1 2 ...
 $ qordi1RT: Ord.factor w/ 5 levels "Je n'ai pas de médecin de médecin généraliste attitré"<..: 5 3 4 5 5 5 5 5 4 4 ...
 $ qordi2RT: Ord.factor w/ 5 levels "Mauvaise"<"Passable"<..: 3 4 1 4 4 5 2 2 2 3 ...
 $ qnum1RT : int  67 87 88 100 94 79 100 100 50 55 ...
 $ qnum2RT : int  98 89 99 100 81 100 100 100 100 77 ...
 $ qnomi1RT: Factor w/ 2 levels "Oui","Non": 2 2 2 2 1 1 2 2 2 1 ...
 $ qnomi2RT: Factor w/ 2 levels "Oui","Non": 1 2 2 1 2 1 2 2 1 2 ...
 $ proRT   : Factor w/ 7 levels "Agriculteur exploitant",..: 7 7 5 4 7 7 4 7 1 6 ...
 $ sexRT   : Factor w/ 2 levels "Femme","Homme": 1 1 1 2 2 1 1 1 1 2 ...

To compute the concordance statistic on qualitative variable, the rel::gac function takes as input a matrix with n subjects and two observations (n*2 matrix)

I made the following function to extract Gwet AC2 (quadratic ponderation) estimations and its bootstrapped CI for ordered variables

library("rel")
library("boot")

ordiTF <- sapply(dfSO[, 1:8], is.ordered)
ordi <- which(ordiTF == TRUE)

g <- function(data, x)
  gac(data[x, c(1, 2)], weight = "quadratic", conf.level = 1-(0.05/8))[[5]]

offset <- 8
B <- 500  
item <- c()
est <- c()
lci <- c()
uci <- c()
for (i in ordi) {
  item <- c(item, i)
  est <- c(est, gac(data = dfSO[ , c(i, i + offset)], weight = "quadratic", conf.level = 1-(0.05/8))[[5]])
  b <- boot(dfSO[ ,c(i, i + offset)], g, B)
  lci <- c(lci, boot.ci(b, type = "bca")$bca[4])
  uci <- c(uci, boot.ci(b, type = "bca")$bca[5])
}
data.frame(item = item, est = est, lci = lci, uci = uci)

What I would like now, is to compute this statistic for each pairs by a third variable, say 'sex'. I'll begin with grouping by this variable, but then I'm not able anymore to specify the columns adequately.

For example, if I want to compute AC2 between ordi1 at test and retest by sexwith dplyr, I don't know anymore how to specify these variables. I tried different options, among which:

dfSO %>% 
  group_by(sexT) %>% 
  select(sexT, qordi1T, qordi1RT) %$%
  gac(as.matrix(qordi1T, qordi1RT), weight = "quadratic")[[5]]

But none of them did the job

As a non-english native speaker, my first limitation to answer this pb is my difficulty to google it, and I'm aware that this question might have been already asked, but if so, I wasn't able to find it.

Thanks for your time and for your help

Folowing @42 remarks, I made several edits

  1. Error output

Here is the error output I obtained

> dfSO %>% 
    +     group_by(sexT) %>% 
    +     select(sexT, qordi1T, qordi1RT) %$%
    +     gac(as.matrix(qordi1T, qordi1RT), weight = "quadratic")[[5]]

    Error in gac(as.matrix(qordi1T, qordi1RT), weight = "quadratic") : 
      The data frame needs to be formatted as a n*2 matrix!
  1. I made a wrong copy-paste nn my initial code, copying a desperate attempt to use %$% instead of %>%. It should have been
    dfSO %>% 
        group_by(sexT) %>% 
        select(sexT, qordi1T, qordi1RT) %>%
        gac(as.matrix(qordi1T, qordi1RT), weight = "quadratic")

with the same error

    Error in gac(., as.matrix(qordi1T, qordi1RT), weight = "quadratic") : 
      The data frame needs to be formatted as a n*2 matrix!

  1. Note that I obtain a (different) error even without grouping
    dfSO %>%
        select(qordi1T, qordi1RT) %>%
        gac(as.matrix('qordi1T', 'qordi1RT'), weight = "quadratic")[[5]]

    Error in .subset2(x, ..2, exact = exact) : subscript out of bounds
  1. But actually, the problem was the way I was extracted the estimation. Indeed, removing [[5]] provided no error
    dfSO %>%
        select(qordi1T, qordi1RT) %>%
        gac(as.matrix('qordi1T', 'qordi1RT'), weight = "quadratic")
    Call:
    gac(data = ., kat = as.matrix("qordi1T", "qordi1RT"), weight = "quadratic")

          Estimate   StdErr  LowerCB UpperCB
    Const 0.860780 0.097708 0.652520   1.069

    Confidence level = 95%
    Sample size = 16
  1. And finally, using magrittr::extract2() returns the coefficient
    dfSO %>% 
      select(qordi1T, qordi1RT) %>%
      gac(as.matrix('qordi1T', 'qordi1RT'), weight = "quadratic") %>% 
      extract2('est')
        Const 
    0.8607799

  1. Now, back to my grouping issue: I would like to obtain a dataframe like this
> AC2_ordi_bySexT
      item  SexT       AC2    AC2_lci   AC2_uci
    1    1 Homme 0.6551724  0.4182536 0.9832149
    2    2 Homme 1.0000000 -0.4163015 0.9023788
    3    1 Femme 0.8734663  0.5308729 0.9780148
    4    2 Femme 0.4605954 -0.3208255 0.9073359
  1. Ultimately, I would like to adapt my g function in order to obtain a dataframe like this for each grouping variable (sex, age, ...).

  2. Regarding the boot issue raised by @42, I'm not sure to understand what's wrong with my syntax...


Viewing all articles
Browse latest Browse all 201894

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>