Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

How to cast this sapply use-case as a dplyr?

$
0
0

While working with predict.knn3 I bumped into an interesting data-wrangling-ish use-case. I didn't know I could call predict using the argument type="class" to get the predicted levels, exactly what I needed. Therefore, I worked out a somewhat involved solution to select from each predict()'s result row, the level having the maximum probability. The problem was due to the names function not working in "vectorized" form with a matrix but only with vectors.

To illustrate the use-case before and after finding out about the type="class" argument:

rm(list = ls())
library(caret)
library(tidyverse)
library(dslabs)

data("tissue_gene_expression")
x <- tissue_gene_expression$x
y <- tissue_gene_expression$y

set.seed(1)
test_index <- createDataPartition(y, times = 1, p = 0.5, list = FALSE)
test_x <- x[test_index,]
test_y <- y[test_index]
train_x <- x[-test_index,]
train_y <- y[-test_index]

# fit the model, predict without type="class" and use sapply to build the y_hat levels
fit <- knn3(train_x, train_y, k = 1)
pred <- predict(fit, test_x)
y_hat <- sapply(1:nrow(pred), function(i) as.factor(names(pred[i,which.max(pred[i,])])))

# compare it to the solution using predict with type="class"
identical(y_hat, as.factor(predict(fit, test_x, type="class")))
[1] TRUE

To illustrate the issue I can do the following, see that the names function operating on a vector of named numeric elements produces the desired result whereas with a matrix will fail with NULL output:

names(pred[1, which.max(pred[1,])])
[1] "cerebellum"
names(pred[1:2, which.max(pred[1:2,])])
NULL

Assuming being unaware of this convenient type="class" in the predict.knn3 function; is there a simpler way using tidyverse and dplyr to replace this sapply with? Or any other simpler way to implement this use-case?

y_hat <- sapply(1:nrow(pred), function(i) as.factor(names(pred[i, which.max(pred[i,])])))

I'm after something like the following but it doesn't work:

as_tibble(predict(fit, test_x)) %>% mutate(y_hat=names(which.max(.[row_number(),])))

Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>