I Need a really big help, I have a problem with a dataset in xlsx that I have to analyze for an exam at university. Below I have pasted the link of the dataset and the R script that I wrote for the exam. The problem of my dataset is on the columns 6° (drinks). I have a problem in:
- summary info to calculate the mean of the 345 different values of the column "drinks" (the column with the problems)
- when I try to write the summary table, the column "drinks" give me NA results for each different information (like 1st quartile, median, 3rd quartile, mean, min, max)
- I can't divide my dataset into 2 groups, one that has drinks under 5 and the other one that has drinks more than 5
- I can't plot drinks with abline to compare with the graphs of "CMV"
- I can't calculate the t-test
I know that there are 9 rows With a 0: I tried to change them into 0.0001 but nothing changed on my R-script. I tried to use the following code: mean(yourdata, na.rm = TRUE) (also this didn't work). I also checked that column in format information but nothing of different from the other columns (like "CMV"& co.) I really can't understand why.
I know that probably it's an easy and stupid problem/error but now I can't find the solution about that. someone can help me? Thanks everybody and sorry for the trouble I also reported some rows of the R-script code that give me the error.
- Dataset XLSX: https://mega.nz/#!lFZDjATb
- Dataset uploded TXT: https://mega.nz/#!gQw0wCxC
- R-script: https://mega.nz/#!ZRgyWAKS
mean(my.data$drinks, na.rm = TRUE, trim = 0.0)
mean(my.data$drinks, na.rm = TRUE, trim = 0.0)
tabella <-array(data=NA, dim=c(dim(my.data)[2],6)) for (i in 1:(dim(my.data)[2])){tabella[(i),]<c(quantile(my.data[,i],c(0.25,0.5,0.75)),mean(my.data[,i]),min(my.data[,i]),max(my.data[,i]))}colnames(tabella)<- c("1st quartile","median","3rd quartile","mean","min","max")rownames(tabella)<-colnames(my.data)[1:dim(my.data)[2]]
library(xlsx)
library(rJava)
write.xlsx(tabella, file = "tabella.xlsx")
library(dplyr)
selector1 <- filter(my.data, selector=="1")
selector1drinkslow<- filter(selector1, drinks <"4,5")
hist(my.data$drinks, breaks = 100, col = blues9 , xlab= "drinks", ylab = "Frequency", main = "Drinks distribution")
abline(v=mean(my.data$drinks),lwd=5, col=rgb(1,0,0,0.8))
t.test(my.data$alkphos,my.data$drinks)
var.test(my.data$alkphos,my.data$drinks)
wilcox.test(my.data$alkphos,my.data$drinks)
cor(my.data)
t.test(my.data$sgpt,my.data$drinks)
t.test(my.data$sgot,my.data$drinks)
t.test(my.data$gammagt,my.data$drinks)
wilcox.test(my.data$alkphos)