"Error: C stack usage 7971488 is too close to the limit"

February 13, 2020, 10:57 am

≫ Next: How to complete missing data in R [duplicate]

≪ Previous: gganimate transition between similar mosaic plots

Currently, I am reading through this excel file and trying to determine the proportion of the columns that have data in at least 20% of their rows. (In other words, step from column to column and see if the proportion of NA's is less than 80%. Then determine the proportion of the columns that fulfill this condition.) However, I keep receiving this error: "Error: C stack usage 7971488 is too close to the limit"

My code is

planets.csv = suppressMessages(read_csv("planets.csv", skip = 73)) na.percent = sapply(planets.csv, function(x) return(sum(is.na(x) == TRUE)/length(x))) sum(na.percent < 0.80) / length(na.percent)

What does this error means and how do I fix it?

↧

How to complete missing data in R [duplicate]

February 13, 2020, 10:58 am

≫ Next: Fill new columns based on key words of an existing column

≪ Previous: "Error: C stack usage 7971488 is too close to the limit"

I'm having so trouble trying to fill in missing rows in a simple df

x <- data.frame( "Name" = c("John","Dora"), "Age" = c(21,15))

I always need a 2x2 dataframe and sometimes are John or Dora missing. I need the output to fill in John or Dora under Name with age of '0' when missing.

Here is what I'm trying

x[1, ] %>% 
       tidyr::complete(tidyr::nesting('John' , 'Dora'), fill = list('Age' = 0))

And it give me this error

Error: `by` can't contain join column `"John"`, `"Dora"` which is missing from RHS
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: In seq.default(along = x) :
partial argument match of 'along' to 'along.with'
2: In seq.default(along = x) :
partial argument match of 'along' to 'along.with'

↧

Fill new columns based on key words of an existing column

February 13, 2020, 10:59 am

≫ Next: Separate strings returned by html_text in rvest

≪ Previous: How to complete missing data in R [duplicate]

I have the dataframe below and I would like to create new columns based on key words in each row of the INFO column.

ID<-c(1,2,3,4)
INFO<-c("You used works apps for 4 minutes today.","You checked your phone 10 times today.",
        "Your commute time to work today was 4 minutes.","You (or at least your phone) were at your work place for 15 minutes today"
        )
DATASET<-data.frame(ID,INFO)

More specifically I have to check each row of the INFO for the words apps,phone,commute,or. Then one new column for each one will be created and it will have either NA or the number that exists in this row so my new dataset will be like:

DATASET2
  ID                                                                      INFO apps phone commute or
1  1                                  You used works apps for 4 minutes today.    4    NA      NA    NA
2  2                                    You checked your phone 10 times today.   NA    10      NA    NA
3  3                            Your commute time to work today was 4 minutes.   NA    NA       4    NA
4  4 You (or at least your phone) were at your work place for 15 minutes today   NA    NA      NA    15

↧

Separate strings returned by html_text in rvest

February 13, 2020, 10:59 am

≫ Next: Golem deploy in docker : %>% not fund

≪ Previous: Fill new columns based on key words of an existing column

I'm trying to extract amenities for a hotel using rvest.

library(rvest)
hotel_url="https://www.tripadvisor.com/Hotel_Review-g187791-d13494726-Reviews-Palazzo_Caruso-Rome_Lazio.html"
amenities<-hotel%>%
    html_node(".hotels-hr-about-amenities-AmenityGroup__amenitiesList--3MdFn")%>%
    html_text()

The resulting text does not separate one amenity from the other:

[1] "Paid private parking nearbyFree High Speed Internet (WiFi)Coffee shopBicycle toursWalking toursCar hireFax / photocopyingBaggage storageFree internetWifiPublic wifiInternetBreakfast availableBreakfast in the roomConciergeExecutive lounge accessNon-smoking hotelSun terrace24-hour front deskPrivate check-in / check-outLaundry service"

Is there any way to add separators (such as ";") between amenities?

↧

Golem deploy in docker : %>% not fund

February 13, 2020, 11:03 am

≫ Next: Plot different variables on different graphs on top of each other

≪ Previous: Separate strings returned by html_text in rvest

I migrate my shiny app in dockerised golem app. I have a problem in the use of pipe. With this line :

plotly::plot_ly(tabPieTension, labels = ~cat, values = ~valeur, type = 'pie', sort = FALSE) %>%
     plotly::layout(title = "Delta tension (Baisse de tension décharge)")

My app run well in Rstudio local. I build the .tar.gz without error. I build the docker image without problem (and dplyr is well installed) but when I run the image I have

error : could not find function "%>%"

Seems that dplyr is not recognized. I try change to dplyr::%>% but the build won't work.

Is someone has an idea of my error ? Many thanks !

↧

Plot different variables on different graphs on top of each other

February 13, 2020, 11:04 am

≫ Next: Colours not displaying on scatterplot points

≪ Previous: Golem deploy in docker : %>% not fund

I have 3 different variables (A, B, C) to be plotted on 3 graphs on top of each other(as they have different axis). My output has a lot of space between graphs and I would like to reduce that space and have an only X-Axis at the bottom and wider graph y-limits on all.

I have also read maybe facet_wrap is a better way of plotting multiple graphs? Could you please give me advice on what is best to do? Thanks

My data:

Location = c(1,2,3,4,5,6,7)
A = c(1.16, 0.96, 0.67,0.78, 0.55, 0.3,0.26)
B = c(6.51, 4.98, 2.85, 3.19, 3.60, 10.82, 8.60)
C = c(75.45, 86.66, 103.36, NA, 107.53, NA, 128.49)

df = data.frame(Location, A, B, C)

My code:

par(mfrow=c(3,1))

plot(A, type = "l", col = "red", ylab = "A", main="Title", xlab = NULL)
plot(B, type = "l", col = "Blue", ylab = "B", xlab = NULL)
plot(C, type = "p", pch= 19, col = "Blue", ylab = "C", xlab = Location)

↧

Colours not displaying on scatterplot points

February 13, 2020, 11:05 am

≫ Next: Why can R plot out-of-the-box but Python (matplotlib) needs tkinter?

≪ Previous: Plot different variables on different graphs on top of each other

I am trying to conduct a RDA analysis, and to do so I had to create a plot with my SNPs (the points on the scatterplot). I assigned colours to my points, but once I plotted them, they were not filled and instead were white circles outlined in gray. Attached is my code, and attached is the image of what I got, and an image of what I want it to look like. Also attached is an image of a smaller dataset to make it easier to see what I'm doing! Thank you in advance!

#gen1 Dataset
 POP   L0001 L0002 L0003
   <chr> <dbl> <dbl> <dbl>
 1 AK        0     1     0
 2 NU       -1    -1    -1
 3 GR        1     0     0
 4 LB        0     1     0
 5 NF        1     0     0
 6 ST        0     0     0
 7 NS       -1     2     0
 8 NB        1     2     0
 9 ME        0     1     0
10 IC        0     0     0
11 FI        0     0     0

#env1 dataset
POP   CHLa.max CHLa.min CHLa.avg
   <chr>    <dbl>    <dbl>    <dbl>
 1 AK       2.07    0.0623    0.780
 2 NU       0.943   0.0697    0.245
 3 GR       2.03    0.0494    0.453
 4 LB       1.55    0.263     0.678
 5 NF       1.63    0.190     0.698
 6 ST       2.40    1.17      1.74 
 7 NS       1.14    0.0708    0.447
 8 NB       1.79    0.231     0.900
 9 ME       1.69    0.131     0.711
10 IC       2.28    0.147     0.892
11 FI       0.554   0.0569    0.207

#Specify columns
gen1<-gen1[2:4]

#Specify predictors
pred1<-subset(env1[,1])

#Conduct RDA
BLGU.rda <- rda(gen1 ~ ., data=pred1, scale=T)
BLGU.rda

#Define Populations
levels(env1[["POP"]]) <- c("AK", "NU", "GR", "LB", "NF", "ST", "NS", "NB", "ME", "IC", "FI")

#Give Populations Callback Name
eco1 <- env1[["POP"]]

#Assign Colours
bg <- c("#fa8a6b", "#5d7142", "#010c22", "#61cd9e", "#7110b6", "#15c4df", "#892f74", "#0615f3", "#b6faea", "#e402b1", "#ad4833")

#Plot RDA
plot(BLGU.rda, type="n", scaling=2)

#Plot populations
points(BLGU.rda, display="sites", pch=21, cex=1.3, col="gray32", scaling=2, bg=bg[eco])

env1 Data

gen1 Data

Scatter Plot of RDA (SNPs vs Environmental Predictors

Scatter Plot of what I want my data to look like

↧

Why can R plot out-of-the-box but Python (matplotlib) needs tkinter?

February 13, 2020, 11:06 am

≫ Next: Is there a way to produce predict.gam(..., type="terms") values that are NOT centered

≪ Previous: Colours not displaying on scatterplot points

I'm on a *nix server right now with limited install privileges and I'm appreciating a distinct advantage of R over Python: you need fewer dependencies to plot in R. I have X-forwarding set up and can plot away in R, but I can't with Python due to lack of a backend.

For Python, I've had to install Tkinter a few times - not a big deal - but now I'm in a situation where it's not obvious how to and I'm appreciating the ease of plotting with R. Wasn't Python supposed to be the "batteries included" language?

So how does R do it? Does every install of R come with Tkinter? Or is it using something else to create its plots?

↧

Is there a way to produce predict.gam(..., type="terms") values that are NOT centered

February 13, 2020, 11:08 am

≫ Next: Appropriate to fit lognormal model to data with heavy tail?

≪ Previous: Why can R plot out-of-the-box but Python (matplotlib) needs tkinter?

Original question:

Calling predict.gam(..., type="terms") returns values that are centered on the average. Is there a way to obtain the raw predicted term values (i.e. ones that have not been centered on the average)?

Edited: Here is a reproducible example of my attempt to get the (non-centered) fitted values of a given variable using lpmatrix. The values are similar to those using visreg but with an offset. This is strictly for the case where the link is identity and there are no tensor products.

    # read in data
    air<-data.frame(airquality)
    air<-air[complete.cases(air),]

    # set up m odel
    model<-gam(Temp~s(Ozone) + s(Solar.R) + s(Wind),data=air,method="ML")

#get predicted values 
predicted<-as.data.frame(predict(model,na.action=na.exclude))

    colnames(predicted)<-"predicted"

# using the lpmatrix, set values of s(Ozone), s(Solar.R), and s(Wind) to 0    
lpmat<-predict(model, type="lpmatrix")
    lpmat_Ozone<-lpmat; lpmat_Ozone[,grep("Ozone",colnames(lpmat))]<-0
    lpmat_Solar.R<-lpmat; lpmat_Solar.R[,grep("Solar.R",colnames(lpmat))]<-0
    lpmat_Wind<-lpmat; lpmat_Wind[,grep("Wind",colnames(lpmat))]<-0

#obtain response predictions with s(each variable) set to 0 (respectively)
    predicted$Ozone<-unname(lpmat_Ozone%*%coef(model))[,1]
    predicted$Solar.R<-unname(lpmat_Solar.R%*%coef(model))[,1]
    predicted$Wind<-unname(lpmat_Wind%*%coef(model))[,1]

#obtain term predictions
    answerdf<-as.data.frame(predicted$predicted - predicted$Ozone)
    colnames(answerdf)<-"Ozone"
    answerdf$Solar.R<-(predicted$predicted - predicted$Solar.R)
    answerdf$Wind<-(predicted$predicted - predicted$Wind)

#visualize using visreg method and the alternative method above 
    visregdat<-visreg(model, "Ozone", plot=FALSE)$fit
    plot(visregFit~Ozone,data=visregdat, type="l", lwd=5, ylim=c(-30,90), ylab= "fitted values")
    points(answerdf$Ozone~air$Ozone, col="violet", pch=20)
    legend(100,60, legend=c("Visreg", "Alt. method"),
           col=c("black", "violet"), pch=20, cex=0.8)

Gives us this plot, showing the same curves but with with different intercepts. Why would this be?

↧

Appropriate to fit lognormal model to data with heavy tail?

February 13, 2020, 11:15 am

≫ Next: Group Rows by Multiple Columns using R then Statistical Analysis

≪ Previous: Is there a way to produce predict.gam(..., type="terms") values that are NOT centered

I am attempting to standardize recreational fishery CPUE data. I am using a delta approach, with a binomial model fit to the presence/absence data and a lognormal model fit to the positive observations. When I test the log(positives) data for normality, I get a plot that looks almost normal, but with a heavy left tail. I am not quite sure why. My question is, how heavy tailed is too heavy tailed for a normal distribution to be appropriate? Are there alternative distributions you would consider, and how would I test which one is most appropriate? I have over 80,000 observations so a shapiro wilk test here is useless. I have included an image of my density plot and qqplot. diagnostic plot Thank you!

↧

Group Rows by Multiple Columns using R then Statistical Analysis

February 13, 2020, 11:16 am

≫ Next: Find categorical indicator vector based on continuous thresholds

≪ Previous: Appropriate to fit lognormal model to data with heavy tail?

Each row in the dataset has an id number, and each row lists a single product with the net price. Some deals have multiple items sold, hence the multiple rows with the same id number. How do I create groups of the rows so that they are grouped by product combination?

For example, IDs 1 and 4 show be seen as the same since they have the exact same products in the deals.

The end goal is to be able to create matching groups, then do a statistical analysis on these groups.

Goal output:

I tried: data_wide <- spread(dt, Product, Price)

Output: 'var' must evaluate to a single number or a column name, not a character vector

structure(list(`ID Number` = c(1, 1, 1, 2, 3, 3, 4, 4, 4), Product = c("A", 
"B", "D", "A", "A", "B", "A", "B", "D"), Price = c(8, 7, 11, 
4, 5, 9, 10, 3, 5)), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame"))

↧

Find categorical indicator vector based on continuous thresholds

February 13, 2020, 11:25 am

≫ Next: Meta analysis of prevalence rates with random effects: is there any problem with my code?

≪ Previous: Group Rows by Multiple Columns using R then Statistical Analysis

I have a set of t thresholds that separate my data vector y into t-1 categories.

y <- runif(100)     # data vector
t <- c(0, 0.5, 1)   # threshold vector

In this example, category 1 corresponds to data points that satisfy 0 < y < 0.5 and category 2 corresponds to data points that satisfy 0.5 < y < 1. To find the corresponding vector of categories, a naive looping approach would be

nc <- length(t) - 1                       # number of categories
categories <- numeric(length=length(y))   # vector of categories

for(cc in 1:nc){    # loop over categories

lower <- t[cc]      # lower bound for category cc
upper <- t[cc + 1]  # upper bound for category cc

cc.log <- (lower < y) & (y < upper) # logical vector where y satisfies thresholds
categories[cc.log] <- cc            # assign active category where thresholds are satisfied

}

Is there an easier and scalable solution that takes as inputs the data vector y as well as the threshold vector t and returns the vector of categories categories?

↧

Meta analysis of prevalence rates with random effects: is there any problem with my code?

February 14, 2020, 10:11 am

≫ Next: Correct use of fun.data with stat_summary in ggplot2?

≪ Previous: Find categorical indicator vector based on continuous thresholds

I'm doing a meta analysis on prevalence data. In each study, participants can belong to one of three mutually exclusive groups. I would like to figure out the prevalence of each group (i.e., the percent of individuals belonging to each) across all of the studies, accounting for sample size.

I am using the "meta" package to do this. I would love if someone could take a look and see if I am doing this correctly. I have attached some sample data and code.

The thing that is giving me pause is that the three prevalence estaimtes with random effects don't add up to 100. Is that normal?

require(data.table)
require(meta)

data <- data.table(Study = c("Smith", "Bond", "Francis", "Smith", "Bond", "Francis", "Smith", "Bond", "Francis"), Group = c("A", "A", "A", "B", "B", "B", "C", "C", "C"), size = c(150, 40, 30, 150, 40, 30, 150, 40, 30), members = c(140, 30, 20, 5, 5, 5, 5, 5, 5))
data$Study <- as.factor(data$Study)
data$Group <- as.factor(data$Group)

analysis <- metaprop(data = data, event = members, n = size, studlab = Study, byvar = data$Group)

To give some info on the data. Study is the name of each of the three studies. Group is which of the three groups each row refers to. Size is the sample size in a given study. Members is the number of people in the sample that belong to a given group.

↧

Correct use of fun.data with stat_summary in ggplot2?

February 14, 2020, 10:15 am

≫ Next: Rounding behavior of updateSliderInput in R shiny

≪ Previous: Meta analysis of prevalence rates with random effects: is there any problem with my code?

From ?stat_summary.

fun.data : Complete summary function. Should take data frame as input and return data frame as output

I'm having trouble understanding this. It doesn't seem like my summary function so.summary is being passed a data frame at all!

Code:

set.seed(0)
so.example <- data.frame(
  sampleID=rep(1:15)
  , sales=runif(15, 0, 1)*1000
  , revenue=runif(15, 0, 1)*10000
)

so.summary <- function(z) {
  print(z)
  data.frame(sales=median(z$sales), revenue=median(z$revenue))
}

ggplot(
  so.example
  , aes(x=sales, y=revenue)
  ) + geom_point() + stat_summary(fun.data=so.summary, geom='point', color='red')

Output:

[1] 2672.207
Error in z$sales : $ operator is invalid for atomic vectors

↧

Rounding behavior of updateSliderInput in R shiny

February 14, 2020, 10:17 am

≫ Next: Pattern matching using regex for messy file names

≪ Previous: Correct use of fun.data with stat_summary in ggplot2?

I am trying to use the following R Shiny code to use the first slider to for updates in the second slider. However, when the updateSliderInput function is called, it seems to overwrite the round = T in the original sliderInput. I know that, since I am dividing by 9 in the updateSliderInput function, the step size will not be an integer for some values of the first slider, but is there a way to show a rounded value in the recalculated slider so that I don't get 16 digits of precision?

ui <- fluidPage(
  sidebarLayout(
    sidebarPanel(
      p("The first slider controls the second"),
      sliderInput(inputId = "value", label = "The independent slider",
                  min = 1000, max = 1500, value = 1000, step = 100, round = T
      ),
      sliderInput(inputId = "value2", label = "The dependent slider",
                      min = 5, max = 500, value = 50, round = T
      )
    ),
    mainPanel()
  )
)

server <- function(input, output, session) {
  observe({
    val <- input$value

    updateSliderInput(session, "value2", value = (val * 0.3),
                      min = (val * 0.005), 
                      max = (val * 0.5), 
                      step = floor((val * 0.5) - floor(val * 0.005))/9)

  })     
}

shinyApp(ui, server)

Right now, I see this, no matter what I try:

↧

Pattern matching using regex for messy file names

February 14, 2020, 10:30 am

≫ Next: Costumize width of bar plot in likert plot

≪ Previous: Rounding behavior of updateSliderInput in R shiny

I do not have very much experience with REs, but need to parse 100s of file names to generate a 'metadata' data set. I have been able to generate text files that include the file paths and the file name. It is simple for me to parse out the complete file name, but I need to be able to parse out the "sample ID" from the file name.

The issue is that the syntax of the "sample IDs" is all over the place (See attached csv for example data: The goal is to go from the 'sample' column to the 'ID' column). I have tried a series of strsplit() commands, but this is very cumbersome, and is not functional in nature. I have also tried writing a function with a number of IF statements based on syntax structure. I feel like this is still not a good solution because it is still dependent on me manually identifying the different syntax before hand, and I could easily miss something since I have to do this by eye.

It seems to me that this is a regex problem, but I could use some resources to help me get started. I would like to be able to do this in either R or Python if possible. Thank you for any resources, or packages/modules that may be useful.

dput(head(brain_ref, 25))
structure(list(file = c("/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/BXH12_1_brain_total_RNA_cDNA_GTCCGC.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/BXH12_2_brain_total_RNA_cDNA_CAGATC.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/HXB13_1_brain_total_RNA_cDNA_ATGTCA.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/HXB13_2_brain_total_RNA_cDNA_GTGAAA.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/HXB17_1_brain_total_RNA_cDNA_CCGTCC.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/HXB17_2_brain_total_RNA_cDNA_ATGTCA.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/HXB2_1_brain_total_RNA_cDNA_GTCCGC.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/HXB2_2_brain_total_RNA_cDNA_CTTGTA.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/HXB25_1_brain_total_RNA_cDNA_AGTTCC.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/HXB25_2_brain_total_RNA_cDNA_AGTCAA.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/HXB27_1_brain_total_RNA_cDNA_CGATGT.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/HXB27_2_brain_total_RNA_cDNA_AGTTCC.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/HXB7_1_brain_total_RNA_cDNA_ACAGTG.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/HXB7_2_brain_total_RNA_cDNA_AGTCAA.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/SHR_1_brain_total_RNA_cDNA_GCCAAT.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch1/ensembl_v96/SHR_2_brain_total_RNA_cDNA_TGACCA.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch10/ensembl_v96/ACI-SegHsd-2-brain-total-RNA_S17.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch10/ensembl_v96/BXH2-3-brain-total-RNA_S4.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch10/ensembl_v96/BXH5-3-brain-total-RNA_S3.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch10/ensembl_v96/BXH8-3-brain-total-RNA_S5.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch10/ensembl_v96/Cop-CrCrl-2-brain-total-RNA_S10.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch10/ensembl_v96/Dark-Agouti-1-brain-total-RNA_S16.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch10/ensembl_v96/Dark-Agouti-2-brain-total-RNA_S13.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch10/ensembl_v96/F344-NCI-1-brain-total-RNA_S18.genes.results", 
"/data/rn6/quantitation/brainTotalRNA/RI/batch10/ensembl_v96/F344-NCI-2-brain-total-RNA_S15.genes.results"
), sample = c("BXH12_1_brain_total_RNA_cDNA_GTCCGC", "BXH12_2_brain_total_RNA_cDNA_CAGATC", 
"HXB13_1_brain_total_RNA_cDNA_ATGTCA", "HXB13_2_brain_total_RNA_cDNA_GTGAAA", 
"HXB17_1_brain_total_RNA_cDNA_CCGTCC", "HXB17_2_brain_total_RNA_cDNA_ATGTCA", 
"HXB2_1_brain_total_RNA_cDNA_GTCCGC", "HXB2_2_brain_total_RNA_cDNA_CTTGTA", 
"HXB25_1_brain_total_RNA_cDNA_AGTTCC", "HXB25_2_brain_total_RNA_cDNA_AGTCAA", 
"HXB27_1_brain_total_RNA_cDNA_CGATGT", "HXB27_2_brain_total_RNA_cDNA_AGTTCC", 
"HXB7_1_brain_total_RNA_cDNA_ACAGTG", "HXB7_2_brain_total_RNA_cDNA_AGTCAA", 
"SHR_1_brain_total_RNA_cDNA_GCCAAT", "SHR_2_brain_total_RNA_cDNA_TGACCA", 
"ACI-SegHsd-2-brain-total-RNA_S17", "BXH2-3-brain-total-RNA_S4", 
"BXH5-3-brain-total-RNA_S3", "BXH8-3-brain-total-RNA_S5", "Cop-CrCrl-2-brain-total-RNA_S10", 
"Dark-Agouti-1-brain-total-RNA_S16", "Dark-Agouti-2-brain-total-RNA_S13", 
"F344-NCI-1-brain-total-RNA_S18", "F344-NCI-2-brain-total-RNA_S15"
), batch = c("batch1", "batch1", "batch1", "batch1", "batch1", 
"batch1", "batch1", "batch1", "batch1", "batch1", "batch1", "batch1", 
"batch1", "batch1", "batch1", "batch1", "batch10", "batch10", 
"batch10", "batch10", "batch10", "batch10", "batch10", "batch10", 
"batch10"), ID = c("BXH12_1", "BXH12_2", "HXB13_1", "HXB13_2", 
"HXB17_1", "HXB17_2", "HXB2_1", "HXB2_2", "HXB25_1", "HXB25_2", 
"HXB27_1", "HXB27_2", "HXB7_1", "HXB7_2", "SHR_1", "SHR_2", "ACI-SegHsd_2", 
"BXH2_3", "BXH5_3", "BXH8_3", "Cop-CrCrl_2", "Dark-Agouti_1", 
"Dark-Agouti_2", "F344-NCI_1", "F344-NCI_2")), row.names = c(NA, 
25L), class = "data.frame")

↧

Costumize width of bar plot in likert plot

February 14, 2020, 10:30 am

≫ Next: Unnest nested tidydrc models

≪ Previous: Pattern matching using regex for messy file names

I'm using the likert package by jbryer and want to visualise the data with stacked bar plots. The size/width of these bar plots depends on how many bars are in the graph, i.e. with only one bar the bar is pretty wide, while they get thinner the more bars are plotted.

I'd like to costumly set the size/width of the bar, so that they stay the same, no matter how many bars are plotted in the graph, i.e. that the bar size is the same for the plots of l29_5 and l29_2.

Likert bar plot with two bars

Likert bar plot with five bars

library(ggplot)
library(likert)    
data(pisaitems)

items29_5 <- pisaitems[,substr(names(pisaitems), 1,5) == 'ST25Q']
colnames(items29_5) <- c("Magazines", "Comic books", "Fiction", 
                    "Non-fiction books", "Newspapers")

items29_2 <-  items29_5 %>% 
  select("Magazines", "Comic books")


l29_5 <- likert(items29_5)
l29_2 <- likert(items29_2)

plot(l29_5)
plot(l29_2)

↧

Unnest nested tidydrc models

February 14, 2020, 10:31 am

≫ Next: Persistent VisibleDeprecationWarning: zmq.eventloop.minitornado is deprecated in pyzmq 14.0 and will be removed

≪ Previous: Costumize width of bar plot in likert plot

Problem

I've been using a tidy wrapper for the drc package—tidydrc— to build growth curves which produces a tidy version of the normal output (best for ggplot). However, due to the inherit nesting of the models, I can't run simple drc functions since the models are nested inside a dataframe. I've attached code that mirrors drc and tidydrc package below.

Ideal Result (works with drc)

library(tidydrc) # To load the Puromycin data
library(drc)

model_1 <- drm(rate ~ conc, state, data = Puromycin, fct = MM.3())
summary(model_1)
mselect(model_1, list(LL.3(), LL.5(), W1.3(), W1.4(), W2.4(), baro5()))

> summary(model_1)

Model fitted: Shifted Michaelis-Menten (3 parms)

Parameter estimates:

              Estimate Std. Error t-value   p-value    
c:treated    31.678738  11.446785  2.7675 0.0131759 *  
c:untreated  37.856744  11.289382  3.3533 0.0037700 ** 
d:treated   221.652424   7.384952 30.0141 4.033e-16 ***
d:untreated 172.512640   8.898151 19.3875 4.973e-13 ***
e:treated     0.104603   0.023678  4.4177 0.0003766 ***
e:untreated   0.108644   0.036529  2.9742 0.0085099 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error:

 8.39933 (17 degrees of freedom)
> mselect(model_1, list(LL.3(), LL.5(), W1.3(), W1.4(), W2.4(), baro5()))
         logLik       IC Lack of fit  Res var
MM.3  -78.10685 170.2137   0.9779485 70.54874
LL.3  -78.52648 171.0530   0.9491058 73.17059
W1.3  -79.22592 172.4518   0.8763679 77.75903
W2.4  -77.87330 173.7466   0.9315559 78.34783
W1.4  -78.16193 174.3239   0.8862192 80.33907
LL.5  -77.53835 177.0767   0.7936113 87.80627
baro5 -78.00206 178.0041   0.6357592 91.41919
Warning message:

Not Working Example with tidydrc

library(tidyverse) # tidydrc utilizes tidyverse functions

model_2 <- tidydrc_model(data = Puromycin, conc, rate, state, model = MM.3())
summary(model_2)

Error: summary.vctrs_list_of() not implemented.

Now, I can manually tease apart the list of models in the dataframe model_2 but can't seem to figure out the correct apply statements (it's a mess) to get this working.

Progress Thus Far

These both produce the same error, so at least I've subsetted a level down but now I'm stuck and pretty sure this is not the ideal solution.

mselect(model_2$drmod, list(LL.3(), LL.5(), W1.3(), W1.4(), W2.4(), baro5()))

model_2_sub <- model_2$drmod # Manually subset the drmod column

apply(model_2_sub, 2, mselect(list(LL.3(), LL.5(), W1.3(), W1.4(), W2.4(), baro5())))

Error in UseMethod("logLik") : no applicable method for 'logLik' applied to an object of class "list"

I've even tried the tidyverse function unnest() to no avail

model_2_unnest <- model_2 %>% unnest_longer(drmod, indices_include = FALSE)

↧

Persistent VisibleDeprecationWarning: zmq.eventloop.minitornado is deprecated in pyzmq 14.0 and will be removed

February 14, 2020, 10:41 am

≫ Next: NA vs. computationally singular error in R multiple regression (glm)

≪ Previous: Unnest nested tidydrc models

When starting a Jupyter notebook on MacOS Catalina this persistent error message occurs:

VisibleDeprecationWarning: zmq.eventloop.minitornado is deprecated in pyzmq 14.0 and will be removed.

Previously asked issue 58443893, has never been solved. Various discussion here and here recommend:

conda install jupyter
conda update pyzmq

and the message itself recommends:

Install tornado itself to use zmq with the tornado IOLoop.

All of these have been done, along with updating conda via:

conda update -n base -c defaults conda

Any ideas on how to get rid of this annoying message?

↧

NA vs. computationally singular error in R multiple regression (glm)

February 14, 2020, 10:48 am

≫ Next: Error in method to visualize missing values

≪ Previous: Persistent VisibleDeprecationWarning: zmq.eventloop.minitornado is deprecated in pyzmq 14.0 and will be removed

Sometimes when I use glm to construct a multiple regression model, it returns a model with coefficients for all except one (or some small subset) of variables which are listed as NA. Presumably, the NAs are due to these variables being covariates of some other variable or linear combinations thereof.

On other occasions, I simplet get an

Error in solve.default(hessian, gradient, tol = <some number << 1 >), 
system is computationally singular: 
reciprocal conditiona number = < some number <<1 >

Presumably this singularity error results from the same thing that gives me NA values for other models (colinearity).

My question is why R is able to generate a model by "isolating" the problem variables in some cases (as NAs) but not others. In those cases where I just get the singularity error, is there a function I can use that provides a systematic way of identifying which subset of variables are causing the singularity?

↧