How can I fix coefplot() error "need finite 'ylim' values"

July 5, 2024, 3:03 am

≫ Next: Can we retrieve entries from a tibble using index matrix?

≪ Previous: DBI dbExecute not working with Oracle Database [duplicate]

I'm not 100% sure this is a coding issue, let me know.

I'm trying to plot the coefficients of this DiD model using fake data, but I'm not savvy enough to know if it's a problem coefplot() or if it's that my model, feols(...), is misspecified.

Any ideas?

EDIT: I did notice just now that the y values are NaN in feols(...) call, maybe that is the issue? The estimates seem good, so I'm still not sure what this implies.

library(fixest)library(dplyr)# Create fake datafake_data <- tibble(  state = c(rep("c",10), rep("t",10)),  time = c(1:10,1:10),  treat = c(rep(0,15), 1,1,1,1,1),  y = c(seq(9,36,3), seq(11,23,3), 28,31,34,37,40)  ) |>   mutate(state = factor(state, levels = c("t","c"))) |>   mutate(y = c(seq(9,36,3), seq(11,23,3), 28,31+2,34+4,37+6,40+8))

Data looks like:

# A tibble: 20 × 4   state  time treat     y<fct> <int> <dbl> <dbl> 1 c         1     0     9 2 c         2     0    12 3 c         3     0    15 4 c         4     0    18 5 c         5     0    21 6 c         6     0    24 7 c         7     0    27 8 c         8     0    30 9 c         9     0    3310 c        10     0    3611 t         1     0    1112 t         2     0    1413 t         3     0    1714 t         4     0    2015 t         5     0    2316 t         6     1    2817 t         7     1    3318 t         8     1    3819 t         9     1    4320 t        10     1    48

Model seems to work:

feols(y ~ i(time, state, ref = 5) |         state + time,       data = data_mult_t_dyn) |>    etable(vcov = "iid")                       feols(y ~ i(t..Dependent Var.:                     ytime = 1 x state = t  -1.95e-14 (NaN)time = 2 x state = t  -1.33e-14 (NaN)time = 3 x state = t   -1.6e-14 (NaN)time = 4 x state = t   -1.6e-14 (NaN)time = 6 x state = t      2.000 (NaN)time = 7 x state = t      4.000 (NaN)time = 8 x state = t      6.000 (NaN)time = 9 x state = t      8.000 (NaN)time = 10 x state = t     10.00 (NaN)Fixed-Effects:        ---------------state                             Yestime                              Yes_____________________ _______________S.E. type             NA (not-avail.)Observations                       20R2                                  1Within R2                           1---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

I can't plot the coefficients:

test <- feols(y ~ i(time, state, ref = 5) |                 state + time,               data = data_mult_t_dyn) coefplot(summary(test))coefplot(test)

Output:

Error in plot.window(...) : need finite 'ylim' values

↧

Can we retrieve entries from a tibble using index matrix?

July 5, 2024, 3:07 am

≫ Next: Merge rows of data frame that satisfy condition (belong to a group) by column (id)

≪ Previous: How can I fix coefplot() error "need finite 'ylim' values"

I have a matrix where the row and column indices are stored that I want to retrieve from a dataset. With a data.frame this works fine:

set.seed(1)df <- data.frame(a= letters[1:10], b= LETTERS[1:10])sm <- matrix(c(sample(1:10, 3, replace= TRUE), sample(1:2, 3, replace= TRUE)), ncol= 2)df[sm][1] "i" "D" "g"

But using the same indices matrix to get entries from a tibble fails:

df_tibble <- tibble::as_tibble(df)df_tibble[sm]

This returns the error Error in df_tibble[sm]: ! Subscript sm is a matrix, it must be of type logical. Run rlang::last_trace() to see where the error occurred.

How to use sm to get the entries df[sm] from df_tibble? All I came up so far is as.data.frame(df_tibble)[sm]. So does it mean we can not subset a tibble with a indices matrix but have to transfer to a data.frame first?

↧

Merge rows of data frame that satisfy condition (belong to a group) by column (id)

July 5, 2024, 3:12 am

≫ Next: Collapse RMarkdown code but display chunk label/description?

≪ Previous: Can we retrieve entries from a tibble using index matrix?

I've got a 300x5 data frame that looks like this (3 groups, 100 rows per group, ids go from 01 to 100 per group, same id means same feat2 and feat3 but always different num_to_sum):

factor/group	id	num_to_sum	feat2	feat3
group1	01	4	...	...
group1	02	9	...	...
group2	01	3	...	...
group2	02	1	...	...
group3	01	4	...	...
group3	02	8	...	...

Now, I'd like to merge all rows by id but only if they belong to a certain group. For example: I want to merge group1 and group2 by id and apply the sum operator to num_to_sum, so the final data frame (output) shall look like this:

factor/group	id	summed_num	feat2	feat3
group12	01	(4+3=)7	...	...
group12	02	(9+1=)10	...	...
group3	01	4	...	...
group3	02	8	...	...

I don't mind about the other features (feat2 and feat3); they are actually the same for each id regardless of the group, so I could drop them, I just don't want them duplicated in the final df as that would mean duplicated data and that's messy.

It'd be great if there is a solution that can be applied to not just two groups, but to n groups that shall be merged.

I've looked at stats::aggregate, but I can't figure out how to aggregate with a condition.

↧

Collapse RMarkdown code but display chunk label/description?

July 5, 2024, 3:18 am

≫ Next: How to customize Hasse diagram in R?

≪ Previous: Merge rows of data frame that satisfy condition (belong to a group) by column (id)

I'm using RMarkdown's code-folding to hide code in an HTML document with this in the YAML header:

---output:   html_document:    code_folding: hide---

But in the rendered document it's impossible to know what each chunk is doing without unfolding it, or putting a text description before/after it.

Is there a way to change the appearance of a folded chunk so it displays the chunk label (or a short description) of each chunk next to the "Show" button, preferably in a way that distinguishes it from regular markdown text? Ideally I'm hoping to have a chunk like this:

#| fold-label: "Import and cleanup data"#| warning: FALSElibrary(tidyverse)mydata <- read.csv("path/mydata.csv", headers = T)mydata <- mydata %>%  mutate(cleaned = T)

Look like this when folded (but keeping the Show button):

# Import and cleanup data

↧

How to customize Hasse diagram in R?

July 5, 2024, 4:01 am

≫ Next: Generate paragraphs inside a loop in Quarto

≪ Previous: Collapse RMarkdown code but display chunk label/description?

Working on a Partial Order Hasse plot for a research paper. Looking to customize the output of hasse() function from the hasseDiagram package but not sure how. I have taken a look under the hood, but it is unclear where I might be able to set parameters to drive:

Label Colors
Arrow Sizes and Shapes

Any ideas?

Example:

library("hasseDiagram")test_data <- generateRandomData(20, 3, 0.5)hasse(test_data)

↧

Generate paragraphs inside a loop in Quarto

July 5, 2024, 4:17 am

≫ Next: Understanding lmer-output in time-lagged segment analysis

≪ Previous: How to customize Hasse diagram in R?

I want to generate with R / Quarto a Word document (docx) in which, inside an item, I have sub-items created by looping. I’m using R version 4.2.2, R Studio 2022.12.0 Build 353, Quarto 1.2.280 and Ubuntu 20.04.

For example, after an introductory text given an overview on a topic, I want to produce sub-items with details of each item.

Without a looping the code would be:

---title: "Data by County"format:  docx:    number-sections: true---```{r}#| echo=FALSE,#| include=FALSEdat.county <- data.frame(  county = LETTERS[1:5],  pop_num = round(runif(5,100,500)),  gdp = runif(5,1000,5000))```# Identifying county characteristicsA total of `r nrow(dat.county)` counties, with a total population of `r sum(dat.county$pop_num)` thousand people were characterized as follows:## County `r dat.county[1,1]`County `r dat.county[1,1]` has a population of `r dat.county[1,2]` thousand people with a real gross domestic product of `r dat.county[1,3]`.## County `r dat.county[2,1]`County `r dat.county[2,1]` has a population of `r dat.county[2,2]` thousand people with a real gross domestic product of `r dat.county[2,3]`.

and so on.

I tried to insert a looping like the one below, but it didn't work. "##" are not recognized as a header. Also I had problems with line breaks ans paragraphs. Finally, the code using cat is not as elegant as the text above.

```{r}#| echo=FALSE  for (i in 1:nrow(dat.county)) {  cat("## County",dat.county[i,1],"\n")  cat("County ",dat.county[i,1]," has a population of ",dat.county[i,2]," thousand people with a real gross domestic product of ",dat.county[i,3],"\n")  }```

My question is, how can I generate some thing like

## County `r dat.county[i,1]`County `r dat.county[i,1]` has a population of `r dat.county[i,2]` thousand people with a real gross domestic product of `r dat.county[i,3]`

inside a looping?

↧

Understanding lmer-output in time-lagged segment analysis

July 5, 2024, 4:19 am

≫ Next: Percentage Change over multiple columns in R

≪ Previous: Generate paragraphs inside a loop in Quarto

When running time-lagged segment analyses in R the outcome variable is significantly predicted by the predictor variable at t-1 but not by the predictor variable at t, which seems not logical to me. Actually, the predictor variable at t-1 is only added to the model to be controlled for.

In more detail:

I am currently analysing data from an ecologically momentary assessment (EMA) study, delivering a huge amount of data entries for each subject. In the study we had two conditions (Variable StudyPhase).I'm interested in the temporal dynamics of changes in all my dependent variables and therefore running so-called time-lagged segment analyses in R using the following code:

Outcome_t+1 ~ Outcome_t + Predictor_t * StudyPhase + Predictor_t-1 + (1 | VP_ID)

Like that I am controlling for the preliminary data entry of the outcome as well as the predictor variable, respectively, while still investigating possible time-lagged influences of the predictor to the outcome variable.

However, for every model, the Predictor_t-1 gets highly significant, whereas the Predictor_t variable often doesn't reach significance. That confuses me as it seems not logical that the predictor significantly predicts the outcome over two assessment points but mostly not over one. Or is this significance due to the fact that I have both, the Predictor_t and the Predictor_t-1 as variables in my model? In that case I probably could "ignore" the Predictor_t-1 variable in the interpretation.

Thanks in advance for your help! :)

↧

Percentage Change over multiple columns in R

July 5, 2024, 4:21 am

≫ Next: ggsurvplot - risk table fontsize

≪ Previous: Understanding lmer-output in time-lagged segment analysis

I have an R dataframe where I'm trying to calculate % change across a number of columns, yet I can't seem to work out the correct syntax for it.

Basically I'm trying to calculate % change from a base date for column index 2:5 (column 1 is the Date).

What I've thought would work would be something like:

df <- df %>% mutate(across(2:5, ~ -first(2:5)/2:5))*100

Or something like that, but seemingly that's not correct. Does anyone have any advice?Many thanks

↧

ggsurvplot - risk table fontsize

July 5, 2024, 4:29 am

≫ Next: Customize names of columns created by dcast.data.table

≪ Previous: Percentage Change over multiple columns in R

How to adjust the fontsize of the risk.table?
I want the fontsize of the numbers at risk to be 20.
Not the title of the risk.table.

library(survminer)library(survival)data(lung)fit <- survfit(Surv(time, status) ~ sex, data = lung)p <- ggsurvplot(  fit,    size = 1,    legend.labs = c("A", "B"),  ylim = c(0, 1),  linetype = "strata",   break.time.by = 365,   palette = c("#E7B800", "#2E9FDF"),   risk.table = TRUE,  risk.table.title = "No. at risk",  risk.table.height = 0.2,  tables.theme = theme_cleantable() +    theme(plot.title = element_text(size = 40)))p$plot <- p$plot +  theme(    text = element_text(size = 20),      axis.text.y = element_text(size = 20),    axis.text.x = element_text(size = 20),    axis.text = element_text(size = 20),     legend.text = element_text(size = 20)    )

I am aware of the risk.table.fontsize = argument but when I specify 20 it looks like this:

Using risk.table.fontsize = 7 looks somewhat the same. But this is an odd solution via eyeballing.

↧

Customize names of columns created by dcast.data.table

July 5, 2024, 4:43 am

≫ Next: How can I align a logo in the navbar header of an R shiny app created using bslib?

≪ Previous: ggsurvplot - risk table fontsize

I am new to reshape2 and data.table and trying to learn the syntax.

I have a data.table that I want to cast from multiple rows per grouping variable(s) to one row per grouping variable(s). For simplicity, let's make it a table of customers, some of whom share addresses.

library(data.table)# Input table:cust <- data.table(name=c("Betty","Joe","Frank","Wendy","Sally"),                   address=c(rep("123 Sunny Rd",2),                              rep("456 Cloudy Ln",2),"789 Windy Dr"))

I want the output to have the following format:

# Desired output looks like this:(out <- data.table(address=c("123 Sunny Rd","456 Cloudy Ln","789 Windy Dr"),                   cust_1=c("Betty","Frank","Sally"),                   cust_2=c("Joe","Wendy",NA)) )#          address cust_1 cust_2# 1:  123 Sunny Rd  Betty    Joe# 2: 456 Cloudy Ln  Frank  Wendy# 3:  789 Windy Dr  Sally     NA

I would like columns for cust_1...cust_n where n is the max customers per address. I don't really care about the order--whether Joe is cust_1 and Betty is cust_2 or vice versa.

↧

How can I align a logo in the navbar header of an R shiny app created using bslib?

July 5, 2024, 5:01 am

≫ Next: sourceCpp error: G__~1.EXE: error: unrecognized command line option '-std=gnu++17'

≪ Previous: Customize names of columns created by dcast.data.table

I am trying to add my company's logo to an R shiny app, made using {bslib}. I tried different ways of adding this image to the 'title' argument of 'page_navbar'. While the image gets added, it looks wonky and changes the position of the other items in the header ribbon. An example image and the logo attached.

Here is a demo code that illustrates the problem:

library(shiny)library(bslib)ui <- page_navbar(  title = div("My app",              img(src = "WCTMainLogoWhite_edited.png", height = "57.5px", width = "auto",                   style = "position: absolute;                           top: 1px;                           right: 2%;")),  theme = bs_theme(version = 5, bootswatch = "zephyr")|> ##setting the primary color of "zephyr" bootswatch theme manually    bslib::bs_add_rules(      rules = "                    .navbar.navbar-default {                        background-color: $primary !important;                    }"    ),  nav_panel(title = "Trends",            layout_columns(              card(                full_screen = TRUE,                card_header("Card 1")                )),              layout_columns(                card(                  full_screen = TRUE,                  card_header("Card 2")),                card(                  full_screen = TRUE,                  card_header("Card 3")),                col_widths = c(12, 12)               )            ),  nav_panel(title = "Instructions on use", p("Content to be added")))server <- function(input, output, session) {}shinyApp(ui, server)

Is there a better way to add the image, that will align with the other items in the header?

↧

sourceCpp error: G__~1.EXE: error: unrecognized command line option '-std=gnu++17'

July 5, 2024, 5:09 am

≫ Next: LU decomposition on a sparse rectangular matrix in R

≪ Previous: How can I align a logo in the navbar header of an R shiny app created using bslib?

I switched from R-3.6.3 to R-4.3.1 and R-4.4.1.

I tried to sourceCpp() some *.cpp which worked fine for R-3.6.3, but not for R-4.3.1 and R-4.4.1. The error is copied below.

Rcpp::evalCpp("2 + 2") is also not working for the newer R versions and gives the same error.

G__~1.EXE: error: unrecognized command line option '-std=gnu++17'using C++ compiler: 'G__~1.EXE (x86_64-posix-seh, Built by MinGW-W64project) 4.9.3' g++ -std=gnu++17
-I"C:/Users/xxxx/AppData/Local/Programs/R/R-44~1.1/include" -DNDEBUG
-I"C:/Users/xxx/AppData/Local/Programs/R/R-4.4.1/library/Rcpp/include"
-I"C:/Users/xxx/AppData/Local/Temp/13/RtmpeWdnmC/sourceCpp-x86_64-w64-mingw32-1.0.12"
-I"c:/rtools44/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c filead48216f63d.cpp -o filead48216f63d.o
g++.exe: error: unrecognized command line option '-std=gnu++17' make:*** [C:/Users/xxx/AppData/Local/Programs/R/R-44~1.1/etc/x64/Makeconf:296:filead48216f63d.o] Error 1
Error in sourceCpp(code = code, env = env, rebuild = rebuild, cacheDir= cacheDir, : Error 1 occurred building shared library

My version R version 4.4.1:

other attached packages:
[1] Rcpp_1.0.12
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 zip_2.3.1 cli_3.6.3
rlang_1.1.4 rematch2_2.1.2 stringi_1.8.4
forcats_1.0.0 generics_0.1.3 glue_1.7.0
[10] colorspace_2.1-0 scales_1.3.0 fansi_1.0.6
RcppProgress_0.4.2 munsell_0.5.1 tibble_3.2.1
openxlsx_4.2.5.2 lifecycle_1.0.4 compiler_4.4.1
[19] dplyr_1.1.4 pkgconfig_2.0.3
RcppEigen_0.3.4.0.0 rstudioapi_0.16.0 paletteer_1.6.0
R6_2.5.1 tidyselect_1.2.1 utf8_1.2.4
pillar_1.9.0 [28] parallel_4.4.1 magrittr_2.0.3tools_4.4.1 RcppArmadillo_0.12.8.4.0

Someone has a clue?

Thanks

↧

LU decomposition on a sparse rectangular matrix in R

July 5, 2024, 5:20 am

≫ Next: Delete a single value in dataframe based on particular date and time

≪ Previous: sourceCpp error: G__~1.EXE: error: unrecognized command line option '-std=gnu++17'

MATLAB is able to perform LU decomposition on a sparse rectangular matrix using [L, U, P, Q] = lu(A) but there is no R package to do so yet.Trying to use Matrix::lu() on a sparse matrix in R returns an error complaining that the matrix must be square.

Is there any analogue of MATLAB's LU decomposition on a rectangular matrix that is not full rank in R? To clarify, this is not about memory issues - if I embed it in an identity matrix (see this question) then Matrix::lu() complains about the matrix being singular, while MATLAB's lu() happily proceeds.

The problem I'm trying to solve is to implement a particular econometric estimator in R that needs the Moore-Penrose pseudo inverse.pracma's Moore-Penrose pseudo inverse fails because the matrix is too large.Getting a Moore-Penrose pseudo inverse of a large sparse matrix is also unsolved apparently, see here.My matrix has more than one 1 in any given row and column.

↧

Delete a single value in dataframe based on particular date and time

July 5, 2024, 5:22 am

≫ Next: How do I create a binary dataset in R with several constraints on the proportion of 1's in R

≪ Previous: LU decomposition on a sparse rectangular matrix in R

I have a dataframe and I want to delete a single value based on the date and time, I want to delete the value for Variable 2 on the 2020-06-15 14:00:00 with an option to replace the value 780.45 with NA, or to leave it blank. I can find answers where the row is deleted based on the datetime but struggling to find where a single value is deleted.

df2 = structure(list(DateTime = structure(c(1592226000, 1592226900, 1592227800, 1592228700, 1592229600, 1592230500), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Variable1 = c(NA, 0.385999999999999, 0.193, 0.290000000000001, 0.385, 0.576000000000001), Variable2 = c(NA, 1005.87, 999.05, 1005.32, 780.45, 1100.44)), row.names = c(NA, 6L), class = "data.frame")

Created on 2024-07-05 with reprex v2.1.0

↧

How do I create a binary dataset in R with several constraints on the proportion of 1's in R

July 5, 2024, 5:25 am

≫ Next: Creating geom_violin plot with pre-created density values

≪ Previous: Delete a single value in dataframe based on particular date and time

I'm looking at getting a synthetic dataset of eligibility for two treatments, A and B, with constraints. The constraints are 60% 1's for A, 80% 1's for B, 25% 1's for both, and 0% 0's for both. I've not really made any progress as I've only got the following for the columns individually, nothing yet for the joint constraints.

Any help would be really appreciated

# Define the size of the datasetn <- 1000  # Number of observations# Generate treatment Atreatment_A <- rep(0, n)treatment_A[1:(0.6*n)] <- 1# Generate treatment Btreatment_B <- rep(0, n)treatment_B[1:(0.8*n)] <- 1

Example of desired output with n=10

dat <- data.frame(A=c(1,1,1,1,1,1,0,0,0,0),                  B=c(0,0,0,1,1,1,1,1,1,1))

Which has 60% A, 70% B, 30% A and B, and 0% not A and B. Not exactly right, but close.

↧

Creating geom_violin plot with pre-created density values

July 5, 2024, 5:32 am

≫ Next: Adding a geom_sf to a long-lat plot with gratia package in R

≪ Previous: How do I create a binary dataset in R with several constraints on the proportion of 1's in R

I have created some weighted Kernal density estimates across different factor levels which I don't think can be incorporated within geom_violin plot estimates. I was wondering if there's a way geom_violin or other ggplot2 functions could use raw data to create violin plots rather than the built-in density calculations? Any help would be much appreciated. Some example code, where I would want to create a violin plot based of the variation in y values across the spread of x values...

###Create dataSurge_vs_Plummet_Stats_df <- data.frame(Type = rep(c("Surge", "Plummet"), each = 50),site_no = sample(1:10, 100, replace = TRUE),Mean = c(rnorm(50, mean = 5, sd = 2), rnorm(50, mean = 3, sd = 1)))###Calculate weightsstation_counts <- table(Surge_vs_Plummet_Stats_df$site_no)Surge_vs_Plummet_Stats_df$Weights <- 1 / station_counts[Surge_vs_Plummet_Stats_df$site_no]Surge_vs_Plummet_Stats_df$Weights <- Surge_vs_Plummet_Stats_df$Weights / sum(Surge_vs_Plummet_Stats_df$Weights)##Normalize (sum to 1)###Identify bandwidthbw <- bw.nrd(Surge_vs_Plummet_Stats_df$Mean)##Not 100% sure its doing much###Now separate the dfs run KDEsSurge_vs_Plummet_Stats_Surge <- Surge_vs_Plummet_Stats_df%>%filter(Type == "Surge")%>%mutate(Weights = Weights / sum(Weights))Surge_kde <- density(Surge_vs_Plummet_Stats_Surge$Mean, weights = Surge_vs_Plummet_Stats_Surge$Weights,bw = bw,                 from=min(Surge_vs_Plummet_Stats_df$Mean), to=max(Surge_vs_Plummet_Stats_df$Mean))##Delib the full df and not just surges#Surge_vs_Plummet_Stats_Plummet <- Surge_vs_Plummet_Stats_df%>%filter(Type == "Plummet")%>%mutate(Weights = Weights / sum(Weights))Plummet_kde <- density(Surge_vs_Plummet_Stats_Plummet$Mean, weights = Surge_vs_Plummet_Stats_Plummet$Weights,bw = bw,                   from=min(Surge_vs_Plummet_Stats_df$Mean), to=max(Surge_vs_Plummet_Stats_df$Mean))##Delib the full df and not just surges##Mean_Kernel_df <- data.frame(x = c(Surge_kde$x,Plummet_kde$x),                         y = c(Surge_kde$y,Plummet_kde$y),                         Type = c(rep("Surge",times=length(Surge_kde$x)),                                  rep("Plummet",times=length(Surge_kde$y))))

↧

Adding a geom_sf to a long-lat plot with gratia package in R

July 5, 2024, 5:40 am

≫ Next: ggsurvplot - adjust thickness of strata indicators

≪ Previous: Creating geom_violin plot with pre-created density values

Using the draw funtction of gratia with a model that contains a smooth s(longitude, latitude) will plot a long-lat plot and effect contours. That's very nice!

I want to add a country shape to the plot

library(giscoR)vatican <- gisco_get_countries(resolution = "10", country = "VAT") %>%  mutate(res = "10M")

Plotting the shape with ggplot works

ggplot() +  geom_sf(data=vatican)

but with gratia not so much.

draw(model,     select="s(longitude,latitude)") +  geom_sf(data=vatican)

I get the error message

Coordinate system already present. Adding new coordinate system, which will replace the existing one.Error in `geom_sf()`:! Problem while computing aesthetics.ℹ Error occurred in the 5th layer.Caused by error in `.data[["longitude"]]`:! Column `longitude` not found in `.data`.

I'd apreciate any help how to solve this!

↧

ggsurvplot - adjust thickness of strata indicators

July 5, 2024, 5:50 am

≫ Next: Using the

≪ Previous: Adding a geom_sf to a long-lat plot with gratia package in R

Is it possible to adjust the thickness of the strata indicators?
The indicators in the legend are not as thick as in the risk.table.

library(survminer)library(survival)fit <- survfit(Surv(time, status) ~ sex, data = lung)p <- ggsurvplot(  fit,  size = 1,  legend.labs = c("A", "B"),  linetype = "solid",  break.time.by = 365,  palette = c("#E7B800", "#2E9FDF"),  risk.table = TRUE,  censor = FALSE,  risk.table.title = "No. at risk",  tables.y.text = FALSE,  legend.title = "",  tables.theme = theme_cleantable() +    theme(plot.title = element_text(size = 20)))p

The red arrow indicates the discrepancies in line thickness.

I tried to adjust but I am not aware how to find out which argument is related to the indicators. If there is one at all.

  tables.theme = theme_cleantable() +    theme(plot.title = element_text(size = 20),          strata.line.x = element_line(size = 5.5))

↧

Using the

July 5, 2024, 6:01 am

≫ Next: Why is my data from SQLite importing incorrectly into R?

≪ Previous: ggsurvplot - adjust thickness of strata indicators

I am using mclapply to run a several linear regressions. Each child process from mclapply takes a dataset from the global environment, subsets that data, and then performs the regression on that subset of data.

I am using glmer(), part of the lme4 package, and I am also using a function called allFit(), where I can use several different optimizers to achieve convergence of the model.

My data is subset within the function and then, in the same function, I run the regression analysis with allFit().

The problem is that allFit()doesn't work well with mclapply. For some reason, even if I explicitly pass my data to glmer(), allFit() and glmer() can't locate the subsetted data. The only way I get get allFit() to work is to use the <<- operator to assign my data to the global environment. However, my worry is that perhaps allFit() could pass the wrong set of data to the model, since <<- assigns the data to the global environment.

I know what the environment for each child process of mclapply is cloned into a new environment, but I'm not sure if something assigned via the <<- operator stays in the child environment, or is it sent back to the parent environment? Does anyone know about this?

Thank you.

Here's an example of how my code looks:

run_regressions <- function(species, phenotype, data){  # subset data  data_to_model <-        data %>%        select(all_of(species), all_of(phenotype), Timepoint_categorical, dna_conc, NEXT_ID, BATCH_NUMBER, clean_reads_FQ_1) %>%        filter(!if_any(everything(), is.na))# assign data to global environment  data_to_model <<- data_to_model# Run Model 1  m1 <- list()  diff_optims <- lme4::allFit(glmer(as.formula(paste(species, "~", phenotype, "+ Timepoint_categorical + dna_conc + clean_reads_FQ_1 + BATCH_NUMBER + (1|NEXT_ID)")), # try all optimizers                              data = data_to_model,                              family = binomial,                              control=glmerControl(optCtrl=list(maxfun=10000))),                              verbose=TRUE)  diff_optims_OK <- diff_optims[sapply(diff_optims, is, "merMod")] # is(), a function to see if an object inherits from a specific class, in this case"merMod".  lapply(diff_optims_OK, function(x) x@optinfo$conv$lme4$messages)  convergence_results <- lapply(diff_optims_OK, function(x) x@optinfo$conv$lme4$messages)  working_indices <- sapply(convergence_results, is.null)  if(sum(working_indices) == 0){    print("No algorithms from allFit converged. You may still be able to use the results, but proceed with extreme caution.")    m1$model <- NULL    m1$success <- FALSE  } else {    m1$model <- diff_optims[working_indices][[1]] # take the first fit    m1$success <- TRUE  }  # Run Model 2  m2 <- list()  diff_optims <- lme4::allFit(glmer(as.formula(paste(species, "~ + Timepoint_categorical + dna_conc + clean_reads_FQ_1 + BATCH_NUMBER + (1|NEXT_ID)")),                              data = data_to_model,                              family = binomial,                              control=glmerControl(optCtrl=list(maxfun=10000))),                              verbose=TRUE)  diff_optims_OK <- diff_optims[sapply(diff_optims, is, "merMod")] # is(), a function to see if an object inherits from a specific class, in this case"merMod".  convergence_results <- lapply(diff_optims_OK, function(x) x@optinfo$conv$lme4$messages)  working_indices <- sapply(convergence_results, is.null)  if(sum(working_indices) == 0){    print("No algorithms from allFit converged. You may still be able to use the results, but proceed with extreme caution.")    m2$model <- NULL    m2$success <- FALSE  } else {    m2$model <- diff_optims[working_indices][[1]] # take the first fit    m2$success <- TRUE  }  model_results = list(m1=m1, m2=m2, model_comparison=model_comparison, model_comparison_tidied=model_comparison_tidied)}# create combinations of species and phenotypes to feed to mclapplycombinations <- expand.grid(species = species_of_interest, phenotype = phenotypes_to_test)# Initialize a list to store resultsresults_list <- list()# execute in parallel with mclapplyresults_list <-  mclapply(seq_len(nrow(combinations)), function(..i) {    species <- combinations[..i,"species"]    phenotype <- combinations[..i,"phenotype"]    tryCatch({      run_log_mod(species = species, phenotype = phenotype, data = data)    }, error = function(e) {      message(paste("Error in iteration", ..i, ":", e$message))      return(NULL)  # or any other default value you want to return in case of an error    })  }, mc.cores = 6)

↧

Why is my data from SQLite importing incorrectly into R?

July 5, 2024, 6:07 am

≫ Next: error when ploting the original date with the predictions of the best models using curve()

≪ Previous: Using the

The data I am having trouble importing from sqlite are in four columns:

PeakAwy_LPeakAwy_RPeakTwd_LPeakTwd_R

I have the following R code to pull in those columns into R:

data_raw <- query_tbl(table = "isok") %>%select(PeakAwy_L,PeakAwy_R,PeakTwd_L,PeakTwd_R)

However when I view the first few rows in R, the data are skewed 1 column down from the column they are supposed to be in. For example, the data that should appear in PeakAwy_L is in PeakAwy_R. This trend continues ultimately excluding the first column of data I want entirely and assigning the data from the next column, (that is not even mentioned in the R code), in sqlite to PeakTwd_R.

Does anyone know why this is happening?

I have ensured that the columns are named the same between R and in the database structure for columns in sqlite.

I have also ensured that the data types for all four columns are the same, NUMERIC, in sqlite.

↧