How to replicate a vector in R

February 21, 2020, 6:20 am

≫ Next: Updating values of three dot ellipsis in R

≪ Previous: Newbie Q- Will this R code count how many survey submissions had values for all 4 of these variables?

I need to replicate the vector in such a way that the numbers change because currently I only replicate the same numbers.

example:

> rep(c(sample(c(1:100),5, replace = T),sample(NA ,5, replace = T)), 2)
 [1] 33 91 48 18 29 NA NA NA NA NA 33 91 48 18 29 NA NA NA NA NA

I would like 
[1] 33 91 48 18 29 NA NA NA NA NA 23 45 27 67 55 NA NA NA NA NA

↧

Updating values of three dot ellipsis in R

February 21, 2020, 6:23 am

≫ Next: Billboarder.R, grouped linechart with more than one variable to plot

≪ Previous: How to replicate a vector in R

I have a function foo() that I'd like to be able to call in two different "modes": once or within a while loop.

I thought using a somewhat generic wrapper (runtime_gateway()) and passing arguments via ... to foo() would make sense here so I could leverage the same "runtime gateway logic" for arbitrary functions with different sets of arguments.

If run foo() within a while loop, I'd like to update some of its arguments while keeping the default or passed values of other arguments.

How would I do that?

I'm aware of rlang::dot_list(...) and friends and had a quick glance at https://github.com/r-lib/ellipsis. It seems that any of those would only let pluck values from or inspect ellipsis content, but I don't see how I could "update it in transit".

Reprex

foo <- function(
  id = "id_a",
  at = Sys.time()
) {
  message(stringr::str_glue("{id}: {at}"))
  Sys.sleep(1)
}

runtime_gateway <- function(
  fun = foo,
  run_mode = c("once", "while"),
  ... # Args to be passed to `fun`
) {
  run_mode <- match.arg(run_mode)

  if (run_mode == "once") {
    fun(...)
  } else if (run_mode == "while") {
    counter <- 0

    while(counter < 3) {
      # Goal: keep ellipsis value for `id` but *update* value for `at`
      dots <- rlang::dots_list(...)
      at <- if ("at" %in% names(dots)) {
        message("`at` was passed via ellipsis:")
        message(dots$at)
        dots$at
      } else {
        Sys.time()
      }

      fun(at = at + 60, ...)
      counter <- counter + 1
    }
  }
}

runtime_gateway()
#> id_a: 2020-02-21 14:09:16.779
runtime_gateway(at = lubridate::ymd_hms("2020-02-21 10:30:00"))
#> id_a: 2020-02-21 10:30:00

runtime_gateway(run_mode = "while")
#> id_a: 2020-02-21 14:10:18.897
#> id_a: 2020-02-21 14:10:19.900
#> id_a: 2020-02-21 14:10:20.902
runtime_gateway(run_mode = "while", id = "id_b")
#> id_b: 2020-02-21 14:10:21.905
#> id_b: 2020-02-21 14:10:22.906
#> id_b: 2020-02-21 14:10:23.908
runtime_gateway(run_mode = "while", at = lubridate::ymd_hms("2020-02-21 10:30:00"))
#> `at` was passed via ellipsis:
#> 2020-02-21 10:30:00
#> Error in fun(at = at + 60, ...): formal argument "at" matched by multiple actual arguments

Created on 2020-02-21 by the reprex package (v0.3.0)

↧

Billboarder.R, grouped linechart with more than one variable to plot

February 21, 2020, 6:26 am

≫ Next: Creating new column based on repeated consecutive row entries

≪ Previous: Updating values of three dot ellipsis in R

I am trying to map on of my tables. It is plotted over an x-axis that has the weeks of the year (1-53) and a y-axis that is a certain percentage (0-100). In this plot I try to make two lines, one for the variable "Task" and one for the variable "Area". However as the x-axis only goes to one year I also want a new line for every year.

My data looks as follows:

head(dt.Ratio()[Week %in% c(52, 53, 1, 2, 3)])
       year Week  Area  Task
    1: 2019   52 63.68 28.39
    2: 2019   53  3.23  0.00
    3: 2020    1 58.58 25.43
    4: 2020    2 61.54 31.75
    5: 2020    3 52.33 27.10

And the plot is done likes this:

billboarder() %>%
        bb_linechart(dt.Ratio(), show_point = TRUE, type = "area") %>%
        bb_x_axis(label = list(text = "Week", position = "outer-right"),
                  tick = list(culling = list(max = 1))) %>%
        bb_y_axis(label = list(text = "Ratio of hours clocked as task", position = "outer-right")) %>%
        bb_y_grid(show = TRUE) %>%
        bb_colors_manual(opacity = 0.25)

I tried a lot to work with the mapping variable in bb_linechart but I cannot find the right mapping. I can make it work for either Area or Task or without grouping by year but I have not found a solution to include all 4 lines (years 2019 and 2020, variables Task and Area)

↧

Creating new column based on repeated consecutive row entries

February 21, 2020, 6:26 am

≫ Next: Assigning values to a data.table when using dot-dot

≪ Previous: Billboarder.R, grouped linechart with more than one variable to plot

Imagine a snippet of the follow data frame:

       ID        ActivityName     Time         Type    Shape 
1       1             Request    0.000       Type_1      767           
2       1             Request  600.000       Type_1      767           
3       1               Start  600.000       Type_1     1376           
4       1               Start  600.000       Type_1     1376  
5       1               Delay 2962.295       Type_1     1022         
6       1 Schedule Activities  600.000       Type_1       15           
7       1 Schedule Activities 2062.295       Type_1       15

What I'm trying to do is to create two new columns based on the repeating entries in ActivityName.

Specifically, I want to combine two subsequent rows for the same activity into one row with a start and complete timestamp (from Time, in seconds).

Given that not all entries in ActivityName have a matching second entry (however, max two consecutive entries are identical), I would also like to delete such rows, as Delay in this case.

It should then look something like:

       ID        ActivityName  StartTime   EndTime      Type    Shape 
1       1             Request  0.000       600.000      Type_1  767           
2       1               Start  600.000     600.000      Type_1  1375
3       1 Schedule Activities  600.000     2062.295     Type_1  15

All categories in ActivityName occur many times in that column. I hope not to compare their associated Time not across the whole column, only those who have two consecutive identical occurrence.

Any ideas for how to go about this will be highly appreciated.

↧

Assigning values to a data.table when using dot-dot

February 21, 2020, 6:26 am

≫ Next: Is there a better way in R to split a file with multiple sections

≪ Previous: Creating new column based on repeated consecutive row entries

I want to substract a vector from multiple columns from my data.table by name. I use an approach with dot-dot and I cannot seem to wrap my head around why the last assignment does not work, because both expressions work just fine when evaluated alone. I have attached a reproducible example that should make the issue clear.

dt <- data.table("a_x" = rnorm(10),
           "b_x" = rnorm(10),
           "a_y" = rnorm(10),
           "b_y" = rnorm(10),
           "d" = rnorm(10)
           )

XIND <- names(dt) %like% "_x"

MAT <- matrix(dt[,d], nrow = dim(dt[,..XIND])[1], ncol = dim(dt[,..XIND])[2])
dt[,..XIND] <- dt[,..XIND] - MAT

↧

Is there a better way in R to split a file with multiple sections

February 21, 2020, 6:26 am

≫ Next: How can I adjust the graph generated in r to see my data in a better way

≪ Previous: Assigning values to a data.table when using dot-dot

I am reading in a CSV file into R that contains multiple sections, the sections contain different headers, number of rows and columns. Example table below, I need to separate them into different data frames so I can process them based on the section type.

The number of sections can change and so far I have only figured out how to hard code them and use grep to split the different sections into different data frames.

The sections are all in the same format ==XY== where X is a letter and Y is a number

Is there a better way to split the data frame into a different data frame for each section no matter how many sections there are?

z1 <- structure(list(V1 = c("==C5===", "H1", "1", "3", "8", "==E5===", 
                            "H1", "10", "2", "==G6===", "H1", "5", "==H4===", "H1", "1", 
                            "==H6===", "H1", "10"), V2 = c("", "H2", "9", "8", "1", "", "H2", 
                                                           "4", "2", "", "", "", "", "H2", "8", "", "", ""), V3 = c("", 
                                                                                                                    "H3", "2", "5", "6", "", "", "", "", "", "", "", "", "", "", 
                                                                                                                    "", "", "")), class = "data.frame", row.names = c(NA, -18L))
DF1 <- z1[grep("==C5", z1$V1):grep("==E5", z1$V1),]
DF2 <- z1[grep("==E5", z1$V1):grep("==G6", z1$V1),]
DF3 <- z1[grep("==G6", z1$V1):grep("==H4", z1$V1),]
DF4 <- z1[grep("==H4", z1$V1):grep("==H6", z1$V1),]
DF5 <- z1[grep("==H6", z1$V1):nrow(z1),]

↧

How can I adjust the graph generated in r to see my data in a better way

February 21, 2020, 6:26 am

≫ Next: Parallelize custom function

≪ Previous: Is there a better way in R to split a file with multiple sections

I am new to r, and I am making an RDA chart with plot of some data obtained with vegan and adespatial packages, the chart that generates me is the following

Graph generated

The graph is correct, but aesthetically it is not good, because the interpretation of it is difficult since you can not even appreciate the names, what I want is to be able to somehow modify the scale in order that the data look more dispersed and thus appreciate the data.

I would like my graphic to look something like this, where you can see at least the initials of the names

Example chart

My code with which I generate the graph is the following

windows(width = 12, height = 10)
par(mar=c(3,3,1,1), mgp=c(2,1,0))
par(cex=0.8, maii=c(0.1,0.1,0.2,0.1))
plot(str.rda, xlab="RDA1 (32.16 %)", ylab="RDA2 (14.46 %)", 
     display=c("cn", "lc", "sp"), type="n", xlim=c(-0.8,0.8), correlation=TRUE)
sites.sc <- scores(str.rda, choices=1:2, scaling=2, display="lc")
points(sites.sc, pch=1, cex=0.5)
text(sites.sc,row.names(sites.sc), cex = 0.6, pos = 4, col = "chocolate1")
va.sc <- scores(str.rda, choices=1:2, scaling=2, display="sp")
sp.names<- c("Americabaetis", "Baetodes", "Camelobaetidius", "Cloeodes", "Nanomis","Varipes","Zelusia","Caenis", "Trichorythodes",  "Lumahyphes","Farrodes","Thraulodes", "Anacroneuria", "Protoptila","Helicopsyche", "Leptonema", "Smicridea", "Alisotrichia", "Celaenotrichia", "Cerasmatrichia", "Hydroptila", "Metrichia", "Neotrichia", "Orthotrichia", "Oxyethira", "Rhyacopsyche", "Chimarra")
text(va.sc[c(1,2,3,4,5,6,7,8,9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,22, 23, 24, 25, 26, 27 ),], 
     sp.names[c(1,2,3,4,5,6,7,8,9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,22, 23, 24, 25, 26, 27 )], cex=0.8, font=3, pos=3,offset=0.1)
env.sc <- scores(str.rda, choices=1:2, scaling=2, display="bp")
arrows(0,0, env.sc[1:3,1], env.sc[1:3,2], lty=1, lwd=1, col="Blue", length=0.1)
env.names <- c("DQO", "DBO", "Turbidez")
text(env.sc[c(1,2,3),], env.names[c(1,2,3)], cex=0.9, font=2, pos=2,offset = 0.1)

I was reading and I found several alternatives such as modifying xlim (lowest value, highest value) and ylim (lowest value, highest value), however the graph keeps coming out in the same way, so I don't know how I can do to improve it Aesthetically, I appreciate your attention and help.

↧

Parallelize custom function

February 21, 2020, 6:32 am

≫ Next: R match ignore case and special characters

≪ Previous: How can I adjust the graph generated in r to see my data in a better way

I've created a loop containing a matrix that works fine. The function tkas a matrix and as imput, calcualte correlation between the amtrix columns and then saves the output in the specified folde.r

mtx <- is a matrix file
out <- output folder
methods <- c("pearson","spearman","kendall")
for(method in methods){
drugCorrelations(methods,mtx,out.folder)
}

But I've not been able to parallelize it, this is my best try

cl <- parallel::makeCluster(3)
doParallel::registerDoParallel(cl)

foreach(i=1:3) %dopar% {
  drugCorrelations(methods[i],mtx,out.folder)
}
parallel::stopCluster(cl)

Any suggestion on how to parelleize this. I would prefer to keep the function as it is and being able to execute this function three times with the different methods.

Thanks

↧

R match ignore case and special characters

February 21, 2020, 6:32 am

≫ Next: Evaluating the resulted simulated data

≪ Previous: Parallelize custom function

I have searched and found similar answers but not exactly what I need.

I want to identify matches in 2 strings, ignoring case and spaces and special characters.

list1 <- c('a', 'b', 'c')
list2 <- c('A', 'B', 'C')
list3 <- c('a-', 'B_', '- c')

All below should give the same output (1 2 3)

match(list1, list1)
match(list1, list2)
match(list1, list3)

I have tried str_detect(list1, regex(list2, ignore_case = TRUE)) but that doesn't give the same type of output (and I don't know how to incorporate the special characters/spaces in there.

↧

Evaluating the resulted simulated data

February 21, 2020, 6:32 am

≫ Next: How to concatenate multiple columns with separators but ignore some of columns based on condition in R?

≪ Previous: R match ignore case and special characters

I am simulating data using the Rejection method where the density function of X is given by f(x)= C * e^(x) for all x in [0,1]. I defined g(x) = 1 and C to be the maximum of f(x) which is equal to 1/(e-1).

I used the following code to simulate data:

rejection <- function(f, C, g, rg, n) {
  naccepts <- 0
  result.sample <- rep(NA, n)

  while (naccepts < n) {
    y <- rg(1)
    u <- runif(1)

    if ( u <= f(y) / (C*g(y)) ) {
      naccepts <- naccepts + 1
      result.sample[naccepts] = y
    }
  }

  result.sample
}

f <- function(x) ifelse(x>=0 & x<=1, (exp(x)), 0)
g <- function(x) 1
rg <- runif
C <-  1/(exp(1) -1)

result <- rejection(f, C, g,rg, 1000)

Then, I use the histogram to compare the simulated data with the curve of original pdf as

hist(result,freq = FALSE)
curve(f, 0, 1, add=TRUE)

But the resulted plot is kind of weired! the plot is here so I am looking for any help to clarify what is wrong in my work.

↧

How to concatenate multiple columns with separators but ignore some of columns based on condition in R?

February 21, 2020, 6:35 am

≫ Next: efficient way of selecting rows with a minimum time spacing between dates while grouping

≪ Previous: Evaluating the resulted simulated data

Hi there i would like to concatenate columns containing strings or blanks or NA's with ";". So lets take example below:


Actor1<- c("Driver","NA","","")
Actor2<- c("President","Zombie","","")
Actor3<- c("CEO","Devil","","")
Actor4<-c("Priest","","Killer","Mayor")

df_ex <-data.frame(Actor1, Actor2, Actor3, Actor4)

i tried this:

df_ex %>%
  mutate(combined= paste0(Actor1,";",Actor2,";",Actor3,";",Actor4))

but obviously the result is wrong, e.g.:

df_ex[3,]

outcome in combined column is this: ;;;Killer

I would expect outcome to be: Killer.

Note: there NA's and blanks "" as well which id like to ignore.

thanks in advance , cheers

↧

efficient way of selecting rows with a minimum time spacing between dates while grouping

February 21, 2020, 9:40 am

≫ Next: Colour stacked bar-chart with unique colour for each bar in ggplot

≪ Previous: How to concatenate multiple columns with separators but ignore some of columns based on condition in R?

I want to select rows of data with dates such that the dates have a minimum time difference of 3 months. Here is an example:

    patient numsermed       date
 1:       1   numser1 2020-01-08
 2:       2   numser2 2015-01-02
 3:       2   numser2 2019-12-12
 4:       2   numser2 2020-01-05
 5:       2   numser2 2020-01-08
 6:       2   numser2 2020-01-20
 7:       2   numser2 2020-03-15
 8:       2   numser2 2020-03-18
 9:       2   numser3 2020-03-13
10:       2   numser3 2020-03-18
11:       3   numser3 2020-01-22
12:       4   numser4 2018-01-02

I want, by patient and numsermed, keep the date that have at least 3 months difference. I cannot use simply the successive differences. Expected result is:

   patient numsermed       date
1:       1   numser1 2020-01-08
2:       2   numser2 2015-01-02
3:       2   numser2 2019-12-12
4:       2   numser2 2020-03-15
5:       2   numser3 2020-03-13
6:       3   numser3 2020-01-22
7:       4   numser4 2018-01-02

Here, for numsermed2 and patient 2, after 2019-12-12, the next date 3 months a least later is 2020-03-15, that I keep. I thus remove 2020-01-05, 2020-01-08, 2020-01-20.

I then remove 2020-03-18, which is 3 days after 2020-03-15. Here is my solution with data.table:

library(data.table)
library(lubridate)

setkeyv(test,c("numsermed","patient","date"))
test[,indx := .I - .I[1]+1,by = .(numsermed,patient)]

max(test[,.N,by = .(numsermed,patient)]$N)
Nmax <- max(test[,.N,by = .(numsermed,patient)]$N)
test[,supp := 0]

for(i in 1:Nmax){
  test[N>1 ,supp := ifelse(i < indx & date < date[i] + 90,1,0),
       by = .(numsermed,patient)]
  test <- test2[supp != 1  ]
  test[,indx := .I - .I[1]+1,by = .(numsermed,patient)]
}

The idea is for each row, to test the condition and then perform the subset. It seems to work, but on a million row table, it is rather slow (few hours). I am sure there is an efficient way with semi equi join or rolling join in data.table, but I did not manage to write it. Could someone come up with a more efficient solution ? dplyr solutions are of course welcome too.

The data:

library(data.table)
library(lubridate)  test<-setDT(list(patient=c(1:3,2),numsermed=c(paste0("numser",1:3),"numser2"),date=as_date(c("2020-01-08","2020-01-20","2020-01-22","2019-12-12"))))
    test<-rbind(test,data.table(patient=4,numsermed="numser4",date=as_date("2018-01-02")))
    test<-rbind(test,data.table(patient=2,numsermed="numser2",date=as_date("2015-01-02")))
    test<-rbind(test,data.table(patient=2,numsermed="numser2",date=as_date("2020-03-15")))
    test<-rbind(test,data.table(patient=2,numsermed="numser2",date=as_date("2020-01-05")))
    test<-rbind(test,data.table(patient=2,numsermed="numser2",date=as_date("2020-01-08")))
    test<-rbind(test,data.table(patient=2,numsermed="numser3",date=as_date("2020-03-13")))
    test<-rbind(test,data.table(patient=2,numsermed="numser3",date=as_date("2020-03-18")))
    test<-rbind(test,data.table(patient=2,numsermed="numser2",date=as_date("2020-03-18")))

↧

Colour stacked bar-chart with unique colour for each bar in ggplot

February 21, 2020, 9:41 am

≫ Next: Closest other Value in the same Vector

≪ Previous: efficient way of selecting rows with a minimum time spacing between dates while grouping

I would like to make a stacked histogram in ggplot, where each of the bars (and stacked bars) have a unique colour - using a provided hex value.

For example, take this dataframe.

Pct <- c(0.8026200, 0.1973800, 0.8316421, 0.1683579)
Site <- c("A","A","B", "B")
hex <- c("#53412F", "#B4A383", "#4E3B29", "#B6A37E")
bin <- rep(c(1,2), 2)

df <- as.data.frame(cbind(Site,Pct,hex,bin))

I would like to use the hex colours specified to colour the corresponding bars.

I have tried variations along these lines:

ggplot()+
  geom_bar(aes(y=Pct, x=as.character(Site), fill=bin), data=df, stat="identity")+
  theme_bw() +
  scale_fill_manual("Subject", values=df$hex)

but this produces a green and red colour for each plot?

Any help would be greatly appreciated. Sorry if it is a simple solution - I have not got much experience with stacked barcharts.

Thank you in advance!

↧

Closest other Value in the same Vector

February 21, 2020, 9:42 am

≫ Next: How can I produce an untidy table from tidy data in R/Tidyverse? [duplicate]

≪ Previous: Colour stacked bar-chart with unique colour for each bar in ggplot

I have a vector

set.seed(2)
x <- sample.int(20, 5)

[1]  4 14 11  3 16

Now, for every element I want to find

the element with the minimum distance (min(abs(x[i]-x[-i])) for element i), which here would be

[1]  3 16 14  4 14

the (first) index of the element with the minimum distance, which here would be

[1] 4 5 2 1 2

The point is that the element itself is not considered, but only all the other elements, which is why this R - Fastest way to find nearest value in vector is not the answer.

If the actual answer is out there, sorry - I didn't find it.

↧

How can I produce an untidy table from tidy data in R/Tidyverse? [duplicate]

February 21, 2020, 9:45 am

≫ Next: R: Is there a way to find partial string matches with the same first element of both the string columns in two different dfs?

≪ Previous: Closest other Value in the same Vector

In R, I have some class attendance data in a tidy data set. Here's a MWE:

library(lubridate)

students <- c("Alice", "Bob", "Alice", "Bob", "Alice", "Bob")
presences <- c("Present", "Present", "Present", "Absent", "Absent", "Present")
dates <- mdy(c("2/17/2020", "2/17/2020", "2/18/2020", "2/18/2020", "2/19/2020", "2/19/2020"))

df <- data.frame(Student=students,
                 Presence=presences, 
                 Date=dates, 
                 stringsAsFactors=FALSE)

which produces

df

  Student Presence       Date
1   Alice  Present 2020-02-17
2     Bob  Present 2020-02-17
3   Alice  Present 2020-02-18
4     Bob   Absent 2020-02-18
5   Alice   Absent 2020-02-19
6     Bob  Present 2020-02-19

For a report, I want to produce a spreadsheet-style table where the rows are by student, the columns are by date, and the cell values are presence status. I've typed up the expected output explicitly below.

        02/17/20    02/18/20    02/19/20
Alice   Present     Present     Absent
Bob     Present     Absent      Present

How do I achieve this using R? I think my difficulty is that all the documentation I can find is for tidying data, and my goal here is essentialy to untidy it.

↧

R: Is there a way to find partial string matches with the same first element of both the string columns in two different dfs?

February 21, 2020, 9:48 am

≫ Next: Sum rows by selected columns specified by a character string in R

≪ Previous: How can I produce an untidy table from tidy data in R/Tidyverse? [duplicate]

I have two string columns in two different dataframes df1 and df2 ->df1$name and df2$name. df1 has more than 10000 rows, while df2 has around 200+ rows. For example:

df1 <- data.frame(name = c("Peter P", "Jim Gordon",  "Bruce Wayne", "Tony Stark","Mony Blake" ))

df2<- data.frame(name = c( "Jeter P", "Bruce Wayne", "Mony Blake" ))

NOTE: the dfs are much larger than these.

I used merge function first. It matched the common rows first but for "Jeter P" it didn't. Then I used a partial matching function amatch from Stringdist library with method = "lv". It matched Peter P to Jeter P, two different individuals. I know that amatch takes the changes in positions and alphabets etc, but i want the function to search the the df while keeping the first element of the string same while matching the string.

Basically when I use partial string matching for Jeter P in df2$name it will only consider rows from df1$name where the string starts with J as potential partial matches. Is it possible?

Thanks in advance.

↧

Sum rows by selected columns specified by a character string in R

February 21, 2020, 9:51 am

≫ Next: setting the lower limit of a derivative in R desolve

≪ Previous: R: Is there a way to find partial string matches with the same first element of both the string columns in two different dfs?

What I have: A large file of 1451 rows and 4663 columns with values 0, 1 and NAs. The row names represent sites and the columns names the date of the survey. I would like to sum rows using specific date intervals, that is to sum specific columns referring to the columns name, which represent dates. In this case I have 666 different date intervals through which to sum rows. The specific intervals are in an object type character.

A short reproducible example would be:

    df <- read.table(text ="     2005-09-23  2005-09-24  2005-09-25  2005-09-26    2005-09-27  2005-09-28  2005-09-29  2005-09-30  2005-10-07  2005-10-08
1  0       0     1    NA    1       0     1    NA   1   0          
2  1       1     1    1     1       1     1    1    1   1    
3  NA      NA    NA   NA    NA      NA    NA   NA   NA  NA", header = TRUE)

And a character string with the column names to sum, that is the date intervals, with start and end dates separated by "_":

intervals <- c("2005-09-23_2005-09-26", "2005-09-27_2005-10-30", "2005-10-07_2005-10-08")

What I want: A data frame with the column names that reflect the date intervals, the sites in the rows, and in the cells, the total of the summed values. It is important to have NAs in the sums in which all values where NAs.

result <- read.table(text ="     2005-09-23_2005-09-26  2005-09-27_2005-10-30  2005-10-07_2005-10-08
1  1       2     1           
2  4       4     2     
3  NA      NA    NA", header = TRUE)

My problem: I don't know how to specify a set of column intervals using the column names. I have found useful information related to my problem here but they all require to specify manually the columns over to which to sum, e.g. df_sum <- rowSums(df [,c(1:3)]) which in my case would be 666 date intervals. I think the answer is somewhere along the lines of the following posts and using the rowSums command, however I can't figure out how to adapt those problems to mine. How to get rowSums for selected columns in R and Sum rows in data.frame or matrix and Sum rows by interval Dataframe and R - How to sum objects in a column between an interval defined by conditions on another column

I am not too R savvy, could someone be so kind to provide me with a code to do this? Thank you!

↧

setting the lower limit of a derivative in R desolve

February 21, 2020, 9:53 am

≫ Next: Can't filter markers in RShiny based on user input

≪ Previous: Sum rows by selected columns specified by a character string in R

I am creating a mathematical model using R's package deSolve. I have three derivatives: dx (the proportion infected hosts), dY (the proportion infected vectors), and dm (which is the ratio of vectors to hosts in the population). The point of my model is to show the effects of a certain insecticide treatment on the population (with the effects represented as parameter "z". To incorporate this time dependent covariate into the model, the approxfun function was used. The model is working correctly, however I would like to set a lower limit for dm (assuming that not all the vectors in the population would be effected). Without setting the lower limit, my code and graph look like this:

Initial vectors for days post treatment, % killed
x <- c(4, 30, 60, 90, 120, 210, 360)
z <- c(1.0, 0.99, 0.99, 0.79, 0.7, 0.02, 0) 

plot(z ~ x)

#=============================================
#  fit data with logistic curve
#       -extract fit values using equation y = Asym / (1 + exp((xmid - input) / scal))
#=============================================
fit2 <- nls(z ~ SSlogis(x, Asym, xmid, scal), data = data.frame(x, z))
summary(fit2)

lines(seq(0, 400, length.out = 400),
      predict(fit2, newdata = data.frame(x = seq(0.5, 400, length.out = 400))))

Asym<-summary(fit2)$parameters[1,1]
xmid<-summary(fit2)$parameters[2,1]
scal<-summary(fit2)$parameters[3,1]

times <- seq(0, 1000, by = 1)
signal <- data.frame(times = times, import = rep(0, length(times)))
signal$import=  Asym / (1 + exp((xmid - times) / scal))

#Force time dependent covariate into the model
input <- approxfun(signal, rule = 2)


RMTx2 <- function(times, stateTx2, parametersTx2)   
{
  with(
    as.list(c(stateTx2, parametersTx2)), 
    {
      z <- input(times)
      dX <- ((m*a*b*Y)+(p*k*(a*m*z*Y)))*(1-X)-r*X 
      dY <- a*c*X*(exp(-g*n)-Y)-((g*(1-m/K)*Y)+(m*a*z*Y)) 
      dm <- ((R*(1-m/K)*m )+(-m*a*z))
      return(list(c(dX, dY, dm)))
    }
  )
}


initTx2 <- c(X = 0.01, Y= 0, m=40) 
parametersTx2 <- c(a=1/14, b=0.00068, n=45, g= 0.005, c=0.28, k= 0.10, r= 1/(3*365), p=0, K=40, R= 0.09)
outTx2 <- as.data.frame(ode(y = initTx2, times = times, func = RMTx2, parms = parametersTx2))
RESULTS2<-data.frame(outTx2$X,outTx2$Y)
RESULTS2m <-data.frame(outTx2$m, outTx2$Y*outTx2$Y)
timesTx2 <- seq(0, 10000, by = 1)

And here is the plot. What I would like to do is limit the drop in dm over time so that it will not drop below a certain value during treatment

I am trying to set the parameter dm so that, for example, the value cannot drop below 15. I have tried a few codes to do this including:

RMTx2 <- function(times, stateTx2, parametersTx2)   
{
  with(
    as.list(c(stateTx2, parametersTx2)), 
    {
      z <- input(times)
      dX <- ((m*a*b*Y)+(p*k*(a*m*z*Y)))*(1-X)-r*X 
      dY <- a*c*X*(exp(-g*n)-Y)-((g*(1-m/K)*Y)+(m*a*z*Y))
      dm <- if (isTRUE (((R*(1-m/K)*m )+(-m*a*z)) > MM)) ((R*(1-m/K)*m )+(-m*a*z)) else 15 
      return(list(c(dX, dY, dm)))
    }
  )
}

initTx2 <- c(X = 0.01, Y= 0, m=40) 
parametersTx2 <- c(a=1/14, b=0.00068, n=45, g= 0.005, c=0.28, k= 0.10, r= 1/(3*365), p=0, K=40, R= 0.09, MM= 15)
outTx2 <- as.data.frame(ode(y = initTx2, times = times, func = RMTx2, parms = parametersTx2))
RESULTS2<-data.frame(outTx2$X,outTx2$Y)
RESULTS2m <-data.frame(outTx2$m, outTx2$Y*outTx2$Y)

Unfortunately, for some reason this is causing the population to just increase indefinitely:

Is there something fundamental about this approach that won't work? Or is this more a coding error? Thanks!

↧

Can't filter markers in RShiny based on user input

February 21, 2020, 9:53 am

≫ Next: How to add columns to a dataframe through lapply

≪ Previous: setting the lower limit of a derivative in R desolve

I'm having a problem with my RShiny app not filtering the markers on the Leaflet map. What am I doing wrong? I'm having troubles identifying why this is happening. I did the same thing but in other ways, for example, use a switch based on input$df to switch datasets in leaflet proxy.

Here's my entire code

UI:

ui <- fluidPage(
theme = shinytheme("superhero"),
tags$head(
    includeCSS("MarkerCluster.Default.css", "MarkerCluster.css"),

    includeScript("leaflet.markercluster-src.js"),

),

titlePanel("Map App"),
sidebarLayout(
    position = "right",
    sidebarPanel(
        h3("Options"),
        selectInput(
            "df",
            h5("Display facilities"),
            choices = list(
                "All" = 3,
                "Empty" = 2,
                "Non-empty" = 1
            ),
            selected = 3
        ),


    ),
    mainPanel(
        h3("Map demo with MarkerClusters"),
        tabsetPanel(
            type = "tabs",
            tabPanel(
                "Map",
                leafletOutput("map1", width = "100%", height = "764px"),

            ),
            tabPanel("Data", h4("Showing first 100 rows"), tableOutput("data"))
        )




    )
)
)

Server:

server <- function(input, output) {


output$map1 <- renderLeaflet({
    leaflet() %>%
        addTiles(attribution = "Map Demo") %>%
        setView(-98.5795, 39.828175, zoom = 3)
})

output$data <- renderTable({
    ds_comp2[1:100, ]
})





observe({
    filter <- reactive({

            switch(input$df,
                   "1" = ds_comp2[ds_comp2$empty == F,],
                   "2" = ds_comp2[ds_comp2$empty == T,],
                   "3" = ds_comp2[,]


            )})

    proxy <- leafletProxy("map1") %>%
                   clearMarkerClusters() %>%
                    clearMarkers() %>%



                   addMarkers(
                       clusterOptions = markerClusterOptions(),
                       data = filter(), 

                       popup = paste(
                           "<b>ZIP code:</b>",
                           ds_comp2$zip,
                           "<br>",
                           "<b>Type:</b>",
                           ds_comp2$type,
                           "<br>",
                           "<b>Group:</b>",
                           ds_comp2$group,
                           "<br>",
                           "<b>Empty?:</b>",!(ds_comp2$empty),
                           "<br>"
                       )
                   )
           })





}

shinyApp(ui = ui, server = server)

↧

How to add columns to a dataframe through lapply

February 21, 2020, 9:53 am

≫ Next: selecting variables with hierarchic preference

≪ Previous: Can't filter markers in RShiny based on user input

There are few issues I have encountered need help:

so I have a tsvfile like this

intermediate.tsv
experiment control   par1 par2 par3
1          a_control  1    11   21 
1          b_control  5    12   21
2          a_control  2    11   50 
2          b_control  3    13   31
3          a_control  4    11   35
3          b_control  2    11   35

I read the tsvfile like this:

tsvfiles<- read.csv2('/path_location/intermediate.tsv', header=T, sep = '\t', stringsAsFactors = F)

convert the value to numeric

for(i in 3:ncol(tsvfiles)) {
  tsvfiles[,i] <- as.numeric(tsvfiles[, i])
}

I realize when I am calling tsvfiles$control, I am only getting 'a', or 'b', which is not big deal but if someone can help me correcting that will be great

Since I just want to call par* column header, I set up a col variable

cols <- names(tsvfiles)[!names(tsvfiles) %in% c('experiment', 'control')]

Then, I want my first column to be like this:

final_data<- c('Description', 'a_control Mean', 'a_control sd',  'b_control Mean', 'b_control sd', 'plot')

And create a function

generate_table<-function(data, col){

  a_mean = mean(data[[col]][tsvfiles$control == "a"])
  b_mean = mean(data[[col]][tsvfiles$control == "b"])
  a_sd = sd(data[[col]][tsvfiles$control == "a"])
  b_sd = sd(data[[col]][tsvfiles$control == "b"])

  p1 <- ggplot(data, aes(x=control, y= !!sym(col), color = control)) + 
    geom_violin() + geom_boxplot(width = 0.1)  +
    geom_jitter(shape = 16, colour = "black", alpha = 0.5, width = 0.2) +
    scale_x_discrete(limits = rev(levels(as.factor(data$control)))) +
    coord_flip()
  column <- c(col, a_mean, a_sd, b_mean, b_sd)
  return(column)
}

when I do lapply

lapply(cols, generate_table, data=tsvfiles)

I get each of this

"par1""1.15285714285714""0.0543270519302177""1.2055""0.0730879066964102""par2""11.15285714285714""1.0543270519302177""12.2055""3.0730879066964102""par3""31.15285714285714""5.0543270519302177""21.2055""2.0730879066964102"

Now here is the real question

how do I apply the output from lapply and add it to the final_data like this

Description     par1                par2               par3
a_mean     1.15285714285714   11.15285714285714   31.15285714285714
a_sd       0.0543270519302177 1.0543270519302177  5.0543270519302177
b_mean     1.2055             12.2055             21.2055
b_sd       0.0730879066964102 3.0730879066964102  2.0730879066964102

This is before without even having a plot row I tried

final_data = cbind(final_data , lapply(cols, generate_table, data=tsvfiles))

and I can't get get final_data to have all the column from the function,

Thank you

↧