Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all 209426 articles
Browse latest View live

Assign the value of a variable based on another variable which is related to column names of a dataframe

$
0
0

I have a data frame with the following variables:

df <- data.frame(ID = seq(1:5),
                 Price.A = c(10,12,14,16,18), 
                 Price.B = c(6,7,9,8,5), 
                 Price.C = c(27,26,25,24,23), 
                 Choice = c("A", "A", "B", "B", "C"))

I want to create a variable called Expenditure, which picks the value from Price.A, Price.B or Price.C depending on the value of the variable Choice.

I tried to create it with the following code:

df$Expenditure <- with(df, get(paste("Price.", Choice, sep ="")))

However, that returns the value of Price.A for all observations.

In my real application, instead of A, B and C, I have hundreds of names, so an ifelse command is not feasible.

Does anyone knows how to do that?


dplyr group_by returns blank

$
0
0

I have a data frame with the following dimensions:

18549282 obs. of  3 variables:

$ road: chr  "MULTILINESTRING((30.5592664 -30.5971316,30.5597665 -30.5964615))" ...
$ n1       : int  0 0 0 0 0 0 0 0 0 0 ...
$ n2       : int  0 0 0 0 0 0 0 0 0 0 ...

There are no blank records in the road column, meaning that every record has a character.

When I use dplyr's group_by along with summarize to get the sum of n1 and sum of n2 by road I get a sum of n1 and n2 but I see a blank in the road column. e.g.

tt %>%
group_by(road) %>%
summarize(sn1 = sum(n1),
sn2 = sum(n2))

I get:

enter image description here

Again I'm 100% sure that there are no blanks in the road column.

But when I create a data frame with, lets say 1000 records as follows

small_dataset <- head(tt, 1000)

I don't see any blank records in the results:

enter image description here

Seems that dplyr strudels with the large amount of data.

Any ideas on how I can handle this issue?

Network timeouts when running lengthy shiny apps

$
0
0

I have a shiny app which selects a subset of observations from a large dataframe, and then renders r markdown reports against each observation of that subset, zipping them all these reports at the end and downloading the zip file.

When the subset is small (eg less than 10 reports), all works fine, but a network timeout occurs once it takes more than a certain amount of time to render all the reports in the background (eg in some cases more than 100 reports need to be rendered).

I have tried editing the config file to set app_init_timeout = 3600 and app_idle_timeout =3600 but this does not seem to impact this problem....

Any ideas?

textplot function: How to center the text within a table?

$
0
0

I can't get my head around centering the text correctly within tables using the textplot funtion.

I have written the following code so far:

jpeg("Equity_Regions_TAA_Scorecard.jpg", width = 7.0, height = 3.5, units="in", res=200, pointsize=5)

  result <- cbind(as.numeric(last(na.locf(signal.combined))),
                 round(as.numeric(last(na.locf(eps.pct))), digits=2),
                 round(as.numeric(last(na.locf(err.pct))), digits=2),
                 round(as.numeric(last(na.locf(pmi.pct))), digits=2),
                 round(as.numeric(last(na.locf(exp.pct))), digits=2),
                 round(as.numeric(last(na.locf(surprise.pct))), digits=2),
                 round(as.numeric(last(na.locf(momentum.pct))), digits=2))

  colnames(result) <- c("TAA Score", "EPS", "ERR", "PMI", "Expectations", "Surprises", "Momentum")

  row.names(result) <- as.character(names(signal.combined))

  for (j in 2:ncol(result)) {
    for (i in 1:nrow(result)) {

      result[i, j] <- if (result[i, j] >= 80) {
        "++"
      } else if (result[i, j] < 80 & result[i, j] >= 60) {
        "+"
      } else if (result[i, j] < 60 & result[i, j] >= 40) {
        "o"
      } else if (result[i, j] < 40 & result[i, j] >= 20) {
        "-"
      } else {
        "--"
      }
    }
  }

  cols <- matrix(NA, nrow=nrow(result), ncol=ncol(result))

  cols[,1] <- col1

  for (j in 2:ncol(result)) {
    for (i in 1:nrow(result)) {

      cols[i, j] <-   if(result[i, j] =="++") {
        "darkgreen"
      } else if (result[i, j] =="+") {
        "forestgreen"
      } else if (result[i, j] =="o") {
        "dimgray"
      } else if (result[i, j] =="-") {
        "firebrick1"
      } else {
        "firebrick"
      }

    }
  }

  textplot(as.data.frame(result), col.data=cols, rmar = 1.0, cmar = 1.0,  max.cex=1.5, cex.main=1.5,
           halign = "center", valign = "center", col.rownames = col1, col.colnames = col1,
           wrap.rownames=10, wrap.colnames=10, mar = c(0,0,3,0)+0.1)

  title(main="Equity Regions TAA Scorecard")

dev.off()

I am using the following R version:

platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          4.3                         
year           2017                        
month          11                          
day            30                          
svn rev        73796                       
language       R                           
version.string R version 3.4.3 (2017-11-30)
nickname       Kite-Eating Tree            

This is my output so far, where you can see that the text within the table is not really aligned and centered: enter image description here

I think the most important piece of code is probably this one:

textplot(as.data.frame(result), col.data=cols, rmar = 1.0, cmar = 1.0,  max.cex=1.5, cex.main=1.5,
           halign = "center", valign = "center", col.rownames = col1, col.colnames = col1,
           wrap.rownames=10, wrap.colnames=10, mar = c(0,0,3,0)+0.1)

Am I doing something wrong here? Obviously the texts aren't centered despite using commands like halign = "center", valign = "center".

Any help would be highly appreciated!

How to group factor levels in R

$
0
0

I have a factor column with football position abbreviations, around 17 unique values with 220 observations. I want to have only three factor levels which encompass the 17 unique values.

levels(nfldraft$Pos) <- list(Linemen = c("C","OG","OT","TE","DT","DE"), Small_Backs =  c("CB","WR","FS"), Big_Backs = c("FB","ILB","OLB","P","QB","RB","SS","WR"))

is what I tried, printing nfldraft$Pos to the console shows 3 factor levels but all the values are either Linemen or Small_Backs and all the other ones are NA. Where am I going wrong? Thank you

Missing category to be shown at bottom

$
0
0

I want category of 'Missing' to be shown at bottom in result after groupby. There can be any character value in column x. See example below.

library(dplyr)
df <- data.frame(x = c('Ap','LA','MN', 'Missing','ZA'),
                 y = c('PA','NA','DN', 'Missing','ZD'),
                 z = 1:5,
                 stringsAsFactors = F)

df %>% group_by(x) %>% summarise(x1 = sum(z))

"Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases" when doing 2-way repeated measures anova test

$
0
0

I've been trying to run a 2-Way repeated measures test on a dataset, with year & vaccine type being the independent variables and coverage being a dependent variable. I ran it with the code:

sat = anova_test(
  data=SA, dv = coverage, wid = country, 
  within=c(vaccine, year)
)

but then I got the error

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

I've run all(is.na()) on all variables, and all of them have turned up false, so there shouldn't be any NA cases. Coverage is definitely a numeric, while country, vaccine, year are definitely factors.

Any help would be greatly appreciated!!

EDIT: I used the anova_test function from the package rstatix Here's a summary of the data

> summary(SA)  
 unicef_region          iso3                  country    vaccine         year     
 Length:1360        Length:1360        Afghanistan:170   BCG :272   1985   :  40  
 Class :character   Class :character   Bangladesh :170   DTP1:272   1986   :  40  
 Mode  :character   Mode  :character   Bhutan     :170   DTP3:272   1987   :  40  
                                       India      :170   MCV1:272   1988   :  40  
                                       Maldives   :170   POL3:272   1989   :  40  
                                       Nepal      :170              1990   :  40  
                                       (Other)    :340              (Other):1120  
    coverage        decade              logitv       
 Min.   : 1.00   Length:1360        Min.   :-4.5951  
 1st Qu.:63.00   Class :character   1st Qu.: 0.5322  
 Median :83.00   Mode  :character   Median : 1.5856  
 Mean   :75.97                      Mean   : 1.7595  
 3rd Qu.:96.00                      3rd Qu.: 3.1781  
 Max.   :99.00                      Max.   : 4.5951  

> head(SA)
# A tibble: 6 x 8
  unicef_region iso3  country     vaccine year  coverage decade logitv
  <chr>         <chr> <fct>       <fct>   <fct>    <dbl> <chr>   <dbl>
1 South Asia    AFG   Afghanistan BCG     1985        17 80s    -1.59 
2 South Asia    BGD   Bangladesh  BCG     1985         2 80s    -3.89 
3 South Asia    BTN   Bhutan      BCG     1985        54 80s     0.160
4 South Asia    IND   India       BCG     1985         8 80s    -2.44 
5 South Asia    MDV   Maldives    BCG     1985        45 80s    -0.201
6 South Asia    NPL   Nepal       BCG     1985        67 80s     0.708

Filtering a dataset depending on a variable that I select in my UI selectInput box in R/Rshiny

$
0
0

I have a dataset (pauldata) that i want to filter for all cases where variable varx='Y'

The following code works well to create the filtered object pauldata2:

pauldata2 <- pauldata[pauldata$varx %in% "Y",]

However, i now want to make this reactive depending on what variable is chosen from my UI selection box (sel1)

I figured the below approach would work (similar to what has proven successful for other reactive boxes i have in my application - although none of these other ones are dataset filtering ones), but it did not:

pauldata2 <- reactive({pauldata[pauldata[[input$sel1]] %in% "Y",]})

Instead when i select the variable varx in the selection box this gave an error:

"Error: object of type 'closure'  is not subsettable"

I'd be interested to know why this does not work and if there is a solution for this?


R - Parse multipe types of prices [duplicate]

$
0
0

I have a dataset with text in one column. In each value, i have price information but with 3 types of writting :

Text_1 <- "vfzvzag 500 000 euros"
Text_2 <- "agfbv 500.000,00 e"
Text_3 <- "ezfcze 500,000.00 e"
Text_4 <- "500 000.00 e"

Sometime, separation ten and cents is with comma, some time with punct. Sometime i don’t have cents

Can you help me to create a regex function to extract the price of each value from this column ?

Thank’s

Best regards

End of the (next) month from a random date

$
0
0

I have a question what is the fastest way to get end of the month from a day. I have a really large table and I want my code to be fast. My current code looks as follows:

library(lubridate)

end_of_month <- function(date){
 day(date) <- days_in_month(date)
 date
}

I have another question. Is it a fast way to get a last day of the next month from a random date?

"2019-05-15" %>% as.Date() %m+% months(1) %>% end_of_month  # 2019-06-30

Can I do this in one step or do I need an extra function to handle this?

How to extract one specific group in dplyr

$
0
0

Given a grouped tbl, can I extract one/few groups? Such function can be useful when prototyping code, e.g.:

mtcars %>%
  group_by(cyl) %>%
  select_first_n_groups(2) %>%
  do({'complicated expression'})

Surely, one can do an explicit filter before grouping, but that can be cumbersome.

Add Markers by country name leaflet R

$
0
0

I am on Rstudio, and I want to add a marker based on country name.

The variable that contain the number of occurence per country is cnt_country which is in the table below

Morocco                            57381
France                             35729
Tunisia                            85563
Saudi Arabia                       10816
Turkey                             6725

However, when I use leaflet(cnt_country)%>% addTiles()%>% addMarkers() I get an error: cannot infer lat long information.

Is there a way for leaflet to add markers based on country name?

Create dataframe in R from messy excel spreadsheets, with each dataframe starting with specific cell content

$
0
0

I have a huge number of excel spreadsheets with the same type of content, but the structure has been changed for some. For example, 'Part 2' may be found on the same tab as 'Part 1' in one spreadsheet, while another may have 'Part 2' on a separate spreadsheet.

Is there a way to have any package (readxl, xlsx etc.) find the tab containing a specific cell - "Part 2" - and then import from that cell?

Thanks very much!

Identify seasonality using 1 year period

$
0
0

I have a data set which represents the number of customers subscribing monthly from Dec2018. I would like to check if i have seaosnality in my data however, am getting an erro in R.

    Month   Subs
1  201812   37551.17
2  201901   42144.39
3  201902   37466.42
4  201903   37568.11
5  201904   35271.50
6  201905   49453.71
7  201906   60640.16
8  201907   59835.07
9  201908   71657.11
10 201909   71911.35
11 201910   57962.19
12 201911   55538.46
13 201912   59423.75
14 202001   62707.65
15 202002   50034.92

In R I have done the below where the following error where prompted:

> myts <- ts(data = Data$DBRollOver, start =c(2018,12),frequency=12)
> fit <- stl(myts,s.window = "period")
Error in stl(myts, s.window = "period") : 
  series is not periodic or has less than two periods

Any suggestions?

How to access data.frame columns by $ followed by a function?

$
0
0

Say I have a df.

df<-data.frame(matrix(,ncol=3,nrow=2));
colnames(df)<-(c("day1","day2","day3"))

Of course now I can access the first column by df$day1. However, instead of directly put day1 in there, I sometimes like to access within a loop, e.g.,

in a loop where I have val=3

Can I access df$day3 by something like df$paste("day",val,sep="")?

Of course the one I wrote won't work.


How to tell R Markdown that a current chunk returns a Latex Code

$
0
0

I'm new to R Markdown and got a problem right now. I want to do a Table one with mean and sd for some variables and for that pursue I'm using compareGroups and createTable. There is also a function, called export2latex, which converts the table into LaTex code but unfortunately I don't know how to tell R Markdown that the current chunk is already Latex and it should convert it.

I've tried it with results = 'tex' in the chunk options but without success. Right now it just displays the latex code in my pdf like this image. Here is the image of my code.

Does anybody know how to solve this problem?

UPDATE As mentioned, results="asis" helped out, so thank you. But there occurs a second problem after doing that. It seems like you need to use such a latex generated code as an inline expression, because knitr can't handle the latex math environment. So I did it like this. Now LaTex is able to compile the file but now there appears two dollar signs in my pdf.

Can someone tell me how to get rid of it?

read serialize data created by Matlab in R

$
0
0

I have serialized data created by Matlab and stored in SQL database. how could I de-serialized those data when I read them in R ?

The Matlab script to serialize the data looks like :

                % Serialize paramValue and convert to hex array
                paramStrTmp = dec2hex(serialize(paramValue), 2);
                % Reshape hex array to single hex string
                paramStr    = reshape(paramStrTmp', 1, 2*length(paramStrTmp)); 

Then this data paramStr will be saved in SQL database. Reading it in R returns :

str(ProjectList)
'data.frame':   179347 obs. of  1 variable:
 $ value: chr  "0302010000002B000000533A5C4F465C4445552D30303034385C5F446F635C30362D44657369676E4261736973425C30332D4C4253""03020100000014000000533A5C4F465C4445552D30303034385C5F4C4341""03020100000041000000533A5C4F465C4445552D30303034385C484F535F7030312D6930315F42342D496E697454656E646572506F735F4"| __truncated__ "030201000000090000004445552D3030303438" ...

create an ID column with randomly generated values in R

$
0
0

I am using the package called mtcars and I want to get a randomly generated number for every observations. I have written a little function

mtcars$ID <- NULL
for (i in mtcars){
  mtcars$ID <- runif(1, min=0, max=100)
}

however this assings the same number to all cars.

i tried

mtcars$ID <- NULL
for (i in mtcars){
  mtcars$ID[i] <- runif(1, min=0, max=100)
}

which results in an error. I would like to get two type of results (two functions):

  1. to assign a random number to each observation
  2. to assig a unique random number to each observation

Heatmap is not rendered in shiny unless I resize the window

$
0
0

I am new to shiny and trying to make a shiny app. I need to plot a heatmap using pheatmap based on the numeric input from user. The data is too big, but here is a reproducable data:

a<-structure(list(gene_name = c("NAT2", "ACADS", "ACAT1", "ADA", 
                             "ADRB2", "ADRB3"), tree = c(1L, 2L, 3L, 3L, 4L, 4L)), row.names = c(NA, 
                                                                                                 6L), class = "data.frame")


b<-structure(list(Phaeodactylum_tricornutum = c(0, 1, 1, 1, 0, 0
), Coccomyxa_subellipsoidea = c(0, 1, 1, 1, 0, 0), Acanthamoeba_castellanii = c(1, 
                                                                                1, 1, 1, 0, 0), Fonticula_alba = c(0, 1, 1, 1, 0, 0), Rhizophagus_irregularis = c(0, 
                                                                                                                                                                  1, 1, 1, 0, 0), Sphaeroforma_arctica = c(0, 1, 1, 1, 0, 0), Capsaspora_owczarzaki = c(0, 
                                                                                                                                                                                                                                                        1, 1, 1, 0, 0), Cryptosporidium_parvum = c(0, 0, 0, 0, 0, 0), 
Enterobacter_cloacae = c(1, 1, 1, 1, 0, 0), gene_name = c("NAT2", 
                                                          "ACADS", "ACAT1", "ADA", "ADRB2", "ADRB3"), human_np = c("NP_000006.2", 
                                                                                                                   "NP_000008.1", "NP_000010.1", "NP_000013.2", "NP_000015.1", 
                                                                                                                   "NP_000016.1")), row.names = c(NA, 6L), class = "data.frame")

Here is the shiny code:

library(shiny)
library(DT)
library(pheatmap)
library(dplyr)

# Define UI ----
ui <- fluidPage(

  titlePanel("title panel"),

  sidebarLayout(
    sidebarPanel("sidebar panel"
                 ),
    mainPanel("main panel", 
              numericInput(inputId = "cl", 
                           h3("Cluster number"), 
                           value = 1, min = 1, max = 2),
              plotOutput("cls_num"),

    )
  )
)

# Define server logic
server <- function(input, output) {

  clinput<-reactive({
    b[which(a$tree == input$cl),1:9]
  })

  output$cls_num<-renderPlot({

    clinput()%>%
      pheatmap(cluster_rows = FALSE, cluster_cols = FALSE)
  })
}

# Run the app ----
shinyApp(ui = ui, server = server)

The problem is, when I run the app, it only shows the plot when I resize the window, and after I change the input, I need to resize the window again to see the heatmap. This happens in both rstudio and browser.

Thanks in advance.

pivot_longer with groups of columns

$
0
0

I've got a dataset that looks like this:

df_start <- tribble(
    ~name,    ~age, ~x1_q1, ~x1_q2, ~x1_q3, ~x2_q1, ~x2_q2, ~x2_q3, ~number,
    "John",     28,     1,     1,     9,     4,     5,     9,         6,
    "Paul",     27,     2,     1,     4,     1,     3,     3,         4,
    "Ringo",    31,     3,     1,     2,     2,     5,     8,         9); df_start

I need to pivot_longer() while handling the groupings within my columns:

  • There are 2 x-values (1 and 2)
  • There are 3 questions (q1, q2, q3) for each x-value

Essentially, what I'd like to do is to apply pivot_longer() to the x-values but leave my 3 questions (q1, q2, q3) wide.

What I'd like to end up with is this:

df_end <- tribble(
    ~name, ~age, ~xval, ~q1, ~q2, ~q3, ~number,
    "John", 28,    1,    1,   1,    9,    6,
    "John", 28,    2,    4,   5,    9,    6,
    "Paul", 27,    1,    2,   1,    4,    4,  
    "Paul", 27,    2,    1,   3,    3,    4, 
    "Ringo", 31,   1,    3,   1,    2,    9, 
    "Ringo", 31,   2,    2,   5,    8,    9); df_end

I have tried lots of very unsuccessful attempts playing with regex & pivot_longer but am completely striking out.

Anyone know how to tackle this?

THANKS!

Viewing all 209426 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>