Filter out rows between two values

April 3, 2020, 3:37 am

≪ Previous: R 3.5.2: Pipe inside custom function - object 'column' not found

I have a problem regarding filtering out some rows.

Sample dataset:

df <- data.frame(id = c("1", "1", "1", "2", "2", "2"), description= c("Start", "Something", "Final", "Start", "Some Other Thing", "Final"), timestamp = c("2017-07-26 23:41:16", "2017-07-27 20:23:16", "2017-07-29 07:06:53", "2017-07-24 04:53:02", "2017-07-25 10:27:02", "2017-07-26 16:51:43")))

Now I want to delete all rows that happen (timewise) between description = "Start" and description ="Final".

Any help would be appreciated. Thanks in advance!

↧

Excluding NA in R

April 3, 2020, 3:42 am

≫ Next: Summarize hourly data to daily data in a list in R

≪ Previous: Filter out rows between two values

I'm new to R Studio and I tried to exclude all the data for which the participants answered less than 17 questions. I tried using the two variations down below.

data1  <- data[data$frequency.participant >= 17, ] data1 <-data[!(data$frequency.participant <17),]

My problem is that both do work, as in, they set the rows for which there are less than 17 answers to NA. But more than showing NA, I want those rows to be deleted. What am I doing wrong?

Here is an example of what my dataset looked like before running the code. There's some NAs but also answers below 17.

Here is an example of after running the code. Now everything below 17 has been replaced with NA.

↧

Summarize hourly data to daily data in a list in R

April 3, 2020, 3:43 am

≫ Next: How do I Install R Github packages inside Docker

≪ Previous: Excluding NA in R

I'm trying to summarize hourly measurement data to daily data for every element in a list.

List looks like this:

SE104:List of 3  ..$ d20:List of 11  .. ..$ 2009:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2009-01-01""2009-01-01""2009-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2010:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2010-01-01""2010-01-01""2010-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2011:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2011-01-01""2011-01-01""2011-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2012:'data.frame': 8784 obs. of  2 variables:  .. .. ..$ Date: Date[1:8784], format: "2012-01-01""2012-01-01""2012-01-01" ...  .. .. ..$ SWC : num [1:8784] 43.1 43 42.8 42.7 42.7 ...  .. ..$ 2013:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2013-01-01""2013-01-01""2013-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2014-01-01""2014-01-01""2014-01-01" ...  .. .. ..$ SWC : num [1:8760] 37.2 37.2 37.3 37.2 37.1 ...  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2015-01-01""2015-01-01""2015-01-01" ...  .. .. ..$ SWC : num [1:8760] 37.3 37.3 37.3 37.3 37.3 ...  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:  .. .. ..$ Date: Date[1:8784], format: "2016-01-01""2016-01-01""2016-01-01" ...  .. .. ..$ SWC : num [1:8784] 36 36 36 36 35.9 ...  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2017-01-01""2017-01-01""2017-01-01" ...  .. .. ..$ SWC : num [1:8760] 32.9 32.9 32.9 32.9 32.9 ...  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2018-01-01""2018-01-01""2018-01-01" ...  .. .. ..$ SWC : num [1:8760] 35 35.1 35.2 35.2 35.2 ...  .. ..$ 2019:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2019-01-01""2019-01-01""2019-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN ...  ..$ d50:List of 11  .. ..$ 2009:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2009-01-01""2009-01-01""2009-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2010:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2010-01-01""2010-01-01""2010-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2011:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2011-01-01""2011-01-01""2011-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2012:'data.frame': 8784 obs. of  2 variables:  .. .. ..$ Date: Date[1:8784], format: "2012-01-01""2012-01-01""2012-01-01" ...  .. .. ..$ SWC : num [1:8784] 39.8 39.6 39.4 39.3 39.3 ...  .. ..$ 2013:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2013-01-01""2013-01-01""2013-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2014-01-01""2014-01-01""2014-01-01" ...  .. .. ..$ SWC : num [1:8760] 32.7 32.7 32.8 32.7 32.7 ...  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2015-01-01""2015-01-01""2015-01-01" ...  .. .. ..$ SWC : num [1:8760] 33.2 33.2 33.2 33.2 33.2 ...  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:  .. .. ..$ Date: Date[1:8784], format: "2016-01-01""2016-01-01""2016-01-01" ...  .. .. ..$ SWC : num [1:8784] 31.9 31.9 31.9 31.9 31.9 ...  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2017-01-01""2017-01-01""2017-01-01" ...  .. .. ..$ SWC : num [1:8760] 27.9 27.9 27.9 27.9 27.9 ...  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2018-01-01""2018-01-01""2018-01-01" ...  .. .. ..$ SWC : num [1:8760] 29.2 29.2 29.2 29.2 29.2 ...  .. ..$ 2019:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2019-01-01""2019-01-01""2019-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN ...  ..$ d5 :List of 11  .. ..$ 2009:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2009-01-01""2009-01-01""2009-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2010:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2010-01-01""2010-01-01""2010-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2011:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2011-01-01""2011-01-01""2011-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2012:'data.frame': 8784 obs. of  2 variables:  .. .. ..$ Date: Date[1:8784], format: "2012-01-01""2012-01-01""2012-01-01" ...  .. .. ..$ SWC : num [1:8784] 58.4 58.4 58.3 58.2 58.2 ...  .. ..$ 2013:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2013-01-01""2013-01-01""2013-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2014:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2014-01-01""2014-01-01""2014-01-01" ...  .. .. ..$ SWC : num [1:8760] 47.4 47.4 47.4 47.4 47.4 ...  .. ..$ 2015:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2015-01-01""2015-01-01""2015-01-01" ...  .. .. ..$ SWC : num [1:8760] 49 49.1 49 49.1 49.1 ...  .. ..$ 2016:'data.frame': 8784 obs. of  2 variables:  .. .. ..$ Date: Date[1:8784], format: "2016-01-01""2016-01-01""2016-01-01" ...  .. .. ..$ SWC : num [1:8784] 43.7 43.7 43.7 43.8 43.7 ...  .. ..$ 2017:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2017-01-01""2017-01-01""2017-01-01" ...  .. .. ..$ SWC : num [1:8760] 39.1 39.2 39.1 39.2 39.2 ...  .. ..$ 2018:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2018-01-01""2018-01-01""2018-01-01" ...  .. .. ..$ SWC : num [1:8760] 45.8 46 46 45.9 45.7 ...  .. ..$ 2019:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2019-01-01""2019-01-01""2019-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN ... $ SE105:List of 3  ..$ d20:List of 11  .. ..$ 2009:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2009-01-01""2009-01-01""2009-01-01" ...  .. .. ..$ SWC : num [1:8760] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  .. ..$ 2010:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2010-01-01""2010-01-01""2010-01-01" ...  .. .. ..$ SWC : num [1:8760] 45.4 45.2 45 44.9 44.8 ...  .. ..$ 2011:'data.frame': 8760 obs. of  2 variables:  .. .. ..$ Date: Date[1:8760], format: "2011-01-01""2011-01-01""2011-01-01" ...  .. .. ..$ SWC : num [1:8760] 39.6 39.6 39.6 39.6 39.6 ...  .. ..$ 2012:'data.frame': 8784 obs. of  2 variables:  .. .. ..$ Date: Date[1:8784], format:

So you can see that my list has multiple levels. First level is a large list that contains 150 lists. Each list of those 150 lists contains 3 lists (d20, d50, d5) which contain 11 dataframes for each year from 2009-2019.

Each dataframe stored in the list looks like this:

structure(list(Date = structure(c(14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14246, 14246, 14246, 14246, 14246, 14246), class = "Date"),     SWC = c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,     NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,     NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)), row.names = c(NA, 30L), class = "data.frame")

I want to summarize the data so that I get one value for each day, I was able to do this for just one dataframe that I subsetted but I can't do it for every element in the list. I think I have to use lapply() but I can't figure out how to do it. This is what it should look like:

        Date mean_SWC1 2009-01-01      NaN2 2009-01-02      NaN

I'd appreciate some help! A user helped me out with the lapply() function in another question already but it looks like I didn't understand how to use it, as I can't do it here. So in case someone can help and does indeed use lapply() I'd appreciate if one could explain it to me.

↧

How do I Install R Github packages inside Docker

April 3, 2020, 3:44 am

≫ Next: Unable to create Docker image for rga.open R script

≪ Previous: Summarize hourly data to daily data in a list in R

I am trying to create a Docker image for my R script to schedule the R job on Google Cloud. I am currently testing it with a small R Script. The docker build commands fails at the step where I am installing the rga GitHub package. Below is my R script and the DockerFile:

R script:

library(rga)library(bigrquery)bq_token()rga.open(instance = "ga", where="~/ga.rga")demoScheduleAPI <- function(){  search_perf <- ga$getData(XXXX, batch = TRUE, walk = TRUE,                           start.date = "2020-01-15",                          end.date = "2020-01-16",                          metrics = "ga:searchUniques",                          dimensions="ga:date,ga:hour,ga:searchKeyword, ga:searchCategory ,ga:dimension6,ga:dimension10")  project <- "bidone-data"  insert_upload_job(project, "GA_Export_Prod_DataSet", "Test_Table123", search_perf)}

Dockerfile

FROM rocker/r-ver:3.6.1RUN mkdir /home/bidoneRUN R -e "install.packages('bigrquery', repos='http://cran.rstudio.com/')"RUN R -e "install.packages('devtools', repos='http://cloud.r-project.org')"RUN R -e "devtools::install_github('skardhamar/rga')"COPY .secrets /home/analysis/.secretsCOPY ga /home/analysis/gaCOPY DockerTest.R /home/analysis/DOckerTest.RCMD R -e "source('/home/analysis/DockerTest.R')"

It does install devtools package, however when it tries to install the rga package from github, it gives the following error.

> devtools::install_github('skardhamar/rga')Error in loadNamespace(name) : there is no package called ‘devtools’Calls: :: ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestartExecution haltedThe command '/bin/sh -c R -e "devtools::install_github('skardhamar/rga')"' returned a non-zero code: 1

How can I fix this issue?

↧

Unable to create Docker image for rga.open R script

April 3, 2020, 3:46 am

≫ Next: Problems with creating rarefaction curves in R with iNEXT-package

≪ Previous: How do I Install R Github packages inside Docker

I am working on scheduling R jobs using Docker. I am using package skardhamar/rga to pull the reports from GA. I am not very familiar with Docker thus I am using package o2r-project/containerit to create the Dockerfile.

My aim is to pull reports from GA (standard) daily and thus want to schedule the script on Google Cloud using Docker. For this I am following this article

https://code.markedmondson.me/4-ways-schedule-r-scripts-on-google-cloud-platform/

However, when I am trying to create the docker file, I get the below error

dockerfile <- dockerfile("DockerTest.R", copy = "script_dir", soft = TRUE)INFO [2020-02-05 10:22:22] Processing R script file 'DockerTest.R' locally.INFO [2020-02-05 10:22:22] Creating an R session with the following expressions:source(file = "DockerTest.R", echo = TRUE)Error: <callr_status_error: callr subprocess failed: object 'redirect.uri' not found>--><callr_remote_error in .rga.authenticate(client.id = client.id, client.secret = client.secret,  ...: object 'redirect.uri' not found> in process 9800See `.Last.error.trace` for a stack trace.

The R Script that I am using is as below:

library(rga)library(bigrquery)bq_token()rga.open(instance = "ga", where="~/ga.rga")demoScheduleAPI <- function(){  search_perf <- ga$getData(xxxxx, batch = TRUE, walk = TRUE,                           start.date = "2020-01-15",                          end.date = "2020-01-16",                          metrics = "ga:searchUniques",                          dimensions="ga:date,ga:hour,ga:searchKeyword, ga:searchCategory ,ga:dimension6,ga:dimension10")  project <- "bidone-data"  insert_upload_job(project, "GA_Export_Prod_DataSet", "Test_Table", search_perf)}

I understand that the rga would redirect me to obtain an auth token, however I have explicitly asked to use saved token, so I am not sure why I am getting this error. How can I fix this?

↧

Problems with creating rarefaction curves in R with iNEXT-package

April 3, 2020, 3:53 am

≫ Next: Clustering similar strings in a big dataset

≪ Previous: Unable to create Docker image for rga.open R script

I'm trying to compare two small datasets of Elasmobranchs landed by artisanal fisheries in Brazil via rarefaction curves that are based on sampling units, not abundances.Therefore, I create a data frame with presence/absence data in which every column is a sampling unit and every row a species (see here: https://textuploader.com/14dvs). Still, I get constantly errors from R and I guess it is about the format of the data I load. Still, I have been confirmed by other researchers who already worked in the same way and still where able to produce results. I would be grateful if someone would maybe look at my data and try to help me figure out where I'm going wrong!

The code I used is this:

### Rarefaction with iNEXT -> Species X Visits out <- iNEXT(Rarefaction_Visits, q=0, datatype="incidence_raw",endpoint=NULL, size=NULL, knots=200, se=TRUE, conf=0.95, nboot=200)### Sample-size-based R/E curves, separating plots by "order"ggiNEXT(out, type=1, facet.var="order")

Thank you and stay healthy!

P.s.: In the linked txt-file there is only one dataset for now, but should be sufficient to illustrate. If not, let me know.

↧

Clustering similar strings in a big dataset

April 3, 2020, 3:55 am

≫ Next: R Shiny Display Two tables in one tab

≪ Previous: Problems with creating rarefaction curves in R with iNEXT-package

My data is similar to the following one

             comp_name                                            perm_id        GM Global Technologies Operations LLC                      16002        GM Global Technologies Operations, Inc.                     NA International Business Machines Corporation (IBM)                 87001 International Business Machines Corp (IBM)                         NA

In sum, I may have similar comp_name strings, though one (or more than one) of them is with missing perm_ids. I want to fill these NAs using the strings with filled perm_ids. Kindly note that the data is larger than 400k rows.

↧

R Shiny Display Two tables in one tab

April 3, 2020, 3:56 am

≫ Next: geom_count with stat="idenity" doesn't work

≪ Previous: Clustering similar strings in a big dataset

what i am trying is to get two tables in one page on the top i want to display the head of the data table which we upload in the bottom i have written a small function name "basic_eda" where it will just give the summary of the data which was uploaded. By running this i can display the head but i am not able to display the summary i am getting this error "default method not implemented for type 'symbol'".

Tried different ways to keep renderTable wasn't successful in getting result what i want.

library(shiny)library(shinydashboard)library(Hmisc)library(funModeling)library(tidyverse)library(dplyr)basic_eda <- function(data){  glimpse(data)  df_status(data)  freq(data)  profiling_num(data)}ui <- dashboardPage(  dashboardHeader(title = "Testing"),  dashboardSidebar(    sidebarMenu(      menuItem("Data Description", tabName = "Data Description", icon = icon("database")),      menuItem("Exproratory Data Analysis", tabName = "Exproratory Data Analysis", icon = icon("dashboard"))    )  ),  dashboardBody(    sidebarLayout(      sidebarPanel(        width = 3,        # Input: Select a file ----        fileInput("file1", "Choose CSV File",                  multiple = FALSE,                  accept = c("text/csv","text/comma-separated-values,text/plain",".csv",options(shiny.maxRequestSize=100*1024^2)),width = 200),        # Horizontal line        tags$hr(),        checkboxInput("header", "Header", TRUE),        radioButtons("sep", "Separator",                     choices = c(Comma = ",",                                 Semicolon = ";",                                 Tab = "\t"),                     selected = ","),        radioButtons("quote", "Quote",                     choices = c(None = "","Double Quote" = '"',"Single Quote" = "'"),                     selected = '"'),        # Input: Select number of rows to display ----        radioButtons("disp", "Display",                     choices = c(Head = "head",                                 All = "all"),                     selected = "head"),        radioButtons("Sum", "Summary",                     choices = c(Summary = "summary"),                     selected = "summary"),      ),      mainPanel(        # Output: Data file ----        tableOutput("contents"),        tableOutput("testing")      )    )  ))server <- function(input, output, session) {  output$contents <- renderTable({    req(input$file1)    tryCatch(      {        df <- read.csv(input$file1$datapath,                       header = input$header,                       sep = input$sep,                       quote = input$quote)      },      error = function(e) {        stop(safeError(e))      }    )    if(input$disp == "head") {      return(head(df,10))     }    else {      return(df)    }  })  tags$hr()  output$testing <-  renderTable({    df<- df    if(input$Sum == "summary") {      return(basic_eda(df))     }    else {      return(df)    }  })}shinyApp(ui, server)

↧

geom_count with stat="idenity" doesn't work

April 3, 2020, 3:58 am

≫ Next: How to create a for loop that pulls data from html_nodes and populates table

≪ Previous: R Shiny Display Two tables in one tab

Is there a way to use the "identity" stat in geom_count?

You can do it with geom_bar():

data.frame(color = c("red", "green"),                num = c(100, 50)) %>%  ggplot(aes(color, num)) +  geom_bar(stat = "identity")

And this returns

But when I try something similar with geom_count():

data.frame(color = c("red", "green", "red", "green"),   cut = c("good", "terrible", "terrible", "good"),  values = c(10, 200, 4, 130)) %>%  ggplot(aes(color, cut)) +  geom_count(mapping = aes(x = cut, y = color),              stat = "identity")

I get:

↧

How to create a for loop that pulls data from html_nodes and populates table

April 3, 2020, 3:59 am

≫ Next: C50 package installation error on Mac: C compiler cannot create executables

≪ Previous: geom_count with stat="idenity" doesn't work

I have a series of publication identifiers from the RePEc database. I need to get the reference list from the database, which I can do like this:

identifier <- "RePEc:imf:imfwpa:01/191"url_base <- "http://citec.repec.org/api/amf/"url <- paste0(url_base, identifier)get_data <- read_html(url)references <- html_nodes(get_data,'references') %>% html_nodes("text")

I get an output that looks like this:

print(references){xml_nodeset (6)}[1] <text ref="RePEc:rio:texdis:400"></text>[2] <text ref="RePEc:fip:fednrp:9608"></text>[3] <text ref="RePEc:nbr:nberwo:1172"></text>[4] <text ref="RePEc:bla:ecnote:v:28:y:1999:i:3:p:335-355"></text>[5] <text ref="RePEc:imf:imfwpa:00/69"></text>[6] <text ref="RePEc:eee:jbfina:v:24:y:2000:i:1-2:p:203-227"></text>

I only want the individual identifiers. In other words, I just want this:

[1] "RePEc:rio:texdis:400"[2] "RePEc:fip:fednrp:9608"[3] "RePEc:nbr:nberwo:1172"[4] "RePEc:bla:ecnote:v:28:y:1999:i:3:p:335-355"[5] "RePEc:imf:imfwpa:00/69"[6] "RePEc:eee:jbfina:v:24:y:2000:i:1-2:p:203-227"

I tried using html_text(references) but it just gave me a series of empty cells..

Once I have this data, I want to create a dataframe with each of these values next to the original identifier. In other words, I need something like this:

identifier <- c("RePEc:imf:imfwpa:01/191", "RePEc:imf:imfwpa:01/191", "RePEc:imf:imfwpa:01/191", "RePEc:imf:imfwpa:01/191", "RePEc:imf:imfwpa:01/191", "RePEc:imf:imfwpa:01/191")references <- c("RePEc:rio:texdis:400", "RePEc:fip:fednrp:9608", "RePEc:nbr:nberwo:1172", "RePEc:bla:ecnote:v:28:y:1999:i:3:p:335-355", "RePEc:imf:imfwpa:00/69", "RePEc:eee:jbfina:v:24:y:2000:i:1-2:p:203-227")df <- data.frame(identifier, references)

I need to do this with about 180,000 different documents. I think I can write a for loop myself once I know how to do it once but if anyone has a smart way to do this, I would be very grateful for your advice. Thank you in advance for your help!

↧

C50 package installation error on Mac: C compiler cannot create executables

April 3, 2020, 4:02 am

≫ Next: How do I apply a function to specific columns in a dataframe and replace the original columns?

≪ Previous: How to create a for loop that pulls data from html_nodes and populates table

I am trying to install the package C50 for R without success.I use R via Anaconda but the package is not available in the Anaconda environment so I have tried to install it directly from RStudio.

RStudio version is : Version 1.1.456
R version is : 3.6.1 (2019-07-05)
Anaconda version is :1.9.7
MacOS : Catalina 10.15.3

I have both used the standard install.packages("C50") as well as the direct installation (using devtools) from the GitHub repository.Searching around it seems that this is an issue related to the SDK headers (? I don't know what that is) and the evolution of the directories in MacOs system over time.

One online search led me here (but again I don't know how to move forward):

https://github.com/conda-forge/compilers-feedstock/issues/11

I reproduce below the message I get when trying to install the package :

* installing *source* package ‘Cubist’ ...** package ‘Cubist’ successfully unpacked and MD5 sums checked** using staged installationchecking for gcc... x86_64-apple-darwin13.4.0-clangchecking whether the C compiler works... noconfigure: error: in `/private/var/folders/sh/hq44lqs10677_vvkvxq01yvh0000gn/T/Rtmpk04dKp/R.INSTALL5020659d63fc/Cubist':configure: error: C compiler cannot create executablesSee `config.log' for more detailsERROR: configuration failed for package ‘Cubist’* removing ‘/opt/anaconda3/lib/R/library/Cubist’* restoring previous ‘/opt/anaconda3/lib/R/library/Cubist’Warning in install.packages :  installation of package ‘Cubist’ had non-zero exit status* installing *source* package ‘C50’ ...** package ‘C50’ successfully unpacked and MD5 sums checked** using staged installationchecking for gcc... x86_64-apple-darwin13.4.0-clangchecking whether the C compiler works... noconfigure: error: in `/private/var/folders/sh/hq44lqs10677_vvkvxq01yvh0000gn/T/Rtmp2NjIsn/R.INSTALL50b47eac3131/C50':configure: error: C compiler cannot create executablesSee `config.log' for more details.ERROR: configuration failed for package ‘C50’* removing ‘/opt/anaconda3/lib/R/library/C50’Warning in install.packages :  installation of package ‘C50’ had non-zero exit statusThe downloaded source packages are in‘/private/var/folders/sh/hq44lqs10677_vvkvxq01yvh0000gn/T/RtmpdUmDxS/downloaded_packages’Updating HTML index of packages in '.Library'Making 'packages.html' ... done

Another hint is to look in the config.log but I don't know how to find it.

I don't know how to interpret this message and any hint would be great.

↧

How do I apply a function to specific columns in a dataframe and replace the original columns?

April 3, 2020, 4:02 am

≫ Next: How to convert datetime column to dd/mm/yyyy format in R

≪ Previous: C50 package installation error on Mac: C compiler cannot create executables

I have got a large dataframe containing medical data (my.medical.data).
A number of columns contain dates (e.g. hospital admission date), the names of each of these columns end in "_date".
I would like to apply the lubridate::dmy() function to the columns that contain dates and overwrite my original dataframe with the output of this function.
It would be great to have a general solution that can be applied using any function, not just my dmy() example.

Essentially, I want to apply the following to all of my date columns:

my.medical.data$admission_date <- lubridate::dmy(my.medical.data$admission_date)my.medical.data$operation_date <- lubridate::dmy(my.medical.data$operation_date)etc.

I've tried this:

date.columns <- select(ICB, ends_with("_date"))date.names <- names(date.columns)date.columns <- transmute_at(my.medical.data, date.names, lubridate::dmy)

Now date.columns contains my date columns, in the "Date" format, rather than the original factors. Now I want to replace the date columns in my.medical.data with the new columns in the correct format.

my.medical.data.new <- full_join(x = my.medical.data, y = date.columns)

Now I get:

Error: cannot join a Date object with an object that is not a Date object

I'm a bit of an R novice, but I suspect that there is an easier way to do this (e.g. process the original dataframe directly), or maybe a correct way to join / merge the two dataframes.

↧

How to convert datetime column to dd/mm/yyyy format in R

April 3, 2020, 4:03 am

≫ Next: Subset rows in data frame based on lag and lead range relative to indicated row

≪ Previous: How do I apply a function to specific columns in a dataframe and replace the original columns?

I have a dataframe consisting 4 columns:

ID      Name       response      datetimea-1     abc        xyz           2020-01-05 00:00:00a-2     abc        xyz           2020-01-06 00:00:00a-3     abc        xyz           2020-01-07 00:00:00

I want to convert only datetime column to dd/mm/yyyy format:

Required Result:

ID      Name       response      datetimea-1     abc        xyz           05/01/2020a-2     abc        xyz           06/01/2020a-3     abc        xyz           07/01/2020

I have tried this:

df2<-mutate(df,as.Date(as.POSIXct(df$datetime, format="%d-%m-%Y")))

↧

Subset rows in data frame based on lag and lead range relative to indicated row

April 3, 2020, 4:06 am

≫ Next: Lapack routine dgesv: system is exactly singular: U[6,6] = 0

≪ Previous: How to convert datetime column to dd/mm/yyyy format in R

I have a data frame where i indicates particular rows with "1" (see df1). I would like to subset the data frame including rows where i = 1 as well all 2 rows "before" (lag1 and lag2) as well as 2 rows "after" (lead1 and lead2) the row indicated by i (see example df2). Two rows are just to illustrate the question - I would also be able to use the code to subset, e.g. 4 "before" and 4 rows "after" each i = 1.

df1 <- data.frame(i =(0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0),                  values = c(9,74,30,81,14,88,62,9,32,64,3,56,84,61,95,97,65,76,31,33,56,69,77,81,80))df2 <- data.frame(i = c(0,0,1,0,0,0,0,1,0,0,0,1,0,0),                  values = c(9,32,64,3,56,95,97,65,76,31,33,56,69,77))

Thank you so much.

↧

Lapack routine dgesv: system is exactly singular: U[6,6] = 0

April 3, 2020, 4:15 am

≫ Next: How to identify subnetworks in adjacency matrix?

≪ Previous: Subset rows in data frame based on lag and lead range relative to indicated row

I am trying to run the code below in order to simulate a set of P-values using a generalised linear model

However, I keep getting the error: Lapack routine dgesv: system is exactly singular: U[6,6] = 0

Here is the code I am trying to run:

#which_p_value = "x1"which_p_value = "groupcategory"#which_p_value = "x1:groupcategory"run_anova = FALSE simulate_mixed_effect = TRUE mixed_effect_sd = 3.094069mixed_effect_sd_slope = 3.098661library(tidyverse)n_people <- c(2,5,10,15,20)coef1 <- 1.61coef2 <- -0.01#coef3 <- 5#coef4 <- 0g1 = 0g2 = 1g3 = 2 distances <- c(60,90,135,202.5,303.75,455.625)/100n_trials <- 35oneto1000 <- 25n_track_lengths <- length(distances)groupcategory = c(rep(g1, n_track_lengths), rep(g2, n_track_lengths),rep(g3,n_track_lengths))z = c(n_people)emptydataframeforpowerplots = NULLcoef3s <- c(-5, -4, -3, -2,-1, 0, 1, 2, 3, 4, 5)coef4s <- c(-1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1)Datarray <- array(dim=c(length(coef3s), length(coef4s),length(n_people)))coef3_counter =1for (coef3 in coef3s) {  coef4_counter =1  for (coef4 in coef4s) {    z1_g2 <- coef1 + coef2*distances + coef3*g2 + coef4*g2*distances    z1_g3 <- coef1 + coef2*distances + coef3*g3 + coef4*g3*distances    d = NULL    pr1 = 1/(1+exp(-z1_g2))    pr2 = 1/(1+exp(-z1_g3))    counter=1    for (i in n_people) {      for (j in 1:oneto1000){        df <- c()        for (k in 1:i){          # random effect from drawing a random intercept with sd = x          if (simulate_mixed_effect){            coef1_r = rnorm(1, mean=coef1, sd=mixed_effect_sd)            coef2_r = rnorm(1, mean=coef1, sd=mixed_effect_sd_slope)          } else {            coef1_r = coef1            coef2_r = coef2          }          z_g1 <- coef1_r + coef2*distances + coef3*g1 + coef4*g1*distances          pr = 1/(1+exp(-z_g1))          z1_g2 <- coef1_r + coef2*distances + coef3*g2 + coef4*g2*distances          pr1 = 1/(1+exp(-z1_g2))          if (run_anova) {            df <- rbind(df, data.frame(x1 = c(rep(distances, 3)),                                     y = c(rbinom(n_track_lengths,n_trials,pr), rbinom(n_track_lengths,n_trials,pr1),rbinom(n_track_lengths,n_trials,pr2)),                                     groupcategory = groupcategory, id = c(rep(k,18))))          } else { # this is for glmer data organisation            for (m in 1:n_trials) {            df <- rbind(df, data.frame(x1 = c(rep(distances, 3)), y = c(rbinom(n_track_lengths,1,pr),rbinom(n_track_lengths,1,pr1),rbinom(n_track_lengths,1,pr2)),groupcategory = groupcategory,id = c(rep(k,18))))            }          }        }        if (run_anova) {          #df_aov <- aov(y~x1*groupcategory+Error(id/(x1*groupcategory)),data=df)          #df_aov_sum <- summary(df_aov)          #pvalue <- df_aov_sum[[5]][[1]][which_p_value,"Pr(>F)"]          df_aov <- aov(y~x1*groupcategory+Error(id),data=df)          df_aov_sum <- summary(df_aov)          pvalue <- df_aov_sum[[2]][[1]][which_p_value, "Pr(>F)"]        } else { # glmer          mod_group_glmer <-  glmer(y ~ x1 + groupcategory + (1+x1|id), data = df, family = "binomial")          sum <- summary(mod_group_glmer)          pvalue <- sum$coefficients[which_p_value, "Pr(>|z|)"]        }        d = rbind(d,data.frame(pvalue))      }      count <- plyr::ldply(d,function(c) sum(c<=0.05))      Datarray[coef3_counter,coef4_counter,counter] <- count$V1/oneto1000      counter = counter +1      d = NULL    }    coef4_counter = coef4_counter + 1  }  coef3_counter = coef3_counter + 1}

Below is the script from the debugger:

Lapack routine dgesv: system is exactly singular: U[6,6] = 08. stopifnot(length(value <- as.numeric(value)) == 1L)7. nM$newf(fn(nM$xeval()))6. (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf, n), control = list()) { n <- length(par) ...5. do.call(optfun, arglist)4. withCallingHandlers(do.call(optfun, arglist), warning = function(w) { curWarnings <<- append(curWarnings, list(w$message)) })3. optwrap(optimizer, devfun, start, rho$lower, control = control, adj = adj, verbose = verbose, ...)2. optimizeGlmer(devfun, optimizer = control$optimizer[[2]], restart_edge = control$restart_edge, boundary.tol = control$boundary.tol, control = control$optCtrl, start = start, nAGQ = nAGQ, verbose = verbose, stage = 2, calc.derivs = control$calc.derivs, use.last.params = control$use.last.params)1. glmer(y ~ x1 + groupcategory + (1 + x1 id), data = df, family = "binomial")

Would anybody be able to give a helping hand as to how I can proceed from here?

↧

How to identify subnetworks in adjacency matrix?

April 3, 2020, 4:18 am

≫ Next: Display two tables in one tab

≪ Previous: Lapack routine dgesv: system is exactly singular: U[6,6] = 0

I have a network graph "G" based on the following edges:

library(igraph)edges <- data.frame(   from=c(1,1,4,4,4,5,5,6),  to=  c(2,3,5,6,7,6,7,7))G <- graph_from_data_frame(d=edges,  directed=F)

This example clearly contains 2 subnetworks, the first with nodes 1,2,3 and the second one with nodes 4,5,6,7. I would like to:

Identify to which subnetwork node "i" belongs to.
The number of nodes in each subnetwork.

Thus, in this example, the function will ideally create an object with as many rows as number of nodes in G, and two columns: the first contains a vector that indicatse the ID of the subnetwork and the second with the size (gsize) of the subnetwork. .

 result <- data.frame(   ID=c(1,1,2,2,2,2,2,2),  gsize=c(3,3,3,4,4,4,4,4))G <- graph_from_data_frame(d=edges,  directed=F)

I am new using igraph so maybe there is a function to do this.

↧

Display two tables in one tab

April 3, 2020, 4:19 am

≫ Next: Calculating time duration in decimal days using R, with missing dates in the time series

≪ Previous: How to identify subnetworks in adjacency matrix?

I am trying to get two tables in one page on the top I want to display the head of the data table which we upload in the bottom I have written a small function name basic_eda where it will just give the summary of the data which was uploaded. By running this i can display the head but I am not able to display the summary I am getting the error

default method not implemented for type 'symbol'

I tried different ways to keep renderTable but wasn't successful in getting the result that I want.

library(shiny)library(shinydashboard)library(Hmisc)library(funModeling)library(tidyverse)library(dplyr)basic_eda <- function(data){  glimpse(data)  df_status(data)  freq(data)  profiling_num(data)}ui <- dashboardPage(  dashboardHeader(title = "Testing"),  dashboardSidebar(    sidebarMenu(      menuItem("Data Description", tabName = "Data Description", icon = icon("database")),      menuItem("Exproratory Data Analysis", tabName = "Exproratory Data Analysis", icon = icon("dashboard"))    )  ),  dashboardBody(    sidebarLayout(      sidebarPanel(        width = 3,        # Input: Select a file ----        fileInput("file1", "Choose CSV File",                  multiple = FALSE,                  accept = c("text/csv","text/comma-separated-values,text/plain",".csv",options(shiny.maxRequestSize=100*1024^2)),width = 200),        # Horizontal line        tags$hr(),        checkboxInput("header", "Header", TRUE),        radioButtons("sep", "Separator",                     choices = c(Comma = ",",                                 Semicolon = ";",                                 Tab = "\t"),                     selected = ","),        radioButtons("quote", "Quote",                     choices = c(None = "","Double Quote" = '"',"Single Quote" = "'"),                     selected = '"'),        # Input: Select number of rows to display ----        radioButtons("disp", "Display",                     choices = c(Head = "head",                                 All = "all"),                     selected = "head"),        radioButtons("Sum", "Summary",                     choices = c(Summary = "summary"),                     selected = "summary"),      ),      mainPanel(        # Output: Data file ----        tableOutput("contents"),        tableOutput("testing")      )    )  ))server <- function(input, output, session) {  output$contents <- renderTable({    req(input$file1)    tryCatch(      {        df <- read.csv(input$file1$datapath,                       header = input$header,                       sep = input$sep,                       quote = input$quote)      },      error = function(e) {        stop(safeError(e))      }    )    if(input$disp == "head") {      return(head(df,10))     }    else {      return(df)    }  })  tags$hr()  output$testing <-  renderTable({    df<- df    if(input$Sum == "summary") {      return(basic_eda(df))     }    else {      return(df)    }  })}shinyApp(ui, server)

↧

Calculating time duration in decimal days using R, with missing dates in the time series

April 3, 2020, 4:20 am

≫ Next: get connected components using igraph in R

≪ Previous: Display two tables in one tab

I have some animal tracking data of which I need to calculate the total number of decimal days each individual was tracked.

My data is in the following format, with animal id, x and y coordinates for location fixes and datetime for when the individual was found. The time zone is Eastern Standard Time (GMT-5).

id  x       y       datetime1   291693  1977345 2019-05-16 10:07:002   291693  1977345 2019-05-24 12:48:003   291693  1977345 2019-06-01 10:51:00

My date and time as you can see is in yyyy-mm-dd hh:mm:ss format.

My question is - which is the simplest way in R to calculate the total duration between datetimes excluding the missing dates & times between recorded location fixes?Apologies if this is already been asked, I could find information on duration calculation but not with missing dates within this sort of time series specifically, but I imagine the answer is fairly straightforward!

↧

get connected components using igraph in R

April 3, 2020, 4:22 am

≫ Next: How to plot graph using plot function inside a function using if loop in R

≪ Previous: Calculating time duration in decimal days using R, with missing dates in the time series

I would like to find all the connected components of a graph where the components have more than one element.

using the clusters gives the membership to different clusters and using cliques does not give connected components.

This is a follow up from

multiple intersection of lists in R

My main goal was to find all the groups of lists which have elements in common with each other.

Thanks in advance!

↧

How to plot graph using plot function inside a function using if loop in R

April 3, 2020, 8:27 am

≫ Next: Unable to install "tidytext" and "jasonlite" packages in rstudio

≪ Previous: get connected components using igraph in R

crime.value = 1crime.bar <- function() {  bp<-ggplot(df_category, aes(x=Category, y=Frequency, fill=Category)) + geom_bar(stat="identity") +    theme(axis.text.x=element_blank())  bp}  if (crime.value == 1) {    crime.bar()  }

The Plot function works when i call crime.bar() seperately. but when called through a if loop it doesn't work. please help.

↧