Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all 209583 articles
Browse latest View live

Why does `substitute` work in multiple lines, but not in a single line?

$
0
0

I was attempting to answer this nice question about creating a non-standard evaluating function for a data.table object, doing a grouped sum. Akrun came up with a lovely answer which I'll simplify here:

akrun <- function(data, var, group){
 var <- substitute(var)
 group <- substitute(group)
 data[, sum(eval(var)), by = group]
}

library(data.table)
mt = as.data.table(mtcars)
akrun(mt, cyl, mpg)
#    group    V1
# 1:     6 138.2
# 2:     4 293.3
# 3:     8 211.4

I was also working on an answer, and had close to the same answer, but with the substitutes inline with the rest. Mine results in an error:

gregor = function(data, var, group) {
  data[, sum(eval(substitute(var))), by = substitute(group)]
} 

gregor(mt, mpg, cyl)
# Error in `[.data.table`(data, , sum(eval(substitute(var))), by = substitute(group)) : 
#  'by' or 'keyby' must evaluate to vector or list of vectors 
#  (where 'list' includes data.table and data.frame which are lists, too) 

At its face, my function is a simple substitution of Akrun's. Why doesn't it work?


Note that both substitutions cause problems, as shown here:

gregor_1 = function(data, var, group) {
  var = substitute(var)
  data[,sum(eval(var)), 
       by = substitute(group)]
} 
gregor_1(mt, mpg, cyl)
# Same error as above


gregor_2 = function(data, var, group) {
  group = substitute(group)
  data[,sum(eval(substitute(var))), 
       by = group]
} 
gregor_2(mt, mpg, cyl)
# Error in eval(substitute(var)) : object 'mpg' not found 

Error when installing mnlogit: "object index is not exported by namespace:mnlogit"

$
0
0

I'm attempting to install the multivariable logistic regression package mnlogit on R Studio 3.5.3

When I execute

install.packages("mnlogit")

The package successfully unpacks, but the installation fails and I get the following error message:

Error : object 'index' is not exported by 'namespace:mlogit'

I attempted to correct the problem by running

update.packages()

as suggested on this post, which dealt with the same error message associated with a different package and function,

R - pbkrtest, “Error : object ‘sigma’ is not exported by 'namespace:stats'”

However, the same error persists, even after reinstalling mlogit, MASS, and other dependencies. Is there anything else I can try to correct the error and complete the installation?

Add line graph based on new data to series of boxplots

$
0
0

I have used the following R script to create two side-side boxplots. One for 1999 and one for 2008:

library(tidyverse)
mpg %>% ggplot(aes(as_factor(year), hwy))+geom_boxplot()

I have a new data set for manufacturer XYZ that has two observations,one for 1999 and one for 2008:

manufacturer <- c("xyz", "xyz")
year <- c(1999, 2008)
hwy <- c(19, 30)
df <- data.frame(manufacturer, year, hwy)

Is there a simple way to adding the two observations from the new data set (df) in my boxplot graph? I have seen a few other similar posts (e.g., ggplot: adding new data to the existing grouped boxplot) but the problems/solutions seem to be more complicated and I could not follow them. Thanks

How to expand ggplot y axis limits to include maximum value

$
0
0

Often in plots the Y axis value label is chopped off below the max value being plotted.

For example:

library(tidyverse)
mtcars %>% ggplot(aes(x=mpg, y = hp))+geom_point()

I know of scale_y_continous - but I can't figure out a smart way to do this. Maybe I'm just overthinking things. I don't wish to mess up the 'smart' breaks that are generated automatically.

I might try to go about this manually...

  mtcars  %>% ggplot(aes(x=mpg, y=hp, color=as.factor(carb)))+geom_point()  + scale_y_continuous(limits = c(0,375))

enter image description here

But this doesn't work like I mentioned above because of the 'smart breaks'. Is there anyway for me to extend the default break interval to 1 more, so that in this case it would be 400? Of course I would want this to be flexible for whatever dataset I am working with.

dplyr sample_n from a single group

$
0
0

I have some data where the summary of the number of observations looks like:

# A tibble: 14 x 3
# Groups:   status [2]
   status  year     n
    <dbl> <dbl> <int>
 1      0  2010  4593
 2      0  2011 10990
 3      0  2012 27711
 4      0  2013 99989
 5      0  2014 95407
 6      0  2015 89010
 7      0  2016 72289
 8      1  2010   584
 9      1  2011   785
10      1  2012   640
11      1  2013   667
12      1  2014   377
13      1  2015   460
14      1  2016   104

Where the class of one group is signficantly higher than the class of another group. How can I randomly sample the class of 0 without doing anything to the class of 1. That is, I would like to keep all class 1 observations and randomly sample the class 0 observations by 4593 (which is the minimum number of observations for that year)

Using group_by(status, year) and then sample_n() doesn't work since the 4593 value is greater than the values in the class 1 group.

Some random sample of my data:

    structure(list(status = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 
    year = c(2013, 2014, 2012, 2013, 2016, 2013, 2015, 2014, 
    2013, 2016, 2015, 2016, 2011, 2014, 2016, 2012, 2013, 2012, 
    2014, 2014, 2012, 2012, 2012, 2016, 2016, 2012, 2016, 2015, 
    2013, 2014, 2015, 2013, 2015, 2015, 2014, 2015, 2011, 2014, 
    2013, 2012, 2011, 2016, 2015, 2015, 2015, 2014, 2012, 2013, 
    2015, 2012, 2015, 2016, 2015, 2013, 2014, 2014, 2014, 2013, 
    2013, 2016, 2016, 2013, 2015, 2012, 2014, 2014, 2013, 2015, 
    2014, 2016, 2016, 2014, 2012, 2016, 2013, 2010, 2011, 2014, 
    2016, 2013, 2016, 2014, 2014, 2013, 2013, 2013, 2016, 2016, 
    2012, 2014, 2013, 2015, 2016, 2013, 2013, 2015, 2013, 2014, 
    2013, 2015, 2013, 2013, 2011, 2014, 2016, 2013, 2010, 2012, 
    2014, 2012, 2011, 2011, 2013, 2015, 2014, 2010, 2010, 2013, 
    2010, 2014, 2011, 2011, 2014, 2013, 2014, 2015, 2015, 2013, 
    2014, 2013, 2011, 2013, 2014, 2013, 2011, 2013, 2012, 2015, 
    2012, 2012, 2012, 2010, 2013, 2013, 2011, 2011, 2011, 2012, 
    2016, 2013, 2011, 2011, 2012, 2012, 2014, 2010, 2013, 2014, 
    2011, 2012, 2010, 2012, 2012, 2011, 2015, 2011, 2011, 2013, 
    2015, 2010, 2015, 2011, 2015, 2015, 2012, 2012, 2013, 2012, 
    2014, 2014, 2012, 2012, 2014, 2010, 2011, 2013, 2014, 2012, 
    2013, 2016, 2014, 2012, 2012, 2013, 2010, 2012, 2013, 2014, 
    2014, 2011)), groups = structure(list(status = c(0, 1), .rows = structure(list(
    1:100, 101:200), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr"))), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), row.names = c(NA, -200L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

labelling factors and keeping numerical values

$
0
0

I am having some issues with creating factors that I can refer to by both, the numeric value and the "label".

Supposedly the lfactors package does this, however I have been unable to execute it as such. So, this is what I did:

library(lfactors)
cars <- mtcars


str(cars)

'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

If we look at the "carb" column (probably reflecting carbon emissions), its a numerical

so using the lfactors package I transformed it:

cars$carb <- lfactor(c(1:4),
                     levels = c(1:4), 
                     labels = c("low", "medium", "high", "extreme" ))
str(cars)

'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: Factor w/ 4 levels "low","medium",..: 1 2 3 4 1 2 3 4 1 2 ..

I noticed that it changed to a factor, as per the package description, so I did my checks

levels(cars$carb) 
[1] "low""medium""high""extreme" # correct

cars$carb == "medium"
[1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
[23] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE # correct

cars$carb == 2  
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # incorrect

I still cant refer to the factor by levels and values, so I wondered if anyone has used this package before or has any suggestions for an alternative?

An close alternative, but not quite there

Even though it is not perfect, as I can not refer to the factors by value and label, I found an approach that at least allowed me to store both, which I thought might be useful for others in my position:

library(sjlabelled)
library(magrittr)
library(sjmisc)

cars <- mtcars
str(cars)

'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

frq(cars$carb)

x <numeric>
# total N=32  valid N=32  mean=2.81  sd=1.62
 val frq raw.prc valid.prc cum.prc
   1   7   21.88     21.88   21.88
   2  10   31.25     31.25   53.12
   3   3    9.38      9.38   62.50
   4  10   31.25     31.25   93.75
   6   1    3.12      3.12   96.88
   8   1    3.12      3.12  100.00
  NA   0    0.00        NA      NA

So this is what we get in its numeric form, the transformation to a factor retains the expected form:

cars$carb <- as_factor(cars$carb)

str(cars$carb)

Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...

frq(cars$carb)

<categorical>
# total N=32  valid N=32  mean=2.81  sd=1.62

 val frq raw.prc valid.prc cum.prc
   1   7   21.88     21.88   21.88
   2  10   31.25     31.25   53.12
   3   3    9.38      9.38   62.50
   4  10   31.25     31.25   93.75
   6   1    3.12      3.12   96.88
   8   1    3.12      3.12  100.00
  NA   0    0.00        NA      NA

Now we have it in categorical form, we can label the values (in this example i'll ignore 6 & 8)

cars$carb<- set_labels(
  cars$carb,
  labels = c(
    `1` = "low",
    `2` = "medium", 
    `3` = "high",
    `4` = "extreme"
    ))

frq(cars$carb)

<categorical>
# total N=32  valid N=32  mean=2.81  sd=1.62

 val   label frq raw.prc valid.prc cum.prc
   1     low   7   21.88     21.88   21.88
   2  medium  10   31.25     31.25   53.12
   3    high   3    9.38      9.38   62.50
   4 extreme  10   31.25     31.25   93.75
   6       6   1    3.12      3.12   96.88
   8       8   1    3.12      3.12  100.00
  NA    <NA>   0    0.00        NA      NA

Now we can see both, the label and the value, however, there is still an issue calling the data based on the labels

cars[cars$carb==1,]
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1

cars[cars$carb=="low",]

 [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
<0 rows> (or 0-length row.names)

Any advice regarding factor labelling and a way to call on factors via both the labels and values would really be appreciated. And in the meantime, I hope my alternative helps.

Creating new column based on repeated consecutive row entries

$
0
0

Imagine a snippet of the follow data frame:

       ID        ActivityName     Time         Type    Shape 
1       1             Request    0.000       Type_1      767           
2       1             Request  600.000       Type_1      767           
3       1               Start  600.000       Type_1     1376           
4       1               Start  600.000       Type_1     1376  
5       1               Delay 2962.295       Type_1     1022         
6       1 Schedule Activities  600.000       Type_1       15           
7       1 Schedule Activities 2062.295       Type_1       15  

What I'm trying to do is to create two new columns based on the repeating entries in ActivityName.

Specifically, I want to combine two subsequent rows for the same activity into one row with a start and complete timestamp (from Time, in seconds).

Given that not all entries in ActivityName have a matching second entry (however, max two consecutive entries are identical), I would also like to delete such rows, as Delay in this case.

It should then look something like:

       ID        ActivityName  StartTime   EndTime      Type    Shape 
1       1             Request  0.000       600.000      Type_1  767           
2       1               Start  600.000     600.000      Type_1  1375
3       1 Schedule Activities  600.000     2062.295     Type_1  15

All categories in ActivityName occur many times in that column. I hope not to compare their associated Time not across the whole column, only those who have two consecutive identical occurrence.

Any ideas for how to go about this will be highly appreciated.

Outlook attachment saving script not working from Windows Command Prompt (works perfectly from RStudio)

$
0
0

Script Objective
Save attachment files (.xlsx) from two emails to a network share.

Issue
R Script below works perfectly well in RStudio but fails when run from Windows command prompt.

Error Message

< checkErrorInfo> 80020009
No support for InterfaceSupportsErrorInfo
checkErrorInfo -2147352567
Error: Exception occurred.
Execution halted

R Code

cat("\n##########################################################################################",
    "\n", format(Sys.time(), "%a %m/%d/%Y %X"), ": Starting Daily Attachment Save Script...",
    "\n##########################################################################################")

cat("\n", format(Sys.time(), "%a %m/%d/%Y %X"), ": Setting variables/constants/functions... ")

  library("RDCOMClient")
  library("lubridate")

  TodaysDate  <- Sys.Date()
  StartTime   <- Sys.time()
  currentUser <- Sys.getenv("USERNAME")
  tmpDir      <- Sys.getenv("TEMP")
  report_run_date_main <- Sys.Date()

  SendMail <- function(outFile,
                       ToEmail,
                       EmailSubject,
                       EmailBody,
                       AttachFile = TRUE,
                       CC = "me@me.com;"){

    OutApp <- COMCreate("Outlook.Application")
    outMail = OutApp$CreateItem(0)

    outMail[["sentonbehalfofname"]] = "me@me.com"
    outMail[["To"]] = ToEmail
    outMail[["CC"]] = CC
    outMail[["subject"]] = EmailSubject
    outMail[["HTMLBody"]] = paste0("<p>**** SYSTEM GENERATED EMAIL ****</p><br><p>", EmailBody, "</p>", sep = "")
    if (AttachFile == TRUE) {
      outMail[["Attachments"]]$Add(outFile)
    }
    outMail$Send()
    outMail <- NULL
    OutApp <- NULL
  }

  outlook_app <- COMCreate("Outlook.Application")
  search <- outlook_app$AdvancedSearch(
    "Inbox", paste0("(urn:schemas:httpmail:subject like 'EMAIL SUBJECT TO SEARCH%')")
  )
  results <- search$Results()

  save_folder <- paste0(Sys.getenv("USERPROFILE"), "\\Documents")

  get_attachment_names <- function(email) {
    number_attachments = email$Attachments()$Count()
    if (number_attachments == 0) {
      return("")
    }
    attachments <- purrr::map(
      seq(number_attachments), 
      function(x) email$Attachments(x)$FileName()
    )
    return(paste(attachments, sep = ", "))
  }

  msg1 <- ""
  msg2 <- ""

cat("done.")

cat("\n", format(Sys.time(), "%a %m/%d/%Y %X"), ": Saving attachemnts from daily emails... ") 
  for (i in 1:results$Count()) {
    # Error on next line, when run using command line
    gmt_date <- results$Item(i)$ReceivedTime()
    date_received <- as.POSIXct(gmt_date * (24 * 60 * 60), origin="1899-12-30", tz="GMT")

    date_received_dt <- format(date_received, "%Y-%m-%d")
    date_received_tm <- format(date_received, "%H:%M:%S %P")
    date_received_hr <- format(date_received, "%H")

    if(date_received >= Sys.Date()-1) {
      if(date_received_dt==Sys.Date()-1){
        if(date_received_hr>=16){
          email <- results$Item(i)
          attachment_file <- get_attachment_names(email)
          email$Attachments(1)$SaveAsFile(paste0(save_folder, attachment_file))
          msg1 <- paste(format(Sys.time(), "%a %m/%d/%Y %X"), 
                        "<br><br><b>Attachment name:</b>", attachment_file ,
                        "<br><b>Email received on:</b>", date_received, 
                        "<br><b>Subject:</b>", email$Subject(),
                        "<br><b>Save folder:</b>", save_folder)
        }
      } else{
        email <- results$Item(i)
        attachment_file <- get_attachment_names(email)
        email$Attachments(1)$SaveAsFile(paste0(save_folder, attachment_file))
        msg2 <- paste(format(Sys.time(), "%a %m/%d/%Y %X"), 
                      "<br><br><b>Attachment name:</b>", attachment_file ,
                      "<br><b>Email received on:</b>", date_received, 
                      "<br><b>Subject:</b>", email$Subject(),
                      "<br><b>Save folder:</b>", save_folder)
      }
    }
  }
cat("done.")


cat("\n", format(Sys.time(), "%a %m/%d/%Y %X"), ": Notifying users of attachment save status... ")
  if((nchar(msg1)==0)){
    SendMail(outFile = "",
             ToEmail = "me@me.com",
             EmailSubject = paste("[Action Required] Daily Attachment Save Error", Sys.Date()),
             EmailBody = paste("Helpful text 1 for troubleshooting"),
             AttachFile = F, CC = "")
  } else {
    SendMail(outFile = "",
             ToEmail = "users@many.com",
             EmailSubject = paste("Daily attachment saved", Sys.Date()),
             EmailBody = paste(msg1),
             AttachFile = F, CC = "")
  }

  if(nchar(msg2)==0){
    SendMail(outFile = "",
             SendMail(outFile = "",
             ToEmail = "me@me.com",
             EmailSubject = paste("[Action Required] Daily Attachment Save Error", Sys.Date()),
             EmailBody = paste("Helpful text 2 for troubleshooting"),
             AttachFile = F, CC = "")
  } else {
    SendMail(outFile = "",
             ToEmail = "users@many.com",
             EmailSubject = paste("Daily attachment saved", Sys.Date()),
             EmailBody = paste(msg2),
             AttachFile = F, CC = "")
  }
cat("done.")

cat("\n", format(Sys.time(), "%a %m/%d/%Y %X"), ": Garbage Collection... ")
  rm(list=ls())
cat("done.")

cat("\n########################################################################################",
    "\n", format(Sys.time(), "%a %m/%d/%Y %X"), ": Daily Attachment Save Script complete.",
    "\n########################################################################################")

R Script Execution Command

C:\Progra~1\R\R-3.4.4\bin\i386\Rscript.exe --no-save --no-restore --verbose C:\Attachments_Save_Script.R >"C:\LOGS\L%date:~-4,4%%date:~-7,2%%date:~-10,2%%time:~0,2%%time:~3,2%%time:~6,2%.log" 2>"C:\LOGS\E%date:~-4,4%%date:~-7,2%%date:~-10,2%%time:~0,2%%time:~3,2%%time:~6,2%.log"

Version Info
Windows 10 Pro 64 bit (i7 vPro with 16 GB RAM)
R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Microsoft Outlook 2013 (15.0.5172.1000) MSO (15.0.5172.1000) 32-bit
RStudio 1.0.143

Ask
Has anyone come across this issue with RDCOMClient library at the command line? Is there a fix? Thank you.


include all values in ggplot

$
0
0

I'm making a graph in ggplot, and it cuts off my data

`ggplot(hhcomp, aes(x=utility, y=consumption))+
      xlim(0,16)+ylim(0,16)+          
      labs(x = "leisure(hours)",y="counsumption(units)")+
                  geom_line(aes(x = leisure, y = consumption,expand=TRUE))+
                  geom_line(aes(x = utilityc, y = consumption))+
                  geom_line(aes(x = leisure1, y = consumption1))+
                  geom_line(aes(x = utilityc1, y = consumption))`

How do I include all the data points, so that my lines go to the edge of my graph?

How to create a combination of k elements between n in Rcpp?

$
0
0

Good afternoon ,

We know that in R , we can retrieve all possible combinations of k elements between A = { 1 , 2 , ... , n } in that manner :

Example : A = { 1 , 2, ,3 ,4 ,5 } and K = 3

> C_wo <- combn(1:5, 3)
> C_wo
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    1    1    1    1    2    2    2     3
[2,]    2    2    2    3    3    4    3    3    4     4
[3,]    3    4    5    4    5    5    4    5    5     5

My question :

Is there any built-in function for Creating those combinations in rcpp ?

Thank You in advance !

Credentials when using "Remotes" in R package Description

$
0
0

I want to include the dependency on an R library which is available at a private gitlab project. To include it I use the the "Remotes:" tag in the DESCRIPTION file. Automatically installing the package fails naturally due to the missing credentials.

Is it possible to automatically ask for credentials while installing the package via the DESCRIPTION file?

The DESCRIPTION file looks like this:

Title: Some Title
Version: 1
Description: What the package does (one paragraph).
Depends: R (>= 3.4.4)
Imports:
    theRemotePackage
Remotes: url::https://path/to/the/gitlab/repo/theRemotePackage
LazyData: true

Can I add a line in this file to make it ask for credentials of theRemotePackage?

Thank you!

ggplot cuts plot in half?

$
0
0

I'm struggling to plot a graph on ggplot, it looks like it's only processed 1/4 of the map, and has left out the remaining 3/4 on the right.

Plot that I've made

Here is the code I've used to plot the image.

 ggplot(eu1) +
   geom_sf(aes(colour = conct), size=1, alpha = 0.4) +
   scale_colour_distiller(palette = "Spectral")

DATA

    conct        coord
1 40.00016  c(509760,179380)
2 40.00018 c(509440,179480) 
3 40.00018 c(512720, 175240)
4 40.00044 c(504620, 175100) 

I have a large dataframe with over 70,000 variables.

Even when I extend the x-axis, the data points are missing from it. Does anyone know where I've gone wrong?

Many thanks.

Looking for an R script that can do the following: if value x of column A contains certain string then write y in a new column B

$
0
0

I'm trying to create a new variable in R based on another variable. Basically what I want to do is: if value x in column A contains a specific string of characters, then (in a new column) write y.

So for example, in my dataset, I have a variable called "condition" which has the following unique values:

[1] "05_CL_dom""16_CF_sub""02_CL_sub""01_CL_dom""19_CF_dom""14_CL_sub""17_CL_dom""10_CL_sub""09_CL_dom""15_CF_dom"
[11] "18_CL_sub""06_CL_sub""20_CF_sub""12_CF_sub""04_CF_sub""08_CF_sub""24_CF_sub""11_CF_dom""21_CL_dom""22_CL_sub"
[21] "23_CF_dom""07_CF_dom""03_CF_dom""13_CL_dom""06_CL_dom""07_CF_sub""16_CF_dom""03_CF_sub""12_CF_dom""11_CF_sub"
[31] "17_CL_sub""14_CL_dom""21_CL_sub""20_CF_dom""22_CL_dom""19_CF_sub""01_CL_sub""08_CF_dom""15_CF_sub""13_CL_sub"
[41] "18_CL_dom""09_CL_sub""05_CL_sub""23_CF_sub""10_CL_dom""24_CF_dom""04_CF_dom""02_CL_dom""18_CF_sub""05_CF_dom"
[51] "02_CF_sub""03_CL_dom""21_CF_dom""01_CF_dom""12_CL_sub""19_CL_dom""20_CL_sub""23_CL_dom""16_CL_sub""22_CF_sub"
[61] "04_CL_sub""11_CL_dom""10_CF_sub""06_CF_sub""14_CF_sub""07_CL_dom""15_CL_dom""08_CL_sub""24_CL_sub""13_CF_dom"
[71] "09_CF_dom""17_CF_dom""19_CL_sub""13_CF_sub""10_CF_dom""16_CL_dom""05_CF_sub""09_CF_sub""06_CF_dom""03_CL_sub"
[81] "08_CL_dom""14_CF_dom""17_CF_sub""21_CF_sub""01_CF_sub""24_CL_dom""23_CL_sub""22_CF_dom""07_CL_sub""18_CF_dom"
[91] "20_CL_dom""04_CL_dom""11_CL_sub""02_CF_dom""12_CL_dom""15_CL_sub"

Based on these values I want to create a new variable called "conditionNEW":
- if a value of condition contains CL_sub -> CLsub
- if a value of condition contains CL_dom -> CLdom
- if a value of condition contains CF_sub -> CFsub
- if a value of condition contains CF_dom -> CFdom

Right now, what I did is:

rawdata$conditionNEW[rawdata$condition == "01_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "02_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "03_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "04_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "05_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "06_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "07_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "08_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "09_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "10_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "11_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "12_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "13_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "14_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "15_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "16_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "17_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "18_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "19_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "20_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "21_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "22_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "23_CF_dom"] <- "CFdom"
rawdata$conditionNEW[rawdata$condition == "24_CF_dom"] <- "CFdom"

rawdata$conditionNEW[rawdata$condition == "01_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "02_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "03_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "04_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "05_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "06_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "07_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "08_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "09_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "10_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "11_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "12_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "13_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "14_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "15_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "16_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "17_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "18_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "19_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "20_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "21_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "22_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "23_CF_sub"] <- "CFsub"
rawdata$conditionNEW[rawdata$condition == "24_CF_sub"] <- "CFsub"

rawdata$conditionNEW[rawdata$condition == "01_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "02_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "03_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "04_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "05_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "06_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "07_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "08_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "09_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "10_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "11_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "12_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "13_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "14_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "15_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "16_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "17_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "18_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "19_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "20_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "21_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "22_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "23_CL_dom"] <- "CLdom"
rawdata$conditionNEW[rawdata$condition == "24_CL_dom"] <- "CLdom"

rawdata$conditionNEW[rawdata$condition == "01_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "02_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "03_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "04_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "05_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "06_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "07_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "08_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "09_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "10_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "11_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "12_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "13_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "14_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "15_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "16_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "17_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "18_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "19_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "20_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "21_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "22_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "23_CL_sub"] <- "CLsub"
rawdata$conditionNEW[rawdata$condition == "24_CL_sub"] <- "CLsub"

It works, but it's a lot of code for something that feels like something that should be rather simple. Is there anyone who knows how to do this with less code?

Insert or replace multiple matches of the same string with a running counter

$
0
0

I have a RIS (text) file that looks roughly likes this:

mylist <- c("TI  - a", "AU  - b", "ER  -", "",
          "TI  - c", "AU  - d", "ER  -", "",
          "TI  - e", "AU  - f", "ER  -")

I would like to insert a running ID tag as follows

mylist_with_ids <- c("TI  - a", "AU  - b", "ID  - 1", "ER  -", "",
                   "TI  - c", "AU  - d", "ID  - 2", "ER  -", "",
                   "TI  - e", "AU  - f", "ID  - 3", "ER  -")

My original approach was to write a stringr::str_replace loop, where I generate the ID list beforehand.

cc_id_replace <- paste0("ID  - ", 1:3, "\nER  -")
for (i in 1:3) {
  mylist_with_ids <- str_replace(mylist, "^ER  -", cc_id_replace[i])
}

Of course, this doesn't work for more than one reason. What might be a better way? (There exist many regex and multiple array questions, but I couldn't figure out an answer so far.)

Why is the model output from a fit with rjags and R2Jags different?

$
0
0

I'm working on fitting a multi-level logistic regression model with group level predictors. I am using JAGS via R. I am getting different behaviors when I fit the model with the runjags versus the R2Jags packages.

I've tried to write a reproducible example that shows the issue. Below, I simulate data from a binomial model, index the data to 8 plots and 2 blocks, and then fit a multi-level logistic regression to recover the success probabilities (b1 and b2) in the code below. Scroll to the bottom to see the summaries of the two fits.

My question is:

  1. Why are the posteriors from these two fits different? I am using the same data, a single model specification, and setting the random number generator before each. Why does the mean of the posteriors differ, and why are the Rhat values so different?
# -------------------------------------------------------------------
# Loading required packages
# -------------------------------------------------------------------
library(rjags) 
library(R2jags)
library(MCMCvis)

Package version information:

jags.version()
[1] ‘4.3.0’

R2jags_0.5-7   MCMCvis_0.13.5 rjags_4-10
# -------------------------------------------------------------------
# Simulate data
# -------------------------------------------------------------------
set.seed(10)

N.plots = 8
N.blocks = 2
trials=400

n = rep(100,trials)
N=length(n)
plotReps=N/N.plots
blockReps=N/N.blocks

# Block 1
b1<-rep(c(.25,.75,.9,.1),each=plotReps)-.05
# Block 2
b2<-rep(c(.25,.75,.9,.1),each=plotReps)+.05

y = rbinom(trials, 100, p = c(b1,b2))

# vectors indexing plots and blocks
plot = rep(1:8,each=plotReps)
block = rep(1:2,each=blockReps)

# pass data to list for JAGS
data = list(
  y = y,
  n = n,
  N = length(n),
  plot = plot,
  block= block,
  N.plots = N.plots,
  N.blocks = N.blocks
)
# -------------------------------------------------------------------
# Code for JAGS model
# -------------------------------------------------------------------

modelString <- "model { 
  ## Priors

  # hyperpriors
  mu.alpha ~ dnorm(0, 0.0001)

  sigma.plot ~ dunif(0,100) 
  tau.plot <- 1 / sigma.plot^2

  sigma.block ~ dunif(0,100) 
  tau.block <- 1 / sigma.block^2

  # priors 
  for(i in 1:N.plots){     
    eps.plot[i]~dnorm(0,tau.plot)
  }

  for(i in 1:N.blocks){
    eps.block[i]~dnorm(0,tau.block)
  }

  # Likelihood
  for(i in 1:N){
    logit(p[i]) <- mu.alpha + eps.plot[plot[i]] + eps.block[block[i]]
    y[i] ~ dbin(p[i], n[i])

  }
}"
# -------------------------------------------------------------------
# Initial values
# -------------------------------------------------------------------
# set inits for rjags
inits = list(list(mu.alpha = 0,sigma.plot=2,sigma.block=2),
             list(mu.alpha = 0,sigma.plot=2,sigma.block=2),
             list(mu.alpha = 0,sigma.plot=2,sigma.block=2)) 

# set inits function for R2jags
initsFun<-function(){list(
  mu.alpha=0,
  sigma.plot=2,
  sigma.block=2
)}
# -------------------------------------------------------------------
# Set JAGS parameters and random seed
# -------------------------------------------------------------------
# scalars that specify the 
# number of iterations in the chain for adaptation
# number of iterations for burn-in
# number of samples in the final chain
n.adapt = 500
n.update = 5000
n.iterations = 1000
n.thin = 1
parsToMonitor = c("mu.alpha","sigma.plot","sigma.block","eps.plot","eps.block")
# -------------------------------------------------------------------
# Call to JAGS via rjags
# -------------------------------------------------------------------
set.seed(2)
# tuning (n.adapt)
jm = jags.model(textConnection(modelString), data = data, inits = inits,
                n.chains = length(inits), n.adapt = n.adapt)

# burn-in (n.update)
update(jm, n.iterations = n.update)

# chain (n.iter)
samples.rjags = coda.samples(jm, variable.names = c(parsToMonitor), n.iter = n.iterations, thin = n.thin)
# -------------------------------------------------------------------
# Call to JAGS via R2jags
# -------------------------------------------------------------------
set.seed(2)
samples.R2jags <-jags(data=data,inits=initsFun,parameters.to.save=parsToMonitor,model.file=textConnection(modelString),
                      n.thin=n.thin,n.chains=length(inits),n.burnin=n.adapt,n.iter=n.iterations,DIC=T)
# -------------------------------------------------------------------
# Summarize posteriors using MCMCvis
# -------------------------------------------------------------------
sum.rjags <- MCMCvis::MCMCsummary(samples.rjags,params=c("mu.alpha","eps.plot","sigma.plot","sigma.block","eps.block"))
sum.rjags

sum.R2jags2 <- MCMCvis::MCMCsummary(samples.R2jags,params=c("mu.alpha","eps.plot","sigma.plot","sigma.block","eps.block"))
sum.R2jags2

Here is the output from an rjags fit:

                     mean         sd         2.5%         50%       97.5% Rhat n.eff
mu.alpha      0.07858079 21.2186737 -48.99286669 -0.04046538 45.16440893 1.11  4063
eps.plot[1]  -1.77570813  0.8605892  -3.45736942 -1.77762035 -0.02258692 1.00  2857
eps.plot[2]  -0.37359614  0.8614370  -2.07913650 -0.37581522  1.36611635 1.00  2846
eps.plot[3]   0.43387001  0.8612820  -1.24273657  0.42332033  2.20253810 1.00  2833
eps.plot[4]   1.31279883  0.8615840  -0.38750596  1.31179143  3.06307745 1.00  2673
eps.plot[5]  -1.34317034  0.8749558  -3.06843578 -1.34747145  0.44451006 1.00  2664
eps.plot[6]  -0.40064738  0.8749104  -2.13233876 -0.41530587  1.37910977 1.00  2677
eps.plot[7]   0.36515253  0.8738092  -1.35364716  0.35784379  2.15597251 1.00  2692
eps.plot[8]   1.71826293  0.8765952  -0.01057452  1.70627507  3.50314147 1.00  2650
sigma.plot    1.67540914  0.6244529   0.88895789  1.53080631  3.27418094 1.01   741
sigma.block  19.54287007 26.1348353   0.14556791  6.68959552 93.21927035 1.22    94
eps.block[1] -0.55924545 21.2126905 -46.34099332 -0.24261169 48.81435107 1.11  4009
eps.block[2]  0.35658731 21.2177540 -44.65998407  0.25801739 49.31921639 1.11  4457

and here is the output from an R2jags fit:

                   mean         sd         2.5%         50%       97.5% Rhat n.eff
mu.alpha     -0.09358847 19.9972601 -45.81215297 -0.03905447 47.32288503 1.04  1785
eps.plot[1]  -1.70448172  0.8954054  -3.41749845 -1.70817566  0.08187877 1.00  1141
eps.plot[2]  -0.30070570  0.8940527  -2.01982416 -0.30458798  1.46954632 1.00  1125
eps.plot[3]   0.50295713  0.8932038  -1.20985348  0.50458106  2.29271214 1.01  1156
eps.plot[4]   1.37862742  0.8950657  -0.34965321  1.37627777  3.19545411 1.01  1142
eps.plot[5]  -1.40421696  0.8496819  -3.10743244 -1.41880218  0.25843323 1.01  1400
eps.plot[6]  -0.45810643  0.8504694  -2.16755579 -0.47087931  1.20827684 1.01  1406
eps.plot[7]   0.30319019  0.8492508  -1.39045509  0.28668886  1.96325582 1.01  1500
eps.plot[8]   1.65474420  0.8500635  -0.03632306  1.63399429  3.29585024 1.01  1395
sigma.plot    1.66375532  0.6681285   0.88231891  1.49564854  3.45544415 1.04   304
sigma.block  20.64694333 23.0418085   0.41071589 11.10308188 85.56459886 1.09    78
eps.block[1] -0.45810120 19.9981027 -46.85060339 -0.33090743 46.27709625 1.04  1795
eps.block[2]  0.58896195 19.9552211 -46.39310677  0.28183123 46.57874408 1.04  1769

Here are trace plots for mu.alpha from the 2 fits. First, from the rjags fit:

Trace plot for mu.alpha from rjags fit

Second, from the R2jags fit:

Trace plot for mu.alpha from R2Jags fit


Is it possible to include NA for one missing value, and can I verify that each individual corresponds with my specified function

$
0
0

I have code here that generates a random spatial distribution of points, returns a distance column between every point and an infected individual and uses a function to calculate the probability of infection in the next time step. There are 60 hosts, one of which is infected. I would like to bind the values of Pi (which calculates infection probability) to my data frame with the original co-ordinates. Obviously one point is removed from the distance matrix, the infected individual. This value I would like to replace with NA in the main data frame as the next step in my code, and also to confirm that the co-ordinates correspond with the output of the function Pi.

So as it stands I am trying to attach a column of 59 rows to the main data frame of 60 rows.

# Create a spatial distribution with infected individuals

xcoord <- sample(1:100,60)
ycoord <- sample(1:100,60)
infectionstatus <- rep(0,60)
Df <- data.frame(xcoord, ycoord, infectionstatus)

a <- sample(1:60, 1)
Df$infectionstatus[a] <- 1

# Calculate distance between infected individuals and susceptibles

library(rdist)

distances <- pdist(Df[,1:2], metric = "euclidean")
position_infected_individual <- which(Df[,3]==1)
distance_from_infected <- distances[-(position_infected_individual), position_infected_individual]

#Assign parameter values and calculate probability of infection

beta<-100
alpha<-0.1
Pi<-vector()
for (p in 1:length(distance_from_infected)){
  Pi[p] = 1-exp(-beta*exp(-alpha*distance_from_infected[p]))
}

Why is a Spatial Data Frame obtained from OSM/Nominatim invalid?

$
0
0

I want to get plottable spatial (boundary/polygon) data from OpenStreetMap using the nominatim R package.

The nominatim::osm_search_spatial request seems to work successfully but the resulting data, despite being found as a "SpatialPolygonsDataFrame", is shown as invalid and I am not able to get it to plot with plot or tmap. It also seems to be missing its CRS.

I am not sure if I am missing a step or two, or if there is indeed something wrong with the data that is retrieved from OSM (seems unlikely, but possible).

Edit: There seems to be a problem with the latitudes and longitudes in the data received so this question has been edited with a new reprex. It looks like it might need to be asked on a GIS forum rather than here in R, in fact.

Deleted previous reprex showing superfluous steps and results (moved to gist here) - this is what Eugene Chong in his first reply was responding to.

Here's my new, more focused, reprex showing the problem (OSM API key hidden). You can see that the polygon @coords leap from the first point at -1,-1 right up to 53,53 for the second point onwards, hence the diagonal line that is plotted.

library(nominatim)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
#> Nominatim Usage Policy: http://wiki.openstreetmap.org/wiki/Nominatim_usage_policy
#> MapQuest Nominatim Terms of Use: http://info.mapquest.com/terms-of-use/
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(tmap)
library(tibble)

# get OSM search results for Ashfield district (UK)
ashfield <- nominatim::osm_search_spatial("Ashfield", limit = 1, key = $my_osm_api_key)
class(ashfield)
#> [1] "list"

# extract SPDF from list
ashfield <- ashfield[[1]]
class(ashfield)
#> [1] "SpatialPolygonsDataFrame"
#> attr(,"package")
#> [1] "sp"

# Convert to an SF object and try again
ashfield_sf <- sf::st_as_sf(ashfield)
class(ashfield_sf)
#> [1] "sf""data.frame"

# set CRS (thanks to Eugene Chong)
st_crs(ashfield_sf) <- 4326
tmap::qtm(ashfield_sf)
#> Warning: The shape ashfield_sf is invalid. See sf::st_is_valid

glimpse(ashfield@data)
#> Observations: 1
#> Variables: 15
#> $ place_id     <chr> "186877616"
#> $ licence      <chr> "Data © OpenStreetMap contributors, ODbL 1.0. https://...
#> $ osm_type     <chr> "relation"
#> $ osm_id       <chr> "154043"
#> $ lat          <dbl> 53.08977
#> $ lon          <dbl> -1.251877
#> $ display_name <chr> "Ashfield, Nottinghamshire, East Midlands, England, Un...
#> $ class        <chr> "boundary"
#> $ type         <chr> "administrative"
#> $ importance   <dbl> 0.2116014
#> $ icon         <chr> "http://ip-10-98-176-55.mq-us-east-1.ec2.aolcloud.net/...
#> $ bbox_left    <fct> 53.0080617
#> $ bbox_top     <fct> 53.1714343
#> $ bbox_right   <fct> -1.3445928
#> $ bbox_bottom  <fct> -1.1642542
head(ashfield@polygons[[1]]@Polygons[[1]]@coords)
#>           [,1]      [,2]
#> [1,] -1.344593 -1.344409
#> [2,] 53.063537 53.063260
#> [3,] 53.064985 53.063764
#> [4,] 53.065520 53.065521
#> [5,] 53.065553 53.065526
#> [6,] 53.065725 53.065656
ashfield_sf$geometry
#> Geometry set for 1 feature 
#> geometry type:  POLYGON
#> dimension:      XY
#> bbox:           xmin: -1.344593 ymin: -1.344593 xmax: 53.17143 ymax: 53.17142
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> POLYGON ((-1.344593 -1.344409, 53.06354 53.0632...

Created on 2020-02-20 by the reprex package (v0.3.0)

Next step: I am going to try a direct download from nominatim rather than via the R package, and see what I get.

shinymeta: how to use expandChain in modules?

$
0
0

I'm making an app with modules in which the user can create as many UI as he wants. Each UI contain one table and I would like to give the possibility to the user to see the code for each of this table separately, not in a unique chunk. Therefore, I included the part of the code with expandChain in my module (module_server).

However, expandChain won't detect the reactive stuff I'm calling because the name of this stuff changes since it is created in a module. Take a look at the app below:

library(dplyr)
library(shiny)
library(shinymeta)
library(WDI)

module_ui <- function(id){
  ns <- NS(id)

  tagList(
    fluidRow(
      actionButton(ns("show_table"), "Show table"),
      actionButton(ns("show_code"), "Show code"),
      tableOutput(ns("table"))
    )
  )
}

module_server <- function(input, output, session){
  data <- metaReactive2({
    req(input$show_table)

    isolate(metaExpr({
      mtcars 
    }))
  })

  output$table <- renderTable({
    data()
  })

  observeEvent(input$show_code, {
    showModal(modalDialog(
      renderPrint({
        expandChain(data())
      })
    ))
  })
}

ui <- fluidPage(
  actionButton("launch", "Launch")
)

server <- function(input, output, session) {

  count <- reactiveValues(value = 0)

  observeEvent(input$launch, {
    count$value <- count$value + 1
    insertUI(selector = "#launch",
             where = "afterEnd",
             ui = module_ui(count$value))
    callModule(module_server, count$value)
  })

}

shinyApp(ui, server)

When I try to show the code for the table generated, I have the error:

Warning: Error in : <text>:2:2: unexpected input
1: `1_data` <- mtcars
2: 1_
    ^
  133: <Anonymous>

Since the module renames data() by adding a number, data() is not recognized by expandChain. I tried with:

expandChain(paste0(id, "_data()"))

without success (since expandChain does not support character).

Does anybody know how to do it?

Also asked on RStudio Community

Replace values in a file based on another file in R

$
0
0

What I have: I have two data frames. They both have the same column and row names and the same dimensions (equal number of rows and columns):

records <- read.table(text ="     D0  D1  D2  D3  
1  0       0     1    NA       
2  1       1     1    1       
3  1       0     NA    1       
4  0       NA     0    0", header = TRUE)

   covariate <- read.table(text ="     D0  D1  D2  D3  
1   70       1     6    3       
2  121       4     8    5       
3   86       3     2    4       
4  141       2     5    2", header = TRUE)

What I want: To replace specific values in the 'covariate' data frame with NAs. The values to replace are those who are in the exact same place that correspond to where 'records' have NAs. The result should look like this:

covariate_fixed <- read.table(text ="     D0  D1  D2  D3  
1   70       1     6    NA       
2  121       4     8    5       
3   86       3     NA    4       
4  141       NA     5    2", header = TRUE)

The closest threads I could find to my problem were these: Replace values from another dataframe by IDs and Replace values from another dataframe by IDs However, I don't have matching values in both files except for the column and row names, and the rule to replace is given not by specific ID but for a specific location in the data frame. I found other posts of similar nature as well but none I could find a way to adapt to my specific problem.

Could someone help me produce a code in R to do this? Thanks!

Asselin mode computing in Python [closed]

$
0
0

I was just asking for the mode estimator based on the algorithm described in Asselin de Beauville (1978) in Python. It exists in R package "modeest".

If someone know it in Python, that will be helpfull.

Regards.

Viewing all 209583 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>