Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all 209307 articles
Browse latest View live

Clustered Stacked Bar Chart in R

$
0
0

Is there a way to make a clustered stacked bar chart in R?

Currently I have 2 stacked bar charts that look like this:
(sorry for the links, I'm not yet able to post images to a question)
Stacked bar chart 1
Stacked bar chart 2

And my data looks something like this:

|state|A1|B1|C1|D1|E1|A2|B2|C2|D2  
|AL|  | 3| 2| 4|12| 2| 2| 8| 1| 6|
|AK|  | 2| 4|22| 6| 4| 4|12| 2| 3|
|AZ|  | 2| 1| 5| 5|45| 6| 2| 4|95|
|CA|  | 3| 9|11| 3|12| 7| 1| 5|25|

Using this data, is there a way to have a clustered stacked bar chart, so that the 2nd values are next to the 1st values for each state?

So far I've used plotly to make these graphs, and my code looks like this:

p1 <- plot_ly(data=df, x = ~state, y = ~E1, type = 'bar', name = 'E1') %>%
  add_trace(y = ~D1, name = 'D1') %>%
  add_trace(y = ~C1, name = 'C1') %>%
  add_trace(y = ~B1, name = 'B1') %>%
  add_trace(y = ~A1, name = 'A1') %>%
  layout(yaxis = list(title = ''), barmode = 'stack')
p1

Edit: I would want the final graph to look something like this: Chart Example
However for each state in the U.S


Can not load the `model1` function of the processR package in a Jupyter Notebook

$
0
0

I am very new to the whole R programming and trying to follow this tutorial, where the model1 function is used to find the Andrew F. Hayes correlation between three variables. As indicated in the tutorial I have the packages installed:

  1. install.packages("devtools")
  2. install.packages("processR")
  3. devtools::install_github("markhwhiteii/processr")

I have also followed the steps:

set.seed(1839)
var1 <- rnorm(100)
cond <- rbinom(100, 1, .5)
var2 <- var1 * cond + rnorm(100)
df3 <- data.frame(var1, var2, cond)
head(df3)

accordingly. However, when running:

mod1result <- model1(iv = "var1", dv = "var2", mod = "cond", data = df3)

I get the error message:

Error in model1(iv = "var1", dv = "var2", mod = "cond", data = df3): could not find function "model1" Traceback:

and running

mod1result <- processr::model1(iv = "var1", dv = "var2", mod = "cond", data = df3)

Error in loadNamespace(name): there is no package called ‘processr’ Traceback:

The strange thing is that the same code just worked yesterday and now it doesn't. I would appreciate it if you could help me understand what is wrong and how I can resolve it.

P.S.1. I'm not sure what .libPaths() is but for some reason it returns two paths on my mac:

  • /usr/local/lib/R/3.6/site-library
  • /usr/local/Cellar/r/3.6.2/lib/R/library

does it mean that I have two installations of R and this is the main cause of the above issues?

P.S.2. OK. This seems to be Jupyter's fault as everything is just working fine in the terminal.

P.S.3. What seems to be working in the terminal is:

  • sudo r
  • devtools::install_github("markhwhiteii/processr")
  • library(processr) notice lower case r in the processr

P.S.4. I'm not sure if this is Jupyter's fault.

if/then replace values looping over rows conditional on column value(s) in R

$
0
0

I am trying to do an if/then replacement (recoding) of values row by row (looping over rows), based on one the values of or more columns in those rows. I've looked at a lot of prior examples here and elsewhere (R help) but haven't been able to get very far.

Here is an example data set:

> set.seed(1234)
> let<-c("AB","AA","BB")
> df <- data.frame(rbind(x1=c(12,"DF1",sample(let,6,TRUE)),x2=c(12,"HA.1",sample(let,6,TRUE)),x3=c(21,"DF1",sample(let,6,TRUE)),x4=c(12,"AS.2",sample(let,6,TRUE))
+ ))
> df
   X1   X2 X3 X4 X5 X6 X7 X8
x1 12  DF1 AB AA AA AA BB AA
x2 12 HA.1 AB AB AA AA BB AA
x3 21  DF1 AB BB AB BB AB AB
x4 12 AS.2 AB AB AB AB AB AB

I would like to conditionally change the coding (replace) values in columns 3:8 (from X3 through X8) based on values in X1 and X2 using if/then. 'AB' becomes 1 if X1=12 AND X2=DF1, 'AA' becomes 2 if X1=12 AND X2=DF1, 'BB' becomes 3 if X1=12 and X2=DF1 etc. There will be many other (nested?) if statements to add to complete this specific case, but I am not sure how to approach even the most basic aspect of this script: how to condition the replacement of values in columns 3:8 based on the column 1 value (and also column 2 or more columns) at a given row.

So, looping over each row, I would test if the value in X2 = DF1 and X1=12 (for example), and if so in both cases, change values of AB to 1, AA to 2, and BB to 3...

for(i in 1:nrow(df)){
      if((df$X2[i]=="DF1") & (df$X1[i]=12)) {   
          ifelse(df[i,3:8] == "AB", 1, ifelse(df[i,3:8]=="AA", 2,ifelse(df[i,3:8]=="BB",3,"NA")))}
             else{} 
      }

Now...this appears to do nothing - no changes to dfand no warnings. But the ifelse statements work when I specify the row (4):

> ifelse(df[4,3:8] == "AB", 1, ifelse(df[4,3:8]=="AA", 2,ifelse(df[4,3:8]=="BB",3,"NA")))
   X3  X4  X5  X6  X7  X8 
x4 "1""3""1""1""1""2"> df[4,3:8]
   X3 X4 X5 X6 X7 X8
x4 AB BB AB AB AB AA

So it must be something in the initial if & ? Do I need to have something in my else clause?

And of course, my 'real' world use case is more complicated as each different value in X1 or X2 will require different if/then statements to recode the values in colum 3:8

Anyway - am I even approaching this correctly? Would a look up table work better? I would be setting up additional nested if/& statements for each combination of values for X1 and X2. It will be ugly, but if I can get the nested if statements to work, then at least I can get there.

Thanks for any suggestions!

R: How to use multiple GPUs with XGBoost?

$
0
0

I'm using xgboost to train my model. I expirienced that training with xgboost on GPU is much faster. Since I have 2 GPUs avaiable, I'd like to use both. Actually I have three questions about this topic.

How can I use multiple GPUs at the same time?

When running this example only the first GPU would work. I tried to set gpu_id = 1. That worked well, i.e. the other GPU did the job. Then I read that gpu_id = -1 would provide the solution. Apparently this argument doesn't work with CUDA 10. I also tried n_gpus but that seems to be deprecated.

How can I clear the cache of GPU memory when finished?

This becomes important when I get to tune the parameters in some kind of a loop. A single model fits well into memory, but not hundreds.

R would crash, when I change the gpu_id for the next model. Is there a way to prevent this?


My setup:

  • R version 3.4.4
  • Ubuntu 18.04.3
  • xgboost_1.0.0.1
  • CUDA 10.2

about how to get fixed network graphs

$
0
0

In igraph, by using layout.fruchterman.reingold(), we can apply fixed nodes position. How can we apply same fixed position using ggraph ??

I would like to have fixed nodes position in several network graphs using same samples with different types data.

# library
library(igraph)
set.seed(1)
# create data:
links <- data.frame(
  source=c("A","A", "A", "A", "A","J", "B", "B", "C", "C", "D","I"),
  target=c("B","B", "C", "D", "J","A","E", "F", "G", "H", "I","I"),
  weight=(sample(1:20, 12, replace=T))

)
links$weight
nodes <- data.frame(
  name=LETTERS[1:10],
  carac=c( rep("young",3),rep("adult",2), rep("old",5))
)

# Turn it into igraph object
network <- graph_from_data_frame(d=links, vertices=nodes, directed=F) 

# Make a palette of 3 colors
library(RColorBrewer)
coul  <- brewer.pal(3, "Set1") 

# Create a vector of color
my_color <- coul[as.numeric(as.factor(V(network)$carac))]

library(ggraph)
library(tidygraph)
#

g1<-tbl_graph(nodes, links, directed = FALSE)
coords <- layout.fruchterman.reingold(g1)
# igraph
plot(g1,layout=coords)

#
ggraph(g1, layout = "fr", weights = weight) +geom_edge_density(aes()) +
  geom_edge_link(aes(), alpha = 0.3)+
  geom_node_point()+geom_node_point(aes(colour =carac ),size=6)+
  geom_node_text(aes(label = name),size=4, repel = F) +
  theme_graph()

Does ggraph have a similar function??

I would deeply appreciate your help.

In ggplot, using a numeric variable like a factor to create multiple plots, but using the numeric values to control spacing

$
0
0

If I make a data frame like this:

d1 <- data.frame(class=rep(c("A", "B", "C"), each=100),
                value=c(rnorm(100,0,1), rnorm(100,1,1), rnorm(100,2,1)))

I can easily make a violin plot with a separate violin for each class:

ggplot(d1, aes(x=class, y=value)) + geom_violin()

enter image description here

But if I make a data frame and plot like this, with numeric values:

d2 <- data.frame(timepoint=rep(c(1, 2, 3.5), each=100),
                 value=c(rnorm(100,0,1), rnorm(100,1,1), rnorm(100,2,1)))
ggplot(d2, aes(x=timepoint, y=value)) + geom_violin()

I just get a single violin plot like so:

enter image description here

I could do factor(timepoint):

ggplot(d2, aes(x=factor(timepoint), y=value)) + geom_violin()

but then I get a plot with equal spacing. What I want is a plot where the third violin is farther to the right, since it is at time=3.5. That is, where the spacing reflects the actual values of timepoint.

enter image description here

I realize this isn't specific to violin plots, it could be a boxplot or any other kind of plot. Is there a way to do what I want?

How to calculate a list of angles between two line vectors in a series of specimens in R?

$
0
0

I am doing work a geometric morphometric dataset of biological specimens with two dimensional Cartesian coordinates (i.e., landmarks). Part of my interest with this dataset is to produce a data frame of distances between important landmarks and angles between landmark vectors with homologous coordinates across all specimens for further comparison. Distances between points have not been a problem, but I have been running into problems when I have tried to calculate angles in each of my specimens.

I am trying to find a way to calculate the angle between line vectors representing the same angle on a series of specimens at once (the vectors describe a feature with biological significance that is present in all specimens across my dataset). Attached is a sample x and y vector dataset for two line segments formed by four landmarks I am interested in finding the angle between, with each column representing a distinct specimen. This is the same format I get when I create a vector by doing (dataset[landmark1,,]-dataset[landmark2,,] on the array produced by the morphometrics package used to analyse the data.

vector1<-matrix(c(-0.1733,0.0901,-0.1307,0.0966,-0.1222,0.0849),nrow=2)
vector2<-matrix(c(0.1061591,0.0116495,0.0876752,-0.0137482,0.1170445,0.0213435),nrow=2)

I want to find the angle between vector1 and vector2 for specimen 1 (column 1 on both matrices), 2, 3, and so on. Ideally, I would like to get the output into a format where R reports the angle for each indivdual specimen, like so:

[,1] [,2] [,3]
131  152  135

And then somehow add it into a data frame so it looks like this:

          Distance Angle
Specimen1 100      131
Specimen2 100      152
Specimen3 100      135

The distances are just filler numbers to show the format I am trying to achieve.

I have tried using the angle() function in the mathlib package and the angle.calc() function in the Morpho package. However, these functions will only calculate the angle of the first specimen, and do not produce a list of angles for all specimens. I have also checked previousquestions on Stack Overflow but none of the answers given appear to be applicable to a dataset containing multiple specimens with each column on both matrices representing a distinct specimen (and when I apply them they do not work on this dataset). I have a large number of specimens and multiple angles per specimen and so it is not possible to measure every angle individually for each specimen.

Is there any way to write a formula to produce a list containing the respective angle for each of these specimens?

How to keep certain values of a categorical variable in regession model

$
0
0

This is the model:

model2<-lm(resp1 ~ pred3 + pred4 + pred6 + pred7, data = hwdat)
summary(model2)

enter image description here

As you can see, if I only want to keep the predict variable which P-Value is less than 0.01, but pred7 is a categorical variable, how to modified the regression model to just keep pred7Grade B and pred7Grade D?


RSelenium, Chrome, How to set download directory, file download error

$
0
0

Hello :) I'm trying to automate downloading spreadsheets from "https://www.gold.org". The code works well, goes through authorization without problem and downloads the file. But, when I try to change the download directory, it starts to download the file, but instantly gives me file download error in browser. The way I tried to change the download directory is by adding:

eCaps <- list(
  chromeOptions = 
    list(prefs = list("profile.default_content_settings.popups" = 0L,
"download.prompt_for_download" = FALSE,
"directory_upgrade" = TRUE,
"download.default_directory" = "C:/XXX/YYY"
    )
    )
)

and adding the extraCapabilities = eCaps to rsDrive():

rD <- rsDriver(browser= "chrome", chromever = "80.0.3987.16", extraCapabilities = eCaps)

Without these two changes code worked well, downloading to default download directory. Is there any way to set it properly to download to any other directory? Here is the complete code:

library(RSelenium)
eCaps <- list(
  chromeOptions = 
    list(prefs = list("profile.default_content_settings.popups" = 0L,
"download.prompt_for_download" = FALSE,
"directory_upgrade" = TRUE,
"download.default_directory" = "C:/XXX/YYY"
    )
    )
)
rD <- rsDriver(browser= "chrome", chromever = "80.0.3987.16", extraCapabilities = eCaps)
remDr <- rD$client

appURL <- 'https://www.gold.org/login'
remDr$navigate(appURL)
remDr$findElement("id", "loginEmail")$sendKeysToElement(list("email"))
remDr$findElement("id", "loginPassword")$sendKeysToElement(list("password", key='enter'))

appURL2 <- "https://www.gold.org/goldhub/data/global-gold-backed-etf-holdings-and-flows"
remDr$navigate(appURL2)
remDr$navigate(appURL2)

remDr$findElement("link text", "XLSX")$sendKeysToElement(list(key='enter'))

R session abort when I use assignTaxonomy

$
0
0

I have been having this problem for more than a week now and I am running out of time and patience.This problem occurs when I run my script on a Mac and when I run it on a PC (no difference of results from more RAM, it just aborts faster). When I try to run this line of my dataset, the session aborts.

  set.seed(119)
  tax_PR2 <- assignTaxonomy(seqtab, 
                      "~/Desktop/Documents/Bruts/aeDNA_data_shared/pr2_version_4.11.1_dada2.fasta",
                      multithread=TRUE)

Does anyone have any idea of what the problem is? I verified my dataset (seqtab is currently considered by R as a large matrix of 3930724 elements of 20.2Mb), I verified the space I have on my computer, I have all the needed packages to run this line of code and I tried different sources of genome database for PR2 (PR2 version 4.11.1 or 4.12.0 etc...) and it always has the same result.

If you have any ideas I would appreciate them. I hope the information I gave is sufficient.

Packages installed:

   library(BiocManager)
   library(Rcpp)
   library(dada2)
   library(ff)
   library(ggplot)
   library(gridExtra)
   library(phyloseq)
   library(vegan)

Unexpected behaviour of function used in conjunction with lapply/sapply?

$
0
0

This

mult_six <- function(x) {
  y <- x * 6
}

mult_six(7)

returns nothing (as expected), and y is not globally assigned (also as exptected, since assignment takes place in the scope of the function, not in the parent environment - so y returns Error: object 'y' not found - completely normal)

But

sapply(c(1,2,3), mult_six)

returns

[1]  6 12 18

(and lapply() returns the list equivalent).

I do not understand why lapply/sapply would behave any differently to calling the function on each element separately?

Assign NA by group with Values in the group [duplicate]

$
0
0

I have a large data frame in r with 12 months of sales data by city. In some cases, the city field is NA and I need to populate it with the city available in the 12 month block. Here is a reproducible example showing the table I want and the table I have.

df.i.want <- data.frame(date = rep(seq(as.Date('2010-01-01'), as.Date('2010-12-01'), 
                                by = 'month'),2), 
                 city = c(rep('New York',12),rep('New Orleans',12)), 
                 sales = rnorm(24),
                 stringsAsFactors = FALSE)

df.i.want


Now I will create the table i have.

df.i.have <- df.i.want

df.i.have[c(3,5,14,20),'city'] <- NA

df.i.have

How can I get the table I want from what I have? Basically, in the first 12 month block, NA needs to be populated by New York and in the second 12 month block NA needs to be populated by New Orleans. The reason I cannot do it manually is that the table is large and I don't know what will be NA.

Creating a box plot with facets

$
0
0

I am trying to create a box plot that shows the variance for values from 8 cities. With them being displayed using facets based on education attainment levels. Here is data frame 1data frame

for example I want to see how much variance there is between HS_grad numbers for all the cities. facets groupings

Update

Here is data frame 2data frame2

Sampling from a normal distribution using a for loop

$
0
0

So I am trying to sample from a normal distribution 1000 times each time computing the mean of 20 random samples from said normal distribution.


unif_sample_size = 20 # sample size
n_samples = 1000 # number of samples

# set up q data frame to contain the results
uniformSampleMeans <- tibble(sampMean = runif(n_samples, unif_sample_size))


# loop through all samples.  for each one, take a new random sample, 
# compute the mean, and store it in the data frame

for (i in 1:n_samples){
  uniformSampleMeans$sampMean[i] = summarize(uniformSampleMeans = mean(unif_sample_size))
}

I successfully generate a tibble, however the values are "NaN". Additionally when I get to my for loop I get an error.

Error in summarise_(.data, .dots = compat_as_lazy_dots(...)) : argument ".data" is missing, with no default

Any insight would be much appreciated!

Convert a column within a large dataset from UTC to PST in R

$
0
0

I have a large dataset, df, that I would like to convert the Date column to PST. It is currently in UTC time.

   Date                        ID

   1/7/2020 1:35:08 AM         A

I would like to convert the Date column from UTC to PST, while preserving the other columns.

   Date                        ID

   1/7/2020 5:35:08 PM         A

Here is the dput:

  structure(list(Date = structure(1L, .Label = "1/7/2020 1:35:08 AM", class = "factor"), 
  ID = structure(1L, .Label = "A", class = "factor")), class = "data.frame", row.names = c(NA, 
 -1L))

This is what I have tried:

  library(lubridate)
  newdata <- as.POSIXct(Sys.Date())

However, I am unsure if I need to add format and what other code needs to be added.


How to subset based on a variable in R

$
0
0

I'm trying to subset a list in R based on two different variables within a loop.

I want the code to return the value for column A when column B = x and column C = y[i]. This is then repeated for a list of files.

x is a character that is defined by a previous formula and changes based on the input file. y is a list of characters and y[i] is the ith character of y based on the loop.

Here is what I tried:

value = subset(data$columnA, data$columnB== x & data$columnC== y[i])

This formula gives me no value and no error message. However, it returns the correct value when I replace x and y[i] with actual characters, ex. "Char"

I'm new to R and programming so apologies if this isn't clear. Thanks for any and all help you can provide!

Overloading as.logical in base R

$
0
0

I want to simplify some little text generating code. Default arguments to program are 0 length character when that function is not to be used. I'm interested in something simpler than

if (nchar(expr) > 0) ...

So I created the following

as.logical <- function(x, ...) UseMethod("as.logical")
as.logical.default <- function (x, ...)  base::as.logical(x, ...)
as.logical.character <- function (x, ...) nchar(x) > 0

If I try some examples in the command line, it works

> as.logical(letters)
 [1] TRUE TRUE TRUE ...

But interestingly, the cond"logical" vector doesn't attempt to apply as.logical. It seems to be used in other cases as below:

> if (1) print('Has')
[1] "Has"> if ('Has') print(1)
Error in if ("Has") print(1) : argument is not interpretable as logical

How does cond evaluate and can't I fool it into using my function?

Using which and ! functions in R (Please help)

$
0
0
x <- c("a", "b", "c", "d", "e", "f", "g")
y <- c("a", "c", "d", "z")

I am trying to compare y to x and find an index where in y that does not match with anything in x. in this case z does match and I want R to return the index of z.

This is one of the things I tried and it does not work.

index <- which(y != x)

incorrect number of subscripts on matrix for-loop

$
0
0

I've got stuck in a incorrect loop where I cannot find the problem.

Here goes my code:

paths <- 1440
count<-50
sample<-matrix(0,count,paths)
for(i in 1:paths)
{
  for(j in 1:count)
  {
sample[j,i]<-paramigpm[j,3]+paramigpm[j,4]*((1-exp(-paramigpm[j,7]*(j/12)))/(paramigpm[j,7]*(j/12)))+paramigpm[j,5]*(((1-exp(-paramigpm[j,7]*(j/12)))/(paramigpm[j,7]*(j/12)))-exp(-paramigpm[j,7]*(j/12)))+paramigpm[j,6]*(((1-exp(-paramigpm[j,8]*(j/12)))/(paramigpm[j,8]*(j/12)))-(exp(-paramigpm[j,8]*(j/12))))
    }
}

Can you figure it out?

about the chi-square distance in R

$
0
0

I found two packages which we can calculate chi-square distance. However, I got different results.

 A<-matrix(c(0,1,1,0,1,0,1,0,1,0,0,0,1,1,0,1,0,0,0,0,1,0,1,1,1,0,0,1,0,1,0,0,0,1,1,0), 6, 6)

 Dist1<-philentropy::distance(A, method="squared_chi"   )
 Dist2<- analogue::distance(A, A, method = "chi.distance")

Which is correct?

Viewing all 209307 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>