Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all 207118 articles
Browse latest View live

R basic graphics xlim and ylim

$
0
0

Have this assignment cannot figure out You can draw the picture by following these steps:

Make a vector years1 containing values 100, 101, ..., 150 and a vector years2 containing values 151, 152, ..., 200. Using function plot(), plot the values of vector first. At the same time, define the size of the drawing area with arguments xlim and ylim so that also the values at years 151-200 will fit the picture. Use the arguments main, xlab and ylab to name the picture and the axes. Define the applied plotting symbol with the argument pch. Add a smooth black line using the values in logistic_model. You can use lines(). Add the observed values at years 151-200 with points(). Find the maximum of vector "first" and mark it to the picture with text(). Another option is to use locator(). The red curve is the function f(x) = ((logistic_model[51]-100)/exp(-150))*exp(-x)+100, in the time interval x=[150, 200]. You can use the function curve() to add it. Add the legend with legend().


What is an alternative to this eval parse code?

$
0
0

I am inserting some code into someone else's custom R package and I don't have flexibility to write it how I would like.

I need to be able to sum up many variables following a similar format that I can re-create with formulas.

I am looking for a more efficient way to write this. Efficiency is important because there is a lot of data to process.

Here is sample code showing what I want to do, but it is slow and clunky. I know eval-parse is not the best way to do this, that's why I'm asking for a better way :-)

v1 <- 1
v2 <- 2
v3 <- 3
v4 <- 4

# this for loop works, but it is clunky and slow
string <- character()
for (i in 1:4) {
  if (i < 4) string <- c(string, paste0("v",i,"+"))
  else string <- c(string, paste0("v",i))
}
eval(parse(text=string))

Data frame consisting of characters is converted to factors after pdata.frame command in R. How to make pooled OLS work?

$
0
0

I am using a panel data set:

  • y is my independent variable equal to 0 or 1 --> numeric
  • x1 are my individuals --> numeric
  • x2 are my time indicators --> numeric
  • x3,x4,...,x65 are my independent variables --> character

In the code below I convert all variables to characters and want to let R know that I am using panel data by the pdata.frame command on the last line. However, the problem now is that the command pdata.frame converts the variables x2 and x3 (the individuals and time indicator) to factors even when stringsAsFactors=FALSE.

#Regressions
df=read_excel("C:/Users/Luuk/Desktop/Master Thesis EME/Data/indep_dep_indlevel.xlsx")
df_dep=data.frame(df[,79])
count=as.data.frame(rep(1:3669, times=1, each=3))
df=cbind(count,df[,3:79])
df_indep=data.frame(df[,c(1:5,8,10:15,17:25,27:44,45,53:77)])
dflm=cbind(df_dep,df_indep)
dflm1 <- data.frame(lapply(dflm, as.character), stringsAsFactors=FALSE)

names(dflm1)[c(2:66)] <- c(paste("x", 1:65, sep=""))
names(dflm1)[1] <- "y"
dflm2=pdata.frame(dflm1,index=c("x1","x2"),stringsAsFactors=FALSE)

Consequently, the following pooled OLS model estimation gives the error:

Error in class(x) <- setdiff(class(x), "pseries") :
adding class "factor" to an invalid object In addition: Warning message: In model.response(mf, "numeric") : using type = "numeric" with a factor response will be ignored

xnam <- paste("x", 3:65, sep="")
Formula <- formula(paste("y ~ ", paste(xnam, collapse=" + ")))
fit=plm(Formula, data=dflm2,model="pooling")

How can I make my pooled OLS estimation procedure work?

Bar-chart with dplyr/ggplot with male/female bars side by side

$
0
0

I'm trying to

Here's the example :

df <- read.table(text = 
  "sex  var value
  m a   1
  m a   1
  m a   0
  m b   0
  m b   0
  f a   1
  f a   0
  f b   0
  f b   1", 
header = TRUE, 
stringsAsFactors = FALSE
  )

What I'd like to create is a bar-chart which has the proportions for each of the var values, for each sex.

So in the above data I would have :

var a : 
        m : 0.6
        f : 0.5
var b : 
        m : 0
        f : 0.5

But expressed as a bar-chart using ggplot

leave one out cross validation in R returns a very low accuracy results (Looking for feedback and comments)

$
0
0

I am trying to compute the accuracy of a decision tree on the seeds dataset (Link to the seeds dataset) over 20 iterations, however, I am getting very low overall accuracy (30%-35%). This is what I've done so far:

library(rpart)
seed = read.csv("seeds_dataset.txt",header= F, sep="\t")
colnames(seed)<- c("area", "per.", "comp.", "l.kernel", "w.kernel","asy_coeff", "lenkernel","type")

sampleSize <- nrow(seed)
mat = matrix(nrow=sampleSize, ncol=20) 
for (t in 1:20) {
  testSampleIdx <- sample(nrow(seed), size=sampleSize)
  data <- seed[testSampleIdx,]

  for (i in 1:nrow(data)){
    training = data[-i, ]
    test = data[i, ] 
    classification = rpart(type ~ ., data=training, method="class") 
    prediction = predict(classification, newdata=test, type="class")
    cm = table(test$type, prediction)
    accuracy <- sum(diag(cm))/sum(cm)
    mat[i,t] = accuracy 
  }
}
for (i in 1:ncol(mat)){
  print(paste("accuracy for ",i," iteration ", round((mean(mat[, i]))*100,1), "%", sep=""))
}
print(paste("overall accuracy ", round((mean(mat))*100,1), "%", sep=""))

Can anyone provide me with comments and feedback on the reason causing this low accuracy? Thank you.

undefined columns selected in plm() function

$
0
0

I had a weird problem in plm() function. Below is the code:

library(data.table)
library(tidyverse)
library(plm)


#Data Generation
n <- 500
set.seed(75080)

z   <- rnorm(n)
w   <- rnorm(n)
x   <- 5*z + 50
y   <- -100*z+ 1100 + 50*w
y   <- 10*round(y/10)
y   <- ifelse(y<200,200,y)
y   <- ifelse(y>1600,1600,y)
dt1 <- data.table('id'=1:500,'sat'=y,'income'=x,'group'=rep(1,n))

z   <- rnorm(n)
w   <- rnorm(n)
x   <- 5*z + 80
y   <- -80*z+ 1200 + 50*w
y   <- 10*round(y/10)
y   <- ifelse(y<200,200,y)
y   <- ifelse(y>1600,1600,y)
dt2 <- data.table('id'=501:1000,'sat'=y,'income'=x,'group'=rep(2,n))

z   <- rnorm(n)
w   <- rnorm(n)
x   <- 5*z + 30
y   <- -120*z+ 1000 + 50*w
y   <- 10*round(y/10)
y   <- ifelse(y<200,200,y)
y   <- ifelse(y>1600,1600,y)
dt3 <- data.table('id'=1001:1500,'sat'=y,'income'=x,'group'=rep(3,n))

dtable <- merge(dt1    ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)


# Model 
dtable_p <- pdata.frame(dtable, index = "group")

mod_1 <- plm(sat ~ income, data = dtable_p,model = "pooling")

Error in [.data.frame(x, , which) : undefined columns selected

I checked all possibilities but I can not figure out why it gives me an error. the columns'names are correct, why R said undefined columns??? Thank you!

Follow up: I add another data set test as the @StupidWolf use to prove

data("Produc", package = "plm")
form <- log(gsp) ~ log(pc) 
Produc$group <-  Produc$region
pProduc <- pdata.frame(Produc, index = "group")

Produc$group <- rep(1:48, each = 17)

summary(plm(form, data = pProduc, model = "pooling"))
>Error in `[.data.frame`(x, , which) : undefined columns selected

R: How to get rid of default data and values stored in global environment

$
0
0

Every time I open my laptop R, the same data and values (data tables and variables I used in the past) keep on appearing, and it's annoying to rm() them every time I open a new file to work on something else. My question is: 1) Is there a way to reset or set default data and variables in global environment? For example, if I'm working on file1, I want to use data table data1, data2, and data3 and variable v1, v2, and v3. It'd be nice if I can store these set of data and variables for file1specifically and simply call them instead of loading the data and running the codes every time I open the file.

Thank you!

Moving objects of pheatmap in r to correct the scale/make it look good

$
0
0

I've read similar questions like Moving color key in R heatmap.2 (function of gplots package) and the handbook but it hasn't worked (/I haven't got it to work). Maybe it is a different command for a pheatmap. If it is, I can't find it.

This is my code:

# Make a data frame of gene IDs vs Names with type too
GeneName_df <- anno[anno$gene_id %in% rownames(plot_matrix[top20idx,]),]
# Re order (so same as plot_matrix[top20idx]) i.e. by gene_id
GeneName_df <- GeneName_df[order(GeneName_df$gene_id), ]

anno_df <- as.data.frame(colData(dds)[,c("condition","CellType")])
colnames(anno_df) <- c("Condition","CellType")

anno_colours = list(
    CellType = c(TKO = "#00BFFF", WT = "#9A2EFE"),
    Condition = c(treated = "#FE2E2E", untreated = "#F5A9A9")
)

top20matrix <- plot_matrix[top20idx,]
pheatmap(top20matrix, main = "Top 20 Differentially Expressed Genes",
fontsize = 8, color = pal(10), border_color = "Black", cellwidth = 26,
cellheight = 9, labels_row = GeneName_df[,2], annotation_col = anno_df, 
annotation_colors = anno_colours, angle_col = "45", cutree_cols = 2, 
annotation_names_col = FALSE)

Output: pheatmap

  1. I want to shift the colour key and column key along so they aren't squished together. Note that when I have been changing the parameters and size of the pheatmap the top and bottom kept cutting off. Apparently this is a problem that will be fixed in the next R update.
  2. Is there a way to reduce the length of the grouping lines at the top of the pheatmap? They're taking up a lot of room.

Selecting columns in R dataframe based on values of column in other dataframe

$
0
0

I have two dataframes as u can see below.

  #Dataframe 1
    colname value
    col1    0.45
    col2    -0.2
    col3    -0.4
    col4    0.1

#Dataframe 2
col1 col2 col3 col4
1    5    9    5
45   29   43   9
34   33   56   3
2    67   76   1

What I want to do is to firstly select all columns of dataframe 1 that have a value > 0.3 or value < -0.3. The second thing I want is to select all column from dataframe 2 that match this condition. So the columns col1, col3 and col4 of dataframe2 should be selected into a new dataframe like below.

col1  col3 col4
1     9    5
45    43   9
34    56   3
2     76   1

The solution I thought about is to firstly select the relevant columns as u can see in the code below.

library(sqldf)
features = sqldf('select colname from dataframe1 where value > 0.3 or value < -0.3')

After this to build a string in a for loop that should look like below. And paste this in a sqldf query to select to right columns from dataframe2. However I dont know how to build this string. U guys know this or have a other solution?

  stringValue = "col1, col3, col4"
   sprintf("SELECT %s FROM dataframe2", stringValue)

how to further refine expss table format?

$
0
0

I am trying to improve my table design using expss. My current design is shown below using the following code:

library(expss)
# bogus example data
x<-structure(list(visits= structure(c(17, 2, 23, 1, 21), label = "Total # Home Visits", class = c("labelled", "numeric")), months_enrolled = structure(c(21.42474, 51.105, 52.474, 53.75, 60.0392105), label = "Enrollment Duration (months)", class =c("labelled","numeric")), marital2 = structure(c("Married", NA, "Married", "Married", "Married"), label = "Marital Status", class = c("labelled", "character")), Relationship2 = structure(c("Mother", "Mother", "Mother", "Mother", "Mother"), label = "Relationship (recoded)", class = c("labelled", "character"))), row.names = c(NA, 5L), class = "data.frame")

htmlTable(x %>% 
tab_cells(visits,months_enrolled) %>%
tab_rows(marital2, Relationship2,  total()) %>%     tab_stat_fun(Mean = w_mean, "Valid N" = w_n, method = list) %>%
tab_pivot() %>%
set_caption("Table 6: Bogus Visits and Duration by Characteristics") %>% 
htmlTable(.,css.cell = c("width: 220px", # first column width
                          rep("width: 50px", ncol(.) - 1))))

I'd like to improve the table design by placing the mean statistics for Home Visits and Enrollment Duration as columns, thus saving a row for each level of Marital Status (and other vars in tab_rows). How is this achieved? Also, is it possible to shade alternate rows?

expssTable

applying function to multiple dataframes programatically [duplicate]

$
0
0

This question already has an answer here:

How can I apply the same function to multiple data frames in R without having to save the data frames into a new list? I don’t want to have to type out manually the names of the data frames, that I am applying the function against. I don’t want to have to type:

data_frame2 = myfunction(data_frame2)

….Over and over. I might have 30 dataframes I want to do this too. I dont want to have to extract them out of a list.

data_frame.2 = mtcars
data_frame.111 = mtcars
data_frame.12345 = mtcars


my_function(dataset){
names(adjusted_dataset)= toupper(names(dataset))
return(adjusted_dataset)
}
my_dfs = ls(pattern = “data_frame.*”) # I know all my data frames in memory start with “data_frame”

How can I make use of sapply(my_dfs, my_function) here?

Creating one merged variable from multiple separate one

$
0
0

Any help would be greatly appreciated

I have a file exported from a PCR plate software. I have already coded the call for all alleles and have now merged them into one data frame.

I need to create a new variable merging the 3 alleles (G1-1, G1-2, and G2) to get a final genotype.

I then need to count the occurrence of the alleles to generate the other 3 APOL1 risk variables that I need to generate.


Allele logic for final genotype:

+/G2 = (G1-1-1(+) & G1-1-2(+)) & (G1-2-1(+) & G1-2-2(+)) & (occurence of (G2) at either G2-1 or G2-2)

+/+ = (G1-1-1(+) & G1-1-2(+)) & (G1-2-1(+) & G1-2-2(+)) & (G2-1(+) & G2-2(+))

G2/G2 = (G1-1-1(+) & G1-1-2(+)) & (G1-2-1(+) & G1-2-2(+)) & (G2-1(G2) & G2-2(G2))

G1^GM/+ = (occurence of (G1^S342G) at either G1-1-1 or G1-1-2) & (occurence of (G1^I384M) at either G1-2-1 or G1-2-2) & (G2-1(+) & G2-2(+))

G1^G+/+ = (occurence of (G1^S342G) at either G1-1-1 or G1-1-2) & (G1-2-1(+) & G1-2-2(+)) & (G2-1(+) & G2-2(+))

G1^GM/G1^GM = (occurence of (G1^S342G) at both G1-1-1 or G1-1-2) & (occurence of (G1^I384M) at both G1-2-1 or G1-2-2) & (G2-1(+) & G2-2(+))

G1^GM/G2 = (occurence of (G1^S342G) at either G1-1-1 or G1-1-2) & (occurence of (G1^I384M) at either G1-2-1 or G1-2-2) & (occurence of (G2) at either G2-1 or G2-2)

G1^G+/G2 = (occurence of (G1^S342G) at either G1-1-1 or G1-1-2) & (G1-2-1(+) & G1-2-2(+)) & (G2-1(+) & G2-2(+))

Original dataframe

Final dataframe needed

Original Dataframe structure

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   28 obs. of  6 variables:
 $ G1-1-1   : chr  "+""+""+""+" ...
 $ G1-1-2   : chr  "+""+""+""+" ...
 $ G1-2-1   : chr  "+""+""+""+" ...
 $ G1-2-2   : chr  "+""+""+""+" ...
 $ G2-1     : chr  "+""+""+""+" ...
 $ G2-2     : chr  "G2""+""G2""G2" ...
The APOL1 Risk variables logic is below:

If (+/+) categorize as 1 in "no APOL1 Risk Alleles"

If (+/G2) or (G1^GM/+) or (G1^G+/+) categorize as 1 in "1 APOL1 Risk Alleles"

If (G1^GM/G1^GM) or (G1^GM/G2) or (G2/G2) categorize as 1 in "2 APOL1 Risk Alleles"

Warning: 'newdata' has x rows but variables found have >x rows with coxph and survfit

$
0
0

I have reviewed the other similar questions and I don't believe I have found the answers.

I am trying to do Cox regression with two co-variates - sex and disease status.

Original data frame looks a bit like this:

Patient ID: 1001, 1002
Age: 56, 60
Sex: Male, Female
Mortality event: 1 0
Follow up years: 6,7

I have called cxmod <- coxph(Surv(Mortality event, time) ~ Disease_status + Sex, data = original data)

I have set up a dummy_df as a grid as per instructions for this package for the co-variates:

Sex Male  Disease_status 0, 
Sex  Female Disease_status 0, 
Sex Male Disease status 1, 
Sex Female Disease status 1

I have renamed the rownames as letters since I understood this is what was needed.

However when I call:

cxsf <- survfit(cxmod, data= orginal_data_frame, newdata = dummy_df, conf.type = "none")

I get the following error message:

Warning message:
'newdata' had 4 rows but variables found have 500000 rows 

Furthermore, if I call surv_summary(cxsf) to help visualise the plot - the R sessions gets terminated encountering a fatal error.

Can anyone advise on what's going wrong?

Getting specific values for each for loop value

$
0
0

Im using a for loop to perform some estimations with the nardl() function. My specific function is the following:

#Estimacion de la boyanza de corto plazo por sector
Boyanza_Corto_Plazo <- function(i) {

  Sector_elegido <- Series_por_sector[i] #Estimar cada serie por separado
  Sector_elegido <- as.data.frame(Sector_elegido)
  #y <- Sector_elegido[,2]
  #x <- Sector_elegido[,1]
  #modelo_correccion_errores <- aTSA::ecm(y,x)
  ardl <- nardl(Sector_elegido[,2]~Sector_elegido[,1],data = Sector_elegido,maxlags = TRUE)

  #Resultados Modelo de Correciones de Errores
  #Resultados_ecm <- as.data.frame(modelo_correccion_errores$coefficients)

  #Resultados Modelo Autorregresivo de Rezagos Distribuidos
  ardl$lres
}
Resultados_Boyanza_CortoPlazo <- lapply(1:length(serie_unica), Boyanza_Corto_Plazo) %>%
  setNames(serie_unica)

Where Series_por_sector is a list formed by time series.

The problem is that once I perform this, i get the same result for the fifteen values. I performed the code one by one and it has different values for each result. Is there an error in the for loop or the function?

Data

Series_por_sector <- list(`Administracion Publica` = structure(c(23.1088620514831, 
23.2109063202046, 23.2769873592992, 23.3605352985273, 23.4519346734195, 
23.483205532437, 23.7591278454179, 23.6850360808979, 23.683115488847, 
23.742078603493, 23.7663243593632, 23.764967369018, 23.7364563768631, 
23.7599662090129, 23.7496509758095, 23.7686965597259, 23.7977667134767, 
23.8207387874707, 23.8654579645418, 23.8245489693678, 23.9265214407142, 
23.9145837058818, 23.9142594395867, 23.9950372209453, 23.962962969242, 
24.0353280983239, 24.0451662222755, 24.2065811733021, 24.2090422600631, 
24.164897354507, 24.2107239155159, 24.2013823343176, 24.1945118651099, 
24.2386054601983, 24.1253600441108, 24.1634700277298, 24.24710422059, 
24.2640569230369, 24.2191827178458, 24.2636917589338, 24.322742284412, 
24.3485370356597, 24.389637008929, 24.4114167612282, 24.4230970285861, 
24.4845569534887, 24.4824708646545, 24.5654465306632, 16.2690985732647, 
16.4386018734529, 16.851630812087, 16.6414972640505, 17.1058597508038, 
17.014281852952, 17.1191750738733, 16.8608600925248, 16.9304970555101, 
16.6136770217261, 16.7599831928413, 16.4540571915749, 16.7552091555068, 
16.8100146613935, 16.9753886743241, 17.2559883430648, 17.1842102120934, 
17.3815201756025, 17.3148608548369, 17.5596787855012, 17.5384238342067, 
17.6258161301485, 17.367905481101, 17.2726943078947, 17.2030548088381, 
17.3321154111439, 17.5589703318828, 17.6671996562904, 18.0414455740657, 
17.9911990393175, 17.9793197549134, 18.0986395501199, 17.9982500487969, 
18.2293428092301, 18.416988766537, 18.9520128550431, 18.5498461288987, 
18.5073800917836, 18.3124545933365, 18.7412643661601, 18.5659965557844, 
18.8370796958012, 18.9838794236317, 18.799868193603, 18.9893923810632, 
19.0239372373528, 18.7336946461114, 18.792862397096), .Dim = c(48L, 
2L), .Dimnames = list(NULL, c("lpib", "lrecaudacion")), .Tsp = c(2007, 
2018.75, 4), class = c("mts", "ts", "matrix")), Agropecuaria = structure(c(23.8974172022309, 
23.8426522293242, 23.8786025678373, 23.9234443277368, 23.955785063984, 
23.9488055976204, 24.0194464814519, 24.0355500701664, 24.0205607574373, 
23.9765003819766, 23.9863872713793, 24.0171893839021, 24.1108105209366, 
24.1242478251806, 24.0974361919034, 24.1441462607396, 24.1319142785708, 
24.1571013126376, 24.2096399220679, 24.1363606441852, 24.1912397024969, 
24.2249078202319, 24.1393775469111, 24.1932906271901, 24.2347461668014, 
24.292215799013, 24.2663996098303, 24.2679583510404, 24.3051016718751, 
24.3260743926178, 24.3605254155406, 24.4282569129793, 24.4556975208826, 
24.4387907390123, 24.5062189765647, 24.6019406593183, 24.5677944247836, 
24.5829066028276, 24.6441787260105, 24.5722032035857, 24.5747298230972, 
24.6429122022835, 24.6871883460115, 24.6818910199581, 24.6755650563727, 
24.7191476175315, 24.7758153911166, 24.7035539340166, 17.3520147511469, 
17.5019693803128, 17.7427308725145, 17.5728914152593, 17.6770013267762, 
17.6530027742057, 17.6340294870891, 17.4432563458735, 17.4416872398418, 
17.6295775122599, 17.651375975813, 17.4408807343564, 17.5561240496743, 
17.6490863243478, 17.7624089659032, 18.1616128908896, 17.8126634086638, 
17.5209304636998, 17.5056551305215, 17.6151152309561, 17.629944161796, 
17.8442028876549, 17.7081428718954, 17.5943688159008, 17.6346589843816, 
17.6707656285536, 17.878255754036, 17.9475483858192, 17.9138103344089, 
18.2280489203447, 18.2779985993668, 18.5045059287446, 18.3055466958494, 
18.0813929061191, 18.1066069724926, 18.1035991705582, 18.2351041279044, 
18.1207747946839, 18.140343870442, 18.3064405036347, 18.2525819596875, 
18.2556394115664, 18.1351550728756, 18.1781952679869, 18.684236705932, 
18.7754363617656, 18.6688183305532, 18.9307958795054), .Dim = c(48L, 
2L), .Dimnames = list(NULL, c("lpib", "lrecaudacion")), .Tsp = c(2007, 
2018.75, 4), class = c("mts", "ts", "matrix")), `Alquiler de Viviendas` = structure(c(24.0006751666943, 
24.0340592554054, 24.0680238113269, 24.1189254582254, 24.1429088813474, 
24.1777417773996, 24.2246656047735, 24.251733128356, 24.2502587974969, 
24.2995059086229, 24.3818266633241, 24.4274377756904, 24.4460131221639, 
24.4669868297722, 24.4935610827344, 24.5558404630517, 24.6137504075591, 
24.6672683318767, 24.7094318300949, 24.733862789062, 24.7217146756303, 
24.7250026668479, 24.7143006888566, 24.7402821079578, 24.7546300561241, 
24.767680402686, 24.7843137141875, 24.808853430029, 24.8507316944205, 
24.8588586737822, 24.865040423614, 24.8639260844803, 24.8913005324989, 
24.9071142444601, 24.9198059636058, 24.9225026283004, 24.9570861190926, 
24.9757496061167, 24.9865754608314, 24.9922466239614, 25.0215061302973, 
25.0401591698257, 25.055364085766, 25.0577641891848, 25.0891343758379, 
25.1093228397352, 25.1180411403536, 25.1197533984494, 18.9593078212553, 
19.3008332475991, 19.2310716049415, 19.2088960072736, 19.2964902687309, 
19.3163205064635, 19.3690846090817, 19.2813216073373, 19.338612494098, 
19.3238357321123, 19.3096978069183, 19.3218745761893, 19.3359647313599, 
19.3595512037955, 19.4432337502825, 19.5419292850372, 19.367867853648, 
19.4792554617571, 19.4744789385823, 19.4970904331532, 19.4864988932459, 
19.461090804926, 19.5023798309416, 19.5724897897347, 19.6788361170831, 
19.7895722836087, 19.7660725132758, 19.8233639156982, 19.860259893191, 
19.966495921088, 19.9860812548669, 19.9990319153442, 20.0326109808829, 
20.0452911486641, 20.1108100946792, 20.1520982005704, 20.251063167808, 
20.191144945087, 20.1876101017726, 20.2377436916409, 20.2276964297304, 
20.2915818876374, 20.2896334296863, 20.3486369626646, 20.4005251275962, 
20.3741955381774, 20.4933058962091, 20.5219375235474), .Dim = c(48L, 
2L), .Dimnames = list(NULL, c("lpib", "lrecaudacion")), .Tsp = c(2007, 
2018.75, 4), class = c("mts", "ts", "matrix")), Comercio = structure(c(24.1370434718447, 
24.1474679560082, 24.247774315585, 24.3309113902029, 24.3783871731216, 
24.4445666853879, 24.4808566103939, 24.3739825549957, 24.3422311994027, 
24.3707336209133, 24.4088239471052, 24.4582189848896, 24.5749890813745, 
24.629195568656, 24.6455654186535, 24.6729956287059, 24.7088855755689, 
24.7333803859132, 24.7201636721202, 24.7551669596583, 24.8092172605393, 
24.8061162844105, 24.8168094913596, 24.8323326931589, 24.8272348481957, 
24.8336101639175, 24.8712888927901, 24.8930736664768, 24.9176867747641, 
24.9777325188736, 25.0056963886149, 25.0642368960481, 25.1093674771005, 
25.0869568996516, 25.1678570679605, 25.2154060650998, 25.2128877023623, 
25.2429075653786, 25.281613916562, 25.3005161445634, 25.2961991109808, 
25.3072650677793, 25.2973309550997, 25.3617282270681, 25.3865019926842, 
25.4163177121506, 25.4190764412604, 25.4411634800277, 21.2983035898428, 
21.3570067572794, 21.4452936588011, 21.4237851772619, 21.4273073318266, 
21.4945534576367, 21.4258382239739, 21.3295361689986, 21.3045228064109, 
21.3944020927061, 21.4842073973378, 21.4389384907488, 21.4956072256246, 
21.3930354066646, 21.3278405349757, 21.5128955832531, 21.4507435697624, 
21.5250876204212, 21.4066744817499, 21.4443117036001, 21.6059428597013, 
21.6369336788754, 21.5904970674634, 21.6635309673871, 21.8297886435693, 
21.8465397001402, 21.9068560451084, 21.953016327191, 22.0205414069325, 
21.9367168184625, 21.9770088938013, 22.0085679027535, 22.0860223086015, 
22.060262279998, 22.1844033814521, 22.1960941239081, 22.1002613229137, 
22.1831252090577, 22.269441531446, 22.2390935138727, 22.1294723280912, 
22.2031216834464, 22.2797664344226, 22.3571514107923, 22.3820705584665, 
22.4259895425348, 22.4991828106128, 22.5097930497936), .Dim = c(48L, 
2L), .Dimnames = list(NULL, c("lpib", "lrecaudacion")), .Tsp = c(2007, 
2018.75, 4), class = c("mts", "ts", "matrix")), Comunicaciones = structure(c(22.7159215016701, 
22.7442934923426, 22.8156289557376, 22.8103180688631, 22.9079280521587, 
22.8651731130763, 22.8185625327301, 22.7879980426273, 22.8257259562352, 
22.9543244956688, 22.9024934066312, 22.9249025428433, 22.988677715386, 
23.0101125568586, 22.9546422949912, 22.9850867549534, 22.9651084826172, 
22.9540511508368, 22.9108721846524, 22.9509357016076, 22.8808470019626, 
22.8471037021794, 22.8221045329777, 22.847135573551, 22.8141938440927, 
22.8435180027461, 22.9127566686255, 22.8493302959942, 22.8041794119784, 
22.8202482099194, 22.8161997967681, 22.7727440025612, 22.815257442878, 
22.8298190159059, 22.8664428656122, 22.8862096742561, 22.8553284792427, 
22.878577485562, 22.8968137208948, 22.9241070538481, 22.9144722667933, 
22.8880620951397, 22.9420612614645, 22.9834423350717, 23.0095123223225, 
22.9597572724236, 23.0723326115794, 23.1097434102164, 20.8239716869769, 
20.5226542439175, 20.6026890686682, 20.76310324137, 20.9131971943851, 
20.7205251172263, 20.5774355857312, 20.7136250593865, 20.4564302498454, 
20.7767470846672, 20.9099449005612, 20.951959385823, 20.9848515489911, 
20.9900123676892, 20.9390425878003, 20.8699224312633, 20.9023061288989, 
21.0558234020519, 21.015551221213, 20.9845630460342, 21.0705504536071, 
21.1018779848056, 21.1459912615211, 21.0792073320455, 21.2352066231257, 
21.2852187072191, 21.3713592385595, 21.2529499988486, 21.4376334474516, 
21.4275513591843, 21.4684247354223, 21.5283291824696, 21.4079744851135, 
21.3797841890375, 21.4806534896227, 21.4629795746998, 21.5305509488717, 
21.5253402007029, 21.5288601216958, 21.5186845853155, 21.5536213729911, 
21.4778607621153, 21.5269248692107, 21.6565995758113, 21.5436387178381, 
21.5233578775461, 21.6342874994564, 21.5566078595888), .Dim = c(48L, 
2L), .Dimnames = list(NULL, c("lpib", "lrecaudacion")), .Tsp = c(2007, 
2018.75, 4), class = c("mts", "ts", "matrix")), Construccion = structure(c(24.3104512383546, 
24.3394290212942, 24.3685369226656, 24.4001487592087, 24.4313358764547, 
24.5637013122111, 24.628809834211, 24.4138986476927, 24.5081816554895, 
24.487818690176, 24.5241059114576, 24.4798591263034, 24.6118953722024, 
24.6117736694609, 24.6459997019144, 24.6616526205874, 24.7972745337092, 
24.7026781507544, 24.7538879069839, 24.7702794155517, 24.8630224317717, 
24.7654205720821, 24.7686333334311, 24.7977896117515, 24.8267493068813, 
24.8903323322539, 24.9341888192851, 24.9938126923127, 25.0505717195684, 
25.0409295327285, 25.0695668362172, 25.1536776503307, 25.0496999985082, 
25.1099439688764, 25.1478211910891, 25.2037124928483, 25.1047936889341, 
25.1700961746316, 25.0649422946338, 25.1924468939817, 25.2013639833252, 
25.2708112626769, 25.3429207496999, 25.4601548137581, 25.3718269603702, 
25.5022070811831, 25.5200212858805, 25.6406421545549, 19.1311784876827, 
19.3619706519596, 19.3628787073326, 19.3423206272171, 19.4330346384083, 
19.5156278154174, 19.5729825269357, 19.4363731729595, 19.3711685254248, 
19.3275457471445, 19.3891653130203, 19.3309563793927, 19.480197560506, 
19.5915941730223, 19.620493537738, 19.799001695147, 19.8551635762407, 
19.8323528497659, 19.8221845538554, 19.9675185991679, 19.9382260968915, 
19.6148322148458, 19.7224392314754, 19.6673232367288, 19.7318790786149, 
19.8242729948508, 19.8528377245517, 19.8665021028161, 19.9548882721544, 
19.9685129522984, 20.116318982605, 20.2142748977724, 20.1937649557145, 
20.2549289856348, 20.3206791254058, 20.2985256718267, 20.2899519583351, 
20.4189483025824, 20.2901939953295, 20.3633307990819, 20.4126419877815, 
20.5255910482695, 20.4743999783096, 20.5958991005088, 20.5358549516528, 
20.6569150364701, 20.5858968756658, 20.7941066075041), .Dim = c(48L, 
2L), .Dimnames = list(NULL, c("lpib", "lrecaudacion")), .Tsp = c(2007, 
2018.75, 4), class = c("mts", "ts", "matrix")))

Render computed HTML in RMarkdown

$
0
0

I have a function that produces some html and I want RMarkdown to render the html.

```{r}
outputFromAFunction <- '<span style="background-color: #A6CEE3">A</span>'
outputFromAFunction
```

How do I tell rmarkdown to render outputFromAFunction as the colored letter A instead of simply printing the html as text.

I have already tried the results='asis' code chunk option and it removes the closing span tag and does not render the html.

I need to render a html page not shiny.


Creating a plot with multiple variables in R [closed]

$
0
0

I am having a hard time finding anything similar to my work. I would like to create multiple plots with similar data as this image. I want my y axis to resemble the numbers, which are D15N, and x axis the dates, such as AP16. Is this data set up right or do I need to edit my data? What would be the best plot for this? I would like to see if there is a trend for sites and sampling dates.

enter image description here

Plotting training and test error rates of knn cross-validation in R

$
0
0

I have performed the following cross-validation knn (using the caret package) on the iris dataset. I am now trying to plot the training and test error rates for the result. Here is my attempt but I cannot get the error rates. Can anyone help me please?

library(caret)
data(iris)
sample <- sample(2, nrow(iris), replace=TRUE, prob=c(0.80, 0.20))

iris.training <- iris[sample == 1, 1:4]
iris.test <- iris[sample == 2, 1:4]

iris.trainLabels <- iris[sample == 1, 5]
iris.testLabels <- iris[sample == 2, 5]

# Combine training data and combine test data.
iris_train <- cbind(iris.trainLabels, iris.training)
iris_test <- cbind(iris.testLabels, iris.test)

trControl <- trainControl(method = "cv", number = 5)

# K values 1 3 5 7 9
k_values <- seq(from=1, to=10, by=2)

fit <- train(iris.trainLabels ~ ., method = "knn", tuneGrid = expand.grid(k = k_values), trControl = trControl, data = iris_train)

# Plot
bestK <- function(iris_train, iris.trainLabels, 
iris.testLabels) {
  ctr <- c(); cts <- c()
  for (k in length(k_values)) {
  fit <- train(iris.trainLabels ~ ., method = "knn", tuneGrid = expand.grid(k = k_values), trControl = trControl, data = iris_train)

  trTable <- prop.table(table(fit, iris.trainLabels))
  tsTable <- prop.table(table(fit, iris.testLabels))

  erTr <- trTable[1,2] + trTable[2,1]
  erTs <- tsTable[1,2] + tsTable[2,1]

  ctr <- c(ctr,erTr)
  cts <- c(cts,erTs)
} 
 err <- data.frame(k=k_values, trER=ctr, tsER=cts)
 return(err)
} 

err <- bestK(iris_train, iris.trainLabels, iris.testLabels)

plot(err$k,err$trER,type='o',ylim=c(0,.5),xlab="k",ylab="Error rate",col="blue")
lines(err$k,err$tsER,type='o',col="red")

Find closest match, then next closest, between groups until a specified number of matches has been made

$
0
0

I'd like to find the closest match (smallest difference) of a variable between two groups, but if the closest match has already been made, move on to the next closest match, until n number of matches have been made.

I used the code from this answer (below) to find the closest match of a value between Samplesfor each pairwise grouping of all groups (i.e. Location by VAR).

However, there are many repeats, and the top match for Sample.x 1, 2, and 3, might all be Sample.y 1.

What I'd like to instead is find the next closest match for Sample.x 2, then 3, etc. until I specified number of distinct (Sample.x-Sample.y) matches have been made. But the order of Sample.x is not important, I'm just looking for the top n matches between Sample.x and Sample.y for a given grouping.

I attempted to do this with dplyr::distinct as shown below. But I am unsure how to use the distinct entries for Sample.y to filter the dataframe and then again by smallest DIFF. However, this won't necessarily result in unique Sample pairings.

Is there a smart way to accomplish this in R with dplyr? Is there a name for this type of operation?

 df01 <- data.frame(Location = rep(c("A", "C"), each =10), 
                   Sample = rep(c(1:10), times =2),
                   Var1 =  signif(runif(20, 55, 58), digits=4),
                   Var2 = rep(c(1:10), times =2)) 
df001 <- data.frame(Location = rep(c("B"), each =10), 
                    Sample = rep(c(1:10), times =1),
                    Var1 = c(1.2, 1.3, 1.4, 1.6, 56, 110.1, 111.6, 111.7, 111.8, 120.5),
                    Var2 = c(1.5, 10.1, 10.2, 11.7, 12.5, 13.6, 14.4, 18.1, 20.9, 21.3))
df <- rbind(df01, df001)
dfl <- df %>% gather(VAR, value, 3:4)

df.result <- df %>% 
  # get the unique elements of Location
  distinct(Location) %>% 
  # pull the column as a vector
  pull %>% 
  # it is factor, so convert it to character
  as.character %>% 
  # get the pairwise combinations in a list
  combn(m = 2, simplify = FALSE) %>%
  # loop through the list with map and do the full_join
  # with the long format data dfl
  map(~ full_join(dfl %>% 
                    filter(Location == first(.x)), 
                  dfl %>% 
                    filter(Location == last(.x)), by = "VAR") %>% 
        # create a column of absolute difference
        mutate(DIFF = abs(value.x - value.y)) %>%
        # grouped by VAR, Sample.x
        group_by(VAR, Sample.x) %>%
        # apply the top_n with wt as DIFF
        # here I choose 5, 
        # and then hope that this is enough to get a smaller n of final matches
        top_n(-5, DIFF) %>%
        mutate(GG = paste(Location.x, Location.y, sep="-")))

res1 <- rbindlist(df.result)
res2 <- res1 %>% group_by(GG, VAR) %>% distinct(Sample.y)    
res3 <- res2 %>% group_by(GG, VAR) %>% top_n(-2, DIFF)

Strange arithmetic result [duplicate]

$
0
0

This question already has an answer here:

:

0.3-0.2-0.1 = -2.775558e-17 # in R3.4 / Python 3.5

While

0.4-0.2-0.1-0.1 = 0.

I am very confused now. I am aware of different storage types in both language. I came on this issue because, I needed to determine if a numeric variable (as a result of different arithmetic operations) is positive or negative.

I need to figure out a reliable way to check the sign of numeric variable.

Save ggplot with a function

$
0
0

I would like to create a function to save plots (from ggplot).

Here is a data frame:

### creating data frame
music <- c("Blues", "Hip-hop", "Jazz", "Metal", "Rock")
number <- c(8, 7, 4, 6, 11)
df.music <- data.frame(music, number)
colnames(df.music) <- c("Music", "Amount")

Then I create a plot:

### creating bar graph (this part is OK)
myplot <- ggplot(data=df.music, aes(x=music, y=number)) +
 geom_bar(stat="identity") +
 xlab(colnames(df.music)[1]) +
 ylab(colnames(df.music)[2]) +
 ylim(c(0,11)) +
 ggtitle("Ulubiony typ muzyki wśród studentów")

Now I want to save this plot to .pdf.

This works:

pdf("Myplot.pdf", width=5, height=5)
plot.music.bad
dev.off()

However I would like to automate this with a function which takes as an argument the plot I want to save. I don't know exactly how to do it; here's what I have tried:

save <- function(myplot){
  plot<- myplot
  pdf("lol.pdf", width=5, height=5)
  plot
  dev.off()
}
### .pdf file is created but doesn't work
save(myplot) 

So, how can I do it?

Viewing all 207118 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>