Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all 204742 articles
Browse latest View live

Creating a dataframe using 3 other data frames that have no common merge values

$
0
0

I'm trying to accomplish a very particular type of data cleaning process using R.

I am given 2 dataframe structures and one matrix structure. The matrix corresponds to DF1 as column headers and DF2 as row headers, but I want to take all of this data and convert to a rectangular dataframe with one observation per row (based on result matrix, matrix_data).

Using the code below, I am able to create one observation per row, but for large data sets (~1M+ unique entries), this can take several minutes to run (~5 min). Right now, I am using a for loop to cycle through DF1, and I'm using do.call(... replicate()) to append rows to DF2. Treatment of the matrix_data is simple - I unwrap the data into a vector and cbind it to the DF1_ext and DF2_ext dataframes. Is there a better way to do this in R?

DF1 <- data.frame(x_1 = c('a','b','c','d','e'), y_1 = c('f','g','h','i','j'), z_1 = c('k','l','m','n','o'))
DF2 <- data.frame(v_2 = 1:3, w_2 = 4:6, x_2 = 7:9, y_2 = 10:12, z_2 = 13:15)
matrix_data <- matrix(data = 1:15, nrow = 3, ncol = 5)


DF1_ext <- NULL
DF1_length <- nrow(DF1) * nrow(DF2)

#Use ceiling function to determine which row to put in NULL dataframe
#i.e. ceiling() rounds up to nearest integer value, setting j = to incremental step in origin dataframe
#See resultant DF

for (k in 1:DF1_length) {
  j = ceiling(k / DF1_length * length(DF1[,2]))
  DF1_ext <- rbind(DF1_ext[], DF1[j,])
}

#replicate DF2 matrix with rbind() based on the number of rows in DF1
DF2_ext <- do.call(rbind, replicate(nrow(DF1), DF2, simplify = FALSE))

#cbind() all values together. 
#matrix_data can be transposed or not. This matters in the actual analysis, but should not matter here. 
DF_result <- cbind(DF1_ext, DF2_ext, as.vector(t(matrix_data)))

View(DF_result)

I am seeking a more "R" way of executing this code, hoping that there may be some more efficient functions along the way. This code, as is, can be copied into R and run with only base functions. To be clear, I am seeking a better way of executing in R because this method executes very slowly, and it seems like a lot of running around to do compared to most R methodology.


How to deal with this website in a webscraping format?

$
0
0

I am trying to webscrape this website.

I am applying the same code that I always use to webscrape pages:

url_dv1 <- "https://ec.europa.eu/commission/presscorner/detail/en/qanda_20_171?fbclid=IwAR2GqXLmkKRkWPoy3-QDwH9DzJiexFJ4Sp2ZoWGbfmOR1Yv8POdlLukLRaU"

url_dv1 <- paste(html_text(html_nodes(read_html(url_dv1), "#inline-nav-1 .ecl-paragraph")), collapse = "")

For this website, thought, the code doesn't seem to be working. In fact, I get Error in UseMethod("read_xml") : no applicable method for 'read_xml' applied to an object of class "c('xml_document', 'xml_node')".

Why is it so? How can I fix it?

Thanks a lot!

"sna" or "igraph" : Why do I get different degree values for undirected graph?

$
0
0

I am doing some basic network analysis using networks from the R package "networkdata". To this end, I use the package "igraph" as well as "sna". However, I realised that the results of descriptive network statistics vary depending on the package I use. Most variation is not too grave but the average degree of my undirected graph halved as soon as I switched from "sna" to "igraph".

library(networkdata)
n_1 <- covert_28

library(igraph)
library(sna)

n_1_adjmat <- as_adjacency_matrix(n_1)
n_1_adjmat2 <- as.matrix(n_1_adjmat)

mean(sna::degree(n_1_adjmat2, cmode = "freeman")) # [1] 23.33333
mean(igraph::degree(n_1, mode = "all")) # [1] 11.66667

This doesn't happen in case of my directed graph. Here, I get the same results regardless of using "sna" or "igraph".

Is there any explanation for this phenomenon? And if so, is there anything I can do in order to prevent this from happening?

Thank you in advance!

randomize observations by groups (blocks)

$
0
0

I have a data frame with I obsevations, and each observation belongs to one of g categories.

set.seed(9782)
I <- 500
g <- 10
library(dplyr)

anon_id <- function(n = 1, lenght = 12) {
  randomString <- c(1:n)
  for (i in 1:n)
  {
    randomString[i] <- paste(sample(c(0:9, letters, LETTERS),
                                    lenght, replace = TRUE),
                             collapse = "")
  }
  return(randomString)
}

df <- data.frame(id = anon_id(n = I, lenght = 16),
                 group = sample(1:g, I, T))

I want to randomly assign each observation to one of J"urns", given some vector of probabilities p. That is the probability of being assign to urn J=1 is p[1]. The added complexity is that I want to do this block by block.

If I ignore the blocks, I can do this easily:

J <- 3
p <- c(0.25, 0.5, 0.25)
df1 <- df %>% mutate(urn = sample(x = c(1:J), size = I, replace = T, prob = p))

I thought about this method to do it by "block"

# Block randomization
randomize_block <- function(g) {
  df1 <- df %>% filter(group==g) 
  size <- nrow(df1)
  df1 <- df1 %>% mutate(urn = sample(x = c(1:J), 
                                     size = size, 
                                     replace = T, 
                                     prob = p))
  return(df1)

}

df2 <- lapply(1:g, randomize_block)
df2 <- data.table::rbindlist(df2)

Is there a better way?

randomize observations by groups (blocks) without replacement

$
0
0

This is a follow up question. The answers in the previous question are doing the random sampling with replacement. How can I change the code so that I assign each observation to on of J"urn" without putting the observation back in the 'lottery'?

This is the code I have right now:

set.seed(9782)
I <- 500
g <- 10
library(dplyr)

anon_id <- function(n = 1, lenght = 12) {
  randomString <- c(1:n)
  for (i in 1:n)
  {
    randomString[i] <- paste(sample(c(0:9, letters, LETTERS),
                                    lenght, replace = TRUE),
                             collapse = "")
  }
  return(randomString)
}

df <- data.frame(id = anon_id(n = I, lenght = 16),
                 group = sample(1:g, I, T))

J <- 3
p <- c(0.25, 0.5, 0.25)

randomize <- function(data, urns=2, block_id = NULL, p=NULL, seed=9782) {
  if(is.null(p)) p <- rep(1/urns, urns) 
  if(is.null(block_id)){
    df1 <- data %>% 
      mutate(Treatment = sample(x = c(1:urns), 
                                size = n(), 
                                replace = T, 
                                prob = p))
    return(df1)
  }else{
    df1 <- data %>% group_by_(block_id) %>% 
      mutate(Treatment = sample(x = c(1:urns), 
                                size = n(), 
                                replace = T, 
                                prob = p))
  }
}    

df1 <- randomize(data = df, urns = J, block_id = "group", p = p, seed = 9782)

If I change replace = T to replace = F I get the following error:

 Error: cannot take a sample larger than the population when 'replace = FALSE'

Clarification of my objective:

Suppose that I have 10 classrooms (or villages, or something like that). To keep it simple, suppose each classroom has 20 students (in reality they will have N_j). Classroom per classroom, I want to assign each student to one of J groups, for example J=3. P says the fraction that will be assigned to each group. For example 25% to group 1 40% to group 2 and 35% to group 3.

Color a single node with tidygraph and ggraph

$
0
0

I have a tidy_igraph data network that I plot out. I want to however color a single node in the graph a distinct color from all others as it is the central point in the graph.

I did the following to make a tibble with a color column:

attending <- consult_igraph_tidy %>%
    activate(nodes) %>%
    filter(label == "Person_A") %>%
    mutate(color = "purple") %>%
    as_tibble()

Now I wanted to add a single node as a layer to the ggraph like so:

consult_igraph_tidy %>% 
    mutate(deg = degree(consult_igraph_tidy)) %>%
    activate(edges) %>%
    filter(Source_to_Target_Count >= 3) %>%
    activate(nodes) %>%
    filter(!node_is_isolated()) %>%
    mutate(friends = case_when(
        deg < 15 ~ "few"
        , deg < 25 ~ "medium"
        , TRUE ~ "most"
    )) %>%
    mutate(group = node_coreness(mode = "all")) %>%
    ggraph(layout = "fr") +
    geom_edge_link(
        aes(
            alpha = .618
        )
    ) +
    geom_node_point(
        aes(
            size = deg
            , color = factor(friends)
        )
    ) +
    geom_node_point( # the single point I want to add
        data = attending
        , aes(color = "purple")
    )

Reading and projecting ESRI grid adf in R

$
0
0

I am having trouble working with a DEM file in R that is available as an ESRI grid adf file (e.g. a folder with various adf files such as hdr.adf, w001001.adf etc.)

I can use the raster package to read and plot the file:

dem<-raster("w001001.adf")
plot(dem)

The resulting plot looks fine and I can do things like crop the file. However, I want to reproject the file from:

 +proj=laea +lat_0=-100 +lon_0=6370997 +x_0=45 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 

To:

 +proj=aea +lat_1=49.0628 +lat_2=50.4997 +lon_0=-113.5986 +ellps=WGS84 +datum=WGS84 +units=m +no_defs +towgs84=0,0,0 

based on another raster. The two rasters have different extents and resolutions in addition to projections and I usually use the spatial_sync_raster from the spatial.tools R package to deal with this type of conversion. But the result is an empty raster. I can't figure out if this is because I haven't properly read in the ESRI adf data (i.e. it's all in memory) or because the spatial sync is too complicated. I've tried just cropping the dem layer to an approximate extent consistent with the second raster and using the projectRaster command from the raster package (so removing the resolution issue and just trying a straight up projection), but again get an empty raster. SO I think the issue is with importing the adf file. Several days of googling has revealed no R-only solutions (I'd rather not have to invoke ArcGIS).

I appreciate any and all help!

Bits of code tried is:

dem_p<-projectRaster(dem,crs=target_proj)
dem_p<-spatial_sync_raster(dem,climate,method="bilinear")

Using R to find out if gerrymanders impact corruption convictions [closed]

$
0
0

I have a small request for help with some tests of a data set I am working with for my thesis.

I am attempting to see if there is a relationship between the years before during and after a census (thus a change in districts) and the number of convictions for corruption.

I have graphed the data and used the students T test and do not see anything ( I could be mistaken of course) but I remembered from class that the professor said that there were other ways to test data that were not in the course I took this time last year. this is a screen shot of the data - it shows state and the number of convictions by year for 35 years.

Graph of convictions by yearenter image description here

Above are graphs of the data I have already made. The graph for 'Compactness' was based on states that were identified as having unusually non-compact districts.


Merging two data frames fails to populate columns when combined

$
0
0

I am new using R. I have two data frames (as below) and I would like to add the information from df2 in df1. The only column in common between both of data frames is "Sample". So I tried to use this column to merge both data frames.

df1

structure(list(Segment = c(3L, 3L, 3L, 4L, 5L, 6L, 6L, 6L, 7L, 
7L), Position = c(838L, 891L, 1204L, 732L, 1550L, 688L, 1167L, 
1446L, 950L, 981L), `AA-REF` = structure(c(2L, 5L, 7L, 6L, 1L, 
8L, 8L, 1L, 3L, 4L), .Label = c("", "D", "E", "H", "K", "L", 
"Q", "T"), class = "factor"), `AA-ALT` = structure(c(4L, 2L, 
2L, 3L, NA, 5L, 3L, NA, 1L, 4L), .Label = c("E", "K", "M", "N", 
"T"), class = "factor"), SYN = structure(c(2L, 3L, 2L, 2L, 1L, 
3L, 2L, 1L, 3L, 2L), .Label = c("", "N     ", "Y     "), class = "factor"), 
    Sample = c("AO103", "AO103", "AO103", "AO103", "AO103", "AO103", 
    "AO103", "AO103", "AO103", "AO103")), row.names = c(NA, 10L
), class = "data.frame")
  Segment Position AA-REF AA-ALT    SYN Sample
1         3      838      D      N N       AO103
2         3      891      K      K Y       AO103
3         3     1204      Q      K N       AO103
4         4      732      L      M N       AO103
5         5     1550          <NA>         AO103
6         6      688      T      T Y       AO103
7         6     1167      T      M N       AO103
8         6     1446          <NA>         AO103
9         7      950      E      E Y       AO103
10        7      981      H      N N       AO103
11        8      199      T      N N       AO103
12        1      341      T      K N       AO104
13        1      934      T      A N       AO104
14        1     1327      L      F N       AO104
15        1     1349      D      G N       AO104

df2

structure(list(Sample = c("AO208 ", "AO209 ", "AO210 ", "AO211 ", 
"AO212 ", "AO213 ", "AO100 ", "AO101 ", "AO102 ", "AO103 "), 
    Quail = c(7, 8, 9, 10, 11, 12, 7, 8, 9, 10), day = c(3, 3, 
    3, 3, 3, 3, 5, 5, 5, 5), Expo = structure(c(1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L), .Label = " DC ", class = "factor"), 
    Group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), .Label = " var", class = "factor")), row.names = c(NA, 
10L), class = "data.frame")
 Sample Quail day Expo Group
1  AO208      7   3  DC    var
2  AO209      8   3  DC    var
3  AO210      9   3  DC    var
4  AO211     10   3  DC    var
5  AO212     11   3  DC    var
6  AO213     12   3  DC    var
7  AO100      7   5  DC    var
8  AO101      8   5  DC    var
9  AO102      9   5  DC    var
10 AO103     10   5  DC    var
11 AO104     11   5  DC    var

NOTE: Not all entries in df2$Sample are present in df1$Sample

I would like to get something like the following:

  Segment Position AA-REF AA-ALT    SYN Sample    Quail   day    Expo    Group
1         3      838      D      N N       AO103    10   5  DC    var
2         3      891      K      K Y       AO103    10   5  DC    var
3         3     1204      Q      K N       AO103    10   5  DC    var
4         4      732      L      M N       AO103    10   5  DC    var
5         5     1550          <NA>         AO103    10   5  DC    var
6         6      688      T      T Y       AO103    10   5  DC    var
7         6     1167      T      M N       AO103    10   5  DC    var
8         6     1446          <NA>         AO103    10   5  DC    var
9         7      950      E      E Y       AO103    10   5  DC    var
10        7      981      H      N N       AO103    10   5  DC    var
11        8      199      T      N N       AO103    10   5  DC    var
12        1      341      T      K N       AO104    11   5  DC    var
13        1      934      T      A N       AO104    11   5  DC    var
14        1     1327      L      F N       AO104    11   5  DC    var
15        1     1349      D      G N       AO104    11   5  DC    var

I tried:

x <- merge(df1, df2, by = "Sample", all = TRUE)

Even though this is adding the columns, everything from df2 is placed at the end of the df1.

I also tried using dplyr's left_join (among others) as:

x <- df1 %>%
  left_join(df2, by = "Sample")

This adds empty columns from df2 and no information at all.

I have been looking at many merging posts but none of those seem to address my problem.

I also tried match without success.

Passing list of variable names to custom function with mutat

$
0
0

I am trying to perform a function over each row and create a new column that considers multiple columns using tidyverse , I was initially using rowwise() but that was very slow. I want the list of columns into my custom function be a variable, but I can't get it to work unless I explicitly list the variable names. For example, this works:

low_risk_codes <- c(0,1,10)
vars <- c("V1", "V2")
m <- matrix(1:9, ncol=3)
classify_low_risk_drug <- function(...){
  t <- cbind(...)
  return(apply(t, 1, function(x) ifelse(any(x %in% low_risk_codes), 1, 0)))
}

as.data.frame(m) %>%
  mutate(val4 = classify_low_risk_drug(V1, V2))

But if I want it to evaluate using the column input as vars :

as.data.frame(m) %>% 
  mutate(val4 = classify_low_risk_drug(vars))

But I can't get it to work even if I include !!, what am I missing?!

Also any suggestions for how to do this with map instead are also appreciated!

Overlay two plots and highlight the highest values with ggplots in R

$
0
0

I want to overlay two vectors:

vect1 <- c(0.00, 0.04, 0.28, 0.01, 0.00, 0.13, 0.00, 0.00, 0.00, 0.00, 0.00, 0.3, 0.2, 0)
vect2 <- c(0.19, 0.14, 0.80, 0.24, 0.14, -0.38, 0.06, 0.11, 0.11, -0.2, 0.06, -0.45, 0.6, 0)

I want to color only vect2 values (vect1 in black): positive values in red and negative values in blue. The closer the value is to 1 or -1 the more intense the colour will be. I want also to add a circle at the peak apex of the 5 most intense values where the size of the circle is proportional to the value (e.g. for vect2 : 0.80, 0.60, -0.45, -0.38, 0.24).

How to push a json object into array using R

$
0
0

I have tried several technics to push json object into an array and to save in the same format of the below example, but without success.

Is anyone has a solution to do it in R ?

Thanks you

EDIT :

I found the solution.

library(jsonlite)

#Set an empty list
list1 <- vector(mode = 'list', length = 2)

# data example
json_data <- list(object1 = list(birthday = '2000-02-14', Age = '20'), 
              object2 = list(Candidate_Number = '1999283', first_attempt = TRUE), 
              object3 = list(name = 'John E.', result = list(), study_hours = 150, GPA = 3.8, exam_infos = list(cost = 800, location = 'F3C6V9', past_exams = list(list(exam_name = 'Science', score = 'passed'), list(exam_name = 'Geometric', score = 'passed')))), 
              object4 = list(study_manual_used = 'Physics Theory', version_found = list(Digital = '1999-01-01', Paper = '1999-01-01')))

# append data into json
for(i in length(list1)){

  list1[[i]] <- json_data

}

# Write to json on his home
write(toJSON(list1, auto_unbox = TRUE, pretty = TRUE), file.path(Sys.getenv()['USERPROFILE'], 'file.json'))

Transpose and create new variables in R

$
0
0

I have a data frame with the following structure:

study_id date
1        01/01/2011
2        01/01/2012
2        01/01/2013
3        01/01/2014
3        01/01/2015
3        01/01/2016

I would like to change the data frame to:

study_id date_1      date_2    date_3
1        01/01/2011  NA         NA
2        01/01/2012  01/01/2013 NA
3        01/01/2014  01/01/2014 01/01/2016

Please note these dates are just examples, the date does not follow this order (01/01/2011 + 1 years and so goes on)

PS: Thanks webprogrammer for editing my question. I deleted before I noticed you fixed it.

How to plot this data in ggplot?

$
0
0

This is my data:

enter image description here

This is data for yeast fermentation for a bio lab. We were told to use graphing software to plot the data. I need to make a graph that shows the difference in levels over time of CO2 production per tube. Tube 1 = Volume1+CO2, Tube 2 = Volume2+CO22, etc. I'm very new to R and am not sure how to proceed.

How to calculate the standard error of the mean for circular data?

$
0
0

I followed the suggestions here to calculate the SD from circular data in the R circular package: How to calculate standard deviation of circular data

However, I need the SE of the mean for a number of different points I have for Aspect (aspect for the terrain I am working on). Can I just divide the SD by the square root of N like I would in linear calculations or do I need to do something else?

Thank you


How to add abline in ggplot2 with x-axis as year?

$
0
0

I am trying to make a temperature trend plot using ggplot2. i could not plot abline with given slope and intercept (they are the result of Sen Estimate for trend calculation).

df is subset of my data.

df<- structure(list(Mean_Tmin = c(-6.85888797419416, -6.61640313272608, 
-4.10521557474695, -3.33048575952967, -4.85158854541956, -6.90958783702363, 
-5.4738307882025, -5.50757613208475, -3.34384859553843, -4.79079906097484, 
-4.64407818570133, -4.17247075814977, -4.00324802670715, -2.92271555321199, 
-4.26302362769172, -4.05299360203852, -3.65186880762282, -2.6784292470159, 
-3.55928476504212, -3.26383609036539, -3.75455004498325, -4.01641178860892, 
-4.95576810887431, -3.85043378628423, -3.39114296653151, -4.62589589895125, 
-2.62611368991667, -3.60484580346817, -3.55386283157491, -3.76902382151618, 
-3.05841501472943, -1.77644594829943, -4.01934355211525, -2.35357265559614, 
-3.63012525456073, -4.61818077798782, -2.78669513101182, -1.87421430381448, 
-2.99847245406934, -3.57236610167573, -3.56963065732644, -2.8243000106372, 
-1.99565301030409, -3.3461888162997, -2.58940007000144, -3.19945356820737, 
-5.37830757225912, -2.73135885451205, -1.88970245530541, -2.54034752481066, 
-3.38038340627931, -3.13416288370415, -2.8610910675591, -2.89723228973215, 
-2.3992604730445, -2.68507391337318, -2.92441273949878, -3.33097198455173, 
-4.43334081889723, -4.53411741435393, -2.96651491236555), ID = c(1, 
 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 
52, 53, 54, 55, 56, 57, 58, 59, 60, 61), av = c(NA, NA, -5.15251619732328, 
-5.16265616988918, -4.93414170098446, -5.21461381245202, -5.21728637965377, 
-5.20512848276483, -4.75202655250037, -4.49175454648982, -4.1908889254143, 
-4.10666231694902, -4.00110723029239, -3.88289031355983, -3.77876992345444, 
 -3.51380616751619, -3.64112000988222, -3.44128250241695, -3.3815937910059, 
-3.45450238720312, -3.9099701595748, -3.96819996382322, -3.99366133905644, 
-4.16793050985004, -3.88987089011159, -3.61968642903037, -3.5603722380885, 
-3.63594840908544, -3.32245223224107, -3.15251868391762, -3.23541823364704, 
-2.99536019845129, -2.9675804850602, -3.27953363771187, -3.48158347425435, 
-3.0525576245942, -3.18153758428884, -3.16998575371184, -2.96027572957956, 
-2.96779670550464, -2.99208444680256, -3.06162771924863, -2.86503451291377, 
-2.79099909508996, -3.30180060741434, -3.44894177625593, -3.15764450405708, 
-3.14783399501892, -3.18401996263331, -2.73519102492232, -2.76113746753173, 
-2.96264343441708, -2.93442602406384, -2.79536412548262, -2.75341409664154, 
-2.84739028004007, -3.15461198587309, -3.58158337413497, -3.63787157393344, 
 NA, NA), yr = structure(c(-7305, -6940, -6575, -6209, -5844, 
-5479, -5114, -4748, -4383, -4018, -3653, -3287, -2922, -2557, 
-2192, -1826, -1461, -1096, -731, -365, 0, 365, 730, 1096, 1461, 
1826, 2191, 2557, 2922, 3287, 3652, 4018, 4383, 4748, 5113, 5479, 
5844, 6209, 6574, 6940, 7305, 7670, 8035, 8401, 8766, 9131, 9496, 
 9862, 10227, 10592, 10957, 11323, 11688, 12053, 12418, 12784, 
13149, 13514, 13879, 14245, 14610), class = "Date"), year = c("1950", 
"1951", "1952", "1953", "1954", "1955", "1956", "1957", "1958", 
"1959", "1960", "1961", "1962", "1963", "1964", "1965", "1966", 
"1967", "1968", "1969", "1970", "1971", "1972", "1973", "1974", 
"1975", "1976", "1977", "1978", "1979", "1980", "1981", "1982", 
"1983", "1984", "1985", "1986", "1987", "1988", "1989", "1990", 
"1991", "1992", "1993", "1994", "1995", "1996", "1997", "1998", 
 "1999", "2000", "2001", "2002", "2003", "2004", "2005", "2006", 
"2007", "2008", "2009", "2010")), .Names = c("Mean_Tmin", "ID", 
 "av", "yr", "year"), row.names = c(NA, -61L), class = "data.frame").

I used this code to make a trend analysis plot with median based trend and intercept. my code so far is

   tplot<- ggplot(dt.anm, aes(yr,Mean_Tmin)) +
              geom_line(aes(y=Mean_Tmin),lty="dotted",color="Blue")+
              geom_point(aes(y=Mean_Tmin,colour="Mean_Tmin"),shape=8,size=5)+
              geom_line(aes(y=av,colour="5 yrs moving average",fill=F), size=0.8)+
              stat_smooth(method=lm)+
              geom_abline(aes(slope=0.0322,intercept=-4.581,color="red"))
    tplot

The problem is the abline is vertical and i do not know why it is not considering slope or intercept. Can anyone suggest me how can i change my code so that there will be horizontal abline with that slope and intercept.I am looking for something,like this but with year as x-axis. thank you, aseem

How can i fill this matrix (then generalize it to an 3D-array) whit a loop and using conditions too? in R

$
0
0

I just want to fill this matrix, I know this is a very easy problem, but I'm not good at all. I won't use these numbers in the matrix. I had tried with for loop, but the problem is this loop only shows the last iteration. I repeat, I don't want the numbers from 1 to 9. I have this:

(mat<-matrix(0, nrow = 3, ncol = 3))
for (i in 1:3) {
  for (j in 1:3) {
    if (j==1 & i==1) {
      mat[i,j]=6
    } else if (j==1 & i==2) {
      mat[i,j]=7
    } else if (j==1 & i==3) {
      mat[i,j]=8
    }
  }
}

I want a code whithout put the conditions & i==1, & i==2, & i==3. I want to make another variable k from 1 to 3. I tried this, but the loop only show me the value in 3. Thank you so much.

Edit: I'm going to show you an example abou I want to solve. You will see that's about the same problem. I have the next data frame:

base2<-c(  20, 15, 17, 23, 19, 21, 16, 22, 18)
base2.1<-c( 6,  5,  3,  4,  1,  7,  2,  9,  8)
base3<-data.frame(base2,base2.1)
names(base3)=c("age","mean")

base3

I want to fill a vector vec where vec[1]=5 (because as you can see,age=15), vec[2]=2 (because ,age=16) and so on, so on.

I have tried this just for the first element:

(vec<-c(rep(0,length(base3$mean))))

for (i in 1:length(base3$mean)) {
  if (base3$age[i]==15) {
    vec[1]=base3$mean[which(base3$age==15)]
      } 
}
vec

Of course, I don't want to put number 1 on this part of the loop: vec[1]=base3$mean[which(base3$age==15)

If I want to fill the entire vector, I have this:

for (i in 1:length(base3$mean)) {
  for (j in sort(base3$age)) {
vec[i]=base3$mean[which(base3$age==j)]  
 }
}
vec

But the foor loop only show me the last iteration:

# [1] 4 4 4 4 4 4 4 4 4

I want the next result:

[1] 5 2 3 8 1 6 7 9 4

R abort session, fatal error, using multiPhylosignal 'picante'

$
0
0

I'm testing for a phylogenetic signal in plant traits using the picante package using the multiPhylosignal function. I have my data and tree set up according to the manual, but everytime I run my code I get a pop up "R Session Aborted R encountered a fatal error. The session was terminated." and the only option is to start a new session. I have not got it to run without crashing once and I am not sure what the error is.

Here is a link to download my phylogeny with the species I measured traits for. The phylogeny is a dated.new file and does not open in excel or with R without a phylogenetic package to read it (e.g. picante)

I have also now uploaded the CSV with the trait data located here

Here is my code:

YLRtreetraits <-read.tree(file.choose())  #use this to open the newick tree
YLRtreetraits$tip.label <- capitalize(YLRtreetraits$tip.label)  
plot(YLRtreetraits); axisPhylo() 

STraits <- read.csv("YLRSpeciesTraitPhylo.csv")
rownames(STraits) <- STraits[,1]
STraits[1:1] <- list(NULL)    #Trait Data


dput(STraits)

STraits <- df2vec(as.matrix.data.frame(STraits), colID=1:13)
STraits <- STraits[1:23,]

#this is where the code always crashses
multikphylo <- multiPhylosignal(STraits, YLRtreetraits)

Here is the trait matrix

> STraits
                         Thickness       SLA        VLA      PtoA  Lobedness      d13C      d15N
Achillea_millefolium     0.2460014 313.37277  3.9266603 27.883472 324.192137 -31.72317  6.866000
Artemisia_californica    0.2457496 147.26442 12.4538144 21.405490 145.468053 -28.76412  7.184118
Avena_barbata            0.1806000 266.22604  2.0010967  4.258448  43.671745 -31.00733  7.724667
Baccharis.glutinosa      0.2921833 188.55328  2.0902859  2.197490   8.126878 -29.00889  4.723333
Baccharis_pilularis      0.4607274  94.83021  0.9670566  3.374101   6.844047 -28.01133  4.700667
Bromus_carinatus         0.1753837 253.13750  1.8565264  4.053892  50.287960 -29.97682  7.592727
Bromus_hordeaceus        0.1366190 242.76840  3.8998693  8.868943  67.526653 -31.44143  7.357143
Carduus_pycnocephalus    0.4474833 188.19508  3.5401087  5.345907  81.591838 -31.29000  8.280000
Erigeron_canadensis      0.1235545 354.77298  0.9662612  2.741911   5.658662 -31.41667  7.683333
Ericameria_ericoides     0.2088789 198.12134 22.6190345 56.108810  64.019041 -31.89667  4.216667
Eschscholzia_californica 0.1208374 390.00074  5.5741182 13.358293 167.892846 -29.77000  9.060333
Festuca_bromoides        0.1177983 203.64997 12.8318043 27.118006 128.823768 -32.42333  7.225000
Festuca_perennis         0.1898500 279.29192  1.9362350  4.169356  70.740632 -32.04333  8.923333
Geranium_dissectum       0.1178849 269.83951  7.3046587 12.376485 115.190032 -31.08545  6.132727
Lupinus_nanus            0.1255962 268.69517  5.2903927 11.972935  56.477617 -32.25429  1.635714
Lupinus_variicolor       0.1610315 170.60479  5.5774146 14.817408  60.068747 -32.22429  1.937143
Medicago_polymorpha      0.1138167 304.04214  9.2125266  3.831219  18.404729 -30.29400  0.454000
Mimulus_aurantiacus      0.1688437 190.93752  2.5959366  2.434252   8.486408 -29.89455  5.967273
Raphanus_sativus         0.2499250 253.25415  2.9799423  2.611690  13.187371 -32.29800 10.564667
Sisyrinchium_bellum      0.1691389 199.43090  3.1102748  5.974129  90.706224 -29.67333  6.218333
Sidalcea_malviflora      0.1917777 171.95936  1.6222312  5.696670  98.516985 -29.32925  9.943000
Sonchus_asper            0.0983500 279.67666  2.1107129  4.459474  28.220847 -31.69133  7.418000
Nasella_pulchra          0.1771103 169.22917  7.0029006 14.378510 198.470881 -30.96810  7.439483
                              Carbon    Nitrogen       CN     PD.0    PD.10    PD.50
Achillea_millefolium     0.000866350 0.000051300 18.04167 7.081424 48.22289 280.7975
Artemisia_californica    0.000756765 0.000037200 21.05556 6.980966 46.40463 286.1870
Avena_barbata            0.001227267 0.000062100 23.38667 8.706421 71.72344 268.6173
Baccharis.glutinosa      0.001388333 0.000069200 23.78889 8.235222 71.48248 266.3621
Baccharis_pilularis      0.001386733 0.000062800 26.03333 8.706421 71.72344 268.6173
Bromus_carinatus         0.001240318 0.000057600 22.02727 8.731711 71.32551 271.3872
Bromus_hordeaceus        0.001143786 0.000054000 25.44286 8.877748 71.80232 266.8042
Carduus_pycnocephalus    0.001116000 0.000060000 18.60000 8.706421 71.72344 268.6173
Erigeron_canadensis      0.001135000 0.000061300 20.93333 7.884788 72.81775 261.0000
Ericameria_ericoides     0.001291667 0.000058000 22.93333 8.706421 71.72344 268.6173
Eschscholzia_californica 0.000770033 0.000056100 14.10000 7.187413 45.37536 293.4173
Festuca_bromoides        0.000965667 0.000045500 25.00000 8.706421 71.72344 268.6173
Festuca_perennis         0.001129400 0.000050000 26.68000 8.706421 71.72344 268.6173
Geranium_dissectum       0.001210091 0.000057100 26.43636 8.706421 71.72344 268.6173
Lupinus_nanus            0.001171857 0.000105714 11.15714 8.785906 71.11917 277.3228
Lupinus_variicolor       0.001153429 0.000061700 20.22857 8.662868 71.17415 269.9471
Medicago_polymorpha      0.001256933 0.000115867 13.21333 8.706421 71.72344 268.6173
Mimulus_aurantiacus      0.001411091 0.000050400 28.50909 8.529721 71.26665 265.9383
Raphanus_sativus         0.001146867 0.000080900 17.61333 8.706421 71.72344 268.6173
Sisyrinchium_bellum      0.000526667 0.000020500 25.25000 5.561880 31.85195 297.3334
Sidalcea_malviflora      0.000692364 0.000037000 19.46250 6.815302 49.24619 280.0572
Sonchus_asper            0.001153933 0.000049900 27.28667 8.706421 71.72344 268.6173
Nasella_pulchra          0.000795431 0.000042000 19.22931 6.911401 47.45061 280.3423

I can paste the data in a different format if needed

Print pretty data.frames/tables to console

$
0
0

Is there a way to print small data.frames to the console in a more readable manner?

For example, would it be possible to output to the console:

library(MASS)   
iris[1:5, ]

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa

as

iris[1:5, ]

  +--------------+-------------+--------------+-------------+---------+
  | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
  +--------------+-------------+--------------+-------------+---------+
1 |          5.1 |         3.5 |          1.4 |         0.2 |  setosa |
2 |          4.9 |         3.0 |          1.4 |         0.2 |  setosa |
3 |          4.7 |         3.2 |          1.3 |         0.2 |  setosa |
4 |          4.6 |         3.1 |          1.5 |         0.2 |  setosa |
5 |          5.0 |         3.6 |          1.4 |         0.2 |  setosa |
  +--------------+-------------+--------------+-------------+---------+

I realise for large data.frames it would take up an unnecessary amount of time, but if it's an option, I would like to be able to look at small frames in a more structured manner.

In particular, when I have two text fields next to each other, it would be much easier with a pipe between the two fields to separate them, as the spacing between words is the same size as the spacing between columns.

Thanks

The length of trainPred is not correct in prediction function with R

$
0
0

This is my code for part of Naive Bayes a

trainPred<- predict(NBclassfier, newdata = train, type = "raw")

but I am getting wrong number for length of trainPred which two times biger than the actual size of the trainPre.

Even when I am using

trainPred<- predict(NBclassfier, newdata = train, type = "class")

I only get 0 for length of trainPred

so when I am running bellow code I get an error

trainTable <- table(train$prog, trainPred)

The Code for NBclassifer is NBclassfier = naiveBayes(prog~., data= train)

the whole code an an error

 library(caret)
library(e1071)

 set.seed(25)
trainIndex=createDataPartition(NaiveData$prog, p=0.8)$Resample1
train=NaiveData[trainIndex, ]
test=NaiveData[-trainIndex, ]

check the balance

print(table(NaiveData$prog))



 0   1 
496 261 

Check the train table

print(table(train$prog))



 0   1 
388 218 

NBclassfier = naiveBayes(prog~., data= train)
trainPred <- predict(NBclassfier, newdata = train, type = "raw")
trainPred<- trainPred
trainTable <- table(train$prog, trainPred)


Error in table(train$prog, trainPred) :   all arguments must have the same length
Viewing all 204742 articles
Browse latest View live