unixODBC installed but odbcinst.ini and odbc.ini are empty

February 19, 2020, 4:33 pm

≫ Next: Is there a way to clean this up in R?

≪ Previous: Installed R packages in Dockerfile won't be found when running container

I'm trying to setup a CentOS 8 RStudio-Server to connect to MS SQL server using odbc. I think I've installed the unixODBC driver, the odbcinst -j command shows below: But the ini files are empty and R odbc isn't able to connect to the db. Hoping someone can provide some hints on how to troubleshoot this. Thank you in advance.

$ odbcinst -j
unixODBC 2.3.7
DRIVERS............: /etc/odbcinst.ini
SYSTEM DATA SOURCES: /etc/odbc.ini
FILE DATA SOURCES..: /etc/ODBCDataSources
USER DATA SOURCES..: /home/user/.odbc.ini
SQLULEN Size.......: 8
SQLLEN Size........: 8
SQLSETPOSIROW Size.: 8

↧

Is there a way to clean this up in R?

February 19, 2020, 4:34 pm

≫ Next: How to reference Timezone (TZ) from a separate column?

≪ Previous: unixODBC installed but odbcinst.ini and odbc.ini are empty

I'm looking for a way to clean this up and make it more streamlined. I'm new to r, so I'm not really sure how to. I've got this line of code, and I want it to be more concise. Any tips?

sum(p[which(x == 2)] ++ p[which(x == 4)] ++ p[which(x == 6)] ++ p[which(x == 8)] ++ p[which(x == 10)] ++ p[which(x == 12)]  ++ p[which(x == 14)] ++ p[which(x == 16)] ++ p[which(x == 18)] ++ p[which(x == 20)])

↧

How to reference Timezone (TZ) from a separate column?

February 19, 2020, 4:36 pm

≫ Next: labelling factors and keeping numerical values

≪ Previous: Is there a way to clean this up in R?

I am attempting to assign the correct timezone (TZ) for each observation in my dataset (Attached screenshot). I have successfully been able to mutate the TZ columns (Start_TimeZone) to new columns (Start_TimeZone_New) to represent the "normal" TZ designations (i.e. "America/Los_Angeles"). The issue I am running into, is understanding how to assign this new TZ column to each date/time observation (2nd code snippet). The ultimate goal is to utilize the TZ assignment to calculate the duration between start and end date/time (end date/time not shown).

```
comp_report_tz %>% 
mutate(Start_TimeZone_New = case_when(is.na(Start_TimeZone) ~ "missing",
           Start_TimeZone == "-08:00" ~ "America/Los_Angeles",
           Start_TimeZone == "-07:00" ~ "America/Phoenix",
           Start_TimeZone == "-06:00" ~ "America/Chicago",
           Start_TimeZone == "-05:00" ~ "America/New_York",
                                       TRUE ~ "others")) %>% 
mutate(End_TimeZone_New = case_when(is.na(End_TimeZone) ~ "missing",
           Start_TimeZone == "-08:00" ~ "America/Los_Angeles",
           Start_TimeZone == "-07:00" ~ "America/Phoenix",
           Start_TimeZone == "-06:00" ~ "America/Chicago",
           Start_TimeZone == "-05:00" ~ "America/New_York",
                                       TRUE ~ "others"))
```

``` 
comp_report_adj %>% 
mutate(Start_Time_Final = as.POSIXct(comp_report_tz$Start_Date_Time, format = "%m/%d/%y 
%I:%M%p", tz=comp_report_adj$Start_TimeZone_New)
```

↧

labelling factors and keeping numerical values

February 19, 2020, 4:38 pm

≫ Next: How to sort plotly stacked bar graph in r by y value?

≪ Previous: How to reference Timezone (TZ) from a separate column?

I am having some issues with creating factors that I can refer to by both, the numeric value and the "label".

Supposedly the lfactors package does this, however I have been unable to execute it as such. So, this is what I did:

library(lfactors)
cars <- mtcars


str(cars)

'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

If we look at the "carb" column (probably reflecting carbon emissions), its a numerical

so using the lfactors package I transformed it:

cars$carb <- lfactor(c(1:4),
                     levels = c(1:4), 
                     labels = c("low", "medium", "high", "extreme" ))
str(cars)

'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: Factor w/ 4 levels "low","medium",..: 1 2 3 4 1 2 3 4 1 2 ..

I noticed that it changed to a factor, as per the package description, so I did my checks

levels(cars$carb) 
[1] "low""medium""high""extreme" # correct

cars$carb == "medium"
[1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
[23] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE # correct

cars$carb == 2  
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # incorrect

I still cant refer to the factor by levels and values, so I wondered if anyone has used this package before or has any suggestions for an alternative?

An close alternative, but not quite there

Even though it is not perfect, as I can not refer to the factors by value and label, I found an approach that at least allowed me to store both, which I thought might be useful for others in my position:

library(sjlabelled)
library(magrittr)
library(sjmisc)

cars <- mtcars
str(cars)

'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

frq(cars$carb)

x <numeric>
# total N=32  valid N=32  mean=2.81  sd=1.62
 val frq raw.prc valid.prc cum.prc
   1   7   21.88     21.88   21.88
   2  10   31.25     31.25   53.12
   3   3    9.38      9.38   62.50
   4  10   31.25     31.25   93.75
   6   1    3.12      3.12   96.88
   8   1    3.12      3.12  100.00
  NA   0    0.00        NA      NA

So this is what we get in its numeric form, the transformation to a factor retains the expected form:

cars$carb <- as_factor(cars$carb)

str(cars$carb)

Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...

frq(cars$carb)

<categorical>
# total N=32  valid N=32  mean=2.81  sd=1.62

 val frq raw.prc valid.prc cum.prc
   1   7   21.88     21.88   21.88
   2  10   31.25     31.25   53.12
   3   3    9.38      9.38   62.50
   4  10   31.25     31.25   93.75
   6   1    3.12      3.12   96.88
   8   1    3.12      3.12  100.00
  NA   0    0.00        NA      NA

Now we have it in categorical form, we can label the values (in this example i'll ignore 6 & 8)

cars$carb<- set_labels(
  cars$carb,
  labels = c(
    `1` = "low",
    `2` = "medium", 
    `3` = "high",
    `4` = "extreme"
    ))

frq(cars$carb)

<categorical>
# total N=32  valid N=32  mean=2.81  sd=1.62

 val   label frq raw.prc valid.prc cum.prc
   1     low   7   21.88     21.88   21.88
   2  medium  10   31.25     31.25   53.12
   3    high   3    9.38      9.38   62.50
   4 extreme  10   31.25     31.25   93.75
   6       6   1    3.12      3.12   96.88
   8       8   1    3.12      3.12  100.00
  NA    <NA>   0    0.00        NA      NA

Now we can see both, the label and the value, however, there is still an issue calling the data based on the labels

cars[cars$carb==1,]
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1

cars[cars$carb=="low",]

 [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
<0 rows> (or 0-length row.names)

Any advice regarding factor labelling and a way to call on factors via both the labels and values would really be appreciated. And in the meantime, I hope my alternative helps.

↧

How to sort plotly stacked bar graph in r by y value?

February 19, 2020, 4:38 pm

≫ Next: Transforming Part Of R Dataframe [duplicate]

≪ Previous: labelling factors and keeping numerical values

I've got this graph and I want it to appear in Sam - Jhon - Paul order because that's going from highest to lowest cost, somebody could tell me how to order it by cost? Tried using code below in layout section but it didn't word.

 layout(yaxis = list(title = 'Cost'), 
        xaxis = list(title = 'Parent',
                     categoryorder = "array",
                     categoryarray = ~cost), 
                     barmode = 'stack')

↧

Transforming Part Of R Dataframe [duplicate]

February 19, 2020, 4:48 pm

≫ Next: group_by operation in dplyr vs data.table for fast implementation

≪ Previous: How to sort plotly stacked bar graph in r by y value?

I wish to transform part of a dataframe to take on of the columns values and append it to the column names. An example of what I have is shown below:

dates <- c("2020-01-01", "2020-01-02", "2020-01-02")
gender <- c("Male","Male","Female")
Age <- c("Eighteen","NineTeen","Twenty")
Count <- c(1,2,1)

Data <- data.frame(dates,gender,Age,Count)

Output:

Desired Example DF:

dates <- c("2020-01-01", "2020-01-02", "2020-01-02")
gender <- c("Male","Male","Female")
Eighteen <- c(1,0,0)
Nineteen <- c(0,2,0)
Twenty <- c(0,0,1)

Data2 <- data.frame(dates,gender,Eighteen,Nineteen,Twenty)

Desired Output:

↧

group_by operation in dplyr vs data.table for fast implementation

February 19, 2020, 4:49 pm

≫ Next: Shinydashboard box() masked by graphics package in Shiny

≪ Previous: Transforming Part Of R Dataframe [duplicate]

dat <- data.frame(yearID = rep(1:10000, each = 12),
                  monthID = rep(1:12, times = 10000),
                  x1 = rnorm(120000),
                  x2 = rnorm(120000),
                  x3 = rnorm(120000),
                  x4 = rnorm(120000),
                  x5 = rnorm(120000),
                  x6 = rnorm(120000),
                  p.start = 6,
                  p.end = 7,
                  m.start = 8,
                  m.end = 9,
                  h.start = 10,
                  h.end = 11)

I need to do some operations on the above data which is described below after my current solution

library(dplyr)

start_time <- Sys.time()

df1 <- dat %>% 
       tidyr::gather(., index_name, value, x1:x6) %>%
       dplyr::filter(!index_name %in% c('x5','x6')) %>%
       dplyr::group_by(yearID, index_name) %>%
       dplyr::summarise(p.start.val = sum(value[monthID == p.start]),
                        p.val = sum(value[monthID >= p.start & monthID <= p.end]),
                        m.val = sum(value[monthID >= m.start & monthID <= m.end]),
                        h.val = sum(value[monthID >= h.start & monthID <= h.end]),
                        h.end.val = sum(value[monthID == h.end])) %>%
       tidyr::gather(., variable, value, p.start.val:h.end.val) %>%
       dplyr::mutate(new.col.name = paste0(index_name,'_',variable)) %>%
       dplyr::select(-index_name, -variable) %>% 
       tidyr::spread(., new.col.name, value) %>%
       dplyr::mutate(yearRef = 2018)

colnames(df1) <-  sub(".val", "", colnames(df1))    

df2 <- dat %>% 
       tidyr::gather(., index_name, value, x1:x6) %>%
       dplyr::filter(index_name %in% c('x4','x6')) %>%
       dplyr::group_by(yearID, index_name) %>%
       dplyr::summarise(p.end.val = value[monthID == p.end],
                        m.end.val = value[monthID == m.end],
                        h.end.val = value[monthID == h.end]) %>%
       tidyr::gather(., variable, value, p.end.val:h.end.val) %>%
       dplyr::mutate(new.col.name = paste0(index_name,'_',variable)) %>%
       dplyr::select(-index_name, -variable) %>% 
       tidyr::spread(., new.col.name, value) %>%
       dplyr::mutate(yearRef = 2018)

colnames(df2) <-  sub(".val", "", colnames(df2))

final.dat <- Reduce(function(...) merge(..., by = c( "yearID", "yearRef"), all.x=TRUE), list(df1,df2))

 end_time <- Sys.time()

 end_time - start_time

 # Time difference of 2.054761 secs

What I want to do is:

for variable x1 to x4, I want to sum them for different months as shown in df1
for variable x5 to x6, I want to select the values for selected month for each year as shown in df2

My code above works fine but takes quite a time if the size of dat increases i.e. if number of years become 20000 instead of 10000. I am wondering if someone could help me with a data.table to implement the above solution which I hope would make this faster. Thank you.

↧

Shinydashboard box() masked by graphics package in Shiny

February 19, 2020, 4:53 pm

≫ Next: Adjust Height and Width of Legend Glyphs Generated by key_glyph ggplot

≪ Previous: group_by operation in dplyr vs data.table for fast implementation

I am trying to create boxes for my shiny app but I believe the box() function from shinydashboard is being masked by the box() function from graphics package.

Here is a very simple reproduction of my problem with a screenshot of what my box looks like. I've included the packages that I use in my actual shinyapp in case that is important, but they aren't used in this reproduction. I have tried running it with shinydashboard::box() but that didn't work either.

library(shiny)
library(shinydashboard)
library(flexdashboard)
library(dplyr)

server <- shinyServer(function(input, output){
 })

ui <- shinyUI(
    fluidPage(
        titlePanel("title panel"),
        sidebarLayout(
            sidebarPanel("Sidebar"),
            mainPanel(
                shinydashboard::box(title = "Box Fail.", status = "primary", background = "red")
            )) ))

shinyApp(ui = ui, server = server)

``
[![shiny screenshot][1]][1]


  [1]: https://i.stack.imgur.com/a17d2.png

↧

Adjust Height and Width of Legend Glyphs Generated by key_glyph ggplot

February 19, 2020, 4:56 pm

≫ Next: Does R run fine on jupyter notebook?

≪ Previous: Shinydashboard box() masked by graphics package in Shiny

I was thrilled to discover that I can change the glyph used in the legend by adding key_glyph = draw_key_rect to my geom layer. I want to make the legend wider and shorter to resemble the legend in this map by Timo Grossenbacher:

I've tried adjusting scale_fill_manual(guide = guide_legend(keyheight = unit(0.01, units = "mm") , keywidth = unit(40, units = "mm"))) which changes the dimensions of the legend, but only seems to work when I make the glyphs bigger. I can't seem to make the keyheight any smaller.

Is there a better method of adjusting the legend glyphs' dimensions?

enter image description here

Simplified code here:

df <- data_frame(x_value = c(1:10),
                 y_value = c(rev(1:10)),
                 value = c("a","a","a","a","b","b","b","b","c","c"))
library(ggplot2)

ggplot(data = df) + 
  geom_point(aes(x_value, y_value, fill = value),
             shape = 21,
             size = 9,
             key_glyph = draw_key_rect) +
  theme(legend.justification = c(0,0), # set which corner of legend legen.position references
        legend.position = c(0.05, 0.04)) +
  scale_fill_manual(values = c("red", "green", "blue"),
                    guide = guide_legend(direction = "horizontal",
                                         keyheight = unit(0.01, units = "mm"),
                                         keywidth = unit(40, units = "mm"),
                                         title.position = 'top',
                                         label.position = "bottom"))

↧

Does R run fine on jupyter notebook?

February 19, 2020, 4:57 pm

≫ Next: Filtering for two identical consecutive entries in a column

≪ Previous: Adjust Height and Width of Legend Glyphs Generated by key_glyph ggplot

I am facing several problems including (1) Autocompletion does not work (2) Reading and writing files are super slow etc. I was wondering that something wrong with my system or this is, in general, a problem? and is there a fix available?

↧

Filtering for two identical consecutive entries in a column

February 19, 2020, 4:58 pm

≫ Next: Efficiently removing shared elements from grouped data

≪ Previous: Does R run fine on jupyter notebook?

Imagine a snippet of the follow data frame:

       ID        ActivityName     Time         Type    Shape 
1       1             Request    0.000       Type_1      767           
2       1             Request  600.000       Type_1      767           
3       1               Start  600.000       Type_1     1376           
4       1               Start  600.000       Type_1     1376           
5       1 Schedule Activities  600.000       Type_1       15           
6       1 Schedule Activities 2062.295       Type_1       15

What I'm trying to do is to create two new columns based on the repeating entries in ActivityName.

Specifically, I want to combine two subsequent rows for the same activity into one row with a start and complete timestamp (from Time, in seconds.)

Given that not all entries in ActivityName have a matching second entry (however max two consecutive entries are identical), I would also like to delete such "single-standing" rows.

Any ideas for how to go about this will be highly appreciated.

↧

Efficiently removing shared elements from grouped data

February 19, 2020, 5:07 pm

≫ Next: Ran caret model and it stopped. It mentioned of missing values in resampled performance measure

≪ Previous: Filtering for two identical consecutive entries in a column

I have the following data.table:

dt
#    unique_id group_id primary_id  ph1  ph2  ph3
# 1:         1        1       TRUE   07   03 <NA>
# 2:         2        1      FALSE   07   03   84
# 3:         3        2      FALSE   10 <NA> <NA>
# 4:         4        2       TRUE <NA>   10 <NA>
# 5:         5        2      FALSE <NA> <NA>   10
# 6:         6        3      FALSE   22   03 <NA>
# 7:         7        3       TRUE <NA>   13   03

unique_ids are grouped by common phone numbers (ph1, ph2, ph3) which are common across rows (e.g. in the first group "07", "03" are common across the group and in the third group, "03" is shared, but not in the same column, as per group 2).

Each group has 1 primary_id.

Within each group I want to remove the common phone number element(s) in the non primary_id's and retain it for the primary id, so they are no longer linked.

I can achieve this easily in a for loop, however, it's across millions of groups and it's extremely slow.

Looking for a quicker method.

Data:

library(data.table)

dt <- data.table(structure(list(unique_id = c(1, 2, 3, 4, 5, 6, 7), group_id = c(1, 
    1, 2, 2, 2, 3, 3), primary_id = c(TRUE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, TRUE), ph1 = c("07", "07", "10", NA, NA, "22", NA), ph2 = c("03", 
    "03", NA, "10", NA, "03", "13"), ph3 = c(NA, "84", NA, NA, "10", 
    NA, "03")), class = "data.frame", row.names = c(NA, -7L))
)

Desired output is:

output <- data.table(structure(list(unique_id = c(1, 2, 3, 4, 5, 6, 7), group_id = c(1, 
1, 2, 2, 2, 3, 3), primary_id = c(TRUE, FALSE, FALSE, TRUE, FALSE, 
FALSE, TRUE), ph1 = c("07", NA, NA, NA, NA, "22", NA), ph2 = c("03", 
NA, NA, "10", NA, NA, "13"), ph3 = c(NA, "84", NA, NA, NA, NA, 
"03")), class = "data.frame", row.names = c(NA, -7L)))

output
#    unique_id group_id primary_id  ph1  ph2  ph3
# 1:         1        1       TRUE   07   03 <NA>
# 2:         2        1      FALSE <NA> <NA>   84
# 3:         3        2      FALSE <NA> <NA> <NA>
# 4:         4        2       TRUE <NA>   10 <NA>
# 5:         5        2      FALSE <NA> <NA> <NA>
# 6:         6        3      FALSE   22 <NA> <NA>
# 7:         7        3       TRUE <NA>   13   03

If still unclear, it may be easier to visualize it like this:

↧

Ran caret model and it stopped. It mentioned of missing values in resampled performance measure

February 19, 2020, 5:10 pm

≫ Next: Get the longest ranges per seqnames

≪ Previous: Efficiently removing shared elements from grouped data

[Dataset] I tried the titanic question, being a newbie. Just about to train using a dataset and that is where I got stuck:

[data_prepro_maf_train]

all_model<-modelLookup()
classification_model<-all_model%>%filter(forClass==TRUE,!duplicated(model))
class_model<-classification_model$model
set.seed(123)
number<-3
repeats<-2
control<-trainControl(method="repeatedcv",number=number,repeats=repeats,classProbs = TRUE,savePredictions = "final",index=createResample(data_prepro_maf_train$Embarked,repeats*number),summaryFunction = multiClassSummary,allowParallel = TRUE)
x<-data_prepro_maf_train[,c(1,3,5,6,7,8)]
y<-data_prepro_maf_train[,12]
levels(y)<-make.names(levels(factor(data_prepro_maf_train[,12])))
y<-make.names(data_prepro_maf_train[,12],unique=TRUE,allow_=TRUE)
#Train the models
model_list1<-caretList(x,y,data=data_prepro_maf_train,trControl = control,metric="Accuracy",methodList = class_model[1])

I made sure to pick columns with no missing value like "Cabin" and already removed missing values for required columns.

Packages used:

library(caret)
library(caretEnsemble)
library(tidyverse)
library(magrittr)
library(doParallel)

↧

Get the longest ranges per seqnames

February 19, 2020, 5:11 pm

≫ Next: Add line graph based on new data to series of boxplots

≪ Previous: Ran caret model and it stopped. It mentioned of missing values in resampled performance measure

I have a GRanges object with different genomic ranges for each seqnames (e.g. chromosomes).
How can I get a GRanges containing only the longest range for each seqname/chromosome?

For example, if gr is a GRanges:

library(GenomicRanges)

# Make a GRanges object
set.seed(123)
gr <- GRanges(seqnames = rep(c("chr1", "chr2", "chr3"), times=2:4),
              ranges = IRanges(start=sample.int(10000, 9), 
                               width = c(3,5,50,20,10,500,100,500,200)))

# Add a column with the width for clarity:
mcols(gr)$width <- width(gr)

gr
#GRanges object with 9 ranges and 1 metadata column:
#      seqnames    ranges strand |     width
#         <Rle> <IRanges>  <Rle> | <integer>
#  [1]     chr1 2463-2465      * |         3
#  [2]     chr1 2511-2515      * |         5
#  [3]     chr2 8718-8767      * |        50
#  [4]     chr2 2986-3005      * |        20
#  [5]     chr2 1842-1851      * |        10
#  [6]     chr3 9334-9833      * |       500
#  [7]     chr3 3371-3470      * |       100
#  [8]     chr3 4761-5260      * |       500
#  [9]     chr3 6746-6945      * |       200
#  -------
#  seqinfo: 3 sequences from an unspecified genome; no seqlengths

Then I want to obtain the following GRanges:

#GRanges object with 3 ranges and 1 metadata column:
#    seqnames      ranges strand |     width
#       <Rle>   <IRanges>  <Rle> | <integer>
#  [1]     chr1 2511-2515      * |         5
#  [2]     chr2 8718-8767      * |        50
#  [3]     chr3 9334-9833      * |       500

For my application I'm OK with getting only the first longest range for chr3 but I would appreciate a solution that can also select all ties if any.

↧

Add line graph based on new data to series of boxplots

February 19, 2020, 5:18 pm

≫ Next: How to colour nodes/geom_point a gradient based on values using ggnet2 in R?

≪ Previous: Get the longest ranges per seqnames

I have used the following R script to create two side-side boxplots. One for 1999 and one for 2008:

library(tidyselect)
mpg %>% ggplot(aes(as_factor(year), hwy))+geom_boxplot()

I have a new data set for manufacturer XYZ that has two observations,one for 1999 and one for 2008:

manufacturer <- c("xyz", "xyz")
year <- c(1999, 2008)
hwy <- c(19, 30)
df <- data.frame(manufacturer, year, hwy)

Is there a simple way to adding the two observations from the new data set (df) in my boxplot graph? I have seen a few other similar posts (e.g., ggplot: adding new data to the existing grouped boxplot) but the problems/solutions seem to be more complicated and I could not follow them. Thanks

↧

How to colour nodes/geom_point a gradient based on values using ggnet2 in R?

February 19, 2020, 5:19 pm

≫ Next: Making adjacency matrix using group information

≪ Previous: Add line graph based on new data to series of boxplots

I am trying to produce a plot using the ggnet2 package in R. I have created a network (net.bg)

library(igraph)
library(GGally)
library(network)
library(sna)
library(intergraph)

direction <- c(2,1,3,1,4,1,5,1,3,2,4,2,5,2,4,3,5,3,5,4)
gr <- matrix(direction, nrow = 2)
x1 <- 10.59
x2 <- 15.74
x3 <- 5
x4 <- 18
x5 <- 7

RImp <- data.frame(x1,x2,x3,x4,x5)
nam <- names(RImp)
RImp <- as.numeric(RImp)
Rint <- c(2.96, 1.34, 1.27, 1.1, 2.22, 1.24, 3.11,
          2.52, 0.96, 1.08)

net.bg <- make_graph(gr, 5)

and I am using ggnet2 and ggplot2 to plot it like so:

library(GGally)
library(RColorBrewer)
library(ggnewscale)
library(ggplot2)


colfunction <- colorRampPalette(c("floralwhite", "firebrick1"))
nodeCol <- colfunction(5)

p   <- ggnet2(net.bg, 
              mode = "circle", 
              size = 0,
              #color = RImp,
              label = nam,
              edge.size = Rint, 
              edge.label = Rint,
              edge.color = "grey") +
  theme(legend.text = element_text(size = 10)) +
  geom_label(aes(label = nam),nudge_y = 0.08) +
  geom_point(aes(fill = RImp), size = RImp, col = nodeCol) +
  scale_fill_continuous(name = "Variable\nImportance",
                        limits=c(0, 20), breaks=seq(0, 20, by= 5),
                        low = "floralwhite" ,high = "firebrick1")
p

Im using ggplot2 to actually plot the nodes, instead of ggnet2, as this allows me to add a legend.

The above code produces a plot similar to this:

As can be seen,the nodes are being coloured as a gradient, however, they are being coloured in a clock-wise manner... I am trying to colour the nodes, based on their size (or in this case, the values contained within RImp).

Any suggestions as to how I would achieve this?

↧

Making adjacency matrix using group information

February 19, 2020, 5:22 pm

≫ Next: Error in summary.manova(model, test = "Pillai") : residuals have rank 1 < 2

≪ Previous: How to colour nodes/geom_point a gradient based on values using ggnet2 in R?

I am relatively new to R and I am have issues in creating an adjacency matrix using group characteristics.

I have a data frame that looks like this:

distid villageid  hhid group1 group2 group3 group4 
1        1         111  0        1     0        0
1        1         112  1        1     1        0
1        2         121  1        1     0        1 
1        2         122  1        0     0        1
2        1         211  1        1     0        0
2        1         212  1        1     1        1
2        2         221  0        0     1        0
2        2         222  0        1     1        0

I need to create an adjacency matrix where if a hhid is in the same distid, villageid and group then they are all fully connected.

So my final matrix should look something like this

hhid  111    112  121   122    211   212   221 222
111    0     1     0     0       0    0     0   0
112    1     0     0     0       0    0     0   0  
121    0     0     0     1       0    0     0   0
122    0     0     0     0       0    0     0   0 
211    0     0     0     0       0    1     0   0
212    0     0     0     0       1    0     0   0 
221    0     0     0     0       0    0     0   1
222    0     0     0     0       0    0     1   0

↧

Error in summary.manova(model, test = "Pillai") : residuals have rank 1 < 2

February 19, 2020, 5:23 pm

≫ Next: Cannot control legend.position in ggplot2 when using facet_wrap()

≪ Previous: Making adjacency matrix using group information

I have a problem . please you can helpe me I want to get mancova for two level group . and I have two variable but I enter data and variable I face with error. and I don't know that it what To fix it

y1=mancova(data=Data,deps=vars(postTotalcorrecttrial ,postmemoryspan) + ,covs =vars(preTotalcorrecttrail ,preMemoryspan), + factors = group + , multivar ="wilks") Error in summary.manova(model, test = "Pillai") : residuals have rank 1 < 2

↧

Cannot control legend.position in ggplot2 when using facet_wrap()

February 19, 2020, 5:23 pm

≫ Next: R Repeat Rows Data Table

≪ Previous: Error in summary.manova(model, test = "Pillai") : residuals have rank 1 < 2

In the following code and graphs using mtcars data as example, I try to put the legend at bottom. It works fine without using theme_bw() in the first graph. Once I add theme_bw(), the legend moves to the right. What have I done wrong, and how to fix this? Thanks.

library(tidyverse)
mtcars %>%
  ggplot(aes(x = factor(cyl), y = mpg,
             color = factor(am)
             )
         ) + 
  geom_boxplot() + 
  facet_wrap(vars(mtcars$vs)) +
  theme(legend.position = "bottom", 
        legend.title = element_blank())

mtcars %>%
  ggplot(aes(x = factor(cyl), y = mpg,
             color = factor(am))) + 
  geom_boxplot() + 
  facet_wrap(vars(mtcars$vs)) +
  theme(legend.position = "bottom", 
        legend.title = element_blank()) +
  theme_bw()

^{Created on 2020-02-20 by the reprex package (v0.3.0)}

↧

R Repeat Rows Data Table

February 19, 2020, 5:23 pm

≫ Next: Assign the value of a variable based on another variable which is related to column names of a dataframe

≪ Previous: Cannot control legend.position in ggplot2 when using facet_wrap()

library(data.table)
dataHAVE=data.frame("student"=c(1,2,3),
                    "score" = c(10,11,12),
                "count"=c(4,1,2))


dataWANT=data.frame("student"=c(1,1,1,1,2,3,3),
                    "score"=c(10,10,10,10,11,12,12),
                    "count"=c(4,4,4,4,1,2,2))

setDT(dataHAVE)dataHAVE[rep(1:.N,count)][,Indx:=1:.N,by=student]

I have data 'dataHAVE' and seek to produce 'dataWANT' that basically copies each 'student''count' number of times as shown in 'dataWANT'. I try doing this as shown above in data.table as this is the solution I seek but get error

Error: unexpected symbol in "setDT(dat)dat"

and I cannot resolve thank you so much.

↧