Pivot by group for unequal data size

October 29, 2019, 6:22 am

≫ Next: difference between working on R directly working under install.packages("sparklyr")

≪ Previous: How can I make R maintain utf8 encodings?

I have the following DF:

DF = structure(list(ID = c(21785L, 21785L, 21785L), V1 = c(0.828273303, 
                                                  6.404590021, 0.775568448), V2 = c(2L, 3L, 2L), V3 = c(NA, 1.122899914, 
                                                                                                        0.850113234), V4 = c(NA, 4L, 3L), V5 = c(NA, 0.866757168, 0.868943246
                                                                                                        ), V6 = c(NA, 5L, 4L), V7 = c(NA, 0.563804788, 0.728656328), 
           V8 = c(NA, 6L, 5L), V9 = c(NA, 0.888109208, 0.823803733), 
           V10 = c(NA, 7L, 6L), V11 = c(NA, 0.578834113, 0.863467391
           ), V12 = c(NA, 1L, 7L), V13 = c(NA, NA, 0.939920869)), class = "data.frame", row.names = c(5L, 
                                                                                                      163L, 167L))

Output: 
Row      ID        V1 V2        V3 V4        V5 V6        V7 V8        V9 V10       V11 V12       V13
5   21785 0.8282733  2        NA NA        NA NA        NA NA        NA  NA        NA  NA        NA
163 21785 6.4045900  3 1.1228999  4 0.8667572  5 0.5638048  6 0.8881092   7 0.5788341   1        NA
167 21785 0.7755684  2 0.8501132  3 0.8689432  4 0.7286563  5 0.8238037   6 0.8634674   7 0.9399209

The data can be broken down into 3 parts:

ID per participant
Odd Columns representing standardized heart rate
Even columns representing day number of week (1 = sunday)

I have 100 plus unique participants and 3000 rows of data with unequal data per day, hence the NAs.

I would like to pivot the data into one column per part

so that: col1 = ID, col2 = HR, col3 = Weekday

I have tried several methods based on similar questions such as:

    # melt the data frame to put all the metrics in a single column
    DF2 = reshape2::melt(DF, id.vars = c("ID"))

    # split the data by ID
    DF3 = split(DF2, DF2$ID)

    # allocate empty DF with 3 columns for future appending
    DF_Organized = data.frame()[1,3]

    # make the data into 3 new columns, 1 for ID, HR, weekday
    for (m in 1:length(DF3)){

    DF_tmp = DF3[m] %>%
      data.frame %>% na.omit() # convert to DF, remove NAs
      setNames(., c("ID","colx","Value")) %>% # set names for clarity
      mutate(ind = rep(c(1, 2),length.out = n())) %>% # assign 1 to amplitude and 2 to day values in each row
      group_by(ind) %>% # group by value type
      mutate(id = row_number()) %>% # make new column that determines location of data by previous assignment
      spread(ind, Value) %>% # organize data by new ID
      select(-id) #clean 

    # reorganize the NAs to the bottom
DF_tmp2 = setNames(do.call(function(...) rowr::cbind.fill(..., fill = NA),
                          lapply(DF_tmp, na.omit)),colnames(DF_tmp)) %>% 
  na.omit() %>% 
  select(-colx) %>% 
  setNames(., c("ID","HR","Weekday")) # set names for clarity

I get close but not accurate:

Actual Output:

> DF_tmp2
      ID HR        Weekday
1  21785 0.8282733 6.4045900
2  21785 0.7755684 2.0000000
3  21785 3.0000000 2.0000000
4  21785 1.1228999 0.8501132

. . . There's misalignment and inaccurate combinations. Any help is appreciated.

Expected Output:

> DF_tmp2
          ID HR        Weekday
    1  21785 0.8282733 2.0000000
    2  21785 6.4045900 3.0000000
    3  21785 1.1228999 4.0000000
    4  21785 0.8667572 5.0000000
    5  21875 0.5638048 6.0000000
.
.
.

↧

difference between working on R directly working under install.packages("sparklyr")

October 29, 2019, 6:30 am

≫ Next: Text to be written must be a length-one character vector

≪ Previous: Pivot by group for unequal data size

I was just reading the article https://spark.rstudio.com/ But I am not sure what is the difference between working on R directly working under install.packages("sparklyr") packages

Could you let me know, I am confused

↧

Text to be written must be a length-one character vector

October 29, 2019, 6:34 am

≫ Next: Optimise R code (SQL read and write) for quicker performace

≪ Previous: difference between working on R directly working under install.packages("sparklyr")

After installing the new version of shiny (1.4.0) package, I face a strange error message

> Warning: Error in writeImpl: Text to be written must be a length-one
> character vector   [No stack trace available]

I do not really understand what has been changed? And how to fix this issue?

Any idea would be highly appreciated!

↧

Optimise R code (SQL read and write) for quicker performace

October 29, 2019, 6:36 am

≫ Next: return ID's of unique combinations

≪ Previous: Text to be written must be a length-one character vector

The below double nested loop runs across 850 branches, 2 service points and 25 specific needs. As it stands the total estimated run time is 70 hours. Please help me optimise so this runs quicker?

Code begins here:

    start_time <- Sys.time()

     for(i in levels(branch_code)) #850 branches 
    {
     for(j in levels(service_point)) #2 service points
    {
    for(u in levels(specific_need)) #25 specific needs
    {
      test<-paste0("SELECT * from dbo.Data_Base where calendar_date <= '2018-11-30' and branch_code = '",i, "'  and service_point = '",j, "' and specific_need = '",u, "' order by calendar_date asc")
      testtwo<-sqlQuery(con,test)

      if(nrow(testtwo)>500)     
      {

        df<-testtwo[,c("volume","calendar_date")]
        colnames(df)<-c("y","ds")

        m <-prophet(weekly.seasonality = TRUE, yearly.seasonality = TRUE,  daily.seasonality = FALSE, holidays=holidays)
        m<-add_seasonality(m, name='monthly',period=30.4,fourier.order = 8,prior.scale = 13)
        m<-fit.prophet(m,df)

        future <- make_future_dataframe(m, periods = 100, freq = "day", include_history = TRUE)
        forecast <- predict(m, future)   

        writeto<-data.frame(forecast$ds,i,j,u,forecast$yhat)
        colnames(writeto)<-c("ds","branch_code","service_point","specific_need","Forecast") 
        writeto_two<-left_join(writeto, df,by = ('ds'))

        colnames(writeto_two)<-c("calendar_date","branch_code","service_point","specific_need","Forecast","Volume")   

        testtwowrite<-data.frame(writeto_two)
        dbWriteTable(connec, "dbo.Actual_Volume_write", testtwowrite, append = TRUE)
      }
      else 
      {
        print(c(i,j,u))
      }
    }
  }
}
end_time <- Sys.time()

↧

return ID's of unique combinations

October 29, 2019, 6:36 am

≫ Next: Counting the number of specific integers per column in an R matrix

≪ Previous: Optimise R code (SQL read and write) for quicker performace

My data table has the following format

ID   Var1   Var2   Var3   ...
1_1  0      0      1      ...
1_2  1      1      0      ...
1_3  0      0      1      ...
...  ...    ...    ...    ...

I want to extract the ID's from unique combinations (Varcolumns). Getting the unique combinations is not the problem (plyr::count(), aggregate() etc), I want to extract the id variables contributing to these unique combinations.

The output should look somewhat like this

Var1   Var2   Var3   IDs
0      0      1      1_1, 1_3
1      1      0      1_2

where the IDs column is a vector/list of all the ID's contributing to a unique combination.

I tried an R package and dplyr pipelines, nothing worked so far.

Any suggestions or even R packages how to handle this task?

Thank you!

↧

Counting the number of specific integers per column in an R matrix

October 29, 2019, 6:40 am

≫ Next: How to calculate a particular index in R

≪ Previous: return ID's of unique combinations

I have a matrix (10 x 100) where I need to count the number of each integer per column so I have a final matrix that is (3 x 100). Counts for 0, 1, and 2 per column.

I think the apply function will be useful here, the code I provided is a solution I envision.

Any help will be greatly appreciated.

library(dplyr)
set.seed(100)
a <- matrix(sample(0:2, size=100, replace=TRUE), nrow=10, ncol=100)
out <- apply(a, 2, function(x) count(x))

 Desired output: rows are the sum of each variable "0, 1, 2"

   1 2 3 ...  n
 0 1 1 3
 1 6 3 3
 2 3 6 4

↧

How to calculate a particular index in R

October 29, 2019, 6:43 am

≫ Next: cor and cov2cor different results with use = pairwise.complete.obs

≪ Previous: Counting the number of specific integers per column in an R matrix

enter image description here

Hello everyone, I need your help for the calculation of this expression. I have a dataframe with income streams (made of 5 "t" periods) from different years. What I need is a command to make R understand the highlighted part of the formula under the summation symbol. I need R to multiplicate when there is a loss for the l0 coefficient, and when there is a gain for the g0 gain coefficient.

↧

cor and cov2cor different results with use = pairwise.complete.obs

October 29, 2019, 6:45 am

≫ Next: Error in if() missing value where TRUE/FALSE needed, but there's actually a boolean argument inside

≪ Previous: How to calculate a particular index in R

Running:

cor(x, use = "pairwise.complete.obs")`

vs running

c <- cov(x, use = "pairwise.complete.obs")
cov2cor(c)

give different results. Anyone know why and which one gives correct results? Both functions call C++ code which I haven't figured out how to parse.

Reproducible data:

x <- data.frame(a1 = rnorm(10), a2 = rnorm(10), a3 = rnorm(10))
x$a1[c(1,3)] <- NA

c <- cov(x, use = "pairwise.complete.obs")
cov2cor(c)
cor(x, use = "pairwise.complete.obs")

↧

Error in if() missing value where TRUE/FALSE needed, but there's actually a boolean argument inside

October 30, 2019, 6:01 am

≫ Next: set missing values for multiple labelled variables

≪ Previous: cor and cov2cor different results with use = pairwise.complete.obs

I have this error in R. I'm working with a database, where my NA values are called "# VALOR N/A", so I did a simple function to see how many of this values I have.

    estavencido <- function(a){
  count = 0
  for(i in 2:367){
    if(a[i]== "# VALOR N/D"){
      count = count+1
    }
  }
 return(count)
}

But when I use the function, I get the error, and I saw that data[i]=="# VALOR N/D" returns a TRUE/FALSE value, so I don't know why this is happening.

↧

set missing values for multiple labelled variables

October 30, 2019, 6:10 am

≫ Next: Dynamic naming of download filename from DataTable buttons extension in R shiny

≪ Previous: Error in if() missing value where TRUE/FALSE needed, but there's actually a boolean argument inside

How to I set missing values for multiple labelled vectors in a data frame. I am working with a survey dataset from spss. I am dealing with about 20 different variables, with the same missing values. So would like to find a way to use lapply() to make this work, but I can't.

I actually can do this with base R via as.numeric() and then recode() but I'm intrigued by the possibilities of haven and the labelled class so I'd like to find a way to do this all in Hadley's tidyverse

Roughly the variables of interest look like this. I am sorry if this is a basic question, but I find the help documentaiton associated with the haven and labelled packages just very unhelpful.

library(haven)
library(labelled)
v1<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v2<-labelled(c(1,2,2,2,5,6), c(agree=1, disagree=2, dk=5, refused=6))
v3<-data.frame(v1=v1, v2=v2)
lapply(v3, val_labels)
lapply(v3, function(x) set_na_values(x, c(5,6)))

↧

Dynamic naming of download filename from DataTable buttons extension in R shiny

October 30, 2019, 6:18 am

≫ Next: geom_dumbbell chart with scales for line and dots

≪ Previous: set missing values for multiple labelled variables

I have:


library(shiny)
library(DT)

ui <- fluidPage(
    h2("Explorer"),

    tabPanel(h3("Inspector"),
             p("Overview of data for a particular sample."),
             selectInput(inputId = "sample",
                         label = h3("Select sample"),
                         selectize = TRUE,
                         choices = names(vcf_tibbles)),
             dataTableOutput("sample_inspector")
            )
    )

server <- function(input, output) {
  output$sample_inspector <- DT::renderDataTable(

      sammple_overview(sample_id = input$sample, vcf_tibbles = vcf_tibbles),
      rownames = FALSE,
      extensions = 'Buttons',
      options = list(paging = FALSE,
             dom = 'Bfrtip',
             buttons = list( list(extend = 'csv',   filename =  paste("snp", input$sample, sep = "-")),
                     list(extend = 'excel', filename =  paste("snp", input$sample, sep = "-"))))
      )
}

Everything works fine, in that I select a sample and the table correspondingly updates. And if I click CSV or Excel, the corresponding dta downloads. However, the file name is always wrong.

It seems that the content of the data table is being updated, but input$sample is not being considered with the buttons.

Is there a way to make the filename argument in the buttons also be reactive?

I tried to make the name be the result of a function call, but was unable to get that to work either.

Thanks!

↧

geom_dumbbell chart with scales for line and dots

October 30, 2019, 6:18 am

≫ Next: parse dates from multiple columns with NAs and dates hidden in text

≪ Previous: Dynamic naming of download filename from DataTable buttons extension in R shiny

I'm trying to make a dumbbell chart with additional information about change (green/red) and significance of the change (vol) and add an additional legend for the dots at the end of the dumbbells.

My code to generate below plot is:

library(ggalt)

# build data set
set.seed(1)

df <- data.frame(country=paste("Region", LETTERS[1:10]))
df$last_year <- runif(nrow(df))
df$this_year <- runif(nrow(df))
df$ydiff <- df$this_year - df$last_year
df$vol <- runif(nrow(df))


# create dumbbell plot
ggplot(df, aes(y=country, group=country)) + 
  geom_dumbbell(aes(x=last_year, xend=this_year, colour = ydiff, size=vol),
                colour_x = "blue",
                colour_xend = "yellow") +
  scale_color_gradient2(low="green", high="red")

Now, I'd like to add a legend about what the yellow and blue dots are. I tried to follow the approach in this answer, but it did not work:

# Use answer with long data
df2 <- melt(df[, c("country", "last_year", "this_year")])

# create point alone works
ggplot() + geom_point(data=df2, aes(x=value, y=country, color=variable)) 

# create dumbbell alone works
ggplot() + geom_dumbbell(data=df, mapping=aes(x=last_year, xend=this_year, y=country, colour = ydiff, size=vol)) 

# combining plots does not work
ggplot() + geom_point(data=df2, aes(x=value, y=country, color=variable)) + 
  geom_dumbbell(data=df, mapping=aes(x=last_year, xend=this_year, y=country, colour = ydiff, size=vol)) 
# Error: Continuous value supplied to discrete scale

I don't know how to access the aesthetics in the dumbbell chart and plot a legend automatically either. Can you point me to a solution?

↧

parse dates from multiple columns with NAs and dates hidden in text

October 30, 2019, 6:20 am

≫ Next: Convert frequency vector to logical matrix

≪ Previous: geom_dumbbell chart with scales for line and dots

I have a data.frame with dates distributed across columns and in a messy format: the year column contains years and NAs, the column date_old contains the format Month DD or DD (or a date duration) or NAs, and the column hidden_date contains text and dates either in thee format .... YYYY .... or in the format .... DD Month YYYY .... (with .... representing general text of variable length).

An example data.frame looks like this:

df <- data.frame(year = c("1992", "1993", "1995", NA),
                 date_old = c("February 15", "October 02-24", "15", NA),
                 hidden_date = c(NA, NA, "The hidden date is 15 July 1995", "The hidden date is 2005"))

I want to get the dates in the format YYYY-MM-DD (take the first day of date durations) and fill unknown values with zeroes.

Using parse_date_time didn't help me so far, and the expected output would be:

  year      date_old                     hidden_date        date
1 1992   February 15                            <NA>  1992-02-15
2 1993 October 02-24                            <NA>  1993-10-02
3 1995            15 The hidden date is 15 July 1995  1995-07-15
4 <NA>          <NA>         The hidden date is 2005  2005-00-00

How do I best go about this?

↧

Convert frequency vector to logical matrix

October 30, 2019, 6:26 am

≫ Next: Creating a vector of numbers based on letters

≪ Previous: parse dates from multiple columns with NAs and dates hidden in text

I would like to convert a frequency vector (i.e. the colSums() of a matrix) to one of the possible versions of the original logical matrix in R.

Something like:

    s <- c(1,2,3)
    # Some function of s
    # Example output:
         [,1] [,2] [,3]
    [1,]    0    0    1
    [2,]    1    0    0 
    [3,]    0    1    0
    [4,]    0    0    1
    [5,]    0    0    1
    [6,]    0    1    0

The order of rows is not important. Could someone give me a hint on how to do this?

↧

Creating a vector of numbers based on letters

October 30, 2019, 6:27 am

≫ Next: Suppress ggpairs messages when generating plot

≪ Previous: Convert frequency vector to logical matrix

So, this is the question:

"Create a function that given one word, return the position of word’s letters on letters vector. For example, if the word is ‘abba’, the function will return 1 2 2 1."

What I have so far is this:

l <- function(word) {
    chr <- c()
    y <- c()
    strsplit(chr,word)
    i<-1
    while(i<length) {
           o<-letters[i]
           x<-chr[i]
           if(o==x) {
                    y[i]<-i
           }
           i+1
    }
    y
}

I have tried running l("hello") and it returns NULL. I'm very lost and would appreciate any help! Thank you!

↧

Suppress ggpairs messages when generating plot

October 30, 2019, 6:27 am

≫ Next: beamer rmarkdown presentation unable to knit pdf file instead knit html document that too is not knitted

≪ Previous: Creating a vector of numbers based on letters

ggpairs prints out a progress bar and estimated remaining time while generating plots, which is nice when used interactively since some of the computations can take a few seconds. But when making documents, like R notebooks, these printed messages end up in the report. ggpairs had a boolean verbose option, but it's depricated now. Is there an alternative? I can't seem to find one.

To see the messages try:

library(GGally) ggpairs(mtcars, columns = c("mpg", "cyl", "hp", "disp", "am", "qsec"))

In a document it ends up including:

plot: [1,1] [==-------------------------------------------] 4% est: 0s
plot: [1,2] [====-----------------------------------------] 8% est: 6s
plot: [1,3] [=====----------------------------------------] 12% est: 5s
plot: [1,4] [=======--------------------------------------] 16% est: 5s

etc

↧

beamer rmarkdown presentation unable to knit pdf file instead knit html document that too is not knitted

October 30, 2019, 6:28 am

≫ Next: Error in R Code if called in a function (multcompleters, strsplit: non-character argument)

≪ Previous: Suppress ggpairs messages when generating plot

I have a beamer rmarkdown file for preparing PDF presentation as teaching material. However, when I click knit with pdf, the output is displayed in html document, instead of beamer_presentation (pdf). my yaml is:

Thanking all in advance.

Earlier it was knitting properly but after some time it started giving this problem I have checked all 'helps' at various forums, but the problem persists. I have even installed tinyverse and installed/removed MikTex and restarted the system to solve the problem.

---
title: "Mathematics for Finance"  
author:   
  - Dr. XXXX  
institute:   
  - YYYY  
date: "June-July 2019"
output:   
  beamer_presentation:  
    incremental: false  
    theme: "AnnArbor"  
    colortheme: "wolverine"  
    fonttheme: "structuresmallcapsserif"  
    toc: true   
    slide_level: 2  
    fig_width: 5  
    fig_height: 4  
    fig_caption: true  
    highlight: tango
    link-citations: yes  
    urlcolor: red  
    linkcolor: red  
    citecolor: blue  
---

No error message is displayed, simply HTM document is executed that too blank!

"C:/Users/Kulbirs/ANACON~1/envs/rstudio/Scripts/pandoc" +RTS -K512m -RTS beamertest1.utf8.md --to html4 --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash+smart --output beamertest1.html --email-obfuscation none --self-contained --standalone --section-divs --template "C:\Users\Kulbirs\Documents\R\win-library\3.6\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable "theme:bootstrap" --include-in-header "C:\Users\Kulbirs\AppData\Local\Temp\RtmpgnjFVR\rmarkdown-str56854a3563d.html" --mathjax --variable "mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --metadata pagetitle=beamertest1.utf8.md

Output created: beamertest1.html

↧

Error in R Code if called in a function (multcompleters, strsplit: non-character argument)

October 30, 2019, 6:29 am

≫ Next: Populating NAs and correcting data based on sequence within a sequence

≪ Previous: beamer rmarkdown presentation unable to knit pdf file instead knit html document that too is not knitted

I would like to generate labels from a Tukey test inside a function. There it throws an error whereas if I call the code not inside a function it works absolutely fine.

Other threads say the solution is to convert a variable with "as.character", but when I tried this it did not work.

Sadly I could not figure out the error and it would be great if you could help me. The error and the trace are in the last lines.

Minimal working example:

require(plyr)
require(multcomp)
require(multcompView) 
require(datasets)
data(iris)

iris
Dataset <- iris

####################################################################
#Works:
a=aov(Dataset$Sepal.Length ~ Dataset$Species)
tHSD <- TukeyHSD(a, ordered = FALSE, conf.level = 0.95)
generate_label_df <- function(HSD, flev){
  Tukey.levels <- HSD[[flev]][,4]
  Tukey.labels <- multcompLetters(Tukey.levels)['Letters']
  plot.labels <- names(Tukey.labels[['Letters']])
  boxplot.df <- ddply(Dataset, flev, function (x) max(fivenum(x$y)) + 0.2)
  plot.levels <- data.frame(plot.labels, labels = Tukey.labels[['Letters']],stringsAsFactors = FALSE)
  labels.df <- merge(plot.levels, boxplot.df, by.x = 'plot.labels', by.y = flev, sort = FALSE)
  return(labels.df)
}
LABELS <- generate_label_df(tHSD, 'Dataset$Species')


####################################################################
#Throws error:
doTukey <- function(y_var, x_var, ret=FALSE) {
  require(ggplot2)
  require(plyr)
  require(multcomp)

  a=aov(y_var ~ x_var)
  tHSD <- TukeyHSD(a, ordered = FALSE, conf.level = 0.95)
  generate_label_df <- function(HSD, flev){
    Tukey.levels <- HSD[[flev]][,4]
    Tukey.labels <- multcompLetters(Tukey.levels)['Letters']
    plot.labels <- names(Tukey.labels[['Letters']])
    boxplot.df <- ddply(Dataset, flev, function (x) max(fivenum(x$y)) + 0.2)
    plot.levels <- data.frame(plot.labels, labels = Tukey.labels[['Letters']],stringsAsFactors = FALSE)
    labels.df <- merge(plot.levels, boxplot.df, by.x = 'plot.labels', by.y = flev, sort = FALSE)
    return(labels.df)
  }

  LABELS <- generate_label_df(tHSD, 'Dataset$Species')
}

doTukey(Dataset$Sepal.Length, Dataset$Species) # Error:  Error in strsplit(x, sep) : non-character argument, 
#Trace:
#5.strsplit(x, sep) 
#4.vec2mat2(namx) 
#3.multcompLetters(Tukey.levels) 
#2.generate_label_df(tHSD, "Dataset$Species") 
#1.doTukey(Dataset$Sepal.Length, Dataset$Species)

Thanks

↧

Populating NAs and correcting data based on sequence within a sequence

October 30, 2019, 6:32 am

≫ Next: how to properly plot igraph's graphs to make them comparable?

≪ Previous: Error in R Code if called in a function (multcompleters, strsplit: non-character argument)

I have a data frame with two issues that I am trying to correct. Here is a toy example.

        require(data.table)
        tempdt <- data.table(ID1=rep(1:6,each=2),ID2=rep(letters[1:2],6),name=c('john','john',NA,'mike','steve',NA,'bob',NA,NA,'henry','joe','frank'))

            ID1 ID2  name
         1:   1   a  john
         2:   1   b  john
         3:   2   a  <NA>
         4:   2   b  mike
         5:   3   a steve
         6:   3   b  <NA>
         7:   4   a   bob
         8:   4   b  <NA>
         9:   5   a  <NA>
        10:   5   b henry
        11:   6   a   joe
        12:   6   b frank

There are 2 sequential grouping variables (ID1 as the primary sequence and ID2 as the secondary sequence within ID1) and a name assignment. Sometimes the name is missing and I need to fill this in based on what is assigned within that ID1 and other times I might have 2 (or more) different names for the same ID1 but there should only be one. Whichever name comes first in the order of ID2 within ID1 should be the assigned name for all of that ID1

Ultimately the name field should read c('john','john','mike','mike','steve','steve','bob','bob','henry','henry','joe','joe')

I could approach this by ordering the data frame(table) based on the two sequential variables and then doing a for loop on ID1 and making the corrections but it seems like there should be a cleaner more efficient way to sequence along ID1 and compare the sequence of ID2 within ID1 and make the corrections avoiding a loop.

Any thoughts? I have it as a data table because I usually work with them but it isn't a necessity.

Will

↧

how to properly plot igraph's graphs to make them comparable?

October 30, 2019, 6:36 am

≫ Next: Assesment of the sphericity with r

≪ Previous: Populating NAs and correcting data based on sequence within a sequence

So, as in the title, I need to be able to plot some igraph's graphs and be able to compare them. To do so I believed that passing coordinates and then plotting them was enough. Then I found out that the graphs are rendered according to the coordinates that are provided but also according to the number of nodes (or the number of subnetworks or whatever other parameter which I am not able to understand). In order to get a grasp over the problem here is an example (the part related to euclidean distance is commented since it require a specific package, but I also posted the output):

library(igraph)
#library(TSdist)

smallNet <- graph(edges=c(1,2), n=2, directed=F) 

V(smallNet)$name <- c("mint", "pepper")

# first try
dev.new()

V(smallNet)$x <- c(10, 23)
V(smallNet)$y <- c(29, 36)

plot(smallNet, vertex.label.color="midnightblue", vertex.size=40, vertex.color="thistle1", layout=layout_nicely)
#print(paste("distance ", EuclideanDistance(V(smallNet)$x, V(smallNet)$y)))
#[1] "distance  23.0217288664427"


# second try
dev.new()

V(smallNet)$x <- c(1400, 1894)
V(smallNet)$y <- c(3700, 4140)

plot(smallNet, vertex.label.color="midnightblue", vertex.size=40, vertex.color="thistle1", layout=layout_nicely)
#print(paste("distance ", EuclideanDistance(V(smallNet)$x, V(smallNet)$y)))
#[1] "distance  3214.73420363177"

# third try
dev.new()

V(smallNet)$x <- c(10000, 26230)
V(smallNet)$y <- c(13800, 32150)

plot(smallNet, vertex.label.color="midnightblue", vertex.size=40, vertex.color="thistle1", layout=layout_nicely)
#print(paste("distance ", EuclideanDistance(V(smallNet)$x, V(smallNet)$y)))
#[1] "distance  7034.65706342534"

The point is that the (euclidean) distances are different but, if I look at the plots, nothing apparently changes. On the other hand, something must be different since the node's distance is increasing.

I noticed that adding few nodes improves the visualisation but still I believe the plots I am getting are not, somehow, respecting the actual distances. Here's another sample code with more nodes:

# first try
evenBigger <- graph(edges=c(1,2, 2,3, 3,1, 4,5), n=5, directed=F) 

V(evenBigger)$name <- c("pear", "mango", "blueberry", "coconut", "fig")

dev.new()

V(evenBigger)$x <- c(0, 25, 50, 70, 60)
V(evenBigger)$y <- c(0, 80, 20, 120, 40)

plot(evenBigger, vertex.label.color="midnightblue", vertex.size=40, vertex.color="thistle1", layout=layout_nicely)

# second try
evenBigger <- graph(edges=c(1,2, 2,3, 3,1, 4,5, 6,6), n=6, directed=F) 

V(evenBigger)$name <- c("pear", "mango", "blueberry", "coconut", "fig", "jujube")

dev.new()

V(evenBigger)$x <- c(0, 25, 50, 70, 120, 2000)
V(evenBigger)$y <- c(0, 80, 20, 120, 140, 2000)

plot(evenBigger, vertex.label.color="midnightblue", vertex.size=40, vertex.color="thistle1", layout=layout_nicely)

it looks like in these two new examples, something changes, since the jujube node is now far away, if compared to the other nodes. Are now the two nets comparable, from a graphical perspective? If not (which I believe to be the case)...what should I do in order to make them comparable?

I tried to set xlim and ylim as mentioned here but it looks like it is not working:

# first try
evenBigger <- graph(edges=c(1,2, 2,3, 3,1, 4,5), n=5, directed=F) 

V(evenBigger)$name <- c("pear", "mango", "blueberry", "coconut", "fig")

dev.new()

V(evenBigger)$x <- c(0, 25, 50, 70, 60)
V(evenBigger)$y <- c(0, 80, 20, 120, 40)

plot(evenBigger, vertex.label.color="midnightblue", vertex.size=40, vertex.color="thistle1", xlim=c(0, 2500), ylim=c(0, 2500), layout=layout_nicely)

# second try
evenBigger <- graph(edges=c(1,2, 2,3, 3,1, 4,5, 6,6), n=6, directed=F) 

V(evenBigger)$name <- c("pear", "mango", "blueberry", "coconut", "fig", "jujube")

dev.new()

V(evenBigger)$x <- c(0, 25, 50, 70, 120, 2000)
V(evenBigger)$y <- c(0, 80, 20, 120, 140, 2000)

plot(evenBigger, vertex.label.color="midnightblue", vertex.size=40, vertex.color="thistle1", xlim=c(0, 2500), ylim=c(0, 2500), layout=layout_nicely)

Suggestions are very welcome!

↧