How to keep the original format of values after unlist(lapply(mydata, function(x) {x$getElementText()}))

February 13, 2020, 10:08 am

≫ Next: LARS "Lasso" regression not choosing significant explanatory variables

≪ Previous: Problems with function MatchIt::matchit

I'm trying to keep the original values format. The data format is:

<option value="xxxxx                  ">xxxx                   </option>
      <option value="yyyy                 ">yyyy                  </option>
      <option value="zzzzzzz              ">zzzzzzz                 </option>
...

But, I'm getting this after using

unlist(lapply(mydata, function(x) {x$getElementText()}))

head(mydata)
[1] "xxxxx""yyyy""zzzzzzz"

What I need:

 head(mydata)
    [1] "xxxxx                  ""yyyy                 ""zzzzzzz              "

I appreciate any help

↧

LARS "Lasso" regression not choosing significant explanatory variables

February 13, 2020, 10:08 am

≫ Next: RadioButton in R Shiny, how to use it when evaluating different kind of data with different functions

≪ Previous: How to keep the original format of values after unlist(lapply(mydata, function(x) {x$getElementText()}))

I am running lasso regression on a large data set n=1918, p=85 and the coefficients the regression identifies as important - when actually put into a linear model - are very insignificant. And one the other end, lasso deems very significant explanatory "model" variables as having coefficients near 0 and not selecting for them. The dataframe going into LARS is already scaled. Any ideas on why this might occur? Below is an example of what LARS might choose and also a model created by me with actually good explanatory variables using the exact same dataset.

signif.coefs(lasso)
     4        45 
 4.855257 -3.020055

lm(response ~ SP.MTMEAN + YEAR, data=df, na.action=na.pass) ###Terrible Lasso Chosen Model
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.16710    0.07190  -2.324  0.02022 *  
SP.MTMEAN    0.09889    0.02313   4.275 2.01e-05 ***
YEAR         0.14097    0.04580   3.078  0.00211 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9903 on 1915 degrees of freedom
Multiple R-squared:  0.01678,   Adjusted R-squared:  0.01576 
F-statistic: 16.34 on 2 and 1915 DF,  p-value: 9.167e-08

###variables chosen by me with model output from same data frame as above
lm(response~log1p.PTL_RESULT+log1p.NTL_RESULT+log1p.PH_RESULT+log1p.EPI.T+SU.MPPT, data=df, na.action=na.pass) 
Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       0.01200    0.01972   0.608  0.54301    
log1p.PTL_RESULT  0.20672    0.03104   6.660 3.58e-11 ***
log1p.NTL_RESULT  0.21219    0.03335   6.362 2.49e-10 ***
log1p.PH_RESULT   0.15543    0.02543   6.113 1.18e-09 ***
log1p.EPI.T       0.09869    0.02189   4.508 6.93e-06 ***
SU.MPPT          -0.06002    0.02135  -2.811  0.00499 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8596 on 1912 degrees of freedom
Multiple R-squared:  0.2603,    Adjusted R-squared:  0.2583 
F-statistic: 134.5 on 5 and 1912 DF,  p-value: < 2.2e-16

↧

RadioButton in R Shiny, how to use it when evaluating different kind of data with different functions

February 13, 2020, 10:12 am

≫ Next: Error in strwidth("W", cex = tl.cex) : plot.new has not been called yet

≪ Previous: LARS "Lasso" regression not choosing significant explanatory variables

I'm quite new with R Shiny so I apologise for the unuseful functions and typos I used to build my app and their relative functions. My goal is to convert the dataset I'll upload by using two different functions, which vary according to the dataset itself. In particular, my dataset might include categorical or continuous data, so I have to switch between two functions to rearrange the data in the final form of 0 and 1 (binarization). I'd like to switch between the functions (data_handling_typeA_training and data_handling_typeB_training) reported below, according to the type A data and the type B data I'm going to upload in the Shiny app.

Dataset and functions

# Input type A ####

input <- data.frame(
            Sample = as.factor(c("c_01", "c_02", "c_03")),
            Pop = as.factor(c("A", "B", "C")),
            xx = c(11L, 9L, 10L),
            xx.1 = c(12L, 11L, 12L),
            zz = c(14L, 12L, 14L),
            zz.1 = c(14L, 13L, 15L),
            yy = c(17L, 16L, 16L),
            yy.1 = c(21L, 16L, 20L),
            ff = c(11L, 11L, 12L),
            ff.1 = c(12L, 13L, 12L),
            nn = c(11L, 11L, 11L),
            nn.1 = c(11L, 12L, 12L)
)

# # Input type B ####
# input <- data.frame(
#             Sample = as.factor(c("A_01", "A_02", "A_03")),
#             Pop = as.factor(c("X", "Y", "Z")),
#             aaa = as.factor(c("CAG", "CGG", "TGG")),
#             aaa.1 = as.factor(c("TGG", "TAG", "CAG")),
#             bbb = as.factor(c("GG", "GG", "GG")),
#             bbb.1 = as.factor(c("GA", "AG", "AG")),
#             ccc = as.factor(c("CAGG", "CAGG", "CAGG")),
#             ccc.1 = as.factor(c("CAGG", "CAAG", "CAAG")),
#             ddd = as.factor(c("CACC", "AAGC", "CACC")),
#             ddd.1 = as.factor(c("AACC", "AAGC", "AACT")),
#             eee = as.factor(c("ACG", "ATA", "ATG")),
#             eee.1 = as.factor(c("ATG", "ATG", "ATG"))
# )

data_handling_typeA_training <- function(input) {
    Populations<-as.factor(input[,2])
    input<-input[,-2]
    names = c(colnames(input))
    evenvals <- seq(2, ncol(input), by=2) 
    names_ok<-colnames(input[,evenvals])
    data_unl<-as.numeric(unlist(input[,-c(1)]))
    data_names<-as.data.frame(input[,1])
    library(plyr); library(dplyr)
    data_unl=dplyr::recode(data_unl,'1'=1,  '1.1'=2,    '1.2'=3,    '1.3'=4,    '2'=5,  '2.1'=6,    '2.2'=7,    '2.3'=8,    '3'=9,  '3.1'=10,   '3.2'=11,   '3.3'=12,   '4'=13, '4.1'=14,   '4.2'=15,   '4.3'=16,   '5'=17, '5.1'=18,   '5.2'=19,   '5.3'=20,   '6'=21, '6.1'=22,   '6.2'=23,   '6.3'=24,   '7'=25, '7.1'=26,   '7.2'=27,   '7.3'=28,   '8'=29, '8.1'=30,   '8.2'=31,   '8.3'=32,   '9'=33, '9.1'=34,   '9.2'=35,   '9.3'=36,   '10'=37,    '10.1'=38,  '10.2'=39,  '10.3'=40,  '11'=41,    '11.1'=42,  '11.2'=43,  '11.3'=44,  '12'=45,    '12.1'=46,  '12.2'=47,  '12.3'=48,  '13'=49,    '13.1'=50,  '13.2'=51,  '13.3'=52,  '14'=53,    '14.1'=54,  '14.2'=55,  '14.3'=56,  '15'=57,    '15.1'=58,  '15.2'=59,  '15.3'=60,  '16'=61,    '16.1'=62,  '16.2'=63,  '16.3'=64,  '17'=65,    '17.1'=66,  '17.2'=67,  '17.3'=68,  '18'=69,    '18.1'=70,  '18.2'=71,  '18.3'=72,  '19'=73,    '19.1'=74,  '19.2'=75,  '19.3'=76,  '20'=77,    '20.1'=78,  '20.2'=79,  '20.3'=80,  '21'=81,    '21.1'=82,  '21.2'=83,  '21.3'=84,  '22'=85,    '22.1'=86,  '22.2'=87,  '22.3'=88,  '23'=89,    '23.1'=90,  '23.2'=91,  '23.3'=92,  '24'=93,    '24.1'=94,  '24.2'=95,  '24.3'=96,  '25'=97,    '25.1'=98,  '25.2'=99,  '25.3'=100, '26'=101,   '26.1'=102, '26.2'=103, '26.3'=104, '27'=105,   '27.1'=106, '27.2'=107, '27.3'=108, '28'=109,   '28.1'=110, '28.2'=111, '28.3'=112, '29'=113,   '29.1'=114, '29.2'=115, '29.3'=116, '30'=117,   '30.1'=118, '30.2'=119, '30.3'=120, '31'=121,   '31.1'=122, '31.2'=123, '31.3'=124, '32'=125,   '32.1'=126, '32.2'=127, '32.3'=128, '33'=129,   '33.1'=130, '33.2'=131, '33.3'=132, '34'=133,   '34.1'=134, '34.2'=135, '34.3'=136, '35'=137,   '35.1'=138, '35.2'=139, '35.3'=140, '36'=141,   '36.1'=142, '36.2'=143, '36.3'=144, '37'=145,   '37.1'=146, '37.2'=147, '37.3'=148, '38'=149,   '38.1'=150, '38.2'=151, '38.3'=152, '39'=153,   '39.1'=154, '39.2'=155, '39.3'=156, '40'=157,   '40.1'=158, '40.2'=159, '40.3'=160, '41'=161,   '41.1'=162, '41.2'=163, '41.3'=164, '42'=165,   '42.1'=166, '42.2'=167, '42.3'=168, '43'=169,   '43.1'=170, '43.2'=171, '43.3'=172, '44'=173,   '44.1'=174, '44.2'=175, '44.3'=176, '45'=177,   '45.1'=178, '45.2'=179, '45.3'=180, '46'=181,   '46.1'=182, '46.2'=183, '46.3'=184, '47'=185,   '47.1'=186, '47.2'=187, '47.3'=188, '48'=189,   '48.1'=190, '48.2'=191, '48.3'=192, '49'=193,   '49.1'=194, '49.2'=195, '49.3'=196, '50'=197,
    )
    nX<-ncol(input[,-c(1)])
    data_unl<-as.data.frame(matrix(data_unl,ncol = nX))
    data_def<-cbind(data_names,data_unl)
    colnames(data_def)<- names
    input<-data_def
    names = c(colnames(input))
    nX <- ncol(input)-1
    listofdf <- lapply(1:nX, function(x) NULL)
    for (i in 1:nX) {
        listofdf[[i]] <- data.frame(input[,1], input[i],input[i+1])
    }
    listofdf <- listofdf [-c(seq(1, 10000, by=2))]
    listofdf_subs <- lapply(1:length(listofdf), function(x) NULL)
    for (i in 1:length(listofdf)) {
        listofdf_subs[[i]] <- cbind(rep(x = 1:nrow(listofdf[[1]]),
                                        times = 2),c(listofdf[[i]][,2],listofdf[[i]][,3]))
    }
    required_vals <- rep(x = 1,
                         times = nrow(x = listofdf_subs[[1]]))
    required_sz <- c(nrow(x = input), 197)
    listofdf_accum<-lapply(1:length(listofdf), function(x) NULL)
    for (i in 1:length(listofdf)){ 
        listofdf_accum[[i]]<-pracma::accumarray(subs = listofdf_subs[[i]],
                                                val = required_vals,
                                                sz = required_sz)}
    data_def <- list()
    for(i in 1:length(listofdf)){
        data_def[[i]] <- listofdf_accum[[i]]
    }
    mydf <- data.frame(data_def)
    values <- rep_len(x = ((0:3) / 10),
                      length.out = 197) + rep(x = 1:50,
                                              each = 4,
                                              length.out = 197)
    variables_ID <- paste(rep(x = names_ok,
                              each = 197),
                          values,
                          sep = "_")
    colnames(mydf)<-variables_ID
    mydf<-mydf[,-(which(colSums(mydf) == 0))] 
    mydf<-cbind(data_names,mydf)
    mydf[mydf=="2"]<-1
    row.names(mydf) <- mydf[,1]
    mydf<-mydf[,-1]
    df1_sel2 <- cbind(Populations,mydf)
    df1_sel2[, colSums(df1_sel2 != 0) > 0]
    df1_sel2<-df1_sel2[sapply(df1_sel2, function(x) length(unique(na.omit(x)))) > 1]
    df1_sel2 <<- df1_sel2
}
data_handling_typeB_training <- function(input) {
    Populations<-as.factor(input[,2])
    input<-input[,-2]
    names = c(colnames(input))
    evenvals <- seq(2, ncol(input), by=2) 
    names_ok<-colnames(input[,evenvals])
    data_unl<-unlist(input[,-c(1)])
    data_names<-as.data.frame(input[,1])
    library(plyr); library(dplyr)
    data_unl=dplyr::recode(data_unl,AA=1,   AC=2,   AG=3,   AT=4,   CA=5,   CC=6,   CG=7,   CT=8,   GA=9,   GC=10,  GG=11,  GT=12,  TA=13,  TC=14,  TG=15,  TT=16,  AAA=17, AAC=18, AAG=19, AAT=20, ACA=21, ACC=22, ACG=23, ACT=24, AGA=25, AGC=26, AGG=27, AGT=28, ATA=29, ATC=30, ATG=31, ATT=32, CAA=33, CAC=34, CAG=35, CAT=36, CCA=37, CCC=38, CCG=39, CCT=40, CGA=41, CGC=42, CGG=43, CGT=44, CTA=45, CTC=46, CTG=47, CTT=48, GAA=49, GAC=50, GAG=51, GAT=52, GCA=53, GCC=54, GCG=55, GCT=56, GGA=57, GGC=58, GGG=59, GGT=60, GTA=61, GTC=62, GTG=63, GTT=64, TAA=65, TAC=66, TAG=67, TAT=68, TCA=69, TCC=70, TCG=71, TCT=72, TGA=73, TGC=74, TGG=75, TGT=76, TTA=77, TTC=78, TTG=79, TTT=80, AAAA=81,    AAAC=82,    AAAG=83,    AAAT=84,    AACA=85,    AACC=86,    AACG=87,    AACT=88,    AAGA=89,    AAGC=90,    AAGG=91,    AAGT=92,    AATA=93,    AATC=94,    AATG=95,    AATT=96,    ACAA=97,    ACAC=98,    ACAG=99,    ACAT=100,   ACCA=101,   ACCC=102,   ACCG=103,   ACCT=104,   ACGA=105,   ACGC=106,   ACGG=107,   ACGT=108,   ACTA=109,   ACTC=110,   ACTG=111,   ACTT=112,   AGAA=113,   AGAC=114,   AGAG=115,   AGAT=116,   AGCA=117,   AGCC=118,   AGCG=119,   AGCT=120,   AGGA=121,   AGGC=122,   AGGG=123,   AGGT=124,   AGTA=125,   AGTC=126,   AGTG=127,   AGTT=128,   ATAA=129,   ATAC=130,   ATAG=131,   ATAT=132,   ATCA=133,   ATCC=134,   ATCG=135,   ATCT=136,   ATGA=137,   ATGC=138,   ATGG=139,   ATGT=140,   ATTA=141,   ATTC=142,   ATTG=143,   ATTT=144,   CAAA=145,   CAAC=146,   CAAG=147,   CAAT=148,   CACA=149,   CACC=150,   CACG=151,   CACT=152,   CAGA=153,   CAGC=154,   CAGG=155,   CAGT=156,   CATA=157,   CATC=158,   CATG=159,   CATT=160,   CCAA=161,   CCAC=162,   CCAG=163,   CCAT=164,   CCCA=165,   CCCC=166,   CCCG=167,   CCCT=168,   CCGA=169,   CCGC=170,   CCGG=171,   CCGT=172,   CCTA=173,   CCTC=174,   CCTG=175,   CCTT=176,   CGAA=177,   CGAC=178,   CGAG=179,   CGAT=180,   CGCA=181,   CGCC=182,   CGCG=183,   CGCT=184,   CGGA=185,   CGGC=186,   CGGG=187,   CGGT=188,   CGTA=189,   CGTC=190,   CGTG=191,   CGTT=192,   CTAA=193,   CTAC=194,   CTAG=195,   CTAT=196,   CTCA=197,   CTCC=198,   CTCG=199,
                           CTCT=200,    CTGA=201,   CTGC=202,   CTGG=203,   CTGT=204,   CTTA=205,   CTTC=206,   CTTG=207,   CTTT=208,   GAAA=209,   GAAC=210,   GAAG=211,   GAAT=212,   GACA=213,   GACC=214,   GACG=215,   GACT=216,   GAGA=217,   GAGC=218,   GAGG=219,   GAGT=220,   GATA=221,   GATC=222,   GATG=223,   GATT=224,   GCAA=225,   GCAC=226,   GCAG=227,   GCAT=228,   GCCA=229,   GCCC=230,   GCCG=231,   GCCT=232,   GCGA=233,   GCGC=234,   GCGG=235,   GCGT=236,   GCTA=237,   GCTC=238,   GCTG=239,   GCTT=240,   GGAA=241,   GGAC=242,   GGAG=243,   GGAT=244,   GGCA=245,   GGCC=246,   GGCG=247,   GGCT=248,   GGGA=249,   GGGC=250,   GGGG=251,   GGGT=252,   GGTA=253,   GGTC=254,   GGTG=255,   GGTT=256,   GTAA=257,   GTAC=258,   GTAG=259,   GTAT=260,   GTCA=261,   GTCC=262,   GTCG=263,   GTCT=264,   GTGA=265,   GTGC=266,   GTGG=267,   GTGT=268,   GTTA=269,   GTTC=270,   GTTG=271,   GTTT=272,   TAAA=273,   TAAC=274,   TAAG=275,   TAAT=276,   TACA=277,   TACC=278,   TACG=279,   TACT=280,   TAGA=281,   TAGC=282,   TAGG=283,   TAGT=284,   TATA=285,   TATC=286,   TATG=287,   TATT=288,   TCAA=289,   TCAC=290,   TCAG=291,   TCAT=292,   TCCA=293,   TCCC=294,   TCCG=295,   TCCT=296,   TCGA=297,   TCGC=298,   TCGG=299,   TCGT=300,   TCTA=301,   TCTC=302,   TCTG=303,   TCTT=304,   TGAA=305,   TGAC=306,   TGAG=307,   TGAT=308,   TGCA=309,   TGCC=310,   TGCG=311,   TGCT=312,   TGGA=313,   TGGC=314,   TGGG=315,   TGGT=316,   TGTA=317,   TGTC=318,   TGTG=319,   TGTT=320,   TTAA=321,   TTAC=322,   TTAG=323,   TTAT=324,   TTCA=325,   TTCC=326,   TTCG=327,   TTCT=328,   TTGA=329,   TTGC=330,   TTGG=331,   TTGT=332,   TTTA=333,   TTTC=334,   TTTG=335,   TTTT=336
                           )
    nX<-ncol(input[,-c(1)])
    data_unl<-as.data.frame(matrix(data_unl,ncol = nX))
    data_def<-cbind(data_names,data_unl)
    colnames(data_def)<- names
    input<-data_def
    names = c(colnames(input))
    nX <- ncol(input)-1
    listofdf <- lapply(1:nX, function(x) NULL)
    for (i in 1:nX) {
        listofdf[[i]] <- data.frame(input[,1], input[i],input[i+1])
    }
    listofdf <- listofdf [-c(seq(1, 10000, by=2))]

    listofdf_subs <- lapply(1:length(listofdf), function(x) NULL)
    for (i in 1:length(listofdf)) {
        listofdf_subs[[i]] <- cbind(rep(x = 1:nrow(listofdf[[1]]),
                                        times = 2),c(listofdf[[i]][,2],listofdf[[i]][,3]))
    }
    required_vals <- rep(x = 1,
                         times = nrow(x = listofdf_subs[[1]]))
    required_sz <- c(nrow(x = input), 1360)
    listofdf_accum<-lapply(1:length(listofdf), function(x) NULL)
    for (i in 1:length(listofdf)){ 
        listofdf_accum[[i]]<-pracma::accumarray(subs = listofdf_subs[[i]],
                                                val = required_vals,
                                                sz = required_sz)}
    data_def <- list()
    for(i in 1:length(listofdf)){
        data_def[[i]] <- listofdf_accum[[i]]
    }
    mydf <- data.frame(data_def)
    values_2_3 <- c('AA',   'AC',   'AG',   'AT',   'CA',   'CC',   'CG',   'CT',   'GA',   'GC',   'GG',   'GT',   'TA',   'TC',   'TG',   'TT',   'AAA',  'AAC',  'AAG',  'AAT',  'ACA',  'ACC',  'ACG',  'ACT',  'AGA',  'AGC',  'AGG',  'AGT',  'ATA',  'ATC',  'ATG',  'ATT',  'CAA',  'CAC',  'CAG',  'CAT',  'CCA',  'CCC',  'CCG',  'CCT',  'CGA',  'CGC',  'CGG',  'CGT',  'CTA',  'CTC',  'CTG',  'CTT',  'GAA',  'GAC',  'GAG',  'GAT',  'GCA',  'GCC',  'GCG',  'GCT',  'GGA',  'GGC',  'GGG',  'GGT',  'GTA',  'GTC',  'GTG',  'GTT',  'TAA',  'TAC',  'TAG',  'TAT',  'TCA',  'TCC',  'TCG',  'TCT',  'TGA',  'TGC',  'TGG',  'TGT',  'TTA',  'TTC',  'TTG',  'TTT')  
    values_4<-c('AAAA', 'AAAC', 'AAAG', 'AAAT', 'AACA', 'AACC', 'AACG', 'AACT', 'AAGA', 'AAGC', 'AAGG', 'AAGT', 'AATA', 'AATC', 'AATG', 'AATT', 'ACAA', 'ACAC', 'ACAG', 'ACAT', 'ACCA', 'ACCC', 'ACCG', 'ACCT', 'ACGA', 'ACGC', 'ACGG', 'ACGT', 'ACTA', 'ACTC', 'ACTG', 'ACTT', 'AGAA', 'AGAC', 'AGAG', 'AGAT', 'AGCA', 'AGCC', 'AGCG', 'AGCT', 'AGGA', 'AGGC', 'AGGG', 'AGGT', 'AGTA', 'AGTC', 'AGTG', 'AGTT', 'ATAA', 'ATAC', 'ATAG', 'ATAT', 'ATCA', 'ATCC', 'ATCG', 'ATCT', 'ATGA', 'ATGC', 'ATGG', 'ATGT', 'ATTA', 'ATTC', 'ATTG', 'ATTT', 'CAAA', 'CAAC', 'CAAG', 'CAAT', 'CACA', 'CACC', 'CACG', 'CACT', 'CAGA', 'CAGC', 'CAGG', 'CAGT', 'CATA', 'CATC', 'CATG', 'CATT', 'CCAA', 'CCAC', 'CCAG', 'CCAT', 'CCCA', 'CCCC', 'CCCG', 'CCCT', 'CCGA', 'CCGC', 'CCGG', 'CCGT', 'CCTA', 'CCTC', 'CCTG', 'CCTT', 'CGAA', 'CGAC', 'CGAG', 'CGAT', 'CGCA', 'CGCC', 'CGCG', 'CGCT', 'CGGA', 'CGGC', 'CGGG', 'CGGT', 'CGTA', 'CGTC', 'CGTG', 'CGTT', 'CTAA', 'CTAC', 'CTAG', 'CTAT', 'CTCA', 'CTCC', 'CTCG', 'CTCT', 'CTGA', 'CTGC', 'CTGG', 'CTGT', 'CTTA', 'CTTC', 'CTTG', 'CTTT', 'GAAA', 'GAAC', 'GAAG', 'GAAT', 'GACA', 'GACC', 'GACG', 'GACT', 'GAGA', 'GAGC', 'GAGG', 'GAGT', 'GATA', 'GATC', 'GATG', 'GATT', 'GCAA', 'GCAC', 'GCAG', 'GCAT', 'GCCA', 'GCCC', 'GCCG', 'GCCT', 'GCGA', 'GCGC', 'GCGG', 'GCGT', 'GCTA', 'GCTC', 'GCTG', 'GCTT', 'GGAA', 'GGAC', 'GGAG', 'GGAT', 'GGCA', 'GGCC', 'GGCG', 'GGCT', 'GGGA', 'GGGC', 'GGGG', 'GGGT', 'GGTA', 'GGTC', 'GGTG', 'GGTT', 'GTAA', 'GTAC', 'GTAG', 'GTAT', 'GTCA', 'GTCC', 'GTCG', 'GTCT', 'GTGA', 'GTGC', 'GTGG', 'GTGT', 'GTTA', 'GTTC', 'GTTG', 'GTTT', 'TAAA', 'TAAC', 'TAAG', 'TAAT', 'TACA', 'TACC', 'TACG', 'TACT', 'TAGA', 'TAGC', 'TAGG', 'TAGT', 'TATA', 'TATC', 'TATG', 'TATT', 'TCAA', 'TCAC', 'TCAG', 'TCAT', 'TCCA', 'TCCC', 'TCCG', 'TCCT', 'TCGA', 'TCGC', 'TCGG', 'TCGT', 'TCTA', 'TCTC', 'TCTG', 'TCTT', 'TGAA', 'TGAC', 'TGAG', 'TGAT', 'TGCA', 'TGCC', 'TGCG', 'TGCT', 'TGGA', 'TGGC', 'TGGG', 'TGGT', 'TGTA', 'TGTC', 'TGTG', 'TGTT', 'TTAA', 'TTAC', 'TTAG', 'TTAT', 'TTCA', 'TTCC', 'TTCG', 'TTCT', 'TTGA', 'TTGC', 'TTGG', 'TTGT', 'TTTA', 'TTTC', 'TTTG', 'TTTT')
    values<-c(values_2_3,values_4)

    variables_ID <- paste(rep(x = names_ok,
                              each = 336),
                          values,
                          sep = "_")
    colnames(mydf)<-variables_ID
    mydf<-mydf[,-(which(colSums(mydf) == 0))] 
    mydf<-cbind(data_names,mydf)
    mydf[mydf=="2"]<-1
    row.names(mydf) <- mydf[,1]
    mydf<-mydf[,-1]
    df1_sel2 <- cbind(Populations,mydf)
    df1_sel2 <<- df1_sel2
}

Consequently, when I run the app, I'd like to switch between type A and type B conversion of the data by using a radioButton. The app I developed is, as follows:

Shiny ui and server

library(shiny)

    ui <- fluidPage(
            titlePanel("Converting different datasets"),
               sidebarLayout(
                sidebarPanel(
                    radioButtons("funct", "Distribution type:",
                                 c("Type A data" = "typeA",
                                   "Type B data" = "typeB")),
                    actionButton("conversion_button", HTML('<b>Convert Database</b>')) 
                ),
                mainPanel(
                    dataTableOutput("rendered_file2")
                )
            )
        )
        server <- function(input, output) {
            df1_sel2 <- reactive({ 
               funct <- switch(input$funct,
               typeA = data_handling_typeA_training, 
               typeB = data_handling_typeB_training, data_handling_typeA_training) 
               funct(input) })

            conversionButton <- eventReactive(input$conversion_button,{
                df1_sel2()
            })

            output$rendered_file2 <- DT::renderDataTable({
                conversionButton()
            })
        }

        shinyApp(ui = ui, server = server)

However, I'm obtaining the following error:

Warning: Error in [.reactivevalues: unused argument (2)

I'd be very grateful if anybody could help me with this matter!

↧

Error in strwidth("W", cex = tl.cex) : plot.new has not been called yet

February 13, 2020, 10:18 am

≫ Next: How to drop symbols from a list of titles?

≪ Previous: RadioButton in R Shiny, how to use it when evaluating different kind of data with different functions

I'm using corrplot package in R (for mac) and I have a error message

Error in strwidth("W", cex = tl.cex) : plot.new has not been called yet

Could anyone help me?

Some more details:

I am trying to use these commands for visualize Pearson correlation plot with circle + black number

corrplot(x.cor, order = "original", type = "upper", tl.pos = "tp") 

corrplot(x.cor, add = TRUE, type = "lower", method = "number", 
         order = "original", col = "black", diag = FALSE,
         tl.pos = "n", cl.pos = "n")

The first command is well performed but I have problem for second command with below error and white window (quartz 2) without plot:

Error in strwidth("W", cex = tl.cex) : plot.new has not been called yet

↧

How to drop symbols from a list of titles?

February 13, 2020, 10:20 am

≫ Next: add (png) image inline in an email using R / sendmailR

≪ Previous: Error in strwidth("W", cex = tl.cex) : plot.new has not been called yet

I have the following list of titles:

cols = c("to_test.namespace.(1,Inf]","to_test.namespace.(1,20]","from_test.namespace.(1,200]")

I need to drop ( and ].

If I apply this piece of code, it only works for to_test.namespace.(1,Inf], but does not do anything for the rest.

out <-  sub("\\.?\\((\\d+),\\s*([A-Za-z]+)\\]$", "_\\1_\\2", cols)

I need to get the following result:

cols = c("to_test.namespace.1_Inf","to_test.namespace.1_20","from_test.namespace.1_200")

or:

cols = c("to_test.namespace_1_Inf","to_test.namespace_1_20","from_test.namespace_1_200")

Both options are fine.

How can I do so?

↧

add (png) image inline in an email using R / sendmailR

February 13, 2020, 10:23 am

≫ Next: Include list of IRanges as column in a data.frame

≪ Previous: How to drop symbols from a list of titles?

I am attempting to send an HTML email with inline images. The text below is abridged, but works fine.

theString=paste0(
   '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
   Strict//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml">
   <head>
   <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
   <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
   <title>HTML demo</title>
   <style type="text/css">
   </style>
   </head>
   <body>
   <h1>Notes</h1>
   '</body>
   </html>'

msg = mime_part(theString)

## Override content type.
msg[["headers"]][["Content-Type"]] = "text/html"

sendmail(from = "mschindel@sunrun.com",
         to = c("mschindel@sunrun.com"),
         bcc=bccList,
         subject = theSubject,
         msg = msg,
         html = TRUE,
         smtp = list(
            host.name = "aspmx.l.google.com", 
            port = 25, 
            user.name = "***", 
            passwd = "***", 
            ssl = TRUE
         ),
         authenticate = TRUE,
         send = TRUE
         )

I'd like to be able to send a *.png image inline (so, perhaps after the "Notes" header in the HTML).

I have a set of *png files in my home directory, but can't figure out how to get the image attached or inline.

Thank you!

↧

Include list of IRanges as column in a data.frame

February 13, 2020, 10:24 am

≫ Next: AWS does not seem to be connecting with R

≪ Previous: add (png) image inline in an email using R / sendmailR

I have some data structured a bit like this:

x01 <- c("94633X94644Y95423X96130", "124240X124494Y124571X124714", "135654X135660Y136226X136786")

That I end up using later as an IRanges object through some steps that look like:

x02 <- sapply(x01,
              function(x) do.call(rbind,
                                  strsplit(strsplit(x,
                                                    split = "Y",
                                                    fixed = TRUE)[[1]],
                                           split = "X",
                                           fixed = TRUE)),
              simplify = FALSE,
              USE.NAMES = FALSE)

x03 <- sapply(x02,
              function(x) IRanges(start = as.integer(x[, 1L]),
                                  end = as.integer(x[, 2L])),
              simplify = FALSE,
              USE.NAMES = FALSE)

> x03
[[1]]
IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]     94633     94644        12
  [2]     95423     96130       708

[[2]]
IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]    124240    124494       255
  [2]    124571    124714       144

[[3]]
IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]    135654    135660         7
  [2]    136226    136786       561

Now I would like to be able to store x03 as a column in a data.frame with some associated information with something simple like:

> x04 <- data.frame("col1" = 1:3,
                    "col2" = x01,
                    "col3" = x03)

This unsurprisingly tells me that I have a differing number of rows, however, I feel like i've seen JSON imports into R mimic the kind of structure I want, where a ragged list inhabits the column of a data.frame. Is this a possible operation?

↧

AWS does not seem to be connecting with R

February 13, 2020, 10:27 am

≫ Next: GGplot2: text printed by element_text is blurry with a halo

≪ Previous: Include list of IRanges as column in a data.frame

Could you please help me to figure out why I am failing to read files from AWS.

Packages installed: 'aws.s3', 'aws.ec2metadata', 'jsonlite', 'batchscr', 'remotes', 'data.table'

Code:

usr <- 'aws00' # the user account for using AWS service
keycache <- read.csv('(directory here)', header = TRUE, stringsAsFactors = FALSE) # the database of our credentials
ec2env(keycache,usr)
Error in Sys.setenv(AWS_ACCESS_KEY_ID = keycache$Access_key_ID[keycache$service ==  : 
  wrong length for argument

thanks a lot!

Best Regards, Sandra

↧

GGplot2: text printed by element_text is blurry with a halo

February 13, 2020, 10:29 am

≫ Next: find missing data from another list (between two dates)

≪ Previous: AWS does not seem to be connecting with R

I have a large dataframe consisting of fires that occurred during different years, I have successfully graphed these in a scatterplot using ggplot 2, however all of my text in the plot has a strange coloured halo around it. Here is an example of the text up close.

Here is an example of what the final product looks like from far away

I have seen this issue before when using geom_text() and have been able to resolve it by adding geom_text(..., check_overlap = TRUE). However, in this case I specify all of my text within a theme using element_text(), and cannot seem to find an answer as to why this strange colouring is occurring.

Here is a sample of my data:

df1 <- data.frame(REP_DATE = c("1988-05-02", "2015-04-18", "1981-06-19", "2009-06-18"),
                      YEAR = c("1988", "2015", "1981", "2009"),
                      CAUSE = c("L", "H", "L", "H"),
                      CALC_HA=c("2350.18324","2350.18324", "1825.316641", "2296.424534"))

As well as my original code to create the plot:

#load your packages
library(ggplot2)
library(RColorBrewer)
library(dplyr)
library(anytime)
library(PCICt)
library(lubridate)
library(forcats)
library(ggExtra)

# first we need to make it a PCICt object
#set Rep_date as a date
data$REP_DATE <- as.Date(data$REP_DATE, "%Y-%m-%d")
data$YDAY <- anydate(yday(data$REP_DATE))
class(data$YDAY)

#subset just the data with H and L
data_H_L <- data[data$CAUSE=="H" | data$CAUSE=="L",]

p<-ggplot(data_H_L, aes(x = YDAY,                                            #set x and Y Axis
                        y = YEAR, colour=CAUSE, size=CALC_HA)) +             #set the colour and size 
  scale_y_reverse()+                        # reverse Y scale to get the years at the top to bottom order
  guides(size=FALSE)+                                                        #remove size legend 
  geom_point(alpha=0.2) +                                                    #set the transparency 
  #scale_color_brewer(type="qual", palette="Dark2")+                         #change the colours here, Dark2 looks good
  scale_colour_manual(values=c("#0195D6", "#E66407"),
                      name  ="Fire Cause",
                      breaks=c("H", "L"),
                      labels=c("Human", "Lightning"))+                       #specify colours using unique colour combo
  scale_x_date(date_breaks = "1 month",date_labels="%b")+  
  #scale_colour_discrete(values=c("#F93800", "#FFB500"),
  removeGrid(x=TRUE, y=FALSE)+                                               # remove x grid
  theme(plot.background = element_rect(fill = "black"),                      #set plot background colour
        panel.background=element_rect(fill="black"))+                        #set panel colour
  scale_size_continuous(range=c(1.75,10))+                                   #set scale for size of circles                                     
  guides(colour = guide_legend(override.aes = list(size=4, alpha=1)))        #set size of circles in legend

pp<-p+ theme(axis.text.x = element_text(face="bold", colour="white", size=10, family="sans"),  #set X axis text
             axis.text.y = element_text(face="bold", colour="white", size=10, family="sans"), #set y axis text
             axis.ticks = element_blank(),
             panel.grid.major.y = element_line(linetype="solid",size=1),#set y axis line width
             plot.title = element_text(color="white", size=14, face="bold", hjust=0.5, family="sans"),#set main title
             plot.subtitle = element_text(color="white", size=12, face="bold", hjust=0.5,family="sans"),#set subtitle 
             axis.title.x = element_blank(),                                       #set x axis title details
             axis.title.y = element_blank(),                                       #set y axis title details
             legend.text = element_text(colour="white", size=10, face="bold",family="sans"), #set legend axis title details
             legend.background = element_rect(fill="black", size=1, linetype="solid", colour= "white"), #set legend background
             legend.title = element_text(colour="white", size=12, face="bold",family="sans"),
             legend.title.align = 0.5,                                      #set legend title
             legend.key=element_blank())                              #set legend key to blank to get rid of background around dot
ggsave("test2.jpg", units="in", width=10, height=7, dpi=300)

Does anyone have any idea how to resolve this?

↧

find missing data from another list (between two dates)

February 13, 2020, 10:29 am

≫ Next: Unused arguments error in R studio

≪ Previous: GGplot2: text printed by element_text is blurry with a halo

I have two tricky situations for my data (all data frames are huge).

1) df looks like:

ISO3     CurrencyCode        OriginalPrice 
USA            USD                   2.5
USA            n/a                   3.6
CAN            CAD                   2.8
...

and df2 is a currency code list looks like

ISO3      CurrencyCode
USA              USD
CAN              CAD
JAP              JPY
...

I tried to find those missing currency codes in df by using df2.

I tried the following code but not working:

setDT(df)[df2, CurrencyCode:= CurrencyCode, on = .(ISO3)]

Another issue is similar but more complex.

df looks like:

ID     PayDate        CurrencyCode   ISO3
1      2016/05/01      EUR            FIN
2      2019/01/14      CAD            CAN
...
10000  2015/07/31      USD            USA
10001  2018/12/07      CAD            CAN

df2 looks like:

StartDate     EndDate      CurrencyCode   Rate      ISO3
2015/01/01    2015/03/05     CAD          0.75      CAN
2017/05/08    2017/12/31     JPY          0.0091    JAP
....

2019/07/01    2019/08/31     JPY          0.0093    JAP

I want to make df looks like:

ID     PayDate        CurrencyCode   Rate    ISO3
1      2016/05/01      EUR           1.06    FIN
2      2019/01/14      CAD           0.85    CAN
...
10000  2015/07/31      USD           1       USA
10001  2018/12/07      CAD           0.75    CAN

and the rate is decided by the PayDate. The PayDate should locate between the StartDate and EndDate in df2. If there's no enough information from df2, then the rate should be defined by the StartDate closet to the PayDate.

This is my code:

setDT(df)[df2, Rate:= Rate, .(date =seq(StartDate, EndDate, by = "day")), by = .(ISO3)]

and again.... I got error.

↧

Unused arguments error in R studio

February 13, 2020, 10:38 am

≫ Next: How to I overcome Dividing by Zero Error in R

≪ Previous: find missing data from another list (between two dates)

I get an error when I try to run this line of code:

nnetPred.model <- nnetPred(X, Y, step_size = 0.4,reg = 0.0002, h=50, niteration = 6000)

The error message is:

Error in nnetPred(X, Y, step_size = 0.4, reg = 2e-04, h = 50, niteration = 6000) : 
unused arguments (step_size = 0.4, reg = 2e-04, h = 50, niteration = 6000)

My code is as below:

nnetPred <- function(X, Y, para = list()){
  W <- para[[1]]
  b <- para[[2]]
  W2 <- para[[3]]
  b2 <- para[[4]]

  N <- nrow(X)
  hidden_layer <- pmax(0, X%*% W + matrix(rep(b,N), nrow = N, byrow = T)) 
  hidden_layer <- matrix(hidden_layer, nrow = N)
  scores <- hidden_layer%*%W2 + matrix(rep(b2,N), nrow = N, byrow = T) 
  predicted_class <- apply(scores, 1, which.max)

  return(predicted_class)  
}

nnetPred.model <- nnetPred(X, Y, step_size = 0.4,reg = 0.0002, h=50, niteration = 6000)

↧

How to I overcome Dividing by Zero Error in R

February 13, 2020, 10:41 am

≫ Next: Erase repeated lines in r with a conditional column

≪ Previous: Unused arguments error in R studio

Dividing by Zero Error encountered

Hi, Stack community! I came across this error working today. I am not sure exactly how to work around it. I read to use a na.rm in my mutate, and I tried it, but it didn't work. I could be completely wrong.


library("DBI")
library("dbplyr")
library("odbc")
library("dplyr")
library("stringr")
library("tidyverse")
library("lubridate")


  select(CustomerID, PostalCodeID, OrderID, ItemID, WrittenSales, WrittenUnits, TransCodeID, SalesType, ProductID, ProductName, GroupID, SubGroupID, CategoryID, TransDate, LocationID, LocationName) %>%

  filter(SalesType == "W",
         LocationID %in% Louisville) %>%

  group_by(CustomerID, PostalCodeID, WrittenSales, TransCodeID, SalesType, ProductID, ProductName, GroupID, SubGroupID, CategoryID, TransDate, LocationID, LocationName) %>%
  summarise(WrittenUnits_purchased = sum(WrittenUnits)) %>%
  ungroup() %>%

  group_by(CustomerID) %>%
  mutate(prop_of_total = WrittenUnits_purchased/sum(WrittenUnits_purchased)) %>%
  ungroup()```

↧

Erase repeated lines in r with a conditional column

February 13, 2020, 10:43 am

≫ Next: Issues after installing tidymodel and dependent packages in R

≪ Previous: How to I overcome Dividing by Zero Error in R

I have this data: https://pastebin.com/x8HrT8qK

"sp";"mes";"ano";"code""56";"CM";7;2016;"CM52""57";"CM";2;2019;"CM52""58";"CM";11;2016;"CM53""59";"CM";9;2019;"CM53""60";"CM";5;2018;"CM53""61";"CM";5;2018;"CM53""374";"EI";8;2019;"EI26""375";"EI";8;2019;"EI26""376";"EI";3;2019;"EI26""377";"EI";7;2019;"EI26""378";"EI";11;2019;"EI26""379";"EI";2;2020;"EI26""380";"EI";10;2019;"EI27""381";"EI";11;2019;"EI27""382";"EI";11;2019;"EI27"

and I would like to exclude lines that have the same "code" only if they have the same "ano"

So that the data would look like this: https://pastebin.com/F7tkUZE1

"sp";"mes";"ano";"code""56";"CM";7;2016;"CM52""57";"CM";2;2019;"CM52""58";"CM";11;2016;"CM53""59";"CM";9;2019;"CM53""60";"CM";5;2018;"CM53""374";"EI";8;2019;"EI26""379";"EI";2;2020;"EI26""380";"EI";10;2019;"EI27"

↧

Issues after installing tidymodel and dependent packages in R

February 13, 2020, 10:43 am

≫ Next: R - type detection / conversion

≪ Previous: Erase repeated lines in r with a conditional column

I was trying to use tidymodels and while installing this package it also installed certain dependent packages I would assume. Post that I have been getting this error whenever I run some dplyr based standalone code or my shiny app which was working fine till then. How can I fix this issue:

library(readr)
library(dplyr)
uscities <- read_csv("https://gist.githubusercontent.com/senthilthyagarajan/f2e7839a08a377c698d9235bb1bcc0bb/raw/3d4e40b422e5535389804d6e2390b674241ca045/uscities.csv")
#View(uscities)

uscities <- uscities %>% select(city,city_ascii,state_id,state_name,population,lat,lng)

Error: `...` is not empty.

We detected these problematic arguments:
* `logical`

These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
Run `rlang::last_error()` to see where the error occurred.

So I did sessionInfo() and this is what I got :

> rm(list=ls())
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.6.1 tools_3.6.1    packrat_0.5.0 
> library(readr)
> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

> uscities <- read_csv("https://gist.githubusercontent.com/senthilthyagarajan/f2e7839a08a377c698d9235bb1bcc0bb/raw/3d4e40b422e5535389804d6e2390b674241ca045/uscities.csv")
Parsed with column specification:
cols(
  city = col_character(),
  city_ascii = col_character(),
  state_id = col_character(),
  state_name = col_character(),
  county_fips = col_double(),
  county_name = col_character(),
  county_fips_all = col_character(),
  county_name_all = col_character(),
  lat = col_double(),
  lng = col_double(),
  population = col_double(),
  density = col_double(),
  source = col_character(),
  military = col_logical(),
  incorporated = col_logical(),
  timezone = col_character(),
  ranking = col_double(),
  zips = col_character(),
  id = col_double()
)
> #View(uscities)
> 
> uscities <- uscities %>% dplyr::select(city)
Error: `...` is not empty.

We detected these problematic arguments:
* `logical`

These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?

So I did try the rlang::last_trace() and got this

> rlang::last_trace()
<error/rlib_error_dots_nonempty>
`...` is not empty.

We detected these problematic arguments:
* `logical`

These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
Backtrace:
     █
  1. └─uscities %>% dplyr::select(city)
  2.   ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  3.   └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
  4.     └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
  5.       └─`_fseq`(`_lhs`)
  6.         └─magrittr::freduce(value, `_function_list`)
  7.           ├─base::withVisible(function_list[[k]](value))
  8.           └─function_list[[k]](value)
  9.             ├─dplyr::select(., city)
 10.             └─dplyr:::select.data.frame(., city)
 11.               └─tidyselect::vars_select(tbl_vars(.data), !!!enquos(...))
 12.                 └─tidyselect:::eval_select_impl(...)
 13.                   ├─tidyselect:::with_subscript_errors(...)
 14.                   │ ├─base::tryCatch(...)
 15.                   │ │ └─base:::tryCatchList(expr, classes, parentenv, handlers)
 16.                   │ │   └─base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
 17.                   │ │     └─base:::doTryCatch(return(expr), name, parentenv, handler)
 18.                   │ └─tidyselect:::instrument_base_errors(expr)
 19.                   │   └─base::withCallingHandlers(...)
 20.                   └─tidyselect:::vars_select_eval(...)
 21.                     └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
 22.                       └─tidyselect:::eval_c(expr, data_mask, context_mask)
 23.                         └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
 24.                           └─tidyselect:::walk_data_tree(init, data_mask, context_mask)
 25.                             └─tidyselect:::as_indices_sel_impl(...)
 26.                               └─tidyselect:::as_indices_impl(x, vars, strict = strict)
 27.                                 └─vctrs::vec_as_subscript(x, logical = "error")
 28.                                   └─ellipsis::check_dots_empty()
 29.                                     └─ellipsis:::action_dots(...)

↧

R - type detection / conversion

February 13, 2020, 10:44 am

≫ Next: Download attachment from an outlook email using R

≪ Previous: Issues after installing tidymodel and dependent packages in R

I have a big R data.table, and I am trying to reduce it's size. fwrite and fread allow for automatic type detection (and reducing size in the process). However it has some funky behavior (reading "0001" as a number and converting it to 1).

Are there some automated option for automatic type detection / conversion without losses ?

↧

Download attachment from an outlook email using R

February 13, 2020, 10:45 am

≫ Next: automation to merge data frames adding a line to keep note of the origin

≪ Previous: R - type detection / conversion

I receive an email every Sunday with an attachment (a zipped folder). The subject of the email never changes. I want to find the latest email with the specified subject line and download the attachment. I am new R user and so far I have only found a way to print the email body based on the subject (from one of the other questions asked on stackoverflow How to retrieve Outlook inbox emails using R RDCOMClient?). Ideally, I want to find the email with the specified subject received on a specified date and then download the attachment. Could some please point me in the right direction. Any help will be greatly appreciated. Thank you.

↧

automation to merge data frames adding a line to keep note of the origin

February 13, 2020, 10:45 am

≫ Next: R function to compare csv files and list what changed between rows with same column name

≪ Previous: Download attachment from an outlook email using R

I am a newbie with R. I have 6 different data frames (U, V, W, X, Y, Z), coming from different CSV files, each of them has the same columns (Surname, Name, Winter, Spring, Summer), and I would like to create a new data frame containing the 5 rows and a sixth row which indicates one of the letters (U, V, ...) where the original data comes from. I have tried with the following code:

U <- read.csv(file = "U", header = T)
V <- read.csv(file = "V", header = T)
W <- read.csv(file = "W", header = T)
X <- read.csv(file = "X", header = T)
Y <- read.csv(file = "Y", header = T)
Z <- read.csv(file = "Z", header = T)

U['class'] <- rep("U")
V['class'] <- rep("V")
W['class'] <- rep("W")
X['class'] <- rep("X")
Y['class'] <- rep("Y")
Z['class'] <- rep("Z")

students <- rbind(U, V, W, X, Y, Z)

I would really need to use a loop, so that I can in future go from A to Z. I would like to do something like this, which is totally nonsense.

for(class.name in list(U, V, W, X, Y, Z)){
  class.name['class'] <- rep('class')
}

Is there a reasonable way to do it?

Thank you

Edited

To clarify my question, the idea is that I have 6 different stations collecting raw data and giving me 6 different data frames. I want to merge them together, maintaining the information of from which station the raw data comes from.

Possible incomplete solution Following @MrFlick's advice, I have managed to put everything in one list as follows

classes <- c('U', 'V', 'W', 'X', 'W', 'Z')
my.files <- paste(classes,".csv",sep="")
year.eight <- lapply(my.files, read.csv, header = T)
name(year.eight) <- classes

However, the final outcome should be one single data frame with a further column to indicate which class are the students in. Can someone help me with this, please?

↧

R function to compare csv files and list what changed between rows with same column name

February 13, 2020, 10:51 am

≫ Next: Subset list of dataframes containing column in R

≪ Previous: automation to merge data frames adding a line to keep note of the origin

I am attempting to write a function in r that takes two fairly large csv files with the same number of columns and the same column names, and compares the rows. But I am having difficulty comparing the rows, because I'd like for the function to return which column the data is changing in. For example, my dataframe/csv may be:

csv1:

qty    name    description                 price
2     alpha    number of alpha to order    20
3     beta     number of beta to order     30
1     gamma    number of gamma to order    10
2     delta    number of delta to order    5

csv2:

qty    name    description                 price
3     alpha    number of alpha to order    20
3     beta     number of beta pieces       30
1     gamma    number of gamma to order    40
2     delta    number of delta to order    5
1     epsilon  number of epsilon to order  10

and I want the function to populate a table/other dataframe that has what changed from csv1 to csv2:

qty    name    description                 price    change
3     alpha    number of alpha to order    20       qty
3     beta     number of beta pieces       30       description
1     gamma    number of gamma to order    40       price
1     epsilon  number of epsilon to order  10       added

since alpha increased in quantity, beta's description changes, and gammas price changed, I'd like those to be listed, ignoring delta where nothing changed, and showing that epsilon is a newly added row, unique from the first csv. Is this possible in r? Thank you in advance!

My current code compares the two dataframes and outputs 3 csv files. one where all the columns matched between the dataframes, a second with unique rows to the first dataframe, and a third with only unique rows to the second dataframe. So I will be comparing the second two dataframes, since the first has already sorted out where all the columns are identical.

↧

Subset list of dataframes containing column in R

February 13, 2020, 10:53 am

≫ Next: gganimate transition between similar mosaic plots

≪ Previous: R function to compare csv files and list what changed between rows with same column name

I have a list of dataframes and I'd like to get column x from each dataframe as a string.

testing <- list(data.frame(A = "Yes", B = "No"),
                data.frame(B = "No", C = "No"),
                data.frame(A = "Yes"))

I can print which of the dataframes have a colname A in them, but I haven't been able to make the connection to subsetting the original testing

lapply(testing, function(x) "A" %in% colnames(x))

Desired Output

[[1]]
    A  B
1 Yes No

[[2]]
    A
1 Yes

↧

gganimate transition between similar mosaic plots

February 13, 2020, 10:55 am

≫ Next: "Error: C stack usage 7971488 is too close to the limit"

≪ Previous: Subset list of dataframes containing column in R

I would like to animate a transition between two mosaicplots with gganimate. The problem: these two graphics are created by different geom_mosaic commands and the effect cannot be create by different segments of one dataframe.

Cann this be done with gganimate or otherwise?

library(tidyverse)
library(ggmosaic)
library(patchwork)

df1 <- tibble(gender=c("male","male","female","female"),smoker=c("yes","no","yes","no"),
              count=c(1,2,1,1))

library(ggmosaic)
df1 %>% 
  ggplot() +
  geom_mosaic(aes(product(smoker,gender),weight=count,fill=smoker)) +
  labs(x="",y="") -> g1

df1 %>% 
  ggplot() +
  geom_mosaic(aes(product(gender,smoker),weight=count,fill=smoker)
              ,divider = c("hspine","vspine")) +
  labs(x="",y="") -> g2

I want the left chart to smoothly make a transition into the right chart

g1+g2

↧