mapping a keras model over a list of tibbles

February 14, 2020, 12:21 pm

≫ Next: Customize width of bar plot in likert plot

≪ Previous: The length of trainPred is not correct in prediction function with R

I have the below code which uses the iris data set to train a number of Machine Learning models:

I want to make predictions for the keras model. The below code works and I am able to obtain predictions for all the models (except the keras model):

When I uncomment the else if - keras part of the code I obtain "errors" or the model produces.

[1] "skipping\n"
[1] "skipping\n"
[1] "skipping\n"
[1] "skipping\n"
[1] "skipping\n"
[1] "skipping\n"
[1] "skipping\n"
[1] "skipping\n"
[1] "skipping\n"
[1] "skipping\n"
[1] "skipping\n"
[1] "skipping\n"

My question is where am I going wrong on the keras predict part? I want to modify this part of the code such that it will give me predicted classes:

  # else if(attr(x, "class")[1] == "keras_training_history"){
  #   # Keras Single Layer Neural Network
  #   tibble(
  #     modelname = attr(x, "class")[1],
  #     prediction = predict_classes(object = x, x = as.matrix(dat))
  #   )
  # }

EDIT 1:

My attempt at the debugging:

dat <- iris %>% 
  filter(Species != "setosa") %>% 
  mutate(Species = +(Species == "virginica"))

mod <- keras_model_sequential() %>% 
  layer_dense(units = 2, activation = 'relu', input_shape = 2) %>% 
  layer_dense(units = 2, activation = 'softmax')
mod
mod %>% compile(
    loss = 'binary_crossentropy',
    optimizer_sgd(lr = 0.01, momentum = 0.9),
    metrics = c('accuracy')
  ) 
mod
fit(mod, 
    x = as.matrix(dat[, 2:3]),
    y = to_categorical(dat$Species, 2),
    epochs = 5,
    batch_size = 5,
    validation_split = 0
  )

predict_classes(mod, as.matrix(dat[, 2:3]))

Gives me:

[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [44] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [87] 0 0 0 0 0 0 0 0 0 0 0 0 0 0

EDIT:

When I run the code in EDIT 1. and then pass:

attr(mod, "class")

I get the following output:

[1] "keras.engine.sequential.Sequential"                        
[2] "keras.engine.training.Model"                               
[3] "keras.engine.network.Network"                              
[4] "keras.engine.base_layer.Layer"                             
[5] "tensorflow.python.module.module.Module"                    
[6] "tensorflow.python.training.tracking.tracking.AutoTrackable"
[7] "tensorflow.python.training.tracking.base.Trackable"        
[8] "python.builtin.object"

However when I run the models_list code and then run the following:

attr(models_list[[1]]$models$Model_Keras, "class")

I get:

[1] "keras_training_history"

So I am passing a different function to the predict. Therefore I am starting to think the code when building the model stores the data incorrectly.

↧

Customize width of bar plot in likert plot

February 14, 2020, 12:21 pm

≫ Next: convert string to time-format in R language

≪ Previous: mapping a keras model over a list of tibbles

I'm using the likert package by jbryer and want to visualise the data with stacked bar plots. The size/width of these bar plots depends on how many bars are in the graph, i.e. with only one bar the bar is pretty wide, while they get thinner the more bars are plotted.

I'd like to costumly set the size/width of the bar, so that they stay the same, no matter how many bars are plotted in the graph, i.e. that the bar size is the same for the plots of l29_5 and l29_2.

Likert bar plot with two bars

Likert bar plot with five bars

library(ggplot)
library(likert)    
data(pisaitems)

items29_5 <- pisaitems[,substr(names(pisaitems), 1,5) == 'ST25Q']
colnames(items29_5) <- c("Magazines", "Comic books", "Fiction", 
                    "Non-fiction books", "Newspapers")

items29_2 <-  items29_5 %>% 
  select("Magazines", "Comic books")


l29_5 <- likert(items29_5)
l29_2 <- likert(items29_2)

plot(l29_5)
plot(l29_2)

↧

convert string to time-format in R language

February 14, 2020, 12:22 pm

≫ Next: Passing list of variable names to custom function with mutate

≪ Previous: Customize width of bar plot in likert plot

I have strings like this:

how to transfer this to time format?

01:00:00
02:00:00
...
23:00:00

do I have to add 0 to the string? I have tried

Data$Time <- formatC(Data$Time, digits = 6, flag = "0")

But it's not working

↧

Passing list of variable names to custom function with mutate

February 14, 2020, 12:25 pm

≫ Next: Remove all packages that do not come with R

≪ Previous: convert string to time-format in R language

I am trying to perform a function over each row and create a new column that considers multiple columns using tidyverse , I was initially using rowwise() but that was very slow. I want the list of columns into my custom function be a variable, but I can't get it to work unless I explicitly list the variable names. For example, this works:

low_risk_codes <- c(0,1,10)
vars <- c("V1", "V2")
m <- matrix(1:9, ncol=3)
classify_low_risk_drug <- function(...){
  t <- cbind(...)
  return(apply(t, 1, function(x) ifelse(any(x %in% low_risk_codes), 1, 0)))
}

as.data.frame(m) %>%
  mutate(val4 = classify_low_risk_drug(V1, V2))

But if I want it to evaluate using the column input as vars :

as.data.frame(m) %>% 
  mutate(val4 = classify_low_risk_drug(vars))

But I can't get it to work even if I include !!, what am I missing?!

Also any suggestions for how to do this with map instead are also appreciated!

↧

Remove all packages that do not come with R

February 14, 2020, 12:29 pm

≫ Next: overlay discrete and continuous layer in ggplot - surprised that layer order matters

≪ Previous: Passing list of variable names to custom function with mutate

How can I remove all installed packages except base and recommended?

↧

overlay discrete and continuous layer in ggplot - surprised that layer order matters

February 14, 2020, 12:29 pm

≫ Next: NA vs. computationally singular error in R multiple regression (glm)

≪ Previous: Remove all packages that do not come with R

consider the following example dataset:

library(dplyr)
library(ggplot2)

d = mtcars %>% 
 as_tibble(rownames = "name") %>% 
 mutate(wt.cat = cut(wt, seq(1.5, 5.5, by = 1))) %>%
 group_by(wt.cat) %>%
 summarize(
   Mean = mean(mpg),
   Min = min(mpg),
   Max = max(mpg)
 )

Say I want to plot points for the "mean" value of each category in wt.cat and a ribbon showing the range. This works:

ggplot(d, aes(x = wt.cat)) + 
  geom_point(aes(y= Mean)) +
  geom_ribbon(aes(x = as.numeric(wt.cat), ymin = Min, ymax = Max), fill = "blue")

But the points are masked by the ribbon. However, if I change the order of the layers so that the points are plotted on top of the ribbon, I get an error:

ggplot(d, aes(x = wt.cat)) + 
  geom_ribbon(aes(x = as.numeric(wt.cat), ymin = Min, ymax = Max), fill = "blue") +
  geom_point(aes(y= Mean))
## Error: Discrete value supplied to continuous scale

So even though I'm specifying the discrete axis as the "default" aesthetic, it gets overridden by the specification of the first plotted layer. The only way I can find around this is to plot a dummy point layer first:

ggplot(d, aes(x = wt.cat)) + 
  geom_point(aes(y= Mean), shape = NA) +
  geom_ribbon(aes(x = as.numeric(wt.cat), ymin = Min, ymax = Max), fill = "blue") +
  geom_point(aes(y= Mean))
## Warning message:
## Removed 4 rows containing missing values (geom_point).

Is there a more "proper" or correct way of combining discrete and continuous layers? Is there a solution that doesn't require creating a dummy layer?

↧

NA vs. computationally singular error in R multiple regression (glm)

February 14, 2020, 12:30 pm

≫ Next: Saving the the output of a str_which loop in R

≪ Previous: overlay discrete and continuous layer in ggplot - surprised that layer order matters

Sometimes when I use glm to construct a multiple regression model, it returns a model with coefficients for all except one (or some small subset) of variables which are listed as NA. Presumably, the NAs are due to these variables being covariates of some other variable or linear combinations thereof.

On other occasions, I simplet get an

Error in solve.default(hessian, gradient, tol = <some number << 1 >), 
system is computationally singular: 
reciprocal conditiona number = < some number <<1 >

Presumably this singularity error results from the same thing that gives me NA values for other models (colinearity).

My question is why R is able to generate a model by "isolating" the problem variables in some cases (as NAs) whereas in other cases the colinear variables just give a singularity error.

In those cases where R just returns a singularity error, is there a straightforward way of identifying which variables are the cause of the error, apart from tedious stepwise addition?

↧

Saving the the output of a str_which loop in R

February 14, 2020, 12:31 pm

≫ Next: Collapse / concatenate / aggregate multiple columns to a single comma separated string within each group

≪ Previous: NA vs. computationally singular error in R multiple regression (glm)

I work with a sheet of data that lists a variety of scientific publications. Rows are publications, columns are a variety of metrics describing each publication (author name and position, Pubmed IDs, Date etc...) I want to filter for publications for each author and extract parts of them. The caveat is the format: all author names (5-80 per cell) are lumped together in one cell for each row.

I managed to solve this with the use of str_which, saving the coordinates for each author and later extract. This works only for manual use. When I try to automate this process using a loop to draw on a list of authors I fail to save the output.

I am at a bit of a loss on how to store the results without overwriting previous ones.

sampleDat <- 
  data.frame(var1 = c("Doe J, Maxwell M, Kim HE", "Cronauer R, Carst W, Theobald U", "Theobald U, Hey B, Joff S"),
             var2 = c(1:3),
             var3 = c("2016-01", "2016-03", "2017-05"))

list of names that I want the coordinates for

namesOfInterest <-
  list(c("Doe J", "Theobald U"))

the manual extraction, requiring me to type the exact name and output object

Doe <- str_which(sampleDat$var1, "Doe J")           
Theobald <- str_which(sampleDat$var1, "Theobald U")

one of many attempts that does not replicate the manual version.

results <- c()

for (i in namesOfInterest) {
  results[i] <- str_which(sampleDat$var1, i)
}

Many thanks in advance

↧

Collapse / concatenate / aggregate multiple columns to a single comma separated string within each group

February 14, 2020, 12:31 pm

≫ Next: How to change the column names and make a data frame of columns in the dataset

≪ Previous: Saving the the output of a str_which loop in R

This is an extension to post Collapse / concatenate / aggregate a column to a single comma separated string within each group

Goal: aggregate multiple columns according to one grouping variable and separate individual values by separator of choice.

Reproducible example:

data <- data.frame(A = c(rep(111, 3), rep(222, 3)), B = c(rep(c(100), 3), rep(200,3)), C = rep(c(1,2,NA),2), D = c(15:20), E = rep(c(1,NA,NA),2))
data
    A   B  C  D  E
1 111 100  1 15  1
2 111 100  2 16 NA
3 111 100 NA 17 NA
4 222 200  1 18  1
5 222 200  2 19 NA
6 222 200 NA 20 NA

A is the grouping variable but B is still displayed in overall result (B depends on A in my application) and C, D and E are the variables to be collapsed into separated character strings.

Desired Output

    A   B  C    D         E
1 111 100  1,2  15,16,17  1
2 222 100  1,2  18,19,20  1

I don't have a ton of experience with R. I did try to expand upon the solutions posted by G. Grothendieck to the linked post to meet my requirements but can't quite get it right for multiple columns.

What would be a proper implementation to get the desired output?

I focused specifically on group_by and summarise_all and aggregate in my attempts. They are a complete mess so I don't believe it would even be helpful to display.

↧

How to change the column names and make a data frame of columns in the dataset

February 14, 2020, 12:34 pm

≫ Next: ggplot2 package in Jupiter notebook [closed]

≪ Previous: Collapse / concatenate / aggregate multiple columns to a single comma separated string within each group

I am building a function to change the column names of 3 columns and make a new data frame with 3 column. The file name is noaaFilename, and Date, HrMn, and Slp were earlier column names and new names I want as Date, Time, AtmosPressure.

  names(noaaFilename)[names(noaaFilename) == "Date"] <- "Date"
  names(noaaFilename)[names(noaaFilename) == "HrMn"] <- "Time"
  names(noaaFilename)[names(noaaFilename) == "Slp"] <- "AtmosPressure"

  noaaData <- subset(noaaFilename, select = c(Date, Time, AtmosPressure))

↧

ggplot2 package in Jupiter notebook [closed]

February 14, 2020, 12:36 pm

≫ Next: LARS "Lasso" regression not choosing significant explanatory variables

≪ Previous: How to change the column names and make a data frame of columns in the dataset

The packages used to work fine, but now I get this message: installation of package ‘ggplot2’ had non-zero exit status.

I tried this: install.packages("ggplot2", repos='http://cran.us.r-project.org'), but it did not work.

↧

LARS "Lasso" regression not choosing significant explanatory variables

February 14, 2020, 12:42 pm

≫ Next: XML Files in to R Dataframe (Extraction Proble)

≪ Previous: ggplot2 package in Jupiter notebook [closed]

I am running lasso regression on a large data set n=1918, p=85 and the coefficients the regression identifies as important - when actually put into a linear model - are very insignificant. And one the other end, lasso deems very significant explanatory "model" variables as having coefficients near 0 and not selecting for them. The dataframe going into LARS is already scaled. Any ideas on why this might occur? Below is an example of what LARS might choose and also a model created by me with actually good explanatory variables using the exact same dataset.

UPDATE: I'm noticing that lasso is choosing all of my temperature variables and assigning them relatively high coefficients (>1) while all the rest of the variables fall between 0 and 1. Not sure why this is occuring.

signif.coefs <- function(lasso, threshold=1){
coefs <- coef(lasso)
signif <- which(abs(coefs[nrow(coefs),]) > threshold)
return(setNames(coefs[nrow(coefs),signif], signif))
}
signif.coefs(lasso)
     4        45 
 4.855257 -3.020055

lm(response ~ SP.MTMEAN + YEAR, data=df, na.action=na.pass) ###Terrible Lasso Chosen Model
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.16710    0.07190  -2.324  0.02022 *  
SP.MTMEAN    0.09889    0.02313   4.275 2.01e-05 ***
YEAR         0.14097    0.04580   3.078  0.00211 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9903 on 1915 degrees of freedom
Multiple R-squared:  0.01678,   Adjusted R-squared:  0.01576 
F-statistic: 16.34 on 2 and 1915 DF,  p-value: 9.167e-08

###variables chosen by me with model output from same data frame as above
lm(response~log1p.PTL_RESULT+log1p.NTL_RESULT+log1p.PH_RESULT+log1p.EPI.T+SU.MPPT, data=df, na.action=na.pass) 
Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       0.01200    0.01972   0.608  0.54301    
log1p.PTL_RESULT  0.20672    0.03104   6.660 3.58e-11 ***
log1p.NTL_RESULT  0.21219    0.03335   6.362 2.49e-10 ***
log1p.PH_RESULT   0.15543    0.02543   6.113 1.18e-09 ***
log1p.EPI.T       0.09869    0.02189   4.508 6.93e-06 ***
SU.MPPT          -0.06002    0.02135  -2.811  0.00499 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8596 on 1912 degrees of freedom
Multiple R-squared:  0.2603,    Adjusted R-squared:  0.2583 
F-statistic: 134.5 on 5 and 1912 DF,  p-value: < 2.2e-16

↧

XML Files in to R Dataframe (Extraction Proble)

February 14, 2020, 12:53 pm

≫ Next: Loop to read all excel files from sharepoint in R

≪ Previous: LARS "Lasso" regression not choosing significant explanatory variables

I have a large set of XML files that in each one it is needed to have them in a data frame. I tried many solutions from simple ones such as xmlToDataframe to the relevant ones such as the following (Parsing xml file in R) or the most complicated ones described here.

But I am still having different errors (Like having only variable names without values or nothing at all or one information among all the available info...) Here, I have attached a shorter version of one of them for you to have a look conveniently.

I would appreciate it if someone helps.

Thank you

<bulletin:BulletinAP>

    <bulletin:Description>3 Description :<BR/><BR/>3.1 Situation actuelle :<BR/>Ciel trËs nuageux au nord de la Loire, arrivÈe d'une ligne pluvieuse sur l'Ile-de-France et l'Eure-et-Loir.<BR/><BR/>3.2 Evolution prÈvue :<BR/>- Cet aprËs-midi : pluies traversant la rÈgion, de courte durÈe surtout ‡ l'ouest de la rÈgion, puis suivies d'une alternance d'averses et d'Èclairices.<BR/><BR/>- Demain vendredi : averses Èparses.<BR/><BR/>- Samedi : rares averses.<BR/>

    </bulletin:Description>

</bulletin:BulletinAP>

<bulletin:AlerteAP dateDebut="nÈant" dateFin="nÈant"/>

<bulletin:DonneesAP  alerteEnCours="false" idRefGeo="701">

    <donnees:DonneeObservee dateDebut="20080611060000" dateFin="20080612055959">

        <donnees:Valeur idParametre="AVGRR" valeur="0"/>

        <donnees:Valeur idParametre="MAXRR" valeur=""/>

    </donnees:DonneeObservee>

    <donnees:DonneePrevue dateDebut="20080612060000" dateFin="20080613055959">

        <donnees:Valeur idParametre="AVGRR" valeur="3/10"/>

        <donnees:Valeur idParametre="MAXRR" valeur=""/>

    </donnees:DonneePrevue>

    <donnees:DonneePrevue dateDebut="20080613060000" dateFin="20080614055959">

        <donnees:Valeur idParametre="AVGRR" valeur="1/5"/>

        <donnees:Valeur idParametre="MAXRR" valeur=""/>

    </donnees:DonneePrevue>

    <donnees:DonneePrevue dateDebut="20080614060000" dateFin="20080615055959">

        <donnees:Valeur idParametre="AVGRR" valeur="Tr/3"/>

    </donnees:DonneePrevue>

</bulletin:DonneesAP><bulletin:DonneesAP  alerteEnCours="false" idRefGeo="702">

    <donnees:DonneeObservee dateDebut="20080611060000" dateFin="20080612055959">

        <donnees:Valeur idParametre="AVGRR" valeur="0"/>

        <donnees:Valeur idParametre="MAXRR" valeur=""/>

    </donnees:DonneeObservee>

    <donnees:DonneePrevue dateDebut="20080612060000" dateFin="20080613055959">

        <donnees:Valeur idParametre="AVGRR" valeur="3/10"/>

        <donnees:Valeur idParametre="MAXRR" valeur=""/>

    </donnees:DonneePrevue>

    <donnees:DonneePrevue dateDebut="20080613060000" dateFin="20080614055959">

        <donnees:Valeur idParametre="AVGRR" valeur="Tr/3"/>

        <donnees:Valeur idParametre="MAXRR" valeur=""/>

    </donnees:DonneePrevue>

    <donnees:DonneePrevue dateDebut="20080614060000" dateFin="20080615055959">

        <donnees:Valeur idParametre="AVGRR" valeur="Tr/3"/>

    </donnees:DonneePrevue>

</bulletin:DonneesAP>

<bulletin:ZoneAP id="701" libelle="Boucles de Seine"/>

<bulletin:ZoneAP id="702" libelle="Oise aval"/>

<bulletin:ParametreAP id="MAXRR" libelle="Maximum de prÈcipitations" unite="mm"/>

<bulletin:ParametreAP id="AVGRR" libelle="Moyenne de prÈcipitations" unite="mm"/>

↧

Loop to read all excel files from sharepoint in R

February 14, 2020, 12:54 pm

≫ Next: How to plot geographic points on a map with different colours and legend?

≪ Previous: XML Files in to R Dataframe (Extraction Proble)

I want to build a loop to read all excel files in a sharepoint folder. I am using the following code, but it shows that the path does not exist:

Filelist <- list.files(path='//XXXcom.sharepoint.com/sites/Shared Documents/General/Email', pattern="*.xlsx")

DF <- lapply(filelist, function(i) {
 L <- read_excel(i, sheet="sheet1")})

When I apply to individual excel file, the path works; However, when I apply to the folder, it shows me that path does not exist. Thank you for any assistance.

↧

How to plot geographic points on a map with different colours and legend?

February 14, 2020, 12:55 pm

≫ Next: How do I convert to transaction? [duplicate]

≪ Previous: Loop to read all excel files from sharepoint in R

So I currently have a map (it's quite ugly, see attached)

Map and points

I found a way to plot them on a top down view, but I would like to attach different colours and have it as a legend (and maybe zoom in), instead of the lettering on top of the points as shown in the first picture. Can anyone help me with this? Here's the ideal picture, but with a legend and different colours:

The ideal picture, but with a legend and different colours

#libraries

  library(readr)
  library(sp)
  library(rgdal)
  library(raster)
  library(GISTools)
  library(sf)

#col_coor points to be plotted
 SUB        POP      LON   LAT
   <chr>      <chr>  <dbl> <dbl>
 1 mandtii    AK    -156.   71.2
 2 ultimus    NU     -82.5  65.9
 3 ultimus    GR     -70.2  76.5
 4 arcticus   LB     -61.7  56.6
 5 arcticus   NF     -53.6  47.3
 6 arcticus   ST     -69.7  47.8
 7 arcticus   NS     -61.5  45.1
 8 arcticus   NB     -66.8  44.6
 9 arcticus   ME     -68.2  44.2
10 islandicus IC     -22.9  65.4
11 grylle     FI      19.3  60.2

# Convert Lat & Lon data into a SPDF            
col_loc <- sp::SpatialPointsDataFrame(col_coor[,3:4], col_coor)

# Assign a coordinate reference system 
crs(col_loc) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

 radius <- 75000
  ColBuff<-raster::buffer(col_loc, width=(radius), filename='1000', doEdge=FALSE)


# Plot maximum flight radius polygons on base raster with continent boundaries
  data("wrld_simpl", package = "maptools")                                                                            
  world_map <- crop(wrld_simpl, extent(-180, 180, 35, 90))                                                                   
  plot(world_map, col="grey") 
  plot(ColBuff, pch=20, col="red",add=TRUE)

# Convert WGS84 to Arctic polar stereographic projection (STERE)  
  proj <- "+proj=stere +lat_0=90 +lat_ts=70 +lon_0=-45 +k=1 +x_0=0 +y_0=0 +a=6378273 +b=6356889.449 +units=m +no_defs"
  wm_stere <- spTransform(world_map, CRSobj = CRS(proj))
  plot(wm_stere, col="grey")
  cb_stere <- spTransform(ColBuff, CRSobj = CRS(proj))
  plot(cb_stere, pch=20, col="red",add=TRUE)

↧

How do I convert to transaction? [duplicate]

February 14, 2020, 1:00 pm

≫ Next: R Group by with conditional and sum other columns using data.table

≪ Previous: How to plot geographic points on a map with different colours and legend?

I am trying to tidy the following dataset (in link) in R and then run an association rules below.

https://www.kaggle.com/fanatiks/shopping-cart

install.packages("dplyr")
library(dplyr)

df <- read.csv("Groceries (2).csv", header = F, stringsAsFactors = F, na.strings=c("","","NA"))
install.packages("stringr")
library(stringr)
temp1<- (str_extract(df$V1, "[a-z]+"))
temp2<- (str_extract(df$V1, "[^a-z]+"))
df<- cbind(temp1,df)
df[2] <- NULL
df[35] <- NULL
View(df)

summary(df)
str(df)

trans <- as(df,"transactions")

I get the following error when I run the above trans <- as(df,"transactions") code:

Warning message: Column(s) 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 not logical or factor. Applying default discretization (see '? discretizeDF').

↧

R Group by with conditional and sum other columns using data.table

February 14, 2020, 1:07 pm

≫ Next: Error when trying to install TinyTeX using tinytex R package "cannot contact mirror.ctan.org, returning a backbone server!"

≪ Previous: How do I convert to transaction? [duplicate]

I want to sum all columns except one specific column based on the condition by groups.

For example:

Col1    Col2   Condition   Name    P1    P2    P3    P4 
1990    1      0           APPLE   10    20    20    30   
1990    1      1           BAN     30    40    50    50   
1990    1      1           CAR     40    40    30    40   
1990    2      0           DOG     100   20    30    40   
1990    2      1           APPLE   10    20    20    30   
1990    2      1           APPLE   50    20    20    30

I want to SUMP2, P3, P4 and then APPENDP1 when Condition equals to "0" by Col1 and Col2.

So the result will be:

Col1    Col2   Condition     P1    P2     P3    P4 
1990    1      0             10    100    100   120     
1990    2      0             100   60     70    100

I know how to add in data.table but have no idea with this.

DT[, .(lapply(.SD, sum, na.rm=TRUE), by=.(Col1, Col2), .SDcols=c("P2", "P3", "P4")]

It seems that DT[, setdiff(names(DT), c("P2", "P3", "P4")), with = FALSE] is a key but still have no idea.

↧

Error when trying to install TinyTeX using tinytex R package "cannot contact mirror.ctan.org, returning a backbone server!"

February 14, 2020, 1:11 pm

≫ Next: Change margin spacing of AwesomeCheckbox from ShinyWidgets

≪ Previous: R Group by with conditional and sum other columns using data.table

I am trying to use RMarkdown to knit my report to a pdf. I am on my work computer (without administrative privileges and behind a firewall), and chose TinyTeX as a LaTeX distribution because I was hoping it would still work without having to involve the IT department (at my place of work it takes ages to get elevated privileges). I had no problems installing the tinytex R package, but I run into errors when using the package to install TinyTeX. The code with errors is below:

>tinytex::install_tinytex()
trying URL 'http://mirror.ctan.org/systems/texlive/tlnet/install-tl.zip'
trying URL 'http://mirror.ctan.org/systems/texlive/tlnet/install-tl.zip'
Content type 'application/zip' length 22541272 bytes (21.5 MB)
downloaded 21.5 MB

trying URL 'https://yihui.org/gh/tinytex/tools/pkgs-custom.txt'
trying URL 'https://yihui.org/gh/tinytex/tools/pkgs-custom.txt'
Content type 'text/plain; charset=utf-8' length 551 bytes
downloaded 551 bytes

trying URL 'https://yihui.org/gh/tinytex/tools/tinytex.profile'
trying URL 'https://yihui.org/gh/tinytex/tools/tinytex.profile'
Content type 'text/plain; charset=utf-8' length 295 bytes
downloaded 295 bytes

Then at this point I get a dialog box that pops up and states: "Next you may see two error dialog boxes about the missing luatex.dll, and an error message like "Use of uninitialized value in bitwise or (|)..." in the end. These messages can be ignored." When I click 'Okay', I don't see any other dialog boxes but I get the following code:

Starting to install TinyTeX to C:\Users\*****\AppData\Roaming/TinyTeX. It will take a few minutes.
Automated TeX Live installation using profile: ../tinytex.profile
cannot contact mirror.ctan.org, returning a backbone server!
Loading http://www.ctan.org/tex-archive/systems/texlive/tlnet/tlpkg/texlive.tlpdb

C:\Users\*****\AppData\Local\Temp\RtmpgHbZh0\install-tl-20200214\install-tl: TLPDB::from_file could not initialize from: http://www.ctan.org/tex-archive/systems/texlive/tlnet/tlpkg/texlive.tlpdb
C:\Users\*****\AppData\Local\Temp\RtmpgHbZh0\install-tl-20200214\install-tl: Maybe the repository setting should be changed.
C:\Users\*****\AppData\Local\Temp\RtmpgHbZh0\install-tl-20200214\install-tl: More info: https://tug.org/texlive/acquire.html
TinyTeX installed to C:\Users\*****\AppData\Roaming/TinyTeX
Please quit and reopen your R session and IDE (if you are using one, such as RStudio or Emacs) and check if tinytex:::is_tinytex() is TRUE.
Warning messages:
1: In download.file(url, output, ..., method = method) :
  URL 'http://mirror.ctan.org/systems/texlive/tlnet/install-tl.zip': status was 'Couldn't connect to server'
2: In download.file(url, output, ..., method = method) :
  URL 'https://yihui.org/gh/tinytex/tools/pkgs-custom.txt': status was 'Couldn't connect to server'
3: In download.file(url, output, ..., method = method) :
  URL 'https://yihui.org/gh/tinytex/tools/tinytex.profile': status was 'Couldn't connect to server'
4: In file.remove("TinyTeX/install-tl.log") :
  cannot remove file 'TinyTeX/install-tl.log', reason 'No such file or directory'

After this, I try tinytex:::is_tinytex() and I get FALSE.

I am using R version 3.6.2, RStudio version 1.2.1335 and I'm on a Windows 10 x64.

I'm not familiar enough with R, RMarkdown or TinyTeX to understand what is going wrong and how to fix it. Perhaps it is because of no admin rights or the firewall...Any help is appreciated!

↧

Change margin spacing of AwesomeCheckbox from ShinyWidgets

February 14, 2020, 1:12 pm

≫ Next: Propensity Score Matching using Match command to estimate ATT in a binary outcome variable

≪ Previous: Error when trying to install TinyTeX using tinytex R package "cannot contact mirror.ctan.org, returning a backbone server!"

I need to change the top and bottom spacing of an awesomeCheckbox() input item in a Shiny project, because I have many of them laid vertically and they have too much whitespace in between.

Using inspect element on my browser I can see that I need to change the element of class "checkboxbs checkbox-bs checkbox-bs-primary" from style="margin-top: 10px; margin-bottom: 10px;" to style="margin-top: 0px; margin-bottom: 0px;"

I've tried changing CSS using the following classes but to no avail:

tags$head(tags$style("
  .checkbox {display: inline-block; margin-top: 0px; margin-bottom: 0px; !important}
  .col-sm-4 {display: inline-block; margin-top: 0px; margin-bottom: 0px; !important}
  .checkboxbs {display: inline-block; margin-top: -20px; margin-bottom: 0px; !important}
  .checkbox-bs.input {margin-top: 0px; margin-bottom: 0px; !important}
  .checkbox-bs {margin-top: 0px; margin-bottom: 0px; top:0px; bottom:0px; !important}
  .checkbox-bs-primary {margin-top: 0px; margin-bottom: 0px; top:0; bottom:0px; !important}
  .checkboxbs.checkbox-bs.checkbox-bs-primary {margin-top: 0px; margin-bottom: 0px;}
  .form-group {margin-top: 0px; margin-bottom: 0px;}
  .shiny-input-container {margin-top: 0px; margin-bottom: 0px; !important}
  .form-group.shiny-input-container {margin-top: 0px; margin-bottom: 0px; !important}
  .shiny-bound-input {margin-top: 0px; margin-bottom: 0px; !important}
  ")),

↧

Propensity Score Matching using Match command to estimate ATT in a binary outcome variable

February 14, 2020, 1:12 pm

≫ Next: How to deal with this website in a webscraping format?

≪ Previous: Change margin spacing of AwesomeCheckbox from ShinyWidgets

I want to use the Matching package and the Match command in R to use propensity score matching to estimate the ATT (Average Treatment Effect on the Treated) for a binary outcome variable or a count outcome variable (poisson). It appears that the Match command only allows for a continuous outcome variable. My code for the continuous variable is:

glm1 <- glm(Tr~age + educ + black + etc.) to estimate the propensity scores in a logit or probit model.

m1 <- Match(Y=Y, Tr=Tr, X=glm1$fitted, estimand="ATT", M=1, ties=TRUE, replace=TRUE) to estimate the ATT of the Treatment exposure on the Outcome Variable Y.

How do I estimate this for a binary or count outcome variable in R?

My analysis is further complicated by the fact that I want to estimate the difference in differences, not just the post outcome in the Treated minus the post outcome in the Control group. So, I want to estimate (Outcome (post - pre) in Treated) minus (Outcome (post - pre) in Control). When the outcome variable is continuous, I believe I can just subtract the mean outcomes for Treated and Control and use that as my new outcome variable with the Match procedure (Y <- cbind(YDIFF)). However, if I have a binary outcome variable or count outcome variable, how do I incorporate this to obtain the difference-in-differences estimate in my propensity score-matched sample?

↧