Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all 207051 articles
Browse latest View live

Using mutate_at with prefix column names to compare values

$
0
0

I have a dataframe with old & new values. I need to update new values if something changed. I think I am really close, but I can't find the missing piece using tidyverse. With base R - using a for loop - it works, but I don't want to create new objects or overwrite the existing one.

data <- tribble(~id, ~firstname, ~lastname, ~old_firstname, ~old_lastname,
    1, NA, NA, "Peter", "Busch",
    2, NA, "Trochen-Pflaume", "Hans", "Trocken")


data%>%
mutate_at(vars(firstname, lastname), ~case_when(
is.na(.) & !is.na(str_c("old_",.)) ~ str_c("old_", .)),
!is.na(.) & . != str_c("old_",.) ~ .)

Basically, the only thing to check is whether the new value is empty, then the old value should be taken. As a result, more complex case_when queries are planned. But I fail to manipulate the column name within the mutate_at function.

What I want, but it depends on the case_when:

  tribble(~id, ~firstname, ~lastname, ~old_firstname, ~old_lastname,
            1, "Peter", "Busch", "Peter", "Busch",
            2,"Hans", "Trochen-Pflaume", "Hans", "Trocken")

Thx for the help!


Add a shiny widget inside shiny dashboard header section

$
0
0

Is there a way to add a selectinput() inside shinydashboardPlus() header? I do not want the dropdown menu ability that is offered by shinydashboardPlus(). I want something like:

enter image description here

#app.r
library(shiny)
library(shinyWidgets)
library(shinydashboard)
library(shinydashboardPlus)

shinyApp(
  ui = dashboardPagePlus(
    header = dashboardHeaderPlus(
#selectInput("variable", "Variable:",
 #           c("Cylinders" = "cyl",
  #            "Transmission" = "am",
   #           "Gears" = "gear"))
    ),
    sidebar = dashboardSidebar(),
    body = dashboardBody(
    ),
    rightsidebar = rightSidebar(),
    title = "DashboardPage"
  ),
  server = function(input, output) { }
)

How to import files in R package rehh for haplotype homozygosity analysis

$
0
0

I would like to try the R package rehh to perform haplotype homozygosity analyses on population genetic data.

I have faily simple data files I would like to input into rehh but it's not working for me. My genotypes are already coded in the format 0=missing/ 1=ref/ 2=nonref. Simplified datasets similar to mine are coded below. Let's say I wanted to do cross population EHH between the two populations (in this simple example data, I have 2 populations, 5 samples each, with 10 SNPs on 3 chromosomes):

### Generating example data
## Creating example SNP map
V1 <- c("SNP_1", "SNP_2", "SNP_3", "SNP_4", "SNP_5", "SNP_6", "SNP_7", "SNP_8", "SNP_9", "SNP_10")
V2 <- c(1,1,1,1,1,2,2,2,3,3)
V3 <- c(15,28,30,40,47,9,17,22,4,11)
V4 <- c("T", "A", "T", "G", "G", "G", "A", "T", "G", "A")
V5 <- c("C", "T", "C", "A", "A", "A", "T", "A", "A", "C")
example_SNPmap <- data.frame(V1, V2, V3, V4, V5)

## Creating example genotypes
# Population 1
Sample_1a <- c(2,1,1,2,1,1,1,1,2,2)
Sample_2a <- c(1,1,2,1,1,2,2,1,1,2)
Sample_3a <- c(1,2,1,1,2,2,1,1,1,1)
Sample_4a <- c(2,2,1,1,1,2,2,2,1,2)
Sample_5a <- c(2,1,1,2,1,2,1,1,2,2)
example_geno_pop1 <- data.frame(Sample_1a, Sample_2a, Sample_3a, Sample_4a, Sample_5a)

# Population 2
Sample_1b <- c(2,2,1,1,2,2,1,1,2,2)
Sample_2b <- c(1,2,1,1,2,2,1,1,1,2)
Sample_3b <- c(1,1,1,1,2,2,1,1,1,2)
Sample_4b <- c(2,2,1,1,2,2,1,2,1,2)
Sample_5b <- c(1,2,1,1,2,2,1,1,2,2)
example_geno_pop2 <- data.frame(Sample_1b, Sample_2b, Sample_3b, Sample_4b, Sample_5b, header=TRUE)

example_SNPmap
example_geno_pop1
example_geno_pop2

But then when I run data2haplohh to import the files I get an error:

hap <- data2haplohh(hap_file = example_geno_pop1,
                    map_file = example_SNPmap,
                    haplotype.in.columns = TRUE,
                    recode.allele = FALSE,
                    chr.name = 1)

Error:

Error in read.table(map_file, row.names = 1, colClasses = "character") : 
  'file' must be a character string or connection

Sorry if I'm missing something simple/ obvious. Any help much appreciated, thanks.

Can I use Plotly's Custom Buttons to update `shared_yaxes` in a faceted plot?

$
0
0

I am using plotly and I have a figure with subplots. I'd like to include in it a button to toggle the property shared_yaxes. Is this possible?

Here a reproducible example (in python). Consider the official simple subplot example:

from plotly.subplots import make_subplots
import plotly.graph_objects as go
fig = make_subplots(rows=1, cols=2)
fig.add_trace(go.Scatter(x=[1, 2, 3], y=[4, 5, 6]), row=1, col=1)
fig.add_trace(go.Scatter(x=[20, 30, 40], y=[50, 60, 70]), row=1, col=2)

This produces: enter image description here

Now, you can simply use the shared_yaxes argument of make_subplots to force the same y-scale on both plots.

from plotly.subplots import make_subplots
import plotly.graph_objects as go
fig = make_subplots(rows=1, cols=2, shared_yaxes=True)
fig.add_trace(go.Scatter(x=[1, 2, 3], y=[4, 5, 6]), row=1, col=1)
fig.add_trace(go.Scatter(x=[20, 30, 40], y=[50, 60, 70]), row=1, col=2)

And then you get:

enter image description here

Now I want to include a custom control (ideally a checkbox) to toggle that property (I've done this using R and Shiny, but now I want this plotly-based solution).

I've tried with custom buttons, for example using this code below, but I cannot make it work.

fig.update_layout(updatemenus=[
    go.layout.Updatemenu(type="buttons",
                         direction="left",
                         buttons=list([
                             dict(args=[{
                                 "shared_yaxes": True
                             }],
                                  label="Shared axes",
                                  method="relayout"),
                             dict(args=[{
                                 "shared_yaxes": False
                             }],
                                  label="Independent axes",
                                  method="relayout")
                         ]),
                         xanchor="left",
                         yanchor="top"),
])

enter image description here

Any ideas how to make it work would be much appreciated.

Plumber API to accept images for API scoring

$
0
0

I am trying to write a plumber function to accept images and score on images and return the scored result. So there are two parts I am trying to figure out.

Part 1: writing plumber function and this is all I came up with.

#* @post /imageAnalysistest
scoreFunction =  function(image){
  ******* need help figuring this part ***********
  image_read(image) %>%
    image_write(paste0(path,"temp.jpg"))
 **************************************************


  # preprocess image
  x = image_load(paste0(path,"temp.jpg"),target_size =  c(img_width, img_height)) %>% 
    array_reshape(c(-1,img_width, img_height, channels))


  # do predictions
  probs = model %>% predict(x, steps = as.integer(test_samples/batch_size), verbose = 1)

  return(probs)
}

I tried passing as below and it prints a bunch of characters. I guess its converting image to character?

#* @post /imageAnalysistest
scoreFunction =  function(req){
print(req$postBody)
}

Part 2: curl the data to send a file to score on the image.

curl \
  -F "image=@/home/550.jpg" \
    http://192.168.1.1:8085/imageAnalysistest

TLDR: I am trying to write a plumber function to accept images to score on a image recognition model. I am OK sending a image as raw, so I can convert it to jpg for scoring

change the python code to R code (especially, numpy and seaborn ) [closed]

$
0
0

Hopefully, I would like to change the python code below to 'R code'.

==============================================

import numpy as np

xd = np.float32(np.random.rand(2,100))
yd=  np.dot([ 0.1, 0.2], xd) + 0.3

w = tf.Variable(tf.random_uniform([1,2],-1.0,1.0))
b = tf.Variable(2.5)
y = tf.matmul(w,xd) + b

loss =  tf.reduce_mean(tf.square(y-yd))
train = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for k in range(0,201): 
    sess.run(train)
    if k % 20 == 0 :
        print (k, sess.run(w), sess.run(b))

==============================================

DNN for iris data

import seaborn as sns

import pandas as pd

import numpy as np 
import matplotlib.pyplot as plt

iris = sns.load_dataset("iris")
iris.info()
SP = iris['species'].unique()
SP
iris.head()
sns.pairplot(iris, hue="species", palette="husl") 
sns.set()
sns.palplot(sns.color_palette())

==============================================

plt.style.available

plt.style.use('ggplot')

sns.set(style="ticks", color_codes=True)

==============================================

Renaming different columns in a list of tibbles using purrr

$
0
0

I'm trying to rename several columns in a list of tibbles based on regex.

Let's take a look on the following example:

library(tidyverse)

df1 <- tibble(
  id_site = 1,
  country = rep(paste0("country", 1:2), each = 3, len = 5),
  species = rep(paste0("sp.", c(1, 3)), each = 3, len = 5),
  min = c(100, 900, 2200, 400, 1300)
)
df2 <- tibble(
  id_ref = 2,
  country = "country3",
  species = rep(paste0("sp.", 2:6), each = 1, len = 4),
  min_alt = c(2700, 400, 600, 1800)
)

I would like to rename id_site to id_ref in df1 and min_alt to min in df2.

I manage to rename one column at a time using the following code:

list(df1, df2) %>% 
  set_names("df1", "df2") %>%
  map(
    ~ .x %>% 
      rename_at(
        colnames(.) %>% str_which("alt$"),
        ~ .x %>% str_replace(".+", "min")
      ) %>% 
      rename_at(
        colnames(.) %>% str_which("site$"),
        ~ .x %>% str_replace(".+", "id_ref")
      )
  )

but I find it quite repetitive…

I'd like to know if it's possible to do this in one go within a single rename_x function.

In regex, what's the difference between a lookbehind and \K? [duplicate]

$
0
0

This question already has an answer here:

I just learned about \K in regex, and was wondering if somebody could explain how it's different from a lookbehind.

For example, the following two gsubs are identical, as far as I can tell:

tst<-"This is the day. That is the day"
gsub("This is the \\Kday","week",tst, perl=TRUE)
gsub("(?<=This is the )day","week",tst,perl=TRUE)

Maybe I'm not thinking complex enough, but can anyone give an example where one makes sense but the other doesn't and vice versa?

One thing I could think of is that a lookbehind has to be of a fixed length (right? I get an error when I try to use .* in a lookbehind), but then why use a lookbehind instead of \K?


TOC summary in the header of each slide

$
0
0

There is the list of possible R-Markdown templates for beamer_presentation in R-Markdown.

And most of them include a kind of a "Navigation bar" at the top (or left/right) of each slide, like that:

Example of slider header

While I do understand how to create a TOC in R-Markdown (by providing a toc: true in the header, I cannot figure out how to add this navigation to each slide.

I also understand how to create a floating TOC for the R-Markdown for HTML format (via toc_float: true as it is described here) but still cannot figure out how to make it in a beamer format. Any hint will be appreciated.

How to incorporate the sufix of an output$sufix name into an input$sufix_rows_selected function in R shiny?

$
0
0

I am trying to get the sufix of an output$sufix name in R Shiny and incorporate it into the input$sufix_rows_selected function. The drilldown table is coming empty. Would someone have any idea of what am I doing wrong?

Function that I am trying to build:

f.drilldata <- function(base.summary, base.drilldown,  sufix.output, group_var){ 

group = enquo(group_var)
base.summary = base.summary %>% mutate(var = !!group)
base.drilldown = base.drilldown %>% mutate(var = !!group)

#input = expr(!!glue("input${sufix.output}_rows_selected"))
input = paste0(sufix.output,'_rows_selected')

validate(need(length(input[[input]]) > 0, ''))
selected_rows <- base.summary[as.integer(input[[input]]), ]$var

base.drilldown[base.drilldown$var %in% selected_rows, ]
}
Error Example:
library("dplyr")
library("shiny")
library("DT")

tbl.summary <- group_by(iris, Species) %>% summarise(Count = n())
tbl.drilldown <- iris

ui <- fluidPage(
DTOutput("output.summary.name")
, DTOutput("output.drilldown.name"))

server <- function(input, output){

# display the data that is available to be drilled down
output$output.summary.name <- renderDT(tbl.summary)

# subset the records to the row that was clicked through f.drilldata function
drilldata <- reactive({ f.drilldata(tbl.summary, tbl.drilldown, 'output.summary.name', Species)  })

# display the subsetted data
output$output.drilldown.name <- renderDT(drilldata())}

shinyApp(ui, server)
Example that works but out of the f.drilldata function
library("dplyr")
library("shiny")
library("DT")

tbl.summary <- group_by(iris, Species) %>% summarise(Count = n())
tbl.drilldown <- iris

ui <- fluidPage(
DTOutput("output.summary.name")
, DTOutput("output.drilldown.name"))


server <- function(input, output){

output$output.summary.name <- renderDT(tbl.summary)

drilldata <- reactive({ validate( need(length(input$output.summary.name_rows_selected) > 0, "Select rows to drill down!")) 
selected_species <- 
tbl.summary[as.integer(input$output.summary.name_rows_selected), ]$Species
tbl.drilldown[tbl.drilldown$Species %in% selected_species, ]  })

output$output.drilldown.name <- renderDT(drilldata())}

shinyApp(ui, server)

Force ggsave to vectorize point geoms in .wmf-files

$
0
0

Plots produced with R are not usable for publication if they cannot be exorted properly. I work on a Windows Machine and use MS Word 2016 for all writing purposes. So, I wish to export my plots as .wmf files (.emf would also do, I suppose).

I produce all graphs with ggplot2, so ggsave (device = "wmf") seems a good choice, I suppose. However, I have a major problem with the resulting files: point geoms seem to be printed as raster instead of vector format. Here is an example for producing a simple scatterplot:

library (ggplot2)    

plot_data <- data.frame (a = runif (1:20), 
                         b = seq (1:20))

x11 (width =  3, height = 3)

ggplot (data = plot_data, mapping = aes (x = a, y = b)) +
    geom_point () +
    labs (x = "my x-label", y = "my y-label") +
    theme (panel.background = element_blank(),
           panel.border = element_rect (fill = NA, size = 0.7),
           axis.ticks = element_line (color = "black", lineend = "round"),
           axis.ticks.length = unit (2, "mm"),
           axis.text = element_text (color = "black"),
           plot.margin = unit(rep (0, 4), "cm")
           )

I save the plot with the following code:

ggsave(filename = "my_file.wmf", device = "wmf")

When I open the plot in MS Word or Libre Office, I see that the points are not rendered in good quality, at all. In Libre Office Draw, a point looks like this (zoomed in quite a lot):

enter image description here

In MS Word, the plot looks like this:

enter image description here

with these "points":

enter image description here

The labels and axes, however, are ok. MS Word:

enter image description here

Libre Office Draw:

enter image description here

I suppose that the labels, tick annotations and axes (and even circles around the points) are stored in vector format, whereas the point geoms seem to be stored as rasters. The resulting plots are not useable, I fear. So, I want to find an option to force ggsave () to vectorize point geoms instead of printing raster. I hope very much someone can help - I urgently need a simple way to export plots from R for publication in order to convince my lab to rely more on R.

Web Scraping in R | Unable to extract information under a certain node using rvest

$
0
0

I'm trying to extract a bit of information under the node /html/head/script[16] from a website (here) but am unable to do so.

nykaa <- "https://www.nykaa.com/biotique-bio-kelp-protein-shampoo-for-falling-hair-intensive-hair-growth-treatment-conf/p/357142?categoryId=1292&productId=357142&ptype=product&skuId=39934"

obj <- read_html(nykaa)

extracted_json <- obj %>% 
  html_nodes(xpath = "/html/head/script[16]") %>% 
  html_text(trim = TRUE)

Currently, my output for the above code is null. But I would like to extract the data under the above mentioned node in an organized manner.

How can I empose the ntree parameter into the train() function of caret package?

$
0
0

I am using the following function to do cross-validation with the random forest algorithm on my dataset. However, ntree raises an error, saying that it is not used in the function. Even though I have seen that usage as a recommendation comment before in one of the threads regarding this issue, it did not work at me. Here is my code:

cv_rf_class1 <- train(y_train_u ~ ., x_train_u , 
                      method ="cforest", 
                      trControl = trainControl(method = "cv", 
                                               number = 10, 
                                               verboseIter = TRUE),  
                                               ntree = 100))

If I cannot change the ntree parameter, it uses 500 trees as default in the function and it raises another error for me (subscript out of bounds), so I cannot make it work for my problem. How can I fix this issue in order to make my function work?

R - how to repeat sequence 1 to 522

$
0
0

I have two files ones with 522 project id that I want to extract from the first file.

the following code works for extracting but I don't want to repeat manually 522 times the data$projectID==datid[x, ] |

data1 = data[(data$projectID==datid[1, ] |data$projectID==datid[2, ]|  ...| data$projectID==datid[522, ]), ]

How to repeat the instruction 522 times adding one each time.

Thanks!!

data = read.csv("~/Desktop/PACA SNA/PACA3.csv", header = TRUE, sep =";")

datid= read.csv("~/Desktop/PACA SNA/PACAID.csv", header = FALSE, sep =";")

data1 = data[(data$projectID==datid[1, ] | ...| data$projectID==datid[522, ]), ]

How to iteratively run a query on each column in a SQL table with R?

$
0
0

I have a table with multiple columns (colA, colB, colC) and I want to run a query against each of them and store the result so I can use them for comparison purposes later, for example this query to find the ratio of NULL and not NULL values in a column:

SELECT COUNT(*) - COUNT(column), COUNT(column) FROM table;

I have too many columns to do this manually, so I'm looking for a way for it to cycle through each column and store the result. Using a WHILE loop in t-sql doesn't seem to be suitable to this problem, and trying to use for loop with R doesn't work at all:

tableDataColumnName <- names(tableDataDataframe)

for (i in tableDataColumnName){ nullColumnNumber <- dbGetQuery(con, "SELECT COUNT (*) - COUNT(i), COUNT(i) FROM dbo.table;") }

Is there a way to execute a query multiple times, once for each column in a table, without doing so manually?


create multiple columns with each column as a sequence of numbers in R

$
0
0

Problem: I wanted to add three columns in my data frame with each column being a sequence of numbers. But I want each column to vary with the other column. So here's an example data frame:

data <- read.table(text="
group1  group2  rate
A     D     0.01     
A     D     0.001
A     D     0.0001  
B     D     0.01    
B     D     0.001      
B     D     0.0001
D     A     0.01     
D     A     0.001
D     A     0.0001  
D     B     0.01    
D     B     0.001      
D     B     0.0001",
                   header=TRUE)

So first I extended my data frame to accommodate the combinations of numbers that I want for the 3 columns. I used 125 because I have 5 numbers for each sequence.

dataext <- data[rep(seq_len(nrow(data)), 125), ]

Then, I created my new column using the sequence of number that I want:

dataext$var1 <- rep_len (seq(0,1, 0.25), length.out=125)
dataext$var2 <- rep_len (seq(0,1, 0.25), length.out=125)
dataext$var3 <- rep_len (seq(0,1, 0.25), length.out=125)

An example of my desired output is:

group1  group2  rate    var1    var 2   var3
    A     D     0.01     0      0       0           
    A     D     0.001    0      0       0               
    A     D     0.0001   0      0       0
    A     D     0.01     0.25   0       0           
    A     D     0.001    0.25   0       0               
    A     D     0.0001   0.25   0       0
    A     D     0.01     0.25   0.25    0           
    A     D     0.001    0.25   0.25    0               
    A     D     0.0001   0.25   0.25    0
    A     D     0.01     0.25   0.25    0.25            
    A     D     0.001    0.25   0.25    0.25                
    A     D     0.0001   0.25   0.25    0.25

I hope this is clear enough. Any leads on how to do it right are greatly appreciated. Thanks!

Output of linear regression model [closed]

$
0
0

I did linear regression model and use summary of the model to display estimates and p.values. I want a global pvalue for categorical variable with more than 2 two values. I know drop1() of the model could display a global p.value for such a variable but it is without estimates and standard error(it display instead AIC, and other parameters). I am searching a way to could dispaly a global p.value for such a categorical variable in the model and also have the estimates If is use this code

summary(mod)

Here is the output

Call: lm(formula = retplasma ~ age + vitamine + alcool + bmi + cholesterol + retdiet + sexe1 + tabac1, data = datasego)

Residuals: Min 1Q Median 3Q Max -460.96 -125.16 -31.74 107.88 995.35

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 543.390976 98.361237 5.524 7.94e-08 *** age 2.689612 0.891518 3.017 0.0028 ** vitamine -11.883758 14.558773 -0.816 0.4151
alcool 7.022657 2.828017 2.483 0.0136 *
bmi 1.118128 2.031971 0.550 0.5826
cholesterol -0.187614 0.111741 -1.679 0.0943 .
retdiet -0.005009 0.022561 -0.222 0.8245
sexe1Femme -76.913471 41.042005 -1.874 0.0620 .

tabac1Non 45.757734 25.265455 1.811 0.0713 .

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 198.4 on 263 degrees of freedom (43 observations deleted due to missingness) Multiple R-squared: 0.1253, Adjusted R-squared: 0.09872 F-statistic: 4.71 on 8 and 263 DF, p-value: 2.067e-05

How to establish case/control status based on first and last observations in R

$
0
0

I have a longitudinal dataset in R with mood scores over a period which starts on Day 1 for all IDs. The end day ranges for every ID (between 26 and 35 days). Based on the mood scores, I would like to assign every participant (ID) as a case if the average mood score on Days -3 to -1 is 30% higher than the average mood score on Days +3 to +5. If not, then the person is a control.

The data looks like this:

ID.    Day.    Mood_Score   
1.     1       5
1.     2.      3 
1.     3.      3
1.     4.      4 
1.     5.      5
...
1.     26      14      
1.     27.     10
1.     28.     18
2.     1       3
2.     2.      3
2.     3.      5
2.     4.      4
2.     5.      3 
...
2.     29.     9       
2.     30.     8
2.     31.     7

Based on the above, ID#1 should be a case, and ID#2 should be a control.

Better solution to check elements of one character vector with another character vector using the tidyverse?

$
0
0

Hello!
My goal is to compare two character vectors - the main being synonyms and another mixnames. The string elements in mixnames do not match exactly to what is in synonyms therefore some string comparison is required. My objective is to extract the elements in synonyms that have a something that looks like what is in mixnames. I tried to do this using only the tidyverse but failed. I found a solution that works using base. I know there is a better way, but I can't figure it out....

library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 3.6.1
#> Warning: package 'tidyr' was built under R version 3.6.1
#> Warning: package 'dplyr' was built under R version 3.6.1

#Acetometaphin 

synonyms <- c("Pediatrix","Percocet-5","Percocet-Demi","Perdolan Mono","Perfalgan", 
              "Phenaphen","Phenaphen W/Codeine","Phenipirin","Phogoglandin","Pinex", 
              "Piramin","Pirinasol","Plicet","Polmofen","Predimol","Predualito",
              "Prodol","Prontina","Puernol","Pulmofen", "Pyregesic-C")

mixNames <- c("Liquiprin","Midol Maximum Strength","Midol PM Night Time Formula",
              "Midol Regular Strength" ,"Midol Teen Formula","Naldegesic",
              "Ornex Severe Cold Formula","Percocet","Percogesic with Codeine",
              "Propacet" )

failed attempt:

#####STUFF THAT DIDNT WORK!!!!

# cross2(
#   .x = synonyms, .y = mixNames  #lists - each list has 2 lists - each of those is an atomic vector of 1
# ) %>% 
#   map_dfc(lift(str_detect)) #lift - modifies function to take a list of arguments - works for nested lists 

#this returns a df just like the apply 

# mix_syn_lgl_df <- map_dfc(
#   mixNames,
#   ~ map_lgl(synonyms, str_detect, pattern = .x)
# )

# colnames(mix_syn_lgl_df) <- mixNames
# 
# mix_syn_lgl_df$synonyms <- synonyms

This actually worked:


#remove mixture names from synonyms

mix_syn_lgl_mat <- sapply(mixNames, function(x){
  str_detect(string = synonyms, pattern = x)
}) #returns a matrix 21x10 of logicals while preserving colnames

rownames(mix_syn_lgl_mat) <- synonyms #add synoyms as rownames
#create a new object with a new col of sum of TRUES in row
mix_syn_lgl_mat2 <- cbind(mix_syn_lgl_mat, rowSums(mix_syn_lgl_mat)) 
#take the numerical matrix mix_syn_lgl_mat2 and return the row names where the last col (rowsums) > 0
badNames <- row.names(mix_syn_lgl_mat2[mix_syn_lgl_mat2[, ncol(mix_syn_lgl_mat2)] > 0, ])
#filter out those names from the synonyms vector
pureSyn <- synonyms[!(synonyms %in% badNames)]

Created on 2019-10-29 by the reprex package (v0.3.0)

How can I make R maintain utf8 encodings?

$
0
0

I have utf8 encoded text that R seems to be representing as ascii. Here is the simplest case in the R console. Is there a way to force R to encode the characters in utf8?

Running in R console

Viewing all 207051 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>