Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Efficient version for searching for two strings in two columns in R

$
0
0

I have a (large) data frame which has a structure relatively similar to this:

id1 id2 symbol1 symbol2 scoreA scoreB scoreC
4790 1120 ABC LLL 1 0 1
2300 4790 NNN ABC 0 0 1
1120 4790 LLL ABC 0 1 1
1120 3120 LLL CCC 0 0 0

I am trying to filter the data frame so that I can every row in which symbol1 and symbol2 match two different strings, this is also being done repeatedly and dynamically so I am searching for the strings as variables.

So in the above example, if I were looking for every instance where the two symbols are ABC and LLL, I would output a result like:

id1 id2 symbol1 symbol2 scoreA scoreB scoreC
4790 1120 ABC LLL 1 0 1
1120 4790 LLL ABC 0 1 1

So my issue is that I want to try and search for every row where one of the columns is equal to either of the values AND the other column is equal to the other of the two.

My solution is to do something like the following:

c1_step1 = scores_file[scores_file$symbol1 == in_gene,]
c2_step1 = scores_file[scores_file$symbol1 == end_gene,]

c1_step2 = c1_step1[c1_step1$symbol2 == end_gene,]
c2_step2 = c2_step1[c2_step1$symbol2 == in_gene,]

out_file = rbind(c1_step2, c2_step2)

However, this just feels fairly bulky and inelegant and I am wondering if there is potentially a nicer (and also more easily readable) method of doing something like this? Maybe something using dplyr that I'm not aware of?


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>