Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Iterating through multiple substrings in a .txt file in base R

$
0
0

I've been tasked with calculating the GC content of a FASTA file using base R (no packages). My problem is that I don't know how to pragmatically iterate through the sequence while storing the sequence name and also the number of Cs and Gs.

Example FASTA file I can read in (as a .txt file):

>T7_promoter
ATTAGACGAG
>T3_promoter
TTTGCGCGAAATTTTTTTTT

*There are no quotes here but the > designates a distinct sequence.

Such that my output will be something conceptually similar to -

T7_promoter: 0.4 (ratio of GC from # of Gs and Cs)
T3_promoter: 0.25

Any and all help is much appreciated. I am currently using readLines() to pass the file through. I tried using unlist(strsplit()) per element that strsplit() naturally produces to try and store each sequence as an element in a list. Then I could iterate through each element to get calculations but my executions have not been successful.


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>