I've looked through SO and have not found any advice that accurately explains what I am looking for.
I have a giant table. The first few columns have information about different expressed transcripts and the SNP which influences it. The remainder of the columns (of which there are around a thousand) are either information about an individual's tissue sample (with a column header such as GTEX.11DXX.1426.SM.5GIDU
) or the individual's ID (GTEX.11DXX
). The information under these columns contain either the number of transcripts expressed (e.g. 92
) at that particular sequence and a binary value representing whether the allele that influences the expression of that transcript is Neandertal inherited or not (1 or 0
), respectively.
What I want to do is consolidate the data underneath the binary columns with the data underneath the transcript number columns like so:
GTEX.11DXX.1426.SM.5GIDU
0;25
1;74
1;104
1;92
0;12
...
etc.
I want to accomplish this by partially matching the column name GTEX.11DXX
with GTEX.11DXX.1426.SM.5GIDU
, and then getting rid of binary columns so it's just the long column names.
I've tried using tidyverse
's map(v, ~select_(ovary, ~matches(.)))
, and it kind of works, but that matches even if a one character is off, like so:
[[49]]
GTEX.13X6H.1026.SM.5SIBE GTEX.13X6H GTEX.13X6I GTEX.13X6J GTEX.13X6K
1: 49 0 0 0 1
2: 44 0 0 0 1
3: 3 0 0 0 1
4: 23 0 0 0 1
5: 78 0 0 0 1
---
80285: 84 1 0 0 0
80286: 1 1 0 0 0
80287: 0 1 0 0 0
80288: 152 1 0 0 0
80289: 120 1 0 0 0
Again, I want to to work like this:
GTEX.13X6H.1026.SM.5SIBE
1: 0;49
2: 0;44
3: 0;3
4: 0;23
Thank you