I have the attached dataframe.
DATA
structure(list(associated_gene = c(NA, NA, "A4GALT", NA, NA,
"NOT FOUND"), chr_name = c("22", "22", "22", "22", "22", "NOT FOUND"
), chrom_start = c(42693910L, 42693843L, 42693321L, 42693665L,
42693653L, 0L), allele = c("G/A/T", "T/C", "G/C", "C/T", "G/A/T",
"NOT FOUND"), refsnp_id = c("rs778598915", "rs11541159", "rs397514502",
"rs762949801", "rs776304817", "NOT FOUND")), row.names = c("s3a",
"s3b", "s3c", "s3d", "s3e", "s3f"), class = "data.frame")
associated_gene chr_name chrom_start allele refsnp_id s3a <NA> 22 42693910 G/A/T rs778598915 s3b <NA> 22 42693843 T/C rs11541159 s3c A4GALT 22 42693321 G/C rs397514502 s3d <NA> 22 42693665 C/T rs762949801 s3e <NA> 22 42693653 G/A/T rs776304817 s3f NOT FOUND NOT FOUND 0 NOT FOUND NOT FOUND
I would like to split the allele column by the first "/" into two (Ref & Var) and insert them between $chrom_start and $refsnp_id
The ideal output is:
associated_gene chr_name chrom_start Ref Var refsnp_id s3a <NA> 22 42693910 G A/T rs778598915 s3b <NA> 22 42693843 T C rs11541159
I dont know if I can load awk, but in bash I'd do:
cat allele | awk -F"/"'{print $1 "\t" $2}'