In a dataframe, I would like to create a column out of an existing one. The new column (name="Symbol") should contain only a subset of the expressions found on the existing column (name="Description) based on a matching pattern, which in my case is defined by the prefix GN=
. For those cells missing the matching pattern from the existing column, "Not available" should be returned in the new column. From here:
View(df[,1:3])
Accession Description Sample1
A0FGR9 Extended synaptotagmin-3 OS=Homo sapiens GN=ESYT3 PE=1 SV=1 - [ESYT3_HUMAN] 117.920
A6NHJ4 Zinc finger protein 860 OS=Homo sapiens GN=ZNF860 PE=1 SV=3 - [ZN860_HUMAN] 30.218
A0A0C4DH68 Immunoglobulin kappa variable 2-24 OS=Homo sapiens GN=IGKV2-24 PE=3 SV=1 - [KV224_HUMAN] 524.706
P0DOX7 Immunoglobulin kappa light chain OS=Homo sapiens PE=1 SV=1 - [IGK_HUMAN] 503.110
I would like to get here:
View(df[,1:4])
Accession Description Symbol Sample1
A0FGR9 Extended synaptotagmin-3 OS=Homo sapiens GN=ESYT3 PE=1 SV=1 - [ESYT3_HUMAN] ESYT3 117.920
A6NHJ4 Zinc finger protein 860 OS=Homo sapiens GN=ZNF860 PE=1 SV=3 - [ZN860_HUMAN] ZNF860 30.218
A0A0C4DH68 Immunoglobulin kappa variable 2-24 OS=Homo sapiens GN=IGKV2-24 PE=3 SV=1 - [KV224_HUMAN] IGKV2-24 524.706
P0DOX7 Immunoglobulin kappa light chain OS=Homo sapiens PE=1 SV=1 - [IGK_HUMAN] Not available 503.110
Thank you in advance for the fruitful suggestions.