I'm using R I have a csv file from single cell data like this, where the column 'cluster' is repeated for all the unique 'gene' column.
dput(markers)
p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
APOC1 0 1.696639642 0.939 0.394 0 0 APOC1
APOE 0 1.487160872 0.958 0.475 0 0 APOE
GPNMB 9.30E-269 1.31714457 0.745 0.301 2.49E-264 0 GPNMB
FTL 2.24E-230 0.766844152 1 0.977 6.00E-226 0 FTL
PSAP 2.27E-225 0.98726538 0.925 0.685 6.07E-221 0 PSAP
CTSB 4.84E-211 0.925031015 0.902 0.606 1.29E-206 0 CTSB
CTSS 1.37E-197 0.898457063 0.869 0.609 3.67E-193 0 CTSS
CSTB 8.05E-191 0.853658991 0.918 0.732 2.15E-186 0 CSTB
CTSD 1.23E-187 1.08931251 0.787 0.443 3.30E-183 0 CTSD
IGKC 0 1.560337702 0.998 0.237 0 1 IGKC
IGLC2 0 1.546344857 0.997 0.152 0 1 IGLC2
IGLC3 0 1.342649567 0.967 0.073 0 1 IGLC3
C11orf96 0 1.245172517 0.99 0.253 0 1 C11orf96
COL3A1 0 1.212528128 1 0.343 0 1 COL3A1
LUM 0 1.202452925 0.971 0.143 0 1 LUM
IGHG4 0 0.977399051 0.876 0.092 0 1 IGHG4
HSPG2 0 0.957478533 0.883 0.148 0 1 HSPG2
NNMT 0 0.952577589 0.945 0.213 0 1 NNMT
IGHG1 0 0.913733424 0.861 0.07 0 1 IGHG1
COL6A31 0 1.847828827 0.907 0.192 0 2 COL6A3
PDGFRA 5.38E-292 0.849349193 0.503 0.052 1.44E-287 2 PDGFRA
COL5A21 2.67E-280 1.400314195 0.649 0.105 7.14E-276 2 COL5A2
CALD1 1.11E-275 1.292924443 0.771 0.155 2.98E-271 2 CALD1
CCDC80 1.73E-271 1.168549626 0.706 0.123 4.64E-267 2 CCDC80
COL1A21 1.66E-268 2.004626869 0.966 0.326 4.45E-264 2 COL1A2
DCN1 1.47E-253 1.540631398 0.886 0.254 3.93E-249 2 DCN
COL3A11 3.88E-253 2.216642854 0.955 0.353 1.04E-248 2 COL3A1
FBN1 6.40E-251 0.949521182 0.525 0.07 1.71E-246 2 FBN1
I want to transform my matrix so that the row name is the unique cluster name and each column has all the genes from that cluster name (picture 2). How should i write the code?
dput(markers)
0 1 2
APOC1 IGKC COL6A3
APOE IGLC2 PDGFRA
GPNMB IGLC3 COL5A2
FTL C11orf96 CALD1
PSAP COL3A1 CCDC80
CTSB LUM COL1A2
CTSS IGHG4 DCN
CSTB HSPG2 COL3A1
CTSD NNMT FBN1
I tried this and the result file has no values.
markers = read.csv("./markers.csv", row.names=1, stringsAsFactors=FALSE)
z1 = matrix("", ncol = length(unique(markers$cluster)))
colnames(z1) = unique(markers$cluster)
for (i in 1:nrow(z1)){
for (j in 1:ncol(z1)){
genes1 = as.character(markers$gene)[markers$cluster == rownames(z1)[i]]
z1[i,0] = paste(genes1, collapse="")
z1 = matrix("", ncol = length(unique(markers$cluster)))
colnames(z1) = unique(markers$cluster)
for (i in 1:nrow(z1)){
for (j in 1:ncol(z1)){
genes1 = as.character(markers$gene)[markers$cluster == rownames(z1)[i]]
z1[i,0] = paste(genes1, collapse="")
}
}
write.csv(z1, "test.csv")