I have data named as my_data. The amount of data is > 100000. Sample output is like below
id source
8166923397733625478 happimobiles
8166923397733625478 Springfit
7301100145962413274 Duroflex
6703062895304712434 happimobiles
6897156268457025524 themrphone
37564799155342281 Sangeetha Mobiles
1159098248970201145 Sangeetha Mobiles
I used the code below and also table(my_data).
library("readxl")
my_data <- read_excel("C:\\Users\\ashishpatodia\\Desktop\\R\\Code\\Sample_Data_Overlap.xlsx",sheet = "10000 sample")
setDT(my_data)
(cohorts <- dcast(unique(my_data)[,cohort:=(source),by=id],cohort~ source, fun.aggregate=length, value.var="cohort"))
I want output where every id should be counted under source and also under which that is repeated Ex ID ending with 5478 falls under both happimobiles and springfit. So happimobiles has id 8166923397733625478 and 6703062895304712434 which makes it 2 and 1 is common with springfit.
Output
happimobiles Springfit Duroflex themrphone Sangeetha
happimobiles 2 1 0 0 0
Springfit 1 1 0 0 0
Duroflex 0 0 1 0 0
themrphone 0 0 0 1 0
Sangeetha 0 0 0 0 1
I have also tried
Pivot<-dcast(my_data,source~source,value.var = "id",function(x) length((x)))
which is giving me only unique records in specific partner correctly but not overlaps.
I also tried
crossprod(table(my_data))
But this doesnot give correct answer
Link to entire data
https://docs.google.com/spreadsheets/d/1HUoRlVVf8EBedj1puXdgtTS6GGeFsXYqjVicUwbc5KE/edit#gid=0 for which i want the code to run