I have data named as my_data
. The amount of data is > 100000. Sample output is like below:
id source
8166923397733625478 happimobiles
8166923397733625478 Springfit
7301100145962413274 Duroflex
6703062895304712434 happimobiles
6897156268457025524 themrphone
37564799155342281 Sangeetha Mobiles
1159098248970201145 Sangeetha Mobiles
Link to entire data:
https://docs.google.com/spreadsheets/d/1HUoRlVVf8EBedj1puXdgtTS6GGeFsXYqjVicUwbc5KE/edit#gid=0 for which I want the code to run.
I want a output where every id should be counted under source and also under which that is repeated
Ex ID ending with 5478 falls under both happimobiles
and springfit
. So happimobiles
has id 8166923397733625478 and 6703062895304712434 which makes it 2 and 1 is common with springfit
.
Output:
happimobiles Springfit Duroflex themrphone Sangeetha
happimobiles 2 1 0 0 0
Springfit 1 1 0 0 0
Duroflex 0 0 1 0 0
themrphone 0 0 0 1 0
Sangeetha 0 0 0 0 2
I have also tried:
Pivot <- dcast(my_data,source~source,value.var = "id",function(x) length((x)))
which is giving me only unique records in specific partner correctly but not overlaps.
I also tried:
crossprod(table(my_data))
But this does not give correct answer.
Any other solution which can hep me get this kind of output