Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201945

Data Concurrency within variable

$
0
0

I have data named as my_data. The amount of data is > 100000. Sample output is like below:

    id                      source
    8166923397733625478 happimobiles
    8166923397733625478 Springfit
    7301100145962413274 Duroflex
    6703062895304712434 happimobiles
    6897156268457025524 themrphone
    37564799155342281   Sangeetha Mobiles
    1159098248970201145 Sangeetha Mobiles

Link to entire data:

https://docs.google.com/spreadsheets/d/1HUoRlVVf8EBedj1puXdgtTS6GGeFsXYqjVicUwbc5KE/edit#gid=0 for which I want the code to run.

I want a output where every id should be counted under source and also under which that is repeated Ex ID ending with 5478 falls under both happimobiles and springfit. So happimobiles has id 8166923397733625478 and 6703062895304712434 which makes it 2 and 1 is common with springfit.

Output:

                   happimobiles   Springfit  Duroflex themrphone   Sangeetha    
happimobiles         2                1        0          0            0
Springfit            1                1        0          0            0
Duroflex             0                0        1          0            0  
themrphone           0                0        0          1            0
Sangeetha            0                0        0          0            2

I have also tried:

Pivot <- dcast(my_data,source~source,value.var = "id",function(x) length((x)))

which is giving me only unique records in specific partner correctly but not overlaps.

I also tried:

crossprod(table(my_data))

But this does not give correct answer.

Any other solution which can hep me get this kind of output


Viewing all articles
Browse latest Browse all 201945

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>