Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

How to merging R based on various column info and only one unique id

$
0
0

I m trying to merge 2 datasets:

dataset 1
id, month, year, postal

dataset 2
id, month, year, postal, Income, name, division

dataset 1
id year month postal  
1 2010   9     j0r1h0
2 2010   8     j0r1h0
....
....
7   2007 6     j3x4p2

dataset 2
id,  year, month, postal, name, division
1   2010 9     j0r1h0 john starting
2   2010 8     j0r1h0 lili retired

I want to keep all my columns and rows in dataset 1 and get the extra columns from dataset 2, like Income and division.

I get wrong result, duplicate field in month and year when I tried:

merge(a,b,by=c(postal,month,year,all.x=TRUE)

This is my expected result:

id year month postal name division
1   2010 9     j0r1h0 john  starting
2   2010 8     j0r1h0 lili  retired
3   2010 7     j1v3c4 verna starting
4   2009 1     j23c5  Greg  medium
5   2007 1     j2j4d3 Greg  medium
6   2008 2     j2p4s3  na   na
7   2007 6     j3x4p2  na   starting

And this is my result:

id year month postal name division
1   2010 9     j0r1h0 john  starting
2   2010 8     j0r1h0 lili  retired
3   2010 8     j0r1h0  na   na
4   2010 7      na     na   na
5   2010 7     j1v3c4 verna starting
6   2009 1     j23c5  Greg  medium
7   2007 1     j2j4d3 Greg  medium
8   2008 2     j2p4s3  na   na
9   2007 6     j3x4p2  na   starting
9   2007 1     j3x4p2  na   starting

my real data set size is over 200000 x 16


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>