I m trying to merge 2 datasets:
dataset 1
id, month, year, postal
dataset 2
id, month, year, postal, Income, name, division
dataset 1
id year month postal
1 2010 9 j0r1h0
2 2010 8 j0r1h0
....
....
7 2007 6 j3x4p2
dataset 2
id, year, month, postal, name, division
1 2010 9 j0r1h0 john starting
2 2010 8 j0r1h0 lili retired
I want to keep all my columns and rows in dataset 1 and get the extra columns from dataset 2, like Income and division.
I get wrong result, duplicate field in month and year when I tried:
merge(a,b,by=c(postal,month,year,all.x=TRUE)
This is my expected result:
id year month postal name division
1 2010 9 j0r1h0 john starting
2 2010 8 j0r1h0 lili retired
3 2010 7 j1v3c4 verna starting
4 2009 1 j23c5 Greg medium
5 2007 1 j2j4d3 Greg medium
6 2008 2 j2p4s3 na na
7 2007 6 j3x4p2 na starting
And this is my result:
id year month postal name division
1 2010 9 j0r1h0 john starting
2 2010 8 j0r1h0 lili retired
3 2010 8 j0r1h0 na na
4 2010 7 na na na
5 2010 7 j1v3c4 verna starting
6 2009 1 j23c5 Greg medium
7 2007 1 j2j4d3 Greg medium
8 2008 2 j2p4s3 na na
9 2007 6 j3x4p2 na starting
9 2007 1 j3x4p2 na starting
my real data set size is over 200000 x 16