My problem is as follows: I need to analyse data from several different files with a lot of entries (up to 500.000 per column, 10 columns in total). The files are connected through the use of IDs, e.g. ORDER_IDs. However, the IDs can appear multiple times, e.g. when an order contains multiple order lines. It is also possible, that one ID doesn't appear in one of the files, e.g. because a file with sales data does only have information on the orders shipped, but not those that have not been shippet yet.
So I have different files with different lengths and unique IDs identifying positions that can vary in their appearance (It is there or not) over all the data files. What I want now is to filter one file by ID so that it only shows the IDs listed in another file. Also, the additional columns from the first file should be moved over.
Example of what I have: dt1:
ORDER_ID SKU_ID Quantity_Shipped
12345 678910 100
12346 648392 30
64739 648392 20
dt2:
ORDER_ID Country
12345 DE
12346 DE
55430 SE
90632 JPN
76543 ARG
64739 CH
What I want:
ORDER_ID SKU_ID Quantity_Shipped Country
12345 678910 100 DE
12346 648392 30 DE
64739 648392 20 CH
Originally, the data was imported from a csv file. The approach I used so far has worked when merging two files. When trying to add the information from a third file hoewever, I get only NA as answers. What can I do to fix this?
This is the approach I used so far.
df2 <- data.frame(ORDER_ID = sales[["ORDER_ID"]])
df1 <- data.frame(ORDER_ID = OL[["ORDER_ID"]], SKU_ID = OL[["SKU_ID"]],
QTY_SHIPPED = OL[["QTY_SHIPPED"]], EXPECTED_VOLUME =
OL[["EXPECTED_VOLUME"]])
library(data.table)
dt2 <- data.table(df1)
dt1 <- data.table(df2)
dt3 <- dt2[match(dt1$ORDER_ID, dt2$ORDER_ID), ]