I'm currently trying to open a 48GB csv on my computer. Needless to say that my RAM does no support such a huge file, so I'm trying to filter it before opening. From what I've researched, the most appropriate way to do so in R is using the sqldf
lib, more specifically the read.csv.sql
function:
df <- read.csv.sql('CIF_FOB_ITIC-en.csv', sql = "SELECT * FROM file WHERE 'Year' IN (2014, 2015, 2016, 2017, 2018)")
However, I got the following message:
Erro: duplicate column name: Measure
As SQL is case insensitive, having two variables, one named Measure and another named MEASURE, implies duplicity in column names. To get around this, I tried using the header = FALSE
argument and substituted the 'Year'
by V9
, yielding the following error instead:
Error in connection_import_file(conn@ptr, name, value, sep, eol, skip) : RS_sqlite_import: CIF_FOB_ITIC-en.csv line 2 expected 19 columns of data but found 24
How should I proceed in this case?
Thanks in advance!