Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 205491

How to filter a very large csv in R prior to opening it?

$
0
0

I'm currently trying to open a 48GB csv on my computer. Needless to say that my RAM does no support such a huge file, so I'm trying to filter it before opening. From what I've researched, the most appropriate way to do so in R is using the sqldf lib, more specifically the read.csv.sql function:

df <- read.csv.sql('CIF_FOB_ITIC-en.csv', sql = "SELECT * FROM file WHERE 'Year' IN (2014, 2015, 2016, 2017, 2018)")

However, I got the following message:

Erro: duplicate column name: Measure

As SQL is case insensitive, having two variables, one named Measure and another named MEASURE, implies duplicity in column names. To get around this, I tried using the header = FALSE argument and substituted the 'Year' by V9, yielding the following error instead:

Error in connection_import_file(conn@ptr, name, value, sep, eol, skip) : RS_sqlite_import: CIF_FOB_ITIC-en.csv line 2 expected 19 columns of data but found 24

How should I proceed in this case?

Thanks in advance!


Viewing all articles
Browse latest Browse all 205491

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>