[erlang-questions] Speed of CSV parsing: how to read 1M of lines in 1 second

Max Lapshin <>
Fri Mar 23 11:30:32 CET 2012


I need to load large CSV file into memory very fast.

I've tried to use erlang parser, but my results were very bad (in fact
 file:read_line is very slow), so I've tried to make a NIF for it.
Target speed is 1 microsecond per line.


My CSV has very strict format: only numbers, no quoting, \n in the
end. Also I moved parsing of integers and date into NIF.

My results are here: http://github.com/maxlapshin/csv_reader and I get
only 15 microseconds:  4,5 seconds for 300K lines CSV.

Currently I use fgets to read line by line from file. Maybe it is a
bad idea and I should use mmap or implement 1MB buffer for read?



More information about the erlang-questions mailing list