[erlang-questions] Speed of CSV parsing: how to read 1M of lines in 1 second

Bengt Kleberg bengt.kleberg@REDACTED
Fri Mar 23 11:42:57 CET 2012


Greetings,

With a working NIF you probably do not need to consider file:read_file/1
and binary:split/2 on <nl>. Otherwise that is an alternative to
read_line/1


bengt

On Fri, 2012-03-23 at 11:30 +0100, Max Lapshin wrote:
> I need to load large CSV file into memory very fast.
> 
> I've tried to use erlang parser, but my results were very bad (in fact
>  file:read_line is very slow), so I've tried to make a NIF for it.
> Target speed is 1 microsecond per line.
> 
> 
> My CSV has very strict format: only numbers, no quoting, \n in the
> end. Also I moved parsing of integers and date into NIF.
> 
> My results are here: http://github.com/maxlapshin/csv_reader and I get
> only 15 microseconds:  4,5 seconds for 300K lines CSV.
> 
> Currently I use fgets to read line by line from file. Maybe it is a
> bad idea and I should use mmap or implement 1MB buffer for read?
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions




More information about the erlang-questions mailing list