[erlang-questions] Speed of CSV parsing: how to read 1M of lines in 1 second

Tim Watson watson.timothy@REDACTED
Fri Mar 23 13:55:51 CET 2012


On 23 Mar 2012, at 12:54, Gordon Guthrie wrote:

> Max
> 
> There is a csv parser for RFC 4180 compliant csv files which is now
> being maintained by Eric Merritt:
> https://github.com/afiniate/erfc_parsers/tree/master/src
> 

This appears to read the whole file into memory, so probably not very space efficient for dealing with large files.

> Gordon
> 
> On 23 March 2012 10:30, Max Lapshin <max.lapshin@REDACTED> wrote:
>> I need to load large CSV file into memory very fast.
>> 
>> I've tried to use erlang parser, but my results were very bad (in fact
>>  file:read_line is very slow), so I've tried to make a NIF for it.
>> Target speed is 1 microsecond per line.
>> 
>> 
>> My CSV has very strict format: only numbers, no quoting, \n in the
>> end. Also I moved parsing of integers and date into NIF.
>> 
>> My results are here: http://github.com/maxlapshin/csv_reader and I get
>> only 15 microseconds:  4,5 seconds for 300K lines CSV.
>> 
>> Currently I use fgets to read line by line from file. Maybe it is a
>> bad idea and I should use mmap or implement 1MB buffer for read?
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
> 
> 
> 
> -- 
> Gordon Guthrie
> CEO hypernumbers
> 
> http://hypernumbers.com
> t: hypernumbers
> +44 7776 251669
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions




More information about the erlang-questions mailing list