[erlang-questions] Speed of CSV parsing: how to read 1M of lines in 1 second

Max Lapshin max.lapshin@REDACTED
Mon Mar 26 10:37:17 CEST 2012


On Mon, Mar 26, 2012 at 12:33 PM, Robert Melton <rmelton@REDACTED> wrote:
>
> Agreed.  Do we have any baseline implementation in pure C or (insert
> fastest language/implementation you are aware of)?  I am working on
> speeding this up (and having a lot of fun!), but I have no idea the
> theory-craft maximum process speed (with proper escaping, etc) on my
> hardware.
>

I really can't understand why should parsing be slower than reading from HDD =)

However, it is slower. Currently I have 950 ms for 300K line CSV with
40 float columns when read on cold system and 820 ms when read from
disk cache.

Copying from kernel cache and byte-by-byte reading all data while
searching '\n' takes 100 ms (it is time of wc -l), so it takes about
700 ms for erlang to parse + create all proper objects.



More information about the erlang-questions mailing list