[erlang-questions] Speed of CSV parsing: how to read 1M of lines in 1 second
Max Lapshin
max.lapshin@REDACTED
Mon Mar 26 10:37:17 CEST 2012
On Mon, Mar 26, 2012 at 12:33 PM, Robert Melton <rmelton@REDACTED> wrote:
>
> Agreed. Do we have any baseline implementation in pure C or (insert
> fastest language/implementation you are aware of)? I am working on
> speeding this up (and having a lot of fun!), but I have no idea the
> theory-craft maximum process speed (with proper escaping, etc) on my
> hardware.
>
I really can't understand why should parsing be slower than reading from HDD =)
However, it is slower. Currently I have 950 ms for 300K line CSV with
40 float columns when read on cold system and 820 ms when read from
disk cache.
Copying from kernel cache and byte-by-byte reading all data while
searching '\n' takes 100 ms (it is time of wc -l), so it takes about
700 ms for erlang to parse + create all proper objects.
More information about the erlang-questions
mailing list