[erlang-questions] Speed of CSV parsing: how to read 1M of lines in 1 second

Dmitry Kolesnikov <>
Mon Mar 26 22:25:13 CEST 2012


Oh my bad.... I've completely forget of theses aspects of VM. 
I boosted performance so that Max's original file is parsed with 2us per line vs 3.15us, a full ETL cycle (see my previous mail) takes just 7.8 us per line vs 8.39us. 

and very good hint on Boyer-Moore searching...
- dmitry

On Mar 26, 2012, at 11:05 PM, Tim Watson wrote:

> Max have you seen
> http://blogtrader.net/blog/tim_bray_s_erlang_exercise2. This states
> "0.93 sec on 1 million lines file on my 4-core linux box" which sounds
> pretty impressive and is based on pure Erlang (with some ets thrown
> into the mix by the looks of things). Might be worth looking at
> whether this can potentially out-perform the NIF!
> 
> On 26 March 2012 12:40, Max Lapshin <> wrote:
>> 
>> 
>>> 
>>> And what do these numbers look like? Do they repeat? Are they short?
>> 
>> Right as in example csv. It is trading data.
>> 
>> 
>>> Or are they high-precision and varying wildly in order of magnitude,
>>> and widely distributed statistically?
>> 
>> 
>> They are very close to each other and vary not more than several percents.
>> You think ot is a good place for optimization?
>> 
>> 
>> In fact I have achieved good enough results: less than a second and thank to
>> all community for it.
>> 
>> 
>> _______________________________________________
>> erlang-questions mailing list
>> 
>> http://erlang.org/mailman/listinfo/erlang-questions
>> 
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120326/57763a1b/attachment.html>


More information about the erlang-questions mailing list