[erlang-questions] Speed of CSV parsing: how to read 1M of lines in 1 second

james james@REDACTED
Mon Mar 26 00:48:27 CEST 2012


 > mmap is the fastest way to read lines is you don't much care about 
portability.

While I think mmap is useful (and might see it as a way to avoid a split 
being sequential, since you only really need to divide it roughly and 
can probe for EoL), I think its worth qualifying your statement.

mmap is NOT necessarily the fastest way to read lines.

The issue is whether the operating system will perform read-ahead in its 
disk system and how many times you fault and wait for a real disk IO if 
not, so its rather important to know if the file is actually in the 
operating system's VM cache already, or is actually on a disk drive.

As a first cut it would be handy to know how fast the OP's hardware can 
do atof, compared with the number of numbers in the file.




More information about the erlang-questions mailing list