[erlang-questions] Speed of CSV parsing: how to read 1M of lines in 1 second

Sat Mar 24 13:00:21 CET 2012

On 2012-03-24, at 04:07 , Toby Thain wrote:
> On 23/03/12 9:03 AM, Tomasz Maciejewski wrote:
>> W dniu 23 marca 2012 11:30 użytkownik Max Lapshin
>> <max.lapshin@REDACTED>  napisał:
>>> Currently I use fgets to read line by line from file. Maybe it is a
>>> bad idea and I should use mmap or implement 1MB buffer for read?
>> 
>> mmap is the fastest way to read lines is you don't much care about portability.
>> 
> 
> Even Windows offers mmap functionality.

Yep, although the exact options may differ all modern OS can memory-map
files, there really is no reason *not* to use it. Especially when
running on 64b OS, where there is no risk to mmap a file bigger than
VMEM.

But due to Erlang semantics, I believe mmap would have to be supported
by the VM itself to work correctly with an ideal API (so that it can be
integrated well with binary handling e.g. as a special kind of refc
binary) otherwise you have to use a file-type interface à la emmap:
https://github.com/krestenkrab/emmap