[erlang-questions] Large Flat file operations - optimization.

dmitry kolesnikov <>
Mon Jun 11 13:51:18 CEST 2012


Hi,

The choose technique would not give you an optimal performance and memory
footprint. The best option would be binary based parser.

Take a look into my csv parser. It does both sequential and parallel
parsing of csv files. The target was 300k lines in less then second
https://github.com/fogfish/csv

Best Regards,
Dmitry >-|-|-*>


On 11.6.2012, at 13.23, Maruthavanan Subbarayan <>
wrote:

 Hi,

I am about to process flatfiles, may be CSV or may be line based one.

I came to know about CSV parsing in the below link.

http://blog.vmoroz.com/2011/01/csv-in-erlang.html

But I do not want to keep a big file in Erlang VM memory and also wanted to
handle that for lines based ones.

So I changed the parse code as below that it would seperate the lines and
give me back where I would read in 1024 or X chunks

parse1([],Lines,CurrentLine)->
        {lists:reverse([lists:reverse(L) ||L <-
Lines]),lists:reverse(CurrentLine)};
parse1([$\r|[]],Lines,CurrentLine) ->
        {lists:reverse([lists:reverse(L) ||L <- [CurrentLine|Lines]]),[]};
parse1([$\n|[]],Lines,CurrentLine) ->
        {lists:reverse([lists:reverse(L) ||L <- [CurrentLine|Lines]]),[]};
parse1([C|T],Lines,CurrentLine) when C == $\r; C ==$\n ->
        parse1(T,[CurrentLine|Lines],[]);
parse1([C|T],Lines,CurrentLine) ->
        parse1(T,Lines,[C|CurrentLine]).

but now to convert into CSV record of each line, I am thinking of the below.
2> {List,Remaining}=parse1(file:read(IO,1024),[],[]).  %%Sample data
{["1,2,3","3,4,5"],"7,8"}
3> [string:tokens(N,",")|| N <- List]. %% Sample Data
[["1","2","3"],["3","4","5]]

But is using string:tokens performs well for handling huge data? Can I use
any other list comprehension or some thing to perform better? Kindly
suggest.

Thanks,
Marutha


 _______________________________________________
erlang-questions mailing list

http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120611/665edb69/attachment.html>


More information about the erlang-questions mailing list