[erlang-questions] Large Flat file operations - optimization.

Maruthavanan Subbarayan <>
Mon Jun 11 12:23:29 CEST 2012


Hi,
I am about to process flatfiles, may be CSV or may be line based one.
I came to know about CSV parsing in the below link.
http://blog.vmoroz.com/2011/01/csv-in-erlang.html

But I do not want to keep a big file in Erlang VM memory and also wanted to handle that for lines based ones.
So I changed the parse code as below that it would seperate the lines and give me back where I would read in 1024 or X chunks
parse1([],Lines,CurrentLine)->        {lists:reverse([lists:reverse(L) ||L <- Lines]),lists:reverse(CurrentLine)};parse1([$\r|[]],Lines,CurrentLine) ->        {lists:reverse([lists:reverse(L) ||L <- [CurrentLine|Lines]]),[]};parse1([$\n|[]],Lines,CurrentLine) ->        {lists:reverse([lists:reverse(L) ||L <- [CurrentLine|Lines]]),[]};parse1([C|T],Lines,CurrentLine) when C == $\r; C ==$\n ->        parse1(T,[CurrentLine|Lines],[]);parse1([C|T],Lines,CurrentLine) ->        parse1(T,Lines,[C|CurrentLine]).
but now to convert into CSV record of each line, I am thinking of the below.2> {List,Remaining}=parse1(file:read(IO,1024),[],[]).  %%Sample data {["1,2,3","3,4,5"],"7,8"}3> [string:tokens(N,",")|| N <- List]. %% Sample Data [["1","2","3"],["3","4","5]]
But is using string:tokens performs well for handling huge data? Can I use any other list comprehension or some thing to perform better? Kindly suggest.
Thanks,Marutha

 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120611/0ec562d1/attachment.html>


More information about the erlang-questions mailing list