[erlang-questions] Large Flat file operations - optimization.
dmitry kolesnikov
dmkolesnikov@REDACTED
Mon Jun 11 13:51:18 CEST 2012
Hi,
The choose technique would not give you an optimal performance and memory
footprint. The best option would be binary based parser.
Take a look into my csv parser. It does both sequential and parallel
parsing of csv files. The target was 300k lines in less then second
https://github.com/fogfish/csv
Best Regards,
Dmitry >-|-|-*>
On 11.6.2012, at 13.23, Maruthavanan Subbarayan <maruthavanan_s@REDACTED>
wrote:
Hi,
I am about to process flatfiles, may be CSV or may be line based one.
I came to know about CSV parsing in the below link.
http://blog.vmoroz.com/2011/01/csv-in-erlang.html
But I do not want to keep a big file in Erlang VM memory and also wanted to
handle that for lines based ones.
So I changed the parse code as below that it would seperate the lines and
give me back where I would read in 1024 or X chunks
parse1([],Lines,CurrentLine)->
{lists:reverse([lists:reverse(L) ||L <-
Lines]),lists:reverse(CurrentLine)};
parse1([$\r|[]],Lines,CurrentLine) ->
{lists:reverse([lists:reverse(L) ||L <- [CurrentLine|Lines]]),[]};
parse1([$\n|[]],Lines,CurrentLine) ->
{lists:reverse([lists:reverse(L) ||L <- [CurrentLine|Lines]]),[]};
parse1([C|T],Lines,CurrentLine) when C == $\r; C ==$\n ->
parse1(T,[CurrentLine|Lines],[]);
parse1([C|T],Lines,CurrentLine) ->
parse1(T,Lines,[C|CurrentLine]).
but now to convert into CSV record of each line, I am thinking of the below.
2> {List,Remaining}=parse1(file:read(IO,1024),[],[]). %%Sample data
{["1,2,3","3,4,5"],"7,8"}
3> [string:tokens(N,",")|| N <- List]. %% Sample Data
[["1","2","3"],["3","4","5]]
But is using string:tokens performs well for handling huge data? Can I use
any other list comprehension or some thing to perform better? Kindly
suggest.
Thanks,
Marutha
_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED
http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120611/665edb69/attachment.htm>
More information about the erlang-questions
mailing list