<html><head></head><body bgcolor="#FFFFFF"><div>Hi,</div><div><br></div><div>The choose technique would not give you an optimal performance and memory footprint. The best option would be binary based parser. </div><div><br>

</div><div>Take a look into my csv parser. It does both sequential and parallel parsing of csv files. The target was 300k lines in less then second</div><a href="https://github.com/fogfish/csv">https://github.com/fogfish/csv</a><div>

<br>Best Regards,<div>Dmitry >-|-|-*></div><div><br></div></div><div><br>On 11.6.2012, at 13.23, Maruthavanan Subbarayan <<a href="mailto:maruthavanan_s@hotmail.com">maruthavanan_s@hotmail.com</a>> wrote:<br><br>

</div><div></div><blockquote type="cite"><div>


<style><!--

.hmmessage P

{

margin:0px;

padding:0px

}

body.hmmessage

{

font-size: 10pt;

font-family:Tahoma

}

--></style>

<div dir="ltr">

Hi,<div><br></div><div>I am about to process flatfiles, may be CSV or may be line based one.</div><div><br></div><div>I came to know about CSV parsing in the below link.</div><div><br></div><div><a href="http://blog.vmoroz.com/2011/01/csv-in-erlang.html">http://blog.vmoroz.com/2011/01/csv-in-erlang.html</a>

</div><div><br></div><div>But I do not want to keep a big file in Erlang VM memory and also wanted to handle that for lines based ones.</div><div><br></div><div>So I changed the parse code as below that it would seperate the lines and give me back where I would read in 1024 or X chunks</div>

<div><br></div><div><div>parse1([],Lines,CurrentLine)-></div><div>        {lists:reverse([lists:reverse(L) ||L <- Lines]),lists:reverse(CurrentLine)};</div><div>parse1([$\r|[]],Lines,CurrentLine) -></div><div>        {lists:reverse([lists:reverse(L) ||L <- [CurrentLine|Lines]]),[]};</div>

<div>parse1([$\n|[]],Lines,CurrentLine) -></div><div>        {lists:reverse([lists:reverse(L) ||L <- [CurrentLine|Lines]]),[]};</div><div>parse1([C|T],Lines,CurrentLine) when C == $\r; C ==$\n -></div><div>        parse1(T,[CurrentLine|Lines],[]);</div>

<div>parse1([C|T],Lines,CurrentLine) -></div><div>        parse1(T,Lines,[C|CurrentLine]).</div></div><div><br></div><div>but now to convert into CSV record of each line, I am thinking of the below.</div><div>2> {List,Remaining}=<span style="font-size:10pt">parse1(file:read(IO,1024),[],[]).  %%Sample data {["1,2,3","3,4,5"],"7,8"}</span></div>

<div>3> [string:tokens(N,",")|| N <- List]. %% Sample Data [["1","2","3"],["3","4","5]]</div><div><br></div><div>But is using string:tokens performs well for handling huge data? Can I use any other list comprehension or some thing to perform better? Kindly suggest.</div>

<div><br></div><div>Thanks,</div><div>Marutha</div><div><br></div><div><br></div>                                           </div>

</div></blockquote><blockquote type="cite"><div><span>_______________________________________________</span><br><span>erlang-questions mailing list</span><br><span><a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a></span><br>

<span><a href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a></span><br></div></blockquote></body></html>