[erlang-questions] Running a regular expression on each line of a file

Dave Challis <>
Tue Jan 11 11:28:32 CET 2011


Ah, thanks, io:get_line/1 makes more sense here.

I solved the problem in the end by adding a another clause to the case 
statement:

...
Result = re:run(Text, Re, [{capture, all_but_first, list}]),
case Result of
     {match, Captured} ->
         io:format("~p ~p ~p~n", Captured);
     _False ->
         false
end,
...

It's still pretty slow to run though (~10 minutes to parse ~1 million 
lines).  Interestingly enough, I tried removing the regex and just 
passing the input out unchanged, and the whole thing still takes ~9m30s 
to run, the bottleneck wasn't in the regex as I"d assumed.

I'm guessing that io:get_line is being pretty slow.  Is there a 
preferred method for faster I/O?



On 10/01/11 17:42, Jesper Louis Andersen wrote:
> On Mon, Jan 10, 2011 at 17:02, Dave Challis<>  wrote:
>
>> parse(Re) ->
>>     case io:get_chars('', 8192) of
>
> Since your data is line-oriented, try io:get_line/1 here. You are not
> going to get a line of input at a time with your approach I think but
> rather 8K. So how simple are your small inputs?
>


-- 
Dave Challis



More information about the erlang-questions mailing list