[erlang-questions] Running a regular expression on each line of a file
Dave Challis
dsc@REDACTED
Tue Jan 11 11:28:32 CET 2011
Ah, thanks, io:get_line/1 makes more sense here.
I solved the problem in the end by adding a another clause to the case
statement:
...
Result = re:run(Text, Re, [{capture, all_but_first, list}]),
case Result of
{match, Captured} ->
io:format("~p ~p ~p~n", Captured);
_False ->
false
end,
...
It's still pretty slow to run though (~10 minutes to parse ~1 million
lines). Interestingly enough, I tried removing the regex and just
passing the input out unchanged, and the whole thing still takes ~9m30s
to run, the bottleneck wasn't in the regex as I"d assumed.
I'm guessing that io:get_line is being pretty slow. Is there a
preferred method for faster I/O?
On 10/01/11 17:42, Jesper Louis Andersen wrote:
> On Mon, Jan 10, 2011 at 17:02, Dave Challis<dsc@REDACTED> wrote:
>
>> parse(Re) ->
>> case io:get_chars('', 8192) of
>
> Since your data is line-oriented, try io:get_line/1 here. You are not
> going to get a line of input at a time with your approach I think but
> rather 8K. So how simple are your small inputs?
>
--
Dave Challis
dsc@REDACTED
More information about the erlang-questions
mailing list