Running a regular expression on each line of a file

Dave Challis dsc@REDACTED
Mon Jan 10 17:02:19 CET 2011


I've got a file containing lines of the format "<a> <b> <c>", which I'm 
trying to pipe to an erlang script, and pull apart with a regular 
expression.  The script is based on 
http://www.erlang.org/faq/how_do_i.html#id53404 .

It works fine for small input files, but is crashing for very large 
input files (e.g. one containing 17 million lines of text).

The crash dump file that is generated indicates that something is 
running away somewhere:

=memory
total: 125063616
processes: 8918232
processes_used: 8902304
system: 116145384

I'm fairly new to erlang, so may well have structured the code for this 
incorrectly.  Here's the full module which is causing the problems:

-module(test_parse).
-export([parse/0]).

parse() ->
     {ok, Re} = re:compile("<([^>]+)> <([^>]+)> <([^>]+)>"),
     parse(Re).

parse(Re) ->
     case io:get_chars('', 8192) of
         eof ->
             init:stop();
         Text ->
             Result = re:run(Text, Re, [{capture, all_but_first, list}]),
             case Result of
                 {match, Captured} ->
                     io:format("~p ~p ~p~n", Captured)
             end
     end,
     parse(Re).


The script is then run (on the command line, ubuntu linux) using:
cat bigfile.txt | erl -noshell -s test_parse parse


Any pointers on what I'm doing wrong would be appreciated!

Thanks,

-- 
Dave Challis
dsc@REDACTED


More information about the erlang-questions mailing list