Running a regular expression on each line of a file
Dave Challis
dsc@REDACTED
Mon Jan 10 17:02:19 CET 2011
I've got a file containing lines of the format "<a> <b> <c>", which I'm
trying to pipe to an erlang script, and pull apart with a regular
expression. The script is based on
http://www.erlang.org/faq/how_do_i.html#id53404 .
It works fine for small input files, but is crashing for very large
input files (e.g. one containing 17 million lines of text).
The crash dump file that is generated indicates that something is
running away somewhere:
=memory
total: 125063616
processes: 8918232
processes_used: 8902304
system: 116145384
I'm fairly new to erlang, so may well have structured the code for this
incorrectly. Here's the full module which is causing the problems:
-module(test_parse).
-export([parse/0]).
parse() ->
{ok, Re} = re:compile("<([^>]+)> <([^>]+)> <([^>]+)>"),
parse(Re).
parse(Re) ->
case io:get_chars('', 8192) of
eof ->
init:stop();
Text ->
Result = re:run(Text, Re, [{capture, all_but_first, list}]),
case Result of
{match, Captured} ->
io:format("~p ~p ~p~n", Captured)
end
end,
parse(Re).
The script is then run (on the command line, ubuntu linux) using:
cat bigfile.txt | erl -noshell -s test_parse parse
Any pointers on what I'm doing wrong would be appreciated!
Thanks,
--
Dave Challis
dsc@REDACTED
More information about the erlang-questions
mailing list