Parsing big files
Ulf Wiger
etxuwig@REDACTED
Tue Dec 5 10:03:08 CET 2000
Hi Thomas,
I've attached a file that seems to do the job.
Example:
1> fileio:lines("fileio.erl","fileio.erl.out",
fun(Str) -> ["=== ",Str] end).
ok
> head -5 fileio.erl.out
=== -module(fileio).
=== -author('etxuwig@REDACTED').
===
=== %%-compile(export_all).
=== -export([lines/3]).
On Tue, 5 Dec 2000, Thomas Arts wrote:
>
>I have got a large file which consists of about 2 million lines.
>The aim is to parse this file, change the format a little and
>write it back to disk.
>
>No surprise that file:read_file(FileName) helps the erlang runtime
>system to get out of memory. I need a file:open, and thereafter
>read the file in parts and write the changed parts to disk.
>
>I wonder if someone already wrote a transformation program for
>such large files. I want the scanner to present a scanned
>line at a time, such that I can write a line at a time, but it
>would be nice if I don't have to do the bookkeeping on the
>byte level.
>
>/Thomas
>
--
Ulf Wiger tfn: +46 8 719 81 95
Senior System Architect mob: +46 70 519 81 95
Strategic Product & System Management ATM Multiservice Networks
Data Backbone & Optical Services Division Ericsson Telecom AB
-------------- next part --------------
-module(fileio).
-author('etxuwig@REDACTED').
%%-compile(export_all).
-export([lines/3]).
%%% lines(InFile, OutFile, Fun : fun/1) -> ok | {error, Reason}
%%%
%%% Process InFile one line at a time. Each line is passed to Fun, and
%%% the return value (a possibly deep list of chars) is written to OutFile.
%%% Don't forget the newline.
%%% Example:
%%%
%%% 2> fileio:lines("fileio.erl","fileio.erl.out",
%%% fun(Str) -> ["=== ",Str] end).
%%% ok
%%%
%%% would produce the following output in fileio.erl.out:
%%%
%%% > head -5 fileio.erl.out
%%%=== -module(fileio).
%%%=== -author('etxuwig@REDACTED').
%%%===
%%%=== %%-compile(export_all).
%%%=== -export([lines/3]).
lines(InFile, OutFile, Fun) ->
case file:open(InFile, [read]) of
{ok, In} ->
case file:open(OutFile, [write]) of
{ok, Out} ->
process_files(In, Out, Fun);
{error, Reason} ->
file:close(In),
{error, {Reason, OutFile}}
end;
{error, Reason} ->
{error, {Reason, InFile}}
end.
process_files(In, Out, Fun) ->
Result = (catch process(In, Out, Fun)),
file:close(In),
file:close(Out),
Result.
process(In, Out, Fun) ->
case io:get_line(In, "") of
eof ->
ok;
Line ->
ok = io:put_chars(Out, Fun(Line)),
process(In, Out, Fun)
end.
More information about the erlang-questions
mailing list