Parsing big files

Ulf Wiger <>
Tue Dec 5 10:03:08 CET 2000


Hi Thomas,

I've attached a file that seems to do the job.

Example:

1> fileio:lines("fileio.erl","fileio.erl.out",
                 fun(Str) -> ["=== ",Str] end).
ok


> head -5 fileio.erl.out
=== -module(fileio).
=== -author('').
=== 
=== %%-compile(export_all).
=== -export([lines/3]).


On Tue, 5 Dec 2000, Thomas Arts wrote:

>
>I have got a large file which consists of about 2 million lines.
>The aim is to parse this file, change the format a little and
>write it back to disk.
>
>No surprise that file:read_file(FileName) helps the erlang runtime
>system to get out of memory. I need a file:open, and thereafter
>read the file in parts and write the changed parts to disk.
>
>I wonder if someone already wrote a transformation program for
>such large files. I want the scanner to present a scanned
>line at a time, such that I can write a line at a time, but it
>would be nice if I don't have to do the bookkeeping on the
>byte level.
>
>/Thomas
>

-- 
Ulf Wiger                                    tfn: +46  8 719 81 95
Senior System Architect                      mob: +46 70 519 81 95
Strategic Product & System Management    ATM Multiservice Networks
Data Backbone & Optical Services Division      Ericsson Telecom AB
-------------- next part --------------
-module(fileio).
-author('').

%%-compile(export_all).
-export([lines/3]).

%%% lines(InFile, OutFile, Fun : fun/1) -> ok | {error, Reason}
%%%
%%% Process InFile one line at a time. Each line is passed to Fun, and
%%% the return value (a possibly deep list of chars) is written to OutFile.
%%% Don't forget the newline.
%%% Example:
%%%
%%% 2> fileio:lines("fileio.erl","fileio.erl.out",
%%%                 fun(Str) -> ["=== ",Str] end).
%%% ok
%%%
%%% would produce the following output in fileio.erl.out:
%%%
%%% > head -5 fileio.erl.out
%%%=== -module(fileio).
%%%=== -author('').
%%%=== 
%%%=== %%-compile(export_all).
%%%=== -export([lines/3]).


lines(InFile, OutFile, Fun) ->
    case file:open(InFile, [read]) of
	{ok, In} ->
	    case file:open(OutFile, [write]) of
		{ok, Out} ->
		    process_files(In, Out, Fun);
		{error, Reason} ->
		    file:close(In),
		    {error, {Reason, OutFile}}
	    end;
	{error, Reason} ->
	    {error, {Reason, InFile}}
    end.


process_files(In, Out, Fun) ->
    Result = (catch process(In, Out, Fun)),
    file:close(In),
    file:close(Out),
    Result.

process(In, Out, Fun) ->
    case io:get_line(In, "") of
	eof ->
	    ok;
	Line ->
	    ok = io:put_chars(Out, Fun(Line)),
	    process(In, Out, Fun)
    end.

		    


More information about the erlang-questions mailing list