list size 'causing VM "problems"

Mon Nov 23 14:56:47 CET 2009

Sounds very likely you are 64 bit which explains the 2 Gig, but I
think that...

        {ok,Line} ->
            [process_line(Line)|read_lines(IOfd)];

...would maybe not be "tail recursive" so the entire stack is being
run hence chewing up 7G or more? What happens if you change the
read_lines function to use an accumulator for the result? e.g...

read_lines(IOfd) ->
    read_lines(IOfd, []).

read_lines(IOfd, Acc) ->
    case file:read_line(IOfd) of
        {ok, Line} ->
	    read_lines(IOfd, [process_line(Line) | Acc]);
        eof ->
            lists:reverse(Acc)
    end.

On Nov 23, 4:09 am, Hendrik Visage <hvj...@REDACTED> wrote:
> Hi there,
>
> Yes, I know this code is not yet optimal (I'm still learning :), but
> it begs a few questions I'd like to understand from the VM etc.
>
> 1) I've run it fine with a small subset, but once I've loaded the 930k
> lines file, the VM sucks up a lot of RAM/Virtualmemory. Like a burst
> of about 2G (I have a 4G MacBookPro) and then once it returned in the
> erl shell, the VM starts to go balistic and consumes >7G of
> virtualmemory ;(
> Q1: why did the VM exhibit this behaviour? the garbage collector going bad/mad??
>
> 2) I will push the data into an ETS of sorts, as I'll try to find
> duplicate files, but were thinking of an initial pull into a list, en
> then fron there do the tests etc. The idea might be to pull in one
> disk, and then compare it to another removal disk's files.
> Q2: Should I rather do this straight into an ETS/DETS?
> Q3: Should I preferably start to consider DETS 'cause of the size??
> Q4: will Mnesia help in this case?
>
> %%--------------------------------------------------------------------
> %% Function: process_line/1
> %% Description: take a properly formated line, and parse it, and
> %%   returns the tuple {Type,File,Hash}
> %% Line: "MD5 (/.file) = d41d8cd98f00b204e9800998ecf8427e"
> %% Nore some might be SHA1 in future.
> %%--------------------------------------------------------------------
> process_line(Line) ->
>     {match,[Type,File,Hash]}=
>         re:run(Line,
>                "\(.*\)[ ][\\(]\(.*\)[\\)][ ][=][ ]\([0-9a-f]*\)\n",
>                [{capture,all_but_first,list}]),
>     {Type,File,Hash}.
>
> %%--------------------------------------------------------------------
> %% Function: read_lines/1
> %% Description: read in all the lines from a "properly formatted"
> %%    md5 output on MacOSX, returning a list with the tupples.
> %%--------------------------------------------------------------------
>
> read_lines(IOfd) ->
>     case  file:read_line(IOfd) of
>         {ok,Line} ->
>             [process_line(Line)|read_lines(IOfd)];
>         eof ->
>             []
>     end.
>
> ________________________________________________________________
> erlang-questions mailing list. Seehttp://www.erlang.org/faq.html
> erlang-questions (at) erlang.org