[erlang-questions] Re: list size 'causing VM "problems"
Hendrik Visage
hvjunk@REDACTED
Tue Nov 24 11:25:37 CET 2009
On 11/23/09, Steve Davis <steven.charles.davis@REDACTED> wrote:
> Sounds very likely you are 64 bit which explains the 2 Gig,
MacOSX 64bit erlang yes... perhaps I need to recompile to 32bit while
testing/playing with this one...
> but I
> think that...
>
> {ok,Line} ->
> [process_line(Line)|read_lines(IOfd)];
>
> ...would maybe not be "tail recursive" so the entire stack is being
> run hence chewing up 7G or more? What happens if you change the
> read_lines function to use an accumulator for the result? e.g...
Tries this, same trouble
type of troubles in the erl shell.
The symptoms:
the:
List=read_lines(FD).
executes, doesn't *appear* to be using lots of VM space, and it
outputs the partial (being the shell it only prints a part of the
lines with the [...]...] stuff.
Then it *hangs* at that, not showing the prompt (this is inside
Aquamacs's erlang shell mode). This is then where it appears the
system goes "west", as the memory utilization (as measured/shown by
the MacOSX Activity Monitor) starts to grow and grow. with the last
test (using the Accumulator as below) it grew to 7G at which point I
killed the beam.smp process.
> read_lines(IOfd) ->
> read_lines(IOfd, []).
>
> read_lines(IOfd, Acc) ->
> case file:read_line(IOfd) of
> {ok, Line} ->
> read_lines(IOfd, [process_line(Line) | Acc]);
> eof ->
> lists:reverse(Acc)
> end.
>
>
>
> On Nov 23, 4:09 am, Hendrik Visage <hvj...@REDACTED> wrote:
>> Hi there,
>>
>> Yes, I know this code is not yet optimal (I'm still learning :), but
>> it begs a few questions I'd like to understand from the VM etc.
>>
>> 1) I've run it fine with a small subset, but once I've loaded the 930k
>> lines file, the VM sucks up a lot of RAM/Virtualmemory. Like a burst
>> of about 2G (I have a 4G MacBookPro) and then once it returned in the
>> erl shell, the VM starts to go balistic and consumes >7G of
>> virtualmemory ;(
>> Q1: why did the VM exhibit this behaviour? the garbage collector going
>> bad/mad??
>>
>> 2) I will push the data into an ETS of sorts, as I'll try to find
>> duplicate files, but were thinking of an initial pull into a list, en
>> then fron there do the tests etc. The idea might be to pull in one
>> disk, and then compare it to another removal disk's files.
>> Q2: Should I rather do this straight into an ETS/DETS?
>> Q3: Should I preferably start to consider DETS 'cause of the size??
>> Q4: will Mnesia help in this case?
>>
>> %%--------------------------------------------------------------------
>> %% Function: process_line/1
>> %% Description: take a properly formated line, and parse it, and
>> %% returns the tuple {Type,File,Hash}
>> %% Line: "MD5 (/.file) = d41d8cd98f00b204e9800998ecf8427e"
>> %% Nore some might be SHA1 in future.
>> %%--------------------------------------------------------------------
>> process_line(Line) ->
>> {match,[Type,File,Hash]}=
>> re:run(Line,
>> "\(.*\)[ ][\\(]\(.*\)[\\)][ ][=][ ]\([0-9a-f]*\)\n",
>> [{capture,all_but_first,list}]),
>> {Type,File,Hash}.
>>
>> %%--------------------------------------------------------------------
>> %% Function: read_lines/1
>> %% Description: read in all the lines from a "properly formatted"
>> %% md5 output on MacOSX, returning a list with the tupples.
>> %%--------------------------------------------------------------------
>>
>> read_lines(IOfd) ->
>> case file:read_line(IOfd) of
>> {ok,Line} ->
>> [process_line(Line)|read_lines(IOfd)];
>> eof ->
>> []
>> end.
>>
>> ________________________________________________________________
>> erlang-questions mailing list. Seehttp://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>
More information about the erlang-questions
mailing list