[erlang-questions] Unstable erlang compared to java or perl

Petter Egesund petter.egesund@REDACTED
Sun Nov 7 22:15:12 CET 2010


Hi, and thanks for answering!

Yes, the memory should be reclaimed - I think the problem might be
that the garbage collector is failing?!

Summing up ets-memory-usage for each thread gives reasonable numbers,
but erlang:memory() shows ets is using much more ram - calling
ets:delete  before the process finishes does not help.

No, I am only keeping integers in the ets, no strings :-) The program
is using binaries a lot, but these seems to be reclaimed from the gb
without problems.

Yes, I know the difference between processes/threads, only a couple
small ets-tables are global and shared between processes. The large
ones, which I think are the cause of the problem, is only written
to/read by one process.

Full source code? It is unfortunately to long and to data-bound to
make sense. Still hoping for a clue - ets:i() tells me that total ets
should be less than to 1 gb total, but memory() says it is using more
than 5 gb for ets.

Cheers,

Petter





On Sun, Nov 7, 2010 at 9:37 PM, Ryan Zezeski <rzezeski@REDACTED> wrote:
>
>
> On Sun, Nov 7, 2010 at 9:49 AM, Petter Egesund <petter.egesund@REDACTED>
> wrote:
>>
>> Hi, I have a small program with lots of memory-updates which I try to
>> run in Erlang.
>>
>> The same algorithm works fine in both Java and Perl, but fails in
>> Erlang because the program runs out of memory - and I can not figure
>> out why. Frustrating, as my Erlang-versjon seems to be the easiest to
>> scale as well as being the most readable.
>>
>> The program is threaded and each thread writes to a ets-table which is
>> created at the beginning of the thread. When the thread dies I try to
>> do a ets:delete(Table), like described in the manual, but the memory
>> used by the thread never seems to be released.
>>
>> Some facts:
>>
>> - The memory usage of each thread is rather constant. This is
>> confirmed when I use ets:i() to show info about memory usage.
>> - The number of threads are constant - confirmed by both running top
>> and writing out the number of threads regularly. When a thread dies, I
>> create a new one.
>> - I have tried to end the thread by sending a exit-signal as the last
>> statement. This helps some, but does not solve the leak.
>> - I put small lists of size 3-4 integers into the ets as values, the
>> keys are list of same size as well.
>> - I garbage-collect each thread before it dies, as well as doing
>> regular global garbage-collects. No help.
>> - Information from ets:i() about memory when I sum usage by each
>> thread, is much lower than stated by memory() when i run
>> erlang:memory(). This might indicate something? Does not seem logical
>> to me, at least.
>> - Info from erlang:memory is about half of what top/the os tells.
>> - I am running on ubuntu, 64-bit, 14A but I have tried 14B as well.
>>
>> Any clues? Dump from ets:i() and erlang:memory() is like below.
>>
>> Cheers,
>>
>> Petter
>>
>> --- dump ---
>>
>> eNumber of processes: 27
>> ets:i():
>>  id              name              type  size   mem      owner
>>
>>  ----------------------------------------------------------------------------
>>  13              code              set   261    10692    code_server
>>  4110            code_names        set   58     7804     code_server
>>  6746271765      the_synapses      ordered_set 5425194 113336012 <0.47.0>
>>  7022018584      the_synapses      ordered_set 15143493 310909950 <0.48.0>
>>  7774416922      the_synapses      ordered_set 8794649 182005810 <0.49.0>
>>  ac_tab          ac_tab            set   6      848
>>  application_controller
>>  file_io_servers file_io_servers   set   0      302      file_server_2
>>  global_locks    global_locks      set   0      302
>>  global_name_server
>>  global_names    global_names      set   0      302
>>  global_name_server
>>  global_names_ext global_names_ext  set   0      302
>>  global_name_server
>>  global_pid_ids  global_pid_ids    bag   0      302
>>  global_name_server
>>  global_pid_names global_pid_names  bag   0      302
>>  global_name_server
>>  inet_cache      inet_cache        bag   0      302      inet_db
>>  inet_db         inet_db           set   29     571      inet_db
>>  inet_hosts_byaddr inet_hosts_byaddr bag   0      302      inet_db
>>  inet_hosts_byname inet_hosts_byname bag   0      302      inet_db
>>  inet_hosts_file_byaddr inet_hosts_file_byaddr bag   0      302
>>  inet_db
>>  inet_hosts_file_byname inet_hosts_file_byname bag   0      302
>>  inet_db
>>  neurone_counter neurone_counter   set   258394 1846182  entity_server
>>  neurone_group_counter neurone_group_counter set   6      344
>> entity_group_server
>>  neurone_group_name neurone_group_name set   6      426
>>  entity_group_server
>>  neurone_group_name_reverse neurone_group_name_reverse set   6
>> 426      entity_group_server
>>  neurone_name    neurone_name      set   258394 11824602 entity_server
>>  neurone_name_reverse neurone_name_reverse set   258394 11824602
>> entity_server
>> memory():         [{total,5568669792},
>>                   {processes,1138936},
>>                   {processes_used,1128120},
>>                   {system,5567530856},
>>                   {atom,349769},
>>                   {atom_used,336605},
>>                   {binary,82704},
>>                   {code,3046365},
>>                   {ets,5562163256}]
>>
>>
>
> Hi Peter, ETS tables are not garbage collected.  Each ETS table has _one_
> owner (a process).  When that owner dies the table is deleted and it's
> memory is reclaimed.  You can also delete a table (and reclaim the memory)
> by calling ets:delete/1.  Looking at your memory result, your ETS tables are
> taking up ~5.2GB of data.  However, you binary usage is very low so I'm
> going to take a guess that you are sotring a list of strings?  If so you
> should note that on a 64-bit system *each character* in a string will use 16
> bytes of memory!  I highly recommend using binaries where possible when
> dealing with a large amount of data; your program will not only be more
> space efficient but also faster.  I've written a non-trivial Erlang
> application for work and I deal with CSV files that get up to 18 million
> rows.  I make heavy use of binaries and the binary module to parse these
> files and write entries to ETS--you'd be surprised how fast it is!  If you'd
> like I could provide an example.
> When you say "thread" do you mean "process?"  You do realize that an OS
> thread and Erlang process are two completely different things.  IIRC, the VM
> spawn's an OS thread per scheduler (along w/ some other threads for I/O and
> such).  Erlang processes are extremely cheap...don't be afraid to make
> thousands or even tens-of-thousands of them.
> You should not have to perform manual garbage collection, that seems like a
> code smell to me.  When a process dies it's heap will be reclaimed.  Each
> process has it's own isolated heap.
> Do you have multiple processes all writing to the same ETS table?  If so
> there are some improvements that were made to ETS (and Erlang in general)
> for concurrent writing/reading of an ETS table in 14B that you might want to
> look at.
> Finally, it would be helpful to see the full source code.  There is a good
> chance your solution is not optimal for Erlang.  By that, I mean that if
> your translation follows closely from your Java and Perl solutions than
> chances are it's not an optimal Erlang program as the paradigms are vastly
> different.
> -Ryan


More information about the erlang-questions mailing list