[erlang-questions] Unstable erlang compared to java or perl

Ryan Zezeski rzezeski@REDACTED
Sun Nov 7 21:37:50 CET 2010


On Sun, Nov 7, 2010 at 9:49 AM, Petter Egesund <petter.egesund@REDACTED>wrote:

> Hi, I have a small program with lots of memory-updates which I try to
> run in Erlang.
>
> The same algorithm works fine in both Java and Perl, but fails in
> Erlang because the program runs out of memory - and I can not figure
> out why. Frustrating, as my Erlang-versjon seems to be the easiest to
> scale as well as being the most readable.
>
> The program is threaded and each thread writes to a ets-table which is
> created at the beginning of the thread. When the thread dies I try to
> do a ets:delete(Table), like described in the manual, but the memory
> used by the thread never seems to be released.
>
> Some facts:
>
> - The memory usage of each thread is rather constant. This is
> confirmed when I use ets:i() to show info about memory usage.
> - The number of threads are constant - confirmed by both running top
> and writing out the number of threads regularly. When a thread dies, I
> create a new one.
> - I have tried to end the thread by sending a exit-signal as the last
> statement. This helps some, but does not solve the leak.
> - I put small lists of size 3-4 integers into the ets as values, the
> keys are list of same size as well.
> - I garbage-collect each thread before it dies, as well as doing
> regular global garbage-collects. No help.
> - Information from ets:i() about memory when I sum usage by each
> thread, is much lower than stated by memory() when i run
> erlang:memory(). This might indicate something? Does not seem logical
> to me, at least.
> - Info from erlang:memory is about half of what top/the os tells.
> - I am running on ubuntu, 64-bit, 14A but I have tried 14B as well.
>
> Any clues? Dump from ets:i() and erlang:memory() is like below.
>
> Cheers,
>
> Petter
>
> --- dump ---
>
> eNumber of processes: 27
> ets:i():
>  id              name              type  size   mem      owner
>
>  ----------------------------------------------------------------------------
>  13              code              set   261    10692    code_server
>  4110            code_names        set   58     7804     code_server
>  6746271765      the_synapses      ordered_set 5425194 113336012 <0.47.0>
>  7022018584      the_synapses      ordered_set 15143493 310909950 <0.48.0>
>  7774416922      the_synapses      ordered_set 8794649 182005810 <0.49.0>
>  ac_tab          ac_tab            set   6      848
>  application_controller
>  file_io_servers file_io_servers   set   0      302      file_server_2
>  global_locks    global_locks      set   0      302      global_name_server
>  global_names    global_names      set   0      302      global_name_server
>  global_names_ext global_names_ext  set   0      302
>  global_name_server
>  global_pid_ids  global_pid_ids    bag   0      302      global_name_server
>  global_pid_names global_pid_names  bag   0      302
>  global_name_server
>  inet_cache      inet_cache        bag   0      302      inet_db
>  inet_db         inet_db           set   29     571      inet_db
>  inet_hosts_byaddr inet_hosts_byaddr bag   0      302      inet_db
>  inet_hosts_byname inet_hosts_byname bag   0      302      inet_db
>  inet_hosts_file_byaddr inet_hosts_file_byaddr bag   0      302
>  inet_db
>  inet_hosts_file_byname inet_hosts_file_byname bag   0      302
>  inet_db
>  neurone_counter neurone_counter   set   258394 1846182  entity_server
>  neurone_group_counter neurone_group_counter set   6      344
> entity_group_server
>  neurone_group_name neurone_group_name set   6      426
>  entity_group_server
>  neurone_group_name_reverse neurone_group_name_reverse set   6
> 426      entity_group_server
>  neurone_name    neurone_name      set   258394 11824602 entity_server
>  neurone_name_reverse neurone_name_reverse set   258394 11824602
> entity_server
> memory():         [{total,5568669792},
>                   {processes,1138936},
>                   {processes_used,1128120},
>                   {system,5567530856},
>                   {atom,349769},
>                   {atom_used,336605},
>                   {binary,82704},
>                   {code,3046365},
>                   {ets,5562163256}]
>
>
>
Hi Peter, ETS tables are not garbage collected.  Each ETS table has _one_
owner (a process).  When that owner dies the table is deleted and it's
memory is reclaimed.  You can also delete a table (and reclaim the memory)
by calling ets:delete/1.  Looking at your memory result, your ETS tables are
taking up ~5.2GB of data.  However, you binary usage is very low so I'm
going to take a guess that you are sotring a list of strings?  If so you
should note that on a 64-bit system *each character* in a string will use 16
bytes of memory!  I highly recommend using binaries where possible when
dealing with a large amount of data; your program will not only be more
space efficient but also faster.  I've written a non-trivial Erlang
application for work and I deal with CSV files that get up to 18 million
rows.  I make heavy use of binaries and the binary module to parse these
files and write entries to ETS--you'd be surprised how fast it is!  If you'd
like I could provide an example.

When you say "thread" do you mean "process?"  You do realize that an OS
thread and Erlang process are two completely different things.  IIRC, the VM
spawn's an OS thread per scheduler (along w/ some other threads for I/O and
such).  Erlang processes are extremely cheap...don't be afraid to make
thousands or even tens-of-thousands of them.

You should not have to perform manual garbage collection, that seems like a
code smell to me.  When a process dies it's heap will be reclaimed.  Each
process has it's own isolated heap.

Do you have multiple processes all writing to the same ETS table?  If so
there are some improvements that were made to ETS (and Erlang in general)
for concurrent writing/reading of an ETS table in 14B that you might want to
look at.

Finally, it would be helpful to see the full source code.  There is a good
chance your solution is not optimal for Erlang.  By that, I mean that if
your translation follows closely from your Java and Perl solutions than
chances are it's not an optimal Erlang program as the paradigms are vastly
different.

-Ryan


More information about the erlang-questions mailing list