[erlang-questions] Unstable erlang compared to java or perl

Morten Krogh mk@REDACTED
Sun Nov 7 22:24:15 CET 2010


Hi

Are you sure that your ets memory information is in bytes, and not in words?

Morten.


On 11/7/10 10:15 PM, Petter Egesund wrote:
> Hi, and thanks for answering!
>
> Yes, the memory should be reclaimed - I think the problem might be
> that the garbage collector is failing?!
>
> Summing up ets-memory-usage for each thread gives reasonable numbers,
> but erlang:memory() shows ets is using much more ram - calling
> ets:delete  before the process finishes does not help.
>
> No, I am only keeping integers in the ets, no strings :-) The program
> is using binaries a lot, but these seems to be reclaimed from the gb
> without problems.
>
> Yes, I know the difference between processes/threads, only a couple
> small ets-tables are global and shared between processes. The large
> ones, which I think are the cause of the problem, is only written
> to/read by one process.
>
> Full source code? It is unfortunately to long and to data-bound to
> make sense. Still hoping for a clue - ets:i() tells me that total ets
> should be less than to 1 gb total, but memory() says it is using more
> than 5 gb for ets.
>
> Cheers,
>
> Petter
>
>
>
>
>
> On Sun, Nov 7, 2010 at 9:37 PM, Ryan Zezeski<rzezeski@REDACTED>  wrote:
>>
>> On Sun, Nov 7, 2010 at 9:49 AM, Petter Egesund<petter.egesund@REDACTED>
>> wrote:
>>> Hi, I have a small program with lots of memory-updates which I try to
>>> run in Erlang.
>>>
>>> The same algorithm works fine in both Java and Perl, but fails in
>>> Erlang because the program runs out of memory - and I can not figure
>>> out why. Frustrating, as my Erlang-versjon seems to be the easiest to
>>> scale as well as being the most readable.
>>>
>>> The program is threaded and each thread writes to a ets-table which is
>>> created at the beginning of the thread. When the thread dies I try to
>>> do a ets:delete(Table), like described in the manual, but the memory
>>> used by the thread never seems to be released.
>>>
>>> Some facts:
>>>
>>> - The memory usage of each thread is rather constant. This is
>>> confirmed when I use ets:i() to show info about memory usage.
>>> - The number of threads are constant - confirmed by both running top
>>> and writing out the number of threads regularly. When a thread dies, I
>>> create a new one.
>>> - I have tried to end the thread by sending a exit-signal as the last
>>> statement. This helps some, but does not solve the leak.
>>> - I put small lists of size 3-4 integers into the ets as values, the
>>> keys are list of same size as well.
>>> - I garbage-collect each thread before it dies, as well as doing
>>> regular global garbage-collects. No help.
>>> - Information from ets:i() about memory when I sum usage by each
>>> thread, is much lower than stated by memory() when i run
>>> erlang:memory(). This might indicate something? Does not seem logical
>>> to me, at least.
>>> - Info from erlang:memory is about half of what top/the os tells.
>>> - I am running on ubuntu, 64-bit, 14A but I have tried 14B as well.
>>>
>>> Any clues? Dump from ets:i() and erlang:memory() is like below.
>>>
>>> Cheers,
>>>
>>> Petter
>>>
>>> --- dump ---
>>>
>>> eNumber of processes: 27
>>> ets:i():
>>>   id              name              type  size   mem      owner
>>>
>>>   ----------------------------------------------------------------------------
>>>   13              code              set   261    10692    code_server
>>>   4110            code_names        set   58     7804     code_server
>>>   6746271765      the_synapses      ordered_set 5425194 113336012<0.47.0>
>>>   7022018584      the_synapses      ordered_set 15143493 310909950<0.48.0>
>>>   7774416922      the_synapses      ordered_set 8794649 182005810<0.49.0>
>>>   ac_tab          ac_tab            set   6      848
>>>   application_controller
>>>   file_io_servers file_io_servers   set   0      302      file_server_2
>>>   global_locks    global_locks      set   0      302
>>>   global_name_server
>>>   global_names    global_names      set   0      302
>>>   global_name_server
>>>   global_names_ext global_names_ext  set   0      302
>>>   global_name_server
>>>   global_pid_ids  global_pid_ids    bag   0      302
>>>   global_name_server
>>>   global_pid_names global_pid_names  bag   0      302
>>>   global_name_server
>>>   inet_cache      inet_cache        bag   0      302      inet_db
>>>   inet_db         inet_db           set   29     571      inet_db
>>>   inet_hosts_byaddr inet_hosts_byaddr bag   0      302      inet_db
>>>   inet_hosts_byname inet_hosts_byname bag   0      302      inet_db
>>>   inet_hosts_file_byaddr inet_hosts_file_byaddr bag   0      302
>>>   inet_db
>>>   inet_hosts_file_byname inet_hosts_file_byname bag   0      302
>>>   inet_db
>>>   neurone_counter neurone_counter   set   258394 1846182  entity_server
>>>   neurone_group_counter neurone_group_counter set   6      344
>>> entity_group_server
>>>   neurone_group_name neurone_group_name set   6      426
>>>   entity_group_server
>>>   neurone_group_name_reverse neurone_group_name_reverse set   6
>>> 426      entity_group_server
>>>   neurone_name    neurone_name      set   258394 11824602 entity_server
>>>   neurone_name_reverse neurone_name_reverse set   258394 11824602
>>> entity_server
>>> memory():         [{total,5568669792},
>>>                    {processes,1138936},
>>>                    {processes_used,1128120},
>>>                    {system,5567530856},
>>>                    {atom,349769},
>>>                    {atom_used,336605},
>>>                    {binary,82704},
>>>                    {code,3046365},
>>>                    {ets,5562163256}]
>>>
>>>
>> Hi Peter, ETS tables are not garbage collected.  Each ETS table has _one_
>> owner (a process).  When that owner dies the table is deleted and it's
>> memory is reclaimed.  You can also delete a table (and reclaim the memory)
>> by calling ets:delete/1.  Looking at your memory result, your ETS tables are
>> taking up ~5.2GB of data.  However, you binary usage is very low so I'm
>> going to take a guess that you are sotring a list of strings?  If so you
>> should note that on a 64-bit system *each character* in a string will use 16
>> bytes of memory!  I highly recommend using binaries where possible when
>> dealing with a large amount of data; your program will not only be more
>> space efficient but also faster.  I've written a non-trivial Erlang
>> application for work and I deal with CSV files that get up to 18 million
>> rows.  I make heavy use of binaries and the binary module to parse these
>> files and write entries to ETS--you'd be surprised how fast it is!  If you'd
>> like I could provide an example.
>> When you say "thread" do you mean "process?"  You do realize that an OS
>> thread and Erlang process are two completely different things.  IIRC, the VM
>> spawn's an OS thread per scheduler (along w/ some other threads for I/O and
>> such).  Erlang processes are extremely cheap...don't be afraid to make
>> thousands or even tens-of-thousands of them.
>> You should not have to perform manual garbage collection, that seems like a
>> code smell to me.  When a process dies it's heap will be reclaimed.  Each
>> process has it's own isolated heap.
>> Do you have multiple processes all writing to the same ETS table?  If so
>> there are some improvements that were made to ETS (and Erlang in general)
>> for concurrent writing/reading of an ETS table in 14B that you might want to
>> look at.
>> Finally, it would be helpful to see the full source code.  There is a good
>> chance your solution is not optimal for Erlang.  By that, I mean that if
>> your translation follows closely from your Java and Perl solutions than
>> chances are it's not an optimal Erlang program as the paradigms are vastly
>> different.
>> -Ryan
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>



More information about the erlang-questions mailing list