[erlang-questions] Unstable erlang compared to java or perl
Petter Egesund
petter.egesund@REDACTED
Sun Nov 7 22:15:12 CET 2010
Hi, and thanks for answering!
Yes, the memory should be reclaimed - I think the problem might be
that the garbage collector is failing?!
Summing up ets-memory-usage for each thread gives reasonable numbers,
but erlang:memory() shows ets is using much more ram - calling
ets:delete before the process finishes does not help.
No, I am only keeping integers in the ets, no strings :-) The program
is using binaries a lot, but these seems to be reclaimed from the gb
without problems.
Yes, I know the difference between processes/threads, only a couple
small ets-tables are global and shared between processes. The large
ones, which I think are the cause of the problem, is only written
to/read by one process.
Full source code? It is unfortunately to long and to data-bound to
make sense. Still hoping for a clue - ets:i() tells me that total ets
should be less than to 1 gb total, but memory() says it is using more
than 5 gb for ets.
Cheers,
Petter
On Sun, Nov 7, 2010 at 9:37 PM, Ryan Zezeski <rzezeski@REDACTED> wrote:
>
>
> On Sun, Nov 7, 2010 at 9:49 AM, Petter Egesund <petter.egesund@REDACTED>
> wrote:
>>
>> Hi, I have a small program with lots of memory-updates which I try to
>> run in Erlang.
>>
>> The same algorithm works fine in both Java and Perl, but fails in
>> Erlang because the program runs out of memory - and I can not figure
>> out why. Frustrating, as my Erlang-versjon seems to be the easiest to
>> scale as well as being the most readable.
>>
>> The program is threaded and each thread writes to a ets-table which is
>> created at the beginning of the thread. When the thread dies I try to
>> do a ets:delete(Table), like described in the manual, but the memory
>> used by the thread never seems to be released.
>>
>> Some facts:
>>
>> - The memory usage of each thread is rather constant. This is
>> confirmed when I use ets:i() to show info about memory usage.
>> - The number of threads are constant - confirmed by both running top
>> and writing out the number of threads regularly. When a thread dies, I
>> create a new one.
>> - I have tried to end the thread by sending a exit-signal as the last
>> statement. This helps some, but does not solve the leak.
>> - I put small lists of size 3-4 integers into the ets as values, the
>> keys are list of same size as well.
>> - I garbage-collect each thread before it dies, as well as doing
>> regular global garbage-collects. No help.
>> - Information from ets:i() about memory when I sum usage by each
>> thread, is much lower than stated by memory() when i run
>> erlang:memory(). This might indicate something? Does not seem logical
>> to me, at least.
>> - Info from erlang:memory is about half of what top/the os tells.
>> - I am running on ubuntu, 64-bit, 14A but I have tried 14B as well.
>>
>> Any clues? Dump from ets:i() and erlang:memory() is like below.
>>
>> Cheers,
>>
>> Petter
>>
>> --- dump ---
>>
>> eNumber of processes: 27
>> ets:i():
>> id name type size mem owner
>>
>> ----------------------------------------------------------------------------
>> 13 code set 261 10692 code_server
>> 4110 code_names set 58 7804 code_server
>> 6746271765 the_synapses ordered_set 5425194 113336012 <0.47.0>
>> 7022018584 the_synapses ordered_set 15143493 310909950 <0.48.0>
>> 7774416922 the_synapses ordered_set 8794649 182005810 <0.49.0>
>> ac_tab ac_tab set 6 848
>> application_controller
>> file_io_servers file_io_servers set 0 302 file_server_2
>> global_locks global_locks set 0 302
>> global_name_server
>> global_names global_names set 0 302
>> global_name_server
>> global_names_ext global_names_ext set 0 302
>> global_name_server
>> global_pid_ids global_pid_ids bag 0 302
>> global_name_server
>> global_pid_names global_pid_names bag 0 302
>> global_name_server
>> inet_cache inet_cache bag 0 302 inet_db
>> inet_db inet_db set 29 571 inet_db
>> inet_hosts_byaddr inet_hosts_byaddr bag 0 302 inet_db
>> inet_hosts_byname inet_hosts_byname bag 0 302 inet_db
>> inet_hosts_file_byaddr inet_hosts_file_byaddr bag 0 302
>> inet_db
>> inet_hosts_file_byname inet_hosts_file_byname bag 0 302
>> inet_db
>> neurone_counter neurone_counter set 258394 1846182 entity_server
>> neurone_group_counter neurone_group_counter set 6 344
>> entity_group_server
>> neurone_group_name neurone_group_name set 6 426
>> entity_group_server
>> neurone_group_name_reverse neurone_group_name_reverse set 6
>> 426 entity_group_server
>> neurone_name neurone_name set 258394 11824602 entity_server
>> neurone_name_reverse neurone_name_reverse set 258394 11824602
>> entity_server
>> memory(): [{total,5568669792},
>> {processes,1138936},
>> {processes_used,1128120},
>> {system,5567530856},
>> {atom,349769},
>> {atom_used,336605},
>> {binary,82704},
>> {code,3046365},
>> {ets,5562163256}]
>>
>>
>
> Hi Peter, ETS tables are not garbage collected. Each ETS table has _one_
> owner (a process). When that owner dies the table is deleted and it's
> memory is reclaimed. You can also delete a table (and reclaim the memory)
> by calling ets:delete/1. Looking at your memory result, your ETS tables are
> taking up ~5.2GB of data. However, you binary usage is very low so I'm
> going to take a guess that you are sotring a list of strings? If so you
> should note that on a 64-bit system *each character* in a string will use 16
> bytes of memory! I highly recommend using binaries where possible when
> dealing with a large amount of data; your program will not only be more
> space efficient but also faster. I've written a non-trivial Erlang
> application for work and I deal with CSV files that get up to 18 million
> rows. I make heavy use of binaries and the binary module to parse these
> files and write entries to ETS--you'd be surprised how fast it is! If you'd
> like I could provide an example.
> When you say "thread" do you mean "process?" You do realize that an OS
> thread and Erlang process are two completely different things. IIRC, the VM
> spawn's an OS thread per scheduler (along w/ some other threads for I/O and
> such). Erlang processes are extremely cheap...don't be afraid to make
> thousands or even tens-of-thousands of them.
> You should not have to perform manual garbage collection, that seems like a
> code smell to me. When a process dies it's heap will be reclaimed. Each
> process has it's own isolated heap.
> Do you have multiple processes all writing to the same ETS table? If so
> there are some improvements that were made to ETS (and Erlang in general)
> for concurrent writing/reading of an ETS table in 14B that you might want to
> look at.
> Finally, it would be helpful to see the full source code. There is a good
> chance your solution is not optimal for Erlang. By that, I mean that if
> your translation follows closely from your Java and Perl solutions than
> chances are it's not an optimal Erlang program as the paradigms are vastly
> different.
> -Ryan
More information about the erlang-questions
mailing list