[erlang-questions] Unstable erlang compared to java or perl
Ryan Zezeski
rzezeski@REDACTED
Sun Nov 7 21:37:50 CET 2010
On Sun, Nov 7, 2010 at 9:49 AM, Petter Egesund <petter.egesund@REDACTED>wrote:
> Hi, I have a small program with lots of memory-updates which I try to
> run in Erlang.
>
> The same algorithm works fine in both Java and Perl, but fails in
> Erlang because the program runs out of memory - and I can not figure
> out why. Frustrating, as my Erlang-versjon seems to be the easiest to
> scale as well as being the most readable.
>
> The program is threaded and each thread writes to a ets-table which is
> created at the beginning of the thread. When the thread dies I try to
> do a ets:delete(Table), like described in the manual, but the memory
> used by the thread never seems to be released.
>
> Some facts:
>
> - The memory usage of each thread is rather constant. This is
> confirmed when I use ets:i() to show info about memory usage.
> - The number of threads are constant - confirmed by both running top
> and writing out the number of threads regularly. When a thread dies, I
> create a new one.
> - I have tried to end the thread by sending a exit-signal as the last
> statement. This helps some, but does not solve the leak.
> - I put small lists of size 3-4 integers into the ets as values, the
> keys are list of same size as well.
> - I garbage-collect each thread before it dies, as well as doing
> regular global garbage-collects. No help.
> - Information from ets:i() about memory when I sum usage by each
> thread, is much lower than stated by memory() when i run
> erlang:memory(). This might indicate something? Does not seem logical
> to me, at least.
> - Info from erlang:memory is about half of what top/the os tells.
> - I am running on ubuntu, 64-bit, 14A but I have tried 14B as well.
>
> Any clues? Dump from ets:i() and erlang:memory() is like below.
>
> Cheers,
>
> Petter
>
> --- dump ---
>
> eNumber of processes: 27
> ets:i():
> id name type size mem owner
>
> ----------------------------------------------------------------------------
> 13 code set 261 10692 code_server
> 4110 code_names set 58 7804 code_server
> 6746271765 the_synapses ordered_set 5425194 113336012 <0.47.0>
> 7022018584 the_synapses ordered_set 15143493 310909950 <0.48.0>
> 7774416922 the_synapses ordered_set 8794649 182005810 <0.49.0>
> ac_tab ac_tab set 6 848
> application_controller
> file_io_servers file_io_servers set 0 302 file_server_2
> global_locks global_locks set 0 302 global_name_server
> global_names global_names set 0 302 global_name_server
> global_names_ext global_names_ext set 0 302
> global_name_server
> global_pid_ids global_pid_ids bag 0 302 global_name_server
> global_pid_names global_pid_names bag 0 302
> global_name_server
> inet_cache inet_cache bag 0 302 inet_db
> inet_db inet_db set 29 571 inet_db
> inet_hosts_byaddr inet_hosts_byaddr bag 0 302 inet_db
> inet_hosts_byname inet_hosts_byname bag 0 302 inet_db
> inet_hosts_file_byaddr inet_hosts_file_byaddr bag 0 302
> inet_db
> inet_hosts_file_byname inet_hosts_file_byname bag 0 302
> inet_db
> neurone_counter neurone_counter set 258394 1846182 entity_server
> neurone_group_counter neurone_group_counter set 6 344
> entity_group_server
> neurone_group_name neurone_group_name set 6 426
> entity_group_server
> neurone_group_name_reverse neurone_group_name_reverse set 6
> 426 entity_group_server
> neurone_name neurone_name set 258394 11824602 entity_server
> neurone_name_reverse neurone_name_reverse set 258394 11824602
> entity_server
> memory(): [{total,5568669792},
> {processes,1138936},
> {processes_used,1128120},
> {system,5567530856},
> {atom,349769},
> {atom_used,336605},
> {binary,82704},
> {code,3046365},
> {ets,5562163256}]
>
>
>
Hi Peter, ETS tables are not garbage collected. Each ETS table has _one_
owner (a process). When that owner dies the table is deleted and it's
memory is reclaimed. You can also delete a table (and reclaim the memory)
by calling ets:delete/1. Looking at your memory result, your ETS tables are
taking up ~5.2GB of data. However, you binary usage is very low so I'm
going to take a guess that you are sotring a list of strings? If so you
should note that on a 64-bit system *each character* in a string will use 16
bytes of memory! I highly recommend using binaries where possible when
dealing with a large amount of data; your program will not only be more
space efficient but also faster. I've written a non-trivial Erlang
application for work and I deal with CSV files that get up to 18 million
rows. I make heavy use of binaries and the binary module to parse these
files and write entries to ETS--you'd be surprised how fast it is! If you'd
like I could provide an example.
When you say "thread" do you mean "process?" You do realize that an OS
thread and Erlang process are two completely different things. IIRC, the VM
spawn's an OS thread per scheduler (along w/ some other threads for I/O and
such). Erlang processes are extremely cheap...don't be afraid to make
thousands or even tens-of-thousands of them.
You should not have to perform manual garbage collection, that seems like a
code smell to me. When a process dies it's heap will be reclaimed. Each
process has it's own isolated heap.
Do you have multiple processes all writing to the same ETS table? If so
there are some improvements that were made to ETS (and Erlang in general)
for concurrent writing/reading of an ETS table in 14B that you might want to
look at.
Finally, it would be helpful to see the full source code. There is a good
chance your solution is not optimal for Erlang. By that, I mean that if
your translation follows closely from your Java and Perl solutions than
chances are it's not an optimal Erlang program as the paradigms are vastly
different.
-Ryan
More information about the erlang-questions
mailing list