[erlang-questions] erlang woes

Thu Aug 5 15:55:27 CEST 2010

Are you using pg2 in a distributed system? There is a known (should be fixed) bug where pg2 was eating memory like crazy.

-----Original Message-----
From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On Behalf Of Arun Suresh
Sent: Thursday, August 05, 2010 1:37 AM
To: erlang-questions@REDACTED
Subject: [erlang-questions] erlang woes

Hello folks..

Ive been using erlang for a while now.. and ive got a production system up
and running from scratch.. but there are some really annoying aspects of the
platform.. the most fundamental of which is the fact that when a node
crashes it is very hard to figure out exactly why.. Almost ALL the time what
i see in the crash dump is something akin to :

=erl_crash_dump:0.1
Wed Aug  4 21:50:01 2010
Slogan: eheap_alloc: Cannot allocate 1140328500 bytes of memory (of type
"heap").
System version: Erlang R13B04 (erts-5.7.5) [source] [smp:2:2] [rq:2]
[async-threads:0] [hipe] [kernel-poll:false]
Compiled: Tue May 11 12:37:38 2010
Taints:

at which point I start to comb the sasl logs... and 9 out of 10 times... it
is because some critical process has died and the supervisor is busy
restarting it.. for example, the other day.. my node crashed and from the
sasl logs.. i see that the http manager for a profile I had created had
crashed like so :

=CRASH REPORT==== 4-Aug-2010::21:47:09 ===
  crasher:
    initial call: httpc_manager:init/1
    pid: <0.185.0>
    registered_name: httpc_manager_store
    exception exit: {{case_clause,
                         [{handler_info,#Ref<0.0.17.61372>,<0.17225.36>,
                              undefined,<0.15665.36>,initiating}]},
                     [{httpc_manager,handle_connect_and_send,5},
                      {httpc_manager,handle_info,2},
                      {gen_server,handle_msg,5},
                      {proc_lib,init_p_do_apply,3}]}
      in function  gen_server:terminate/6
    ancestors: [httpc_profile_sup,httpc_sup,inets_sup,<0.46.0>]
    messages: [{'EXIT',<0.16755.36>,normal},
                  {connect_and_send,<0.16752.36>,#Ref<0.0.17.61366>,

and subsequent messages were related to the supervisor trying to restart the
profile manager... and failing..

Now my point is... why did the node have to crash.. just because the manager
has to be restarted ?
and why does the crash.dump always keep telling me im out of memory..

The problem is.. I thought erlang was built to be fault tolerant.. the
choice of me using erlang had a LOT to do with doing away with the having to
code defensively.. "let it crash" and all that .. just make sure u have a
supervisor that restarts ur process and everything will just work fine...
but my experience is that most of the time.. simple process restarts bring
the whole node crashing down...

Would deeply appreciate it someone could tell me if there is something
fundamentally wrong with the way im doing things.. or if anyones been in my
situation and have had some enlightenments.

thanks in advance
-Arun