[erlang-questions] Sudden death of Erlang Node
Fri Jan 19 08:31:24 CET 2007
I don't know why you don't get an erl_crash dump.
But if/when you do get one, then I can recommend the
crashdump_viewer. It is excellent actually!!
WebTool is available at http://localhost:8888/
Eranga Udesh wrote:
> Thanks for the info.
> I said "very busy" to indicate the system I am talking about handles over
> 1000-1500 message passing, 750-1000 process spawning, 250-500 mnesia DB
> access, 500-750 Erl Port calls, etc, Per Second (/s) kind of busyness.
> However the CPU utilization doesn't go beyond 25% in any of the 4 CPU in the
> system and memory is over 40% free.
> Based on that, even though my end problem seems similar, the cause for it
> may not be. I am running 11B-2 compiled with SMP support, but the node is
> not started in SMP mode.
> In this particular node, I am running a couple of Erl Port Drivers developed
> in C.
> I guess if I can generate the erl_crash.dump, I should be able to find the
> cause for the problem. Why it's not generating?
> What methods do I have to identify the issue in a situation like this
> (activate debug, crash dump, etc)?
> - Eranga
> -----Original Message-----
> From: Serge Aleynikov [mailto:]
> Sent: Thursday, January 18, 2007 8:56 PM
> To: Eranga Udesh
> Subject: Re: [erlang-questions] Sudden death of Erlang Node
> Eranga Udesh wrote:
>> I have a very busy Erlang node running in a Quad Proc server with plenty
>> Ram. The server utilization is quite normal.
> You indicate that you have a "very busy" node, yet it's utilization is
> "quite normal". I find these definition contradictory. Could you
> define the peak utilization in CPU percentage consumption? If it is,
> say, over 90% that can't be considered normal.
>> However time to time, the
>> Erlang node goes to sudden death without any warnings. The erlang.log.x
>> files only show that the "heart" couldn't kill the server and the node
>> restarting info. Also I cannot find any erl_crash.dump file. Later I
>> introduced ERL_CRASH_DUMP and ERL_CRASH_DUMP_SECONDS environment variable
>> with different settings, but no luck. I use Erlang version 11B-2.
> We've experienced a similar issue intermittently with R11B-0 (without
> SMP - which is what we are running in production). The details can be
> found in this thread:
> Are you seeing the following message in the log?
> "heart: Wed Dec 13 18:59:54 2006: Erlang has closed."
> I managed to reproduce a similar issue by creating sustained CPU load at
> 100%. strace showed that at some point a node failed to allocate memory
> by calling mmap(). After that the node closed all file descriptors,
> which was immediately detected by the "heart" process that in turn
> killed and restarted the node. The only artifact seen was the error
> message above in the erlang.log.x file.
> I don't know exactly if this was the same cause as we had in production
> (at least the production process didn't seem to have exhausted the
> memory) but the heart message in the log was identical. What else can
> cause an Erlang node to close the pipe connecting it to the heart process?
> I suggest you set up a monitoring process on that machine to log some
> statistics about the process (such as timestamp + /proc/PID/status), so
> that you can correlate process memory with a time of the failure.
> Not sure how much this is helpful in your case, but this similar issue
> pops up once every couple of months in our production system followed by
> an automatic restart that remains unresolved.
More information about the erlang-questions