[erlang-questions] all nodes in cluster crashing with eheap_alloc in the same time

Lukas Larsson lukas@REDACTED
Mon Sep 26 10:55:35 CEST 2016


Hello,

On Wed, Sep 21, 2016 at 7:50 PM, Caragea Silviu <silviu.cpp@REDACTED>
wrote:

>
> The only question I have now is :
>
> How I can make something to include in the logs more other info before
> process dies. like number of messages in the queue.
>
> We tried to setup also a monitor to be triggered way less than the limit
> where it has to be killed:
>
> Options = [{long_gc, 10000}, {large_heap, 1000000}, busy_port,
> busy_dist_port],
> erlang:system_monitor(self(), Options),
>
> handle_info({monitor, Pid, Type, Details}, State) ->
>     log_system_event({Type, Pid, Details}),
>     {noreply, State};
>
> log_system_event({large_heap, GcPid, Info}) ->
>     LogFun = fun() ->
>         case recon:info(GcPid, messages) of
>       {messages, Messages} ->
>            ?WARNING_MSG("Large heap (~p): ~p~nProcess info: ~p~nProcess
> state size (words in the heap): ~p~nMessage queue(first 10):~p~n",
>             [GcPid, Info, recon:info(GcPid), erts_debug:size(recon:get_state(GcPid)),
> Messages]);
>           undefined ->
>        ?WARNING_MSG("Large heap (~p): ~p~nProcess info is not available",
> [GcPid, Info])
>         end
>     end,
>     spawn(LogFun);
>
> But unfortunately the processes that has this issues have a life time
> small than 4 seconds. And this event is never triggered in time.
>
> Any help is appreciated !
>

You could try to use max_heap_size with #{ kill => false } and then install
a specialized error_logger that listens to specifically that type of event
and retrieves the information you want before killing the process.
Depending on how fast you need the run-away process to be killed this may
be acceptable to you.

Another tip, you may want to configure the process to keep the message
queue data off_heap for processes that tend to build large message queues.
It will make the GC a lot happiers, but it will also make max_heap_size not
include the message queue size when doing it's analysis.

Lukas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160926/6b15e264/attachment.htm>


More information about the erlang-questions mailing list