[erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server

Tue Jan 13 15:07:30 CET 2009

Hi Rickard, thank you very much - this sounds correct to me. The
customer cluster is still running a cron job that effectively does
lists:foreach(fun erlang:garbage_collect/1, erlang:processes()) every
ten minutes.

This script was introduced as a stop-gap measure when running a heavily
loaded ejabberd cluster on the 32bit VM where an out of memory condition
would take down the node and then the entire cluster due to some
problems with cross-node monitor storms. The cluster now runs on 64bit
VMs so we'll revisit the memory consumption problem and avoid using
erlang:garbage_collect/1.

We'll disable the script and see if the problem recurs.

Once again, thank you very much - I'm always very impressed by the level
of support the OTP team gives the erlang community.

Cheers,
--Geoff

Rickard Green <rickard.s.green@REDACTED> writes:

> Hi Geoff,
>
> I've looked at this and found a bug that may have caused this. When a
> process garbage collect another process and the process being garbage
> collected also receives a message during the garbage collect, the
> process being garbage collected can end up in the state that you
> described.
>
> This kind of garbage collect only happen when someone calls the
> garbage_collect/1 BIF or when code is purged. In the case with the
> disk_log server being stuck I think we can rule out the purge, i.e.,
> if it is this bug that caused your problem another process must have
> garbage collected the disk_log server via the garbage_collect/1
> BIF. Do you have any code that may have garbage collected the disk_log
> server via the garbage_collect/1 BIF? The garbage collect may also
> have been done explicitly in the shell.
>
> Regards,
> Rickard Green, Erlang/OTP, Ericsson AB.