[erlang-bugs] R12B-3/64bit/smp Stuck disk_log_server

Rickard Green rickard.s.green@REDACTED
Wed Jan 14 10:59:19 CET 2009


A source patch can now be downloaded:
  http://www.erlang.org/download/patches/otp_src_R12B-5_OTP-7738.patch
  http://www.erlang.org/download/patches/otp_src_R12B-5_OTP-7738.readme

Regards,
Rickard Green, Erlang/OTP, Ericsson AB.


Rickard Green wrote:
> I'll prepare a source patch fixing the problem. I wont be able to post 
> it until tomorrow, though.
> 
> Regards,
> Rickard Green, Erlang/OTP, Ericsson AB.
> 
> 
> Geoff Cant wrote:
>> Hi Rickard, thank you very much - this sounds correct to me. The
>> customer cluster is still running a cron job that effectively does
>> lists:foreach(fun erlang:garbage_collect/1, erlang:processes()) every
>> ten minutes.
>>
>> This script was introduced as a stop-gap measure when running a heavily
>> loaded ejabberd cluster on the 32bit VM where an out of memory condition
>> would take down the node and then the entire cluster due to some
>> problems with cross-node monitor storms. The cluster now runs on 64bit
>> VMs so we'll revisit the memory consumption problem and avoid using
>> erlang:garbage_collect/1.
>>
>> We'll disable the script and see if the problem recurs.
>>
>> Once again, thank you very much - I'm always very impressed by the level
>> of support the OTP team gives the erlang community.
>>
>> Cheers,
>> --Geoff
>>
>>
>> Rickard Green <rickard.s.green@REDACTED> writes:
>>
>>> Hi Geoff,
>>>
>>> I've looked at this and found a bug that may have caused this. When a
>>> process garbage collect another process and the process being garbage
>>> collected also receives a message during the garbage collect, the
>>> process being garbage collected can end up in the state that you
>>> described.
>>>
>>> This kind of garbage collect only happen when someone calls the
>>> garbage_collect/1 BIF or when code is purged. In the case with the
>>> disk_log server being stuck I think we can rule out the purge, i.e.,
>>> if it is this bug that caused your problem another process must have
>>> garbage collected the disk_log server via the garbage_collect/1
>>> BIF. Do you have any code that may have garbage collected the disk_log
>>> server via the garbage_collect/1 BIF? The garbage collect may also
>>> have been done explicitly in the shell.
>>>
>>> Regards,
>>> Rickard Green, Erlang/OTP, Ericsson AB.
>>
>>
> 



More information about the erlang-bugs mailing list