[erlang-bugs] Scheduler Wall Time Statistics live|dead locking a process.
Wed Jul 18 16:30:27 CEST 2012
On Wed, 18 Jul 2012, Fred Hebert wrote:
> Hi there Patrick!
> I can't exactly afford to dump core on one of the servers right now because
> yeah, it would interrupt the service. I could however set one VM up on the
> server that just sits and calls vmstats and does some busy test work, pushing
> it to a fake StatsD server to reproduce it; maybe it could work.
That would be great!
> I can get going setting stuff up; how should I dump the core for it to be
> useful for you?
Oh, just kill -ABRT on the beam pid when it hangs. Don't forget to 'ulimit
-c unlimited' before starting erlang though, I tend to forget that all
the time and get very sad when I've reproduced a problem after three
days of trying and get no core whatsoever :)
> On 12-07-18 9:44 AM, wrote:
>> Hi Fred!
>> On Wed, 18 Jul 2012, Fred Hebert wrote:
>>> Hi there,
>>> If you go on erlang-questions, you'll find the following thread I started
>>> regarding one of my gen_servers locking up forever until I try to connect
>>> to the VM:
>>> And the information following it in
>>> The gist of it is that apparently, the gen_server gets stuck while calling
>>> erlang:statistics(scheduler_wall_time). A process info dump on it returns:
>>> with the interesting parts:
>>> I'm unsure what exactly causes the problem, and we're running the VM with
>>> default arguments when it comes to scheduling and layout. It happens even
>>> when the virtual machine is under relatively low load (scheduler active
>>> wall time is less than 5%, but more than 2% of the total wall time when
>>> averaging all cores) and can also happen under higher load.
>> Ouch... Seems like one of the schedulers does not understand that it should
>> report data back to the process. Is there any chance of dumping core of a
>> machine where it hangs, or would that mean interruption of service? I
>> *really* would like to know what the schedulers are doing when they should
>> be reporting back...
>>> Only that process appears affected.
>> Yes, it's just waiting for a message that does not arrive, one that should
>> be sent from the VM when statistics for the scheduler is available...
>>> erlang-bugs mailing list
More information about the erlang-bugs