[erlang-bugs] Scheduler Wall Time Statistics live|dead locking a process.
pan@REDACTED
pan@REDACTED
Wed Jul 18 16:30:27 CEST 2012
Hi!
On Wed, 18 Jul 2012, Fred Hebert wrote:
> Hi there Patrick!
>
> I can't exactly afford to dump core on one of the servers right now because
> yeah, it would interrupt the service. I could however set one VM up on the
> server that just sits and calls vmstats and does some busy test work, pushing
> it to a fake StatsD server to reproduce it; maybe it could work.
That would be great!
>
> I can get going setting stuff up; how should I dump the core for it to be
> useful for you?
Oh, just kill -ABRT on the beam pid when it hangs. Don't forget to 'ulimit
-c unlimited' before starting erlang though, I tend to forget that all
the time and get very sad when I've reproduced a problem after three
days of trying and get no core whatsoever :)
>
> Regards,
> Fred.
Cheers,
Patrik
>
> On 12-07-18 9:44 AM, pan@REDACTED wrote:
>> Hi Fred!
>>
>> On Wed, 18 Jul 2012, Fred Hebert wrote:
>>
>>> Hi there,
>>>
>>> If you go on erlang-questions, you'll find the following thread I started
>>> regarding one of my gen_servers locking up forever until I try to connect
>>> to the VM:
>>> http://erlang.org/pipermail/erlang-questions/2012-July/068097.html
>>>
>>> And the information following it in
>>> http://erlang.org/pipermail/erlang-questions/2012-July/068099.html
>>>
>>> The gist of it is that apparently, the gen_server gets stuck while calling
>>> erlang:statistics(scheduler_wall_time). A process info dump on it returns:
>>>
>>> [{registered_name,vmstats_server},
>>> {current_function,{erlang,sched_wall_time,3}},
>>> {initial_call,{proc_lib,init_p,5}},
>>> {status,waiting},
>>> {message_queue_len,2},
>>> {messages,[{system,{<5998.7341.243>,#Ref<5998.0.3810.221818>},get_status},
>>> {system,{<5998.28757.800>,#Ref<5998.0.3811.260443>},get_status}]},
>>> {links,[<5998.918.0>]},
>>> {dictionary,[{random_seed,{17770,13214,15044}},
>>> {'$ancestors',[vmstats_sup,<5998.917.0>]},
>>> {'$initial_call',{vmstats_server,init,1}}]},
>>> {trap_exit,false},
>>> {error_handler,error_handler},
>>> {priority,normal},
>>> {group_leader,<5998.916.0>},
>>> {total_heap_size,122003},
>>> {heap_size,121393},
>>> {stack_size,21},
>>> {reductions,314325681},
>>> {garbage_collection,[{min_bin_vheap_size,46368},
>>> {min_heap_size,233},
>>> {fullsweep_after,65535},
>>> {minor_gcs,23774}]},
>>> {suspending,[]}]
>>> ok
>>>
>>> with the interesting parts:
>>> {current_function,{erlang,sched_wall_time,3}},
>>> {status,waiting},
>>>
>>> I'm unsure what exactly causes the problem, and we're running the VM with
>>> default arguments when it comes to scheduling and layout. It happens even
>>> when the virtual machine is under relatively low load (scheduler active
>>> wall time is less than 5%, but more than 2% of the total wall time when
>>> averaging all cores) and can also happen under higher load.
>>
>> Ouch... Seems like one of the schedulers does not understand that it should
>> report data back to the process. Is there any chance of dumping core of a
>> machine where it hangs, or would that mean interruption of service? I
>> *really* would like to know what the schedulers are doing when they should
>> be reporting back...
>>
>>
>>>
>>> Only that process appears affected.
>>
>> Yes, it's just waiting for a message that does not arrive, one that should
>> be sent from the VM when statistics for the scheduler is available...
>>
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> erlang-bugs@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>>
>> Cheers,
>> /Patrik
>
More information about the erlang-bugs
mailing list