[erlang-bugs] Scheduler Wall Time Statistics live|dead locking a process.
Fred Hebert
mononcqc@REDACTED
Wed Jul 18 15:50:08 CEST 2012
Hi there Patrick!
I can't exactly afford to dump core on one of the servers right now
because yeah, it would interrupt the service. I could however set one VM
up on the server that just sits and calls vmstats and does some busy
test work, pushing it to a fake StatsD server to reproduce it; maybe it
could work.
I can get going setting stuff up; how should I dump the core for it to
be useful for you?
Regards,
Fred.
On 12-07-18 9:44 AM, pan@REDACTED wrote:
> Hi Fred!
>
> On Wed, 18 Jul 2012, Fred Hebert wrote:
>
>> Hi there,
>>
>> If you go on erlang-questions, you'll find the following thread I
>> started regarding one of my gen_servers locking up forever until I
>> try to connect to the VM:
>> http://erlang.org/pipermail/erlang-questions/2012-July/068097.html
>>
>> And the information following it in
>> http://erlang.org/pipermail/erlang-questions/2012-July/068099.html
>>
>> The gist of it is that apparently, the gen_server gets stuck while
>> calling erlang:statistics(scheduler_wall_time). A process info dump
>> on it returns:
>>
>> [{registered_name,vmstats_server},
>> {current_function,{erlang,sched_wall_time,3}},
>> {initial_call,{proc_lib,init_p,5}},
>> {status,waiting},
>> {message_queue_len,2},
>> {messages,[{system,{<5998.7341.243>,#Ref<5998.0.3810.221818>},get_status},
>>
>> {system,{<5998.28757.800>,#Ref<5998.0.3811.260443>},get_status}]},
>> {links,[<5998.918.0>]},
>> {dictionary,[{random_seed,{17770,13214,15044}},
>> {'$ancestors',[vmstats_sup,<5998.917.0>]},
>> {'$initial_call',{vmstats_server,init,1}}]},
>> {trap_exit,false},
>> {error_handler,error_handler},
>> {priority,normal},
>> {group_leader,<5998.916.0>},
>> {total_heap_size,122003},
>> {heap_size,121393},
>> {stack_size,21},
>> {reductions,314325681},
>> {garbage_collection,[{min_bin_vheap_size,46368},
>> {min_heap_size,233},
>> {fullsweep_after,65535},
>> {minor_gcs,23774}]},
>> {suspending,[]}]
>> ok
>>
>> with the interesting parts:
>> {current_function,{erlang,sched_wall_time,3}},
>> {status,waiting},
>>
>> I'm unsure what exactly causes the problem, and we're running the VM
>> with default arguments when it comes to scheduling and layout. It
>> happens even when the virtual machine is under relatively low load
>> (scheduler active wall time is less than 5%, but more than 2% of the
>> total wall time when averaging all cores) and can also happen under
>> higher load.
>
> Ouch... Seems like one of the schedulers does not understand that it
> should report data back to the process. Is there any chance of dumping
> core of a machine where it hangs, or would that mean interruption of
> service? I *really* would like to know what the schedulers are doing
> when they should be reporting back...
>
>
>>
>> Only that process appears affected.
>
> Yes, it's just waiting for a message that does not arrive, one that
> should be sent from the VM when statistics for the scheduler is
> available...
>
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
>>
> Cheers,
> /Patrik
More information about the erlang-bugs
mailing list