[erlang-bugs] Scheduler Wall Time Statistics live|dead locking a process.

Wed Jul 18 15:44:13 CEST 2012

Hi Fred!

On Wed, 18 Jul 2012, Fred Hebert wrote:

> Hi there,
>
> If you go on erlang-questions, you'll find the following thread I started 
> regarding one of my gen_servers locking up forever until I try to connect to 
> the VM: http://erlang.org/pipermail/erlang-questions/2012-July/068097.html
>
> And the information following it in 
> http://erlang.org/pipermail/erlang-questions/2012-July/068099.html
>
> The gist of it is that apparently, the gen_server gets stuck while calling 
> erlang:statistics(scheduler_wall_time). A process info dump on it returns:
>
> [{registered_name,vmstats_server},
> {current_function,{erlang,sched_wall_time,3}},
> {initial_call,{proc_lib,init_p,5}},
> {status,waiting},
> {message_queue_len,2},
> {messages,[{system,{<5998.7341.243>,#Ref<5998.0.3810.221818>},get_status},
> {system,{<5998.28757.800>,#Ref<5998.0.3811.260443>},get_status}]},
> {links,[<5998.918.0>]},
> {dictionary,[{random_seed,{17770,13214,15044}},
>              {'$ancestors',[vmstats_sup,<5998.917.0>]},
>              {'$initial_call',{vmstats_server,init,1}}]},
> {trap_exit,false},
> {error_handler,error_handler},
> {priority,normal},
> {group_leader,<5998.916.0>},
> {total_heap_size,122003},
> {heap_size,121393},
> {stack_size,21},
> {reductions,314325681},
> {garbage_collection,[{min_bin_vheap_size,46368},
>                      {min_heap_size,233},
>                      {fullsweep_after,65535},
>                      {minor_gcs,23774}]},
> {suspending,[]}]
> ok
>
> with the interesting parts:
> {current_function,{erlang,sched_wall_time,3}},
> {status,waiting},
>
> I'm unsure what exactly causes the problem, and we're running the VM with 
> default arguments when it comes to scheduling and layout. It happens even 
> when the virtual machine is under relatively low load (scheduler active wall 
> time is less than 5%, but more than 2% of the total wall time when averaging 
> all cores) and can also happen under higher load.

Ouch... Seems like one of the schedulers does not understand that it 
should report data back to the process. Is there any chance of dumping 
core of a machine where it hangs, or would that mean interruption of 
service? I *really* would like to know what the schedulers are doing when 
they should be reporting back...

>
> Only that process appears affected.

Yes, it's just waiting for a message that does not arrive, one that should 
be sent from the VM when statistics for the scheduler is available...

> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>
Cheers,
/Patrik