[erlang-bugs] Scheduler thread spins in futex_wait and sched_yield

Jebu Ittiachen jebu.ittiachen@REDACTED
Fri Jun 15 12:24:18 CEST 2012


Ah well fixing this for myself looks like this is the scheduler compaction
feature kicking in and compacting the run queues. And I should be disabling
it with +scl false

--
Jebu

On Fri, Jun 15, 2012 at 1:40 PM, Jebu Ittiachen <jebu.ittiachen@REDACTED>wrote:

> Well some more findings on this.
> My workload is typically high and I have seen run queue sizes around 600.
>
> The schedulers always keep dropping off in descending order. First
> scheduler 4 and then scheduler 3. I have not seen it drop after that.
> Apparently the schedulers are ok but the run queue associated to that
> scheduler is empty, atleast that is what i could find by monitoring
> statistics(run_queues). Now the good thing is that I can get the scheduler
> to start ticking again without a restart. A spawn with scheduler set to the
> particular scheduler id gets things back to normal again.
> spawn_opt(fun() -> ok end, [{scheduler, 4}]).
> Spawning with scheduler 4 brings back both 3 and 4.
>
> Extending this I have been able to make sure that schedulers dont drop off
> just by having one long running active process tied to the last scheduler
> id.
>
> Now maybe this is more of a feature than a bug, if so I'm not sure its
> helping because once the runq goes dry I have not seen it come back online
> automatically, this causes a lag in the system as processes are getting
> cleared off by 2 instead of 4 threads. In addition I see this behaviour is
> also present in R14B03.
>
> --
> Jebu
>
>
> On Tue, Jun 12, 2012 at 10:51 AM, Jebu Ittiachen <jebu.ittiachen@REDACTED
> > wrote:
>
>> Hi,
>>   I seem to have hit upon a weird bug in the Erlang scheduler. I'm
>> running R15B01 on Linux 64bit, Erlang compiled with HiPE disabled. Erlang
>> starts up with 4 scheduler threads and everything is ok for a while. After
>> a period of time the CPU usage drops on the machine and things start going
>> slow. top -H shows 2 threads of the 4 running at around 15% and the other 2
>> at 95%. Typically all 4 threads are more or less in the same CPU
>> utilization figures. strace on the process shows the two sluggish threads
>> alternating between calls to futex_wait and sched_yield while the other two
>> are doing a lot of other stuff.
>>
>>   Here is a sample of strace -f -p <pid> |grep <thread id>
>>
>> 20292 sched_yield( <unfinished ...>
>> 20292 <... sched_yield resumed> )       = 0
>> 20292 sched_yield( <unfinished ...>
>> 20292 <... sched_yield resumed> )       = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>>  20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 futex(0x1bf2220, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished
>> ...>
>> 20292 <... futex resumed> )             = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>> 20292 sched_yield()                     = 0
>>
>>   My only option out of this now is to restart the node, when it again
>> runs happily for a while before scheduler threads start dropping off. I'd
>> be happy to provide any more dumps/info that maybe needed to get to the
>> bottom of this.
>>
>> Thanks
>> --
>> Jebu Ittiachen
>> jebu.ittiachen@REDACTED
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20120615/7a8a6f54/attachment.htm>


More information about the erlang-bugs mailing list