[erlang-questions] How to avoid long_schedule issue ?

Alex Feng <>
Thu Dec 29 10:22:49 CET 2016

Hi Daniel,

The answer is very helpful, I have learnt a lot from your answer.
Thank you very much and Happy New Year.


2016-12-29 0:19 GMT+01:00 Dániel Szoboszlay <>:

> Hi,
> First of all, lists:seq/2 is not a BIF, it is a pure Erlang function. And
> the 20-40 ms "long schedule" events are absolutely normal, you should use a
> much larger threshold (I would recommend at least 500 ms) to filter for the
> real outliers only.
> Now let's see what more can I tell you about long schedules!
> When you monitor long schedules, the schedulers will simply note the wall
> clock time when they schedule in a process and compare it with the wall
> clock time when they schedule it out. If more time has passed than the
> threshold you set, you get a message. This unfortunately can be quite a bit
> misleading, as it may include times you wouldn't expect. Just like:
>    - Other OS processes running on the CPU. The OS can preempt the
>    scheduler thread and give the CPU to some other process for 10-20 ms or
>    more. And the scheduler will not know about this interruption. I'm pretty
>    sure this is the reason for your long schedules, Alex.
>    And this is fine as long as the host does not actually become CPU
>    limited. If there are many OS processes fighting for the CPU, you will see
>    horrible long schedules all over the place.
>    - The OS performing some time consuming task for you. Like a page
>    fault that requires reading from the disk swap. Whoops, your scheduler is
>    suspended for tens of milliseconds without noticing it!
>    - The OS performing some time consuming interrupt handling while your
>    scheduler has the CPU.
>    This is my personal favourite, because this means some totally
>    unrelated OS code (like crappy NIC drivers) runs in your process' context,
>    and can log scary looking messages that all seem to come from the pure beam
>    process. Not to mention long schedules, of course.
>    - The time it takes to grab some internal locks in the VM. If the lock
>    is held by a long scheduling process, every other process waiting for the
>    same lock will also long schedule.
>    In our production system for example ~50% of long schedules come from
>    a single monitoring process that periodically collects process info from
>    other processes. Of course if a process long schedules with e.g. 1200 ms,
>    the monitoring process will have to wait up to 1200 ms as well to grab the
>    lock on it required for fetching process info.
> It is also good to know that not all BIF-s are preemptable, and those that
> are, will calculate their reduction cost in very ad-hoc looking ways. For
> example, it looks like that the lists:reverse/2 BIF can process 40 list
> elements per reduction, while lists:keyfind/3 can search 10 list elements
> per reduction. Do you think that reverting 40 list elements and looking
> through 10 list elements would take exactly the same wall clock time?
> Probably not. And they probably won't take exactly as much time either as
> an average reduction when executing your application's Erlang code. These
> reduction cost estimates work fine in most cases, but can be inaccurate
> when you give huge inputs to these functions. If they happen to be too low
> estimates on your system, you may still see long schedules when all your
> BIF-s and NIF-s are nice and preemptable.
> Now if you still have interesting long schedules that you want to debug,
> you need to keep in mind that the schedule in and schedule out functions
> are not necessarily the point where the time is wasted. For example if you
> have a gen_server that - when handling one particular message - calls an
> unfriendly BIF/NIF which doesn't update the reduction count, you will
> typically see that both the schedule in and schedule out points are
> gen_server:loop/6. Nothing will point to the BIF/NIF in the event, so
> good luck finding the offender! You have to consider all execution pathes
> that may lead from the schedule in point to the schedule out point. The
> offender can be any of the functions used on any of these pathes.
> Finally, a bit about finding the ideal long schedule threshold. 10 ms is
> typically too low: it basically means every time the OS schedules out your
> VM thread you will get a long schedule. But you need to consider the
> latency requirements of your application: if you're doing high frequency
> trading or ad bidding or whatever, maybe a 10 ms pause would be too much
> for you. In this case you can use such a low threshold, but be sure to turn
> off swapping and pin your schedulers to cores that are exclusively used by
> the VM, and where you have disabled interrupts, tick handling etc. In
> general, for a system where you need to keep latency under T, it makes
> sense to monitor long schedules with a threshold of ~0.5 T - 0.8 T or so.
> Both heart and the net ticktime of the distribution protocol give you such
> latency requirements: heart needs a heart beat message every 60 seconds and
> the distribution protocol sends one hear beat message every 15 seconds. So
> long schedules in the 15,000 ms range start to interfere with the
> distribution protocol, and above 60,000 ms can kill your node. (These
> limits may sound crazy, but I regularly see ~20,000 ms long schedules in
> our systems. Unfortunately.)
> Hope this helps!
> Daniel
> On Wed, 28 Dec 2016 at 16:23 Max Lapshin <> wrote:
>> I'm also very interested in how to properly interpret these warnings =(
>> _______________________________________________
>> erlang-questions mailing list
>> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161229/c98900f4/attachment.html>

More information about the erlang-questions mailing list