<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Hi Scott (and Joe)!<br>

      <br>

      Thank you for these tests! <br>

      <br>

      I would say Joe's comment at the end of the test10 gist says it

      all, and is spot on:<br>

      <br>

      "This isn't just a NIF problem. Any code that sits in C land and

      doesn't accurately contribute towards scheduler reductions can

      case

      this. So, BIFs that don't estimate work and perform BIF_TRAPs are

      also

      bad. Turns out that that the commonly used <code>term_to_binary</code>

      and

      <code>external_size</code> BIFs have this problem. "<br>

      <br>

      Joe points out a couple of misbehaving BIF's and NIF's which will

      cause this, breaking the scheduling algorithm. I bet there's more

      of them. I can see several problems that needs to be fixed:<br>

      <br>

      1) OTP should of course not have code (BIF's or NIF's or whatever)

      that does not even bump reductions or trap properly. <br>

      2) If writing NIF's, you should have a way to monitor the

      scheduler behavior to easily find long schedules. DTrace is nice,

      but not available everywhere...<br>

      3) If writing NIF's, you should have a simple way to put the

      execution of your code in a separate worker thread.<br>

      <br>

      The answer to (1) is that we continue (or intensify) our work when

      it comes to adding proper reductions and trapping to BIF's (and

      NIF's). A first step would be to just add proper reductions to all

      relevant BIF's, which is fairly easy to do. Whenever there's a BIF

      whose work depends on the size of the input, it should also at

      least add a cost to the process that's proportional. Some old

      BIF's does not do even that, which really needs to be fixed.

      Contributions are always welcome... term_to_binary and

      external_size are already being worked on, but there's most

      probably more problem BIF's out there...<br>

      <br>

      One step towards (2) is the ability to monitor long schedules in

      the system. I've extended erlang:system_monitor/2 to have an

      option to monitor all schedules and port operations that run for

      more than a specified amount of wall clock time. That should at

      least help in identifying such problems (the code is not in maint

      yet, but will be soon). More monitoring options, to see the

      scheduler behavior may be needed, but this is at least a start. As

      an example, monitoring long schedules in test10, will inform you

      that the processes run uninterrupted for a whopping 1,5 *seconds*.

      Just adding reduction cost to the md5 calls will reduce this to a

      tenth of the scheduling time of course.<br>

      <br>

      The answer to (3) is "dirty schedulers", which is in the roadmap

      for R17.<br>

      <br>

      I think all three things need to be done for the scheduling to

      work properly, but not only for that. A schedule that takes too

      long, also breaks real time properties of the VM, so fixing this

      by poking the schedulers to wake up at certain intervals just

      handles one symptom, but does not remove the cause and does not

      cure the impact on real time behavior...  <br>

      <br>

      So - it's not the scheduling algorithms as such that results in

      this problem, it's still a problem with uninterrupted C-code.

      These examples shows that some (or many) of our BIF's need to be

      fixed, that we need to intensify the work on monitoring options

      and that we need dirty schedulers. At least that's how I see it.<br>

      <br>

      Cheers,<br>

      Patrik<br>

      <br>

      On 05/01/2013 12:13 AM, Scott Lystig Fritchie wrote:<br>

    </div>

    <blockquote cite="mid:83533.1367359995@snookles.snookles.com"

      type="cite">

      <pre wrap="">Patrik, there are a couple of synthetic load cases that have an end

result of what we occasionally see Riak and Riak CS doing in the wild.

Manymany thanks to Joseph Blomstedt for inventing these two modules.

  test10.erl:

    <a class="moz-txt-link-freetext" href="https://gist.github.com/jtuple/0d9ca553b7e58adcb6f4">https://gist.github.com/jtuple/0d9ca553b7e58adcb6f4</a>

  test11:erl:

    <a class="moz-txt-link-freetext" href="https://gist.github.com/jtuple/8f12ce9c21471f5d6f01">https://gist.github.com/jtuple/8f12ce9c21471f5d6f01</a>

Both can be used by running the 'go/0' function.

The test10:go() function creates an oscillation between a couple of

workloads: one that tends toward scheduler collapse, and one that tends

to wake them up again.

The test11:go() function uses only a single load that tends toward

scheduler collapse.

Both of them fail mostly regularly on my 8 core MBP using R15B01,

R15B03, and R16B.

The io:format() messages are sent while load is not running, with very

generous pauses before starting the next phase of workload.  If you call

io:format() during unfairly-scheduled workload (which these tests excel

at doing), the messages can be delayed by dozens of seconds.

Note that these synthetic tests are using two different functions to

cause scheduler collapse: test10.erl with crypto:md5_update/2, a NIF,

and test11.erl with erlang:external_size/1, a BIF.  It's quite likely

that erlang:term_to_binary/1 is similarly effective/buggy.

Neither of them fails when using this patch on any of those three VM

versions:

    <a class="moz-txt-link-freetext" href="https://github.com/slfritchie/otp/compare/erlang:maint...disable-scheduler-sleeps">https://github.com/slfritchie/otp/compare/erlang:maint...disable-scheduler-sleeps</a>

  or

    <a class="moz-txt-link-freetext" href="https://github.com/slfritchie/otp/tree/disable-scheduler-sleeps">https://github.com/slfritchie/otp/tree/disable-scheduler-sleeps</a>

... when also using "+scl false +zdnfgtse 500:500".

-Scott

</pre>

    </blockquote>

    <br>

  </body>

</html>