[erlang-questions] Dirty NIF - classifying as CPU or I/O bound

Steve Vinoski vinoski@REDACTED
Sun Oct 14 17:29:12 CEST 2018


On Sun, Oct 14, 2018 at 11:08 AM Roger Lipscombe <roger@REDACTED>
wrote:

> On 14 October 2018 at 15:46, Jesper Louis Andersen <
> jesper.louis.andersen@REDACTED> wrote:
>
>> On Sun, Oct 14, 2018 at 2:42 PM Roger Lipscombe <roger@REDACTED>
>> wrote:
>>
>>> If I *don't know* whether the job is going to be CPU bound or I/O bound
>>> (it executes arbitrary code provided by a third party), am I safest to just
>>> classify the dirty job as CPU-bound? Or is this warning hinting at a
>>> disaster of biblical proportions[1] if I even *think* about fudging the
>>> classification?
>>>
>>>
>> Either classification risks being wrong, so you can't really do any of
>> them safely. The two classifications exist because IO resources and CPU
>> resources tend to orthogonally consumed: If we have many IO bound jobs, we
>> can still run CPU bound jobs and vice versa. But if you don't know what
>> kind of job you are looking at a priori, you have no way to classify it
>> correctly.
>>
>
> Thanks Jesper, I guess my question is rooted in this statement in the docs:
>
> "If you should classify CPU bound jobs as I/O bound jobs, dirty I/O
> schedulers might starve ordinary schedulers."
>

According to git, Rickard Green wrote this, so I'd take it as advice you
shouldn't ignore.

This, to me, implies that I should probably classify unknown jobs as CPU
> bound, rather than I/O bound, because the documentation only mentions bad
> things happening one way round.
>

That's probably a good approach. One way to mitigate guessing incorrectly
would be to teach your jobs to cooperatively yield, if possible. If there
are points within the tasks where you can get them to reschedule
themselves, then regardless of where they're running, they'll be giving
other jobs a chance to run.

Based on my limited knowledge of how dirty schedulers works, my instinct
> tells me that classifying jobs as CPU bound when they're I/O bound will
> probably just be less efficient, whereas classifying jobs as I/O bound when
> they're CPU bound will result in trying to run too many jobs at once. But
> I'm just guessing.
>

It would be good if Rickard or Sverker could weigh in here, as I think they
know this code best.

--steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20181014/e9b8e74e/attachment.htm>


More information about the erlang-questions mailing list