<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Sun, Oct 14, 2018 at 11:08 AM Roger Lipscombe <<a href="mailto:roger@differentpla.net">roger@differentpla.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 14 October 2018 at 15:46, Jesper Louis Andersen <span dir="ltr"><<a href="mailto:jesper.louis.andersen@gmail.com" target="_blank">jesper.louis.andersen@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><span><div style="font-family:arial,helvetica,sans-serif"><span style="font-family:Arial,Helvetica,sans-serif">On Sun, Oct 14, 2018 at 2:42 PM Roger Lipscombe <<a href="mailto:roger@differentpla.net" target="_blank">roger@differentpla.net</a>> wrote:</span><br></div></span><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div dir="ltr"><div>If I *don't know* whether the job is going to be CPU bound or I/O bound (it executes arbitrary code provided by a third party), am I safest to just classify the dirty job as CPU-bound? Or is this warning hinting at a disaster of biblical proportions[1] if I even *think* about fudging the classification?</div><div><br></div></div></div></blockquote><div><br></div></span><div><div style="font-family:arial,helvetica,sans-serif">Either classification risks being wrong, so you can't really do any of them safely. The two classifications exist because IO resources and CPU resources tend to orthogonally consumed: If we have many IO bound jobs, we can still run CPU bound jobs and vice versa. But if you don't know what kind of job you are looking at a priori, you have no way to classify it correctly.</div></div></div></div></blockquote><div><br></div><div>Thanks Jesper, I guess my question is rooted in this statement in the docs:</div><div><br></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">"If you should classify CPU bound jobs as I/O bound jobs, dirty I/O schedulers might starve ordinary schedulers."</span></div></div></div></div></blockquote><div><br></div><div>According to git, Rickard Green wrote this, so I'd take it as advice you shouldn't ignore.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>This, to me, implies that I should probably classify unknown jobs as CPU bound, rather than I/O bound, because the documentation only mentions bad things happening one way round.<br></div></div></div></div></blockquote><div><br></div><div>That's probably a good approach. One way to mitigate guessing incorrectly would be to teach your jobs to cooperatively yield, if possible. If there are points within the tasks where you can get them to reschedule themselves, then regardless of where they're running, they'll be giving other jobs a chance to run.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Based on my limited knowledge of how dirty schedulers works, my instinct tells me that classifying jobs as CPU bound when they're I/O bound will probably just be less efficient, whereas classifying jobs as I/O bound when they're CPU bound will result in trying to run too many jobs at once. But I'm just guessing.<br></div></div></div></div></blockquote><div><br></div><div>It would be good if Rickard or Sverker could weigh in here, as I think they know this code best.</div><div><br></div><div>--steve</div></div></div>