<div class="gmail_quote"><blockquote style="margin: 0px 0px 0px 0.8ex; padding-left: 1ex; border-left-color: rgb(204, 204, 204); border-left-width: 1px; border-left-style: solid;" class="gmail_quote"><div style="word-wrap: break-word;">
<div><div class="im"><div> </div></div><div>BEAM is starting to make use of NUMA, for example when allowing you to control the binding of schedulers to cores. See e.g. </div><div><br></div></div></div></blockquote><div> </div>
<div>The real efficiencies come from processes (and their working set) being bound to particular spots in the memory hierarchy, though.</div><div>I imagine that, because processes have affinity for schedulers, then schedulers being bound to cores would help some, but when the runtime decides to migrate a process, it probably needs to consider how far down the memory hierarchy to "fork" the process. (These days, we have to worry about L1, L2, L3 and NUMA nodes as distinct points in this choice)</div>
<div>Similarly, if one queue is falling behind, and another queue on the same core (using hyper-threading) is lightly loaded, it doesn't really make sense to shift load between those two hyper-threads as they are limited on the same functional units and L1.</div>
<div>It sounds like you're already considering these things; just making sure it's stated explicitly in this thread!</div><div> </div><blockquote style="margin: 0px 0px 0px 0.8ex; padding-left: 1ex; border-left-color: rgb(204, 204, 204); border-left-width: 1px; border-left-style: solid;" class="gmail_quote">
<div style="word-wrap: break-word;"><div> </div><div> </div><div>Yes, but one thing I learned while at Ericsson was that NEBS-compliant ATCA processor boards don't exactly stay on the leading edge of processor capacity.</div>
</div></blockquote><div> </div><div>That is of course a consideration. On the other hand, I imagine projects run over several years, and what's best price/performance in data centers today will likely trickle down to the telecom industry eventually :-)</div>
<div> </div><div> </div><blockquote style="margin: 0px 0px 0px 0.8ex; padding-left: 1ex; border-left-color: rgb(204, 204, 204); border-left-width: 1px; border-left-style: solid;" class="gmail_quote"><div style="word-wrap: break-word;">
</div></blockquote><blockquote style="margin: 0px 0px 0px 0.8ex; padding-left: 1ex; border-left-color: rgb(204, 204, 204); border-left-width: 1px; border-left-style: solid;" class="gmail_quote"><div style="word-wrap: break-word;">
<div></div><div>The key, in my experience, is not usually to go as fast as possible, but to deliver reliable and predictable performance that is good enough for the problem at hand. As many have pointed out through the years, Erlang was never about delivering maximum performance.</div>
<div class="im"><div> </div><div> </div></div></div></blockquote><div> </div><div>I'm with you there! If I know that I can add another node to get close to linear X% increased capacity, that's really powerful.</div>
</div><div> </div><div>Sincerely,</div><div> </div><div>jw</div><div><br> </div>