<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto">Thanks,<div><br></div><div>I did what you suggested, I’m getting best result on CPU pinning on single NUMA0 but in that case I’m wasting my CPU resources. When I trying to CPU pinning with dual NUMA then performance is 50% less, I have use all available option to correct CPU Topology, threads sibling on same core etc. but still erlang doesn’t like dual NUMA VM. </div><div><br></div><div>It feels like erlang understand where I’m running and trying to adjust itself or restrict for something. Same erlang working fine on bare metal but not on VM with equal amount of CPU and memory. <br><br><div dir="ltr">Sent from my iPhone</div><div dir="ltr"><br><blockquote type="cite">On Feb 7, 2020, at 8:45 AM, Ameretat Reith <ameretat.reith@gmail.com> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Feb 7, 2020, 4:47 PM Satish Patel <<a href="mailto:satish.txt@gmail.com">satish.txt@gmail.com</a>> wrote:</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
My openstack compute host with 32 core so i have created 30 vCPU core<br>
vm on it which has all NUMA awareness, when i am running Erlang<br>
application benchmark on this VM getting worst performance but then i<br>
create new VM with 16 vCPU core (In this case my all VM cpu pinned<br>
with Numa-0 node) and in this case benchmark result was great.<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">I don't know about how OpenStack utilize NUMA, but I had same experience benchmarking QEMU VMs; best VMs were the ones all vCPUs pinned to cores on single host NUMA node. Next to them, were VMs that half CPUs pinned to cpuset in one NUMA node and other to another node, Of course I set proper NUMA topology by QEMU args. Worst performance were VMs with no pinning/affinity setting which means, QEMU defined two NUMA nodes (as host) while vCPUs getting swapped over different cores on different real NUMA nodes. I see it expectable. My tests were targetting nothing Erlang based, it was redis and some other things.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
But interesting thing is when i run same erlang application run on<br>
bare metal then performance is really good, so trying to understand<br>
why same application running on VM doesn't perform well?<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"></blockquote></div></div><div dir="auto">I think because Erlang is smart about NUMA but in virtualized environment, its knowledge about NUMA nodes are not reliable without CPU pinning.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Is there any setting in erlang to better fit with NUMA when running on<br>
virtual machine?<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">I don't believe it's something Erlang could offer. I would just test pinning VM vCPUs to different CPU sets based their NUMA node and set proper topology on VM. Then i expect Erlang utilize NUMA nodes as smart as it does on bare metal.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
</blockquote></div></div></div>
</div></blockquote></div></body></html>