Erlang R13B Multicore Efficiency Questions

Tue Jun 2 18:25:34 CEST 2009

Hello again list members,

I have a few questions about the efficiency of Erlang using multiple
core architectures and SMP enabled kernels.  I am in the process of
testing Erlang for a high performance computing application in a
virtualized clustering environment, using a fibre channel storage fabric
and VMware vSphere ESXi 4.0 as the hypervisor.  With vSphere ESXi 4.0
you have the option of passing up to 8 physical cores into a single VM,
with unprivileged instructions being executed by the VM directly on the
hardware cores; in VMware bare metal hypervisor environments, the CPU is
not virtualized to get near-native performance inside the VM.

With a multiple core architecture (say for example an Intel 5300 series
quad core with two hardware processors on the motherboard) there are two
camps of thought within the world of virtualized clustering:

1) Assign only a single CPU core to each VM and use a single processor
kernel/hardware abstraction layer inside the VM's guest operating
system; let the hypervisor handle scheduling and distribution of CPU
resources across multiple VMs.
2) Assign multiple cores into a single VM, use an SMP-enabled
kernel/hardware abstraction layer inside the VM's guest operating
system, and rely upon the guest OS' SMP-capabilities and the guest OS
application's ability to maximize concurrency.

Each architecture has its merits, for example processor affinity can be
used with Option #1 to "pin" a VM onto a specific core.  For a high
performance cluster I could create 8 VMs on a single host, pin VM1 to
core0, VM2 to core1, VM3 to core2 etc and therefore get an 8:1 hardware
consolidation ratio with near-native performance.  The downsides to this
architecture is that extra memory and CPU resources are expended on the
guest OS for each of the 8 VMs, although VMware vSphere/VI3.5 use
transparent page sharing to "deduplicate" redundant 4K memory pages
provided each VM has similar architecture and guest OS components.

Option #2 has the benefit of a single VM without the overhead of 7
additional VMs, but this requires that the guest OS has robust SMP
support and the application (Erlang) has the ability to take advantage
of multiple core architectures.  An added benefit to this architecture
is that VMware ESX/ESXi use a relaxed co-scheduling method that will
detect the idle loop being executed in secondary cores and will
de-schedule those CPU resources to be used by other VMs, so in theory
when those extra cores that are assigned to a single VM are not being
used the hypervisor will free them up for other VMs.

So the question is this - how robust is the SMP/multicore capabilities
of Erlang R13B, and would multiple CPU cores passed to a single Erlang
node recognize the same level of efficiency and utilization as say for
example 8 separate non-SMP VMs, each with a single dedicated CPU core
using hypervisor processor affinity?

Thanks in advance

Greg