[erlang-questions] We need a better marketing division :-)

Wed Jan 12 14:05:55 CET 2011

On Wed, Jan 12, 2011 at 10:49, Ulf Wiger <ulf.wiger@REDACTED> wrote:
> …or Haskell, perhaps. :)
>
> The proposal in question means to design a DSL for large-scale
> "embarrassingly parallel" computing. This is not really a domain where
> Erlang is the obvious choice (although it wouldn't necessarily be a very
> bad one). Haskell, OTOH, has some really great facilities for creating DSLs
> and also some pretty advanced research on data parallelism.

The proposal looks to be significantly less ambitious than Haskell. If
the goal is to support the *heterogenous* environment of a CPU/GPU
fusion merger, then the only problems that will run fast are those
which are

  * Parallelizable: This includes problems which are embarrassingly
parallel, but also those for which it is possible to hide the latency
of communication by doing something else while waiting for information
to transfer between execution units. Traditionally, this is not that
well supported by GPUs.

  * LinAlgibrazible: In the sense you need to transform the problem
into Linear Algebra, essentially speaking. To a certain extent this is
true for all programming, but it is much easier to do with, e.g.,
video decoding or raytracing than symbolic manipulation in a compiler.

  * Sizable: Every block of data operated on must be of rougly equal
size. Otherwise throughput is wasted.

  * BasicBlockable: GPUs operate as an old army infantry unit. If the
sergeant yells "FORWARD!", then the whole unit does so, even though it
consist of 480 people/cores. Same with "STOP!" or "FIRE!". 480 muskets
firing is of course quite dangerous, but it has ramifications to
computation because it means that GPUs prefer long Basic blocks of
code with no jumps.

Haskells approach does not per se expect a heterogenous environment
and they are researching into Nested Data Level Parallelism in order
to beat the 'Sizable' problem above. It makes their parallelism
approach far more general. Yet, since Haskell is pure, they have a
very clean slate to work on. In other words: If they can't solve the
problem they face, then there is little hope anybody else will in the
foreseeable future.

And Ulf is right: Traditionally, this is not the domain of Erlang at
all. Erlang excels at harnessing concurrency first and foremost. None
of those problems will be solved with the Scala/Haskell approach,
neither GPUs. If history is a teacher of worth, the GPU-idea will die
again but for supercomputing. The Commodore Amiga of the 80'es had a
"GPU' as a 2d blitter which made it superior to the PCs at the time.
But it lost because the CPUs in the PCs became so fast they could
compete. We've had numerous vector-computer incarnations which were
fast at vector operations, yet still did not break through. Intel
clearly bets on a large amount of CPU-cores at the moment, perhaps
with small SSE-like instruction sets tied to them. And Multicore ARM
looks like the same traditional design. Considering the size of the
heat sinks on GPUs, we can't claim that they are not currently
power-efficient, though that might in principle change.

Yet, Erlang has another problem it is solving at the moment: If we
have a concurrent description of a program, which is essentially what
we have in Erlang, then we can try to map this program to multiple
cores. We probably won't achieve perfect speedup, but that hardly
matter. We just got a much higher throughput for a highly concurrent
program. These programs/problems are not addressed by my points above
- so it seems we have a nice niche there, together with perhaps Google
Go :)

> Perhaps I'm reading too much into it, but as many of you know, I have
> been crusading a bit around the "state-space explosion" problem, arguing
> that it's not messaging per-se, but the event-handling semantics
> (lack of selective event handling) that explodes the state space. I'd say
> that we have plenty of empirical evidence that Erlang is great for keeping
> the state space from growing out of control.

I just think they are viewing the state space incorrectly. If you line
it up as a list:

[S1, S2, ..., Sn]

where each Si is a state of a process then of course, it gets nasty.
The interleaving of instructions goes through the roof here as any
state can be advanced if that process is chosen to run. Correctness
arguments now need to take the whole list into account meaning much
more work and an exploding state space. But Erlang programs are not
like this in reality, even though you could describe its semantics
like it. What happens in Erlang programs are that the state space is
given by

{{S11, S12, ...}, {...}, ...}

in other words as a tree of states. Why? Because of *isolation* of
internal process state, we can take a group of processes with state,
smack a label on one of the processes and call it an "API". The rest
of the system can then view the whole group as a single state with a
few well-designed invariants. In some cases the group is *stateless*
to the outside, so it acts like a functional program. Essentially it
ties in with the ability of selective receive and it dampens the
state-space-explosion. States in a group gets interlocked to each
other and they can only advance as a group on the larger scale.
Internally, there might be interleaving which is problematic, but it
has closure and is limited to the group.

Take this view, and the state-space-explosion problem can be tackled.

> The interesting thing is that Scala *does* support erlang-style concurrency,
> to a greater extent than most languages. Why, then, would they say this?

It is a grant proposal, so it is important it breeds the fear of
impending doom of computation upon us so the grant is given :)

> The intersection between language, libraries, prominent applications and
> community somehow define the core set of concepts. For Erlang, lightweight
> concurrency, fault-tolerance and powerful message passing are all right
> smack in the core, and thus permeate everything we do, for better and for
> worse.

Indeed. The approach to heterogenous computation is right now at the
point where you define a small program called a *kernel* which is then
automatically run on each core in the infantry unit. Each soldier gets
its own target (as in: its own piece of the vector-array to process).
The CPU-sarge then yells "FIRE!" and counts the dead amongst the
enemy. The language of the kernel can be chosen freely although for
familiarity a C- or FORTRAN-like syntax and semantics tend to be
chosen. The rest is simply plumbing to get it to run correctly and
marshal data around in the system. You could add this if you wanted to
Erlang - but we already have NIFs that gives you a practical solution
here-n-now and for the core value of Erlang, it wouldn't do that much.

I'd much rather gamble on a world with multicore CPUs for Erlang.
These already exist and they are needed very much for servers.
Currently, imagining running a web-server off of your GPU is distant
and not even in the horizon. So even while your desktop computer might
go the CPU/GPU heterogenous way, I don't think the servers will - for
anything but supercomputing needs.

-- 
J.