[erlang-questions] Application granularity (was: Parallel Shootout & a style question)
Fri Sep 5 08:33:14 CEST 2008
... in response to a flurry of messages about automatically
parallelizing list comprehensions ...
If I go through all the code I have written and count the number of
list comprehensions relative to the total amount of code, there are
not that many occurrences to worry about. The length of each
comprehension's data set is not that great in typical code, and
unless it is a very data parallel algorithm such as matrix
manipulation, there is little to be gained overall. Mats toss off of
10% would over-estimate the likely benefit in the code I typically
have to implement.
... also references to tuning the number of processes to the number
of cores ...
Tuning to a specific hardware configuration is folly unless you have
only one implementation site and never plan on modifying the setup.
I really would not recommend this approach to programming, unless you
have a specific problem that can only be achieved today by a
carefully tuned solution. I think the majority of cases do not fall
in this boat.
In general, the erlang approach is to isolate sequential code within
a collection of processes. The great effort comes in architecting a
good organization and hierarchy of logic so that failures and
efficiency are spread to maximum effect. What is desired is an
efficient and responsive _application_ rather than an efficient
snippet of code sprinkled here and there.
In terms of performance, I look to scalability -- running on a newer
machine should run faster without any tweaks. Tweaks may improve
things more, but they should be unnecessary to get the basic speed up.
Quite a while back (a couple decades) I remember hearing about
attempts to parallelize code. No one could seem to get a linear
speed up with the number of processors. One day it was announced
that a direct linear speed up had been achieved and it seemed the
number of processors could be increased without loss of linearity.
This alchemy was performed by turning the approach upside down.
Instead of trying to decompose an algorithm into components that were
independent and could efficiently parallelize, the implementors chose
to multiply the problem by a few orders of magnitude. They
replicated the algorithm and scaled up the problem to produce more
work than the processors could achieve. Adding more processors just
made it run faster.
Over the last few years I have been contemplating the state of
applications, operating systems and the benefits that erlang offers.
The biggest advantage is that processes are lightweight and can be
treated as equivalent to data structures when designing an
architecture. Doing so affords an approach to constructing
applications that is far different from the monolithic structures
that we currently face, where one failure crashes your entire browser
(at least until Google Chrome came out).
Instead of futzing with automating the handling of a single vector, I
submit you should spend your time trying to figure out how to
structure your application so that it can have at least 1000
processes. When you move from 4 core to 8 to 32 or 64, you should
see linear speed up in your application without modifying anything.
And all the compiler tools that we currently use will work to your
advantage without change.
If your application ends up with a bunch of large vectors and lots of
computation, partition the data to make lots of processes. If it
doesn't have large data or computational requirements, partition the
software components so that they are easier to test and debug and
they can operate on separate processors or cores.
With the future of hardware continuing towards many core, the new
measure of the quality of application architecture will be the
granularity of the components. They should be small, task specific,
fault isolating, transparently distributable and interfaced with
minimal messaging. Whenever you are confronted with making an
algorithm more complicated versus keeping it simpler by introducing
more processes, go with more processes. If your first implementation
is fast enough (even if it is 10% slower than it could be), future
upgrades will automatically scale.
I believe the compiler writers and tool builders should focus on
making it easier to produce more numerous, but smaller processes,
rather than trying to make the sequential code eke out an additional
10% of performance. I want my code to run 1000x faster when I get a
1000 core machine. I likely will need 100,000 processes to realize
More information about the erlang-questions