Erlang on the niagara
Joe Armstrong (AL/EAB)
joe.armstrong@REDACTED
Thu Jun 8 16:50:11 CEST 2006
I've been having fun :-)
How can you parallelise an existing (Erlang) program?
As an experiment I parallelised my document generator, this is a program
which batch converts my new wiki-file documentation format into HTML.
As a small test I ran this on a directory of 63 wiki files which are
converted into HTML files, totalling 1.35 MBytes of HTML (ie 21.9
KBytes/file on average) - this is a very typical batch processing
task, with an extremely simple logical structure.
I usually write batch converters something like this:
all() ->
Dir = "../wik",
Files = file:files(Dir, "*.wik"),
lists:foreach(fun(I) -> xform_file(I) end, Files),
...
Parallelising this is easy, all I needed was a simple "parallel
foreach",
actually I just made a pmap (parallel map) and used that instead. pmap
is easy:
pmap(F, L) ->
S = self(),
Pids = map(fun(I) ->
spawn(fun() -> do_f(S, F, I) end)
end, L),
gather(Pids).
gather([H|T]) ->
receive
{H, Ret} -> [Ret|gather(T)]
end;
gather([]) ->
[].
do_f(Parent, F, I) ->
Parent ! {self(), (catch F(I))}.
Then I just replaced the call to lists:foreach with pmap
Now I could run on the niagara.
The results were as follows:
#CPUs Speedup
1 0,953
2 1,855
3 2,679
4 3,44
5 4,012
6 4,624
7 5,093
8 5,46
9 5,73
10 6,11
11 6,108
12 6,47
13 6,58
14 6,8
15 6,67
16 7
17 6,99
18 7,29
19 6,97
20 6,97
21 6,74
22 6,86
23 7,07
24 6,85
What I'm measuring here is the speedup as a function of the number of
schedulers enabled in the SMP erlang. The speedup factor is just
computed
as the pmap/foreach ratio - ie I did the conversion twice, one with
foreach
the second time with pmap.
As you can see we get near linear speedup for 1-7 CPUs - then a gradual
tailing off, with a plateau reached at 16 CPUs, above 16 CPUs we can do
no better.
Realistically a speedup of 7 was achieved.
Why can't we speed up beyond a factor 7 - who knows? - at some stage
things do get serialised, there is after all only one disk on the
machine, and the SMP erlang has to serialise all disk I/O, even if the
programs generating data run in parallel, also, the CPU caches will
get filled and have to swap into shared memory.
Nevertheless, I'm encouraged by this result? Why? because
- this kind of program is a typical of a large class
of programs (ie a lot of my program just do boring things
to large numbers of files)
Usually batch processes on individual files are sequential
often there is not much intrinsic concurrency in a typical
file In -> file Out program.
The concurrency comes from the fact that the top loop of
the program can be changed from a:
forAllFiles Do ... end
structure to a:
forAllFiles DoInParallel ... end
structure.
Now this only requires a ONE LINE change to the program
(change a foreach to a pmap)
- My program went 7 times faster.
At this stage this is very nice - usually when a program is
complete
there is little that can be done to speed it up - since I
usually
chose my data structures and algorithms with some care. Sure I
*could* speed things up with messy optimised code, but the code
would
be less beautiful, and more difficult to maintain.
Changing a foreach to a pmap means the programs goes faster
(x 7) AND stays beautiful - the best of both worlds.
More information about the erlang-questions
mailing list