[erlang-questions] MPI and Erlang. (fun experiment)

Fri Jun 17 10:33:18 CEST 2011

> On Fri, 2011-06-17 at 05:07 +0200, Ale wrote:
>> Hello all,
>> 
>> Just sharing with whom might ever be interested, a fun (for me, might
>> not be for you) experiment. I tried to port a MPI implementation of
>> Simpson's rule[0], I had made for a course in HPC, to Erlang. I wasn't
>> expecting Erlang to beat C... but, well see the results:

(1) You are not comparing apples with apples:  in the C code the 'sin'
    function is provided to the kernel code via a macro definition,
    which means that on an X86 a compiler might (with certain command line
    parameters) inline that and just generate the x87 'sin' instruction;
    in the Erlang code math:sin/1 is passed as a parameter and invoked in
    what I believe to be the slowest available way to invoke a function.

(2) You tell us nothing about how the C code was compiled.
    Between different C compilers and different compilation options,
    a factor of 3 on the same machine is possible.

(3) You tell us nothing about how the Erlang code was compiled.

(4) You have told the C compiler all about the relevant types;
    you have told the Erlang compiler nothing about them.
    This is the kind of code where HiPE can make quite a difference,
    *if* you give it a few clues.

    Take

	compute(Idx, Acc, A, H, Fun) ->
	    X = A + Idx * H,
	    case Idx rem 2 of
	        0 ->
	            NewAcc = Acc + apply(Fun, [X]) * 2;
	        _ ->
	            NewAcc = Acc + apply(Fun, [X]) * 4
	    end,
	    NewAcc.

    as an example.  First off, what _is_ the point of NewAcc?
    Why not write this as

	compute(Idx, Acc, A, H, Fun) ->
	    Acc + Fun(A + Idx*H) * ((Idx band 1)*2 + 2).

    and then give HiPE a hint by writing

	compute(Idx, Acc, A, H, Fun)
	  when is_integer(Idx),
	       is_float(Acc), is_float(A), is_float(H),
	       is_function(Fun, 1)
	    -> Acc + Fun(A + Idx*H) * ((Idx band 1)*2 + 2).

    Of course the most help would come from a -spec, in which
    you can tell HiPe that Fun takes float arguments and delivers
    a float result.

    cycle(Start, End, Step, Fun, Args, Acc) ->
        NextStep = Start + Step,
        case NextStep > End of 
            true ->
                Acc;
            false ->
                NewAcc = apply(Fun, [Start, Acc|Args]),
                cycle(NextStep, End, Step, Fun, Args, NewAcc)
        end.

    cycle(Start, End, Args, Acc) ->
        cycle(Start, End, 1, fun compute/5, Args, Acc).

    Here I note that Fun has arity 5 and is called with [Start,Acc|Args],
    so Args must have 3 elements.  We could do

	cycle(Start, End, [A,H,Fun], Acc) ->
	    cycle(Start, End, 1, Acc, fun (S, R) ->
		compute(S, R, A, H, Fun)
	    end).

	cycle(Start, End, Step, Acc, Fun) ->
	    Next_Step = Start + Step,
	    if Next_Step > End -> Acc
             ; true -> cycle(Next_Step, End, Step, Fun(Start, Acc), Fun)
	    end.

    and it's natural to revise this to

	cycle(Start, End, [A,H,Fun], Acc)
	  when is_integer(Start), is_integer(End),
	       is_float(A), is_float(H), is_float(Acc),
	       is_function(Fun, 1)
	    -> cycle(Start, End, 1, Acc, fun (I, R) ->
	           R + Fun(A + I*H)*((I band 1)*2 + 2)
	       end).

	cycle(Start, End, Step, Acc, Update)
	  when is_integer(Start), is_integer(End), is_integer(Step),
	       is_float(Acc),
	       is_function(Update, 2)
	    -> Next_Step = Start + Step,
	       if Next_Step > End -> Acc
		; true -> cycle(Next_Step, End, Step, Update(Start, Acc), Fun)
	       end.

    As before, a more up to date way to help HiPe is to use -spec declarations,
    which can tell the compiler what the arguments of Update are and what the
    result is.

The big issue here is that with adequate type information, HiPe can avoid
boxing and unboxing a lot of floating point numbers, so it's quite likely
that you can get a factor of 2 to 4 speedup fairly easily.