[erlang-questions] Performance Testing with C++, Java, Ruby and Erlang

Fri Sep 7 15:22:41 CEST 2007

shahzad bhatti wrote:
> I posted that blog entry yesterday and didn't expect that much 
> controversy. I am learning Erlang and just trying any interesting 
> distributed problems I see. This was not meant as Erlang bashing, 
 > in fact I like Erlang a lot.

With comments enabled, you might have received a lot of helpful
pointers on how to improve the code, but this forum will also
work. I've not spent much time trying to understand the outline
of your code, but some comments about programming style might
be in order:

You should strive to eliminate the get() and put() operations,
and have the functions return all the structures that were
modified. For example:

355 set_node_value(Round) ->
356     Config = get(config),
357     ProcIds = lists:seq(0, config:get_num_procs(Config)-1),
358     lists:foreach(
359         fun(Id) ->
360            set_node_value(Round, Id) end, ProcIds).
361
362 set_node_value(Round, Id) ->
363     L = repository:get_rank_list(Round, Id),
364     lists:foreach(
365         fun(Path) ->
366             set_node_value(Round, Id, Path) end, L).
367
368 set_node_value(_Round, _Id, Path) ->
369     Nodes = get(nodes),
370     Node = get_node(Path),
371     Value = find_majority(Path),
372     Nodes1 = dict:store(Path, node:set_output(Node, Value), Nodes),
373     put(nodes, Nodes1).

The set_node_value/1 function is called from within a lists:foreach(),
when you should rather use e.g. lists:foldl() with Nodes as an
accumulator. That would eliminate all the get() and put() operations,
as the updated Nodes variable is passed on to the next iteration.

set_node_value(Round, #config{proc_ids = Proc_ids, nodes = Nodes}) ->
    lists:foldl(
        fun(Id, Ns) ->
            set_node_value(Round, Id, Nodes)
        end, Nodes, ProcIds).

set_node_value(Round, Id, Nodes) ->
     L = repository:get_rank_list(Round, Id),
     lists:foldl(
         fun(Path, Ns) ->
             Node = get_node(Path),
             Value = find_majority(Path),
             dict:store(Path, node:set_output(Node, Value), Ns).
         end, Nodes, L).

These are just minor changes, to illustrate the point:

- No need to build Proc_ids in every iteration. Do it once,
   since it's static, and store it in your record. Also, keep
   nodes in the record, rather than in the process dictionary.
   Besides, each Erlang process has a unique Id - no need to
   invent a corresponding value, when the Pid will do nicely.
- No need to have a separate module that selects attributes
   from a record. It's much faster to do it inline, using
   record syntax.

You can apply this to many places in the code.
And if you then look at find_majority/1:

393 find_majority(Path) ->
394     Counts = dict:new(),
395     Counts1 = dict:store(?ONE, 0, Counts),
396     Counts2 = dict:store(?ZERO, 0, Counts1),
397     Counts3 = dict:store(?UNKNOWN, 0, Counts2),
398     put(counts, Counts3),
399     L = repository:get_children_path(Path),
400     N = length(L),
401     lists:foreach(fun(Child) ->
402         increment_count(Child) end, L),
403     Counts4 = get(counts),
404     OneCount = dict:fetch(?ONE, Counts4),
405     ZeroCount = dict:fetch(?ZERO, Counts4),

Here, you rely on the process dictionary to save
the side-effect of increment_count/1, so that you
can retrieve Counts4 right afterwards. You also use dict
to keep track of the only three counters that
increment_count/1 can update. It would be _much_ more
efficient to simply fold over the list with the counters
as an accumulator:

lists:foldl(fun(Child, {C1,C2,C3}) ->
               Node = get_node(Child),
               Node of ?ONE     -> {C1+1,C2,C3};
                       ?ZERO    -> {C1,C2+1,C3};
                       ?UNKNOWN -> {C1,C2,C3+1}
               end, {0,0,0}, L)

(There are several other ways to write this, achieving
the same effect.)

Finally, in repository.erl:

519 table_lookup(Tab, Key) ->
520     Result = ets:lookup(Tab, Key),
521     case Result of
522         []  ->
523             [];
524         error ->
525             [];
526         {ok, Value} ->
527             lists:reverse(Value);
528         [{Key, Value}] ->
529             lists:reverse(Value)
530     end.

ets:lookup/2 _only_ returns a (possibly empty) list
of objects. error and {ok,Value} are not possible
return values.

Also, it seems odd that you reverse Value at every
lookup, but prepend to the (reversed) list at insert.
The insert sequence [1,2,3,4,5] would then give the
value list [5,3,1,2,4]. Is this really what you want?
If so, you should write a comment explaining the
virtues of the arrangement.

BR,
Ulf W