[erlang-questions] Why so slow kmeans implementation

Tue Feb 24 10:53:35 CET 2015

On Tue, Feb 24, 2015 at 9:40 AM, Andrea Peruffo
<andrea.peruffo1982@REDACTED> wrote:
> This:
> https://github.com/andreaferretti/kmeans
> is a benchmark of different languages on a kmeans clustering algo.
>
> I made an implementation of it but is terribly slow...
>
> A question is about the data structure "dict" I used, is that the proper use
> case or there is anything better?
> I have checked with "ets" but looks even slower...
>

Using sofs was 10 times faster on my computer:

groupBy(L, Fn) ->
TableId = groupBy(L, Fn, []),
values(TableId).

values([]) -> [];
values([{_, V} | T]) ->
[V | values(T)].

groupBy([H|T], Fn, Acc) ->
    Pair = {erlang:phash2(Fn(H)),H},
    groupBy(T, Fn, [Pair|Acc]);
groupBy([], _, Acc) ->
    L0 = sofs:relation(Acc),
    L = sofs:relation_to_family(L0),
    sofs:to_external(L).

https://github.com/andreaferretti/kmeans/pull/12

/Bjorn

-- 
Björn Gustavsson, Erlang/OTP, Ericsson AB