[erlang-questions] Why so slow kmeans implementation
Björn Gustavsson
bjorn@REDACTED
Tue Feb 24 10:53:35 CET 2015
On Tue, Feb 24, 2015 at 9:40 AM, Andrea Peruffo
<andrea.peruffo1982@REDACTED> wrote:
> This:
> https://github.com/andreaferretti/kmeans
> is a benchmark of different languages on a kmeans clustering algo.
>
> I made an implementation of it but is terribly slow...
>
> A question is about the data structure "dict" I used, is that the proper use
> case or there is anything better?
> I have checked with "ets" but looks even slower...
>
Using sofs was 10 times faster on my computer:
groupBy(L, Fn) ->
TableId = groupBy(L, Fn, []),
values(TableId).
values([]) -> [];
values([{_, V} | T]) ->
[V | values(T)].
groupBy([H|T], Fn, Acc) ->
Pair = {erlang:phash2(Fn(H)),H},
groupBy(T, Fn, [Pair|Acc]);
groupBy([], _, Acc) ->
L0 = sofs:relation(Acc),
L = sofs:relation_to_family(L0),
sofs:to_external(L).
https://github.com/andreaferretti/kmeans/pull/12
/Bjorn
--
Björn Gustavsson, Erlang/OTP, Ericsson AB
More information about the erlang-questions
mailing list