[erlang-questions] Why so slow kmeans implementation

Tue Feb 24 18:28:02 CET 2015

The main problem is the behaviour of dict:append/3. From documentation:

For example append/3 could be defined as:

append(Key, Val, D) ->
    update(Key, fun (Old) -> Old ++ [Val] end, [Val], D).

dict module is implemented in pure erlang which means Old ++ [Val] is what
exactly happens behind. When Old part is growing it becomes the problem due
its quadratic complexity. The solution is use dict:update/4 in the way:

preppend(Key, Val, D) ->
    dict:update(Key, fun (Old) -> [Val|Old] end, [Val], D).

With this fix, it is around 24 times faster on mine computer.
https://github.com/andreaferretti/kmeans/pull/14

Faster process dictionary version is available at
https://github.com/pichi/kmeans/tree/proc_dict
It is not good practice but can be an inspiration how to write fast code
when there is not other option like NIF, C-port or C-node.

On Tue, Feb 24, 2015 at 9:40 AM, Andrea Peruffo <
andrea.peruffo1982@REDACTED> wrote:

> This:
> https://github.com/andreaferretti/kmeans
> is a benchmark of different languages on a kmeans clustering algo.
>
> I made an implementation of it but is terribly slow...
>
> A question is about the data structure "dict" I used, is that the proper
> use case or there is anything better?
> I have checked with "ets" but looks even slower...
>
> Is there anything wrong with the provided implementation?
>
> Thanks in advance.
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150224/3d423a7c/attachment.htm>