[erlang-questions] Why so slow kmeans implementation

Hynek Vychodil vychodil.hynek@REDACTED
Tue Feb 24 18:28:02 CET 2015

The main problem is the behaviour of dict:append/3. From documentation:

For example append/3 could be defined as:

append(Key, Val, D) ->
    update(Key, fun (Old) -> Old ++ [Val] end, [Val], D).

dict module is implemented in pure erlang which means Old ++ [Val] is what
exactly happens behind. When Old part is growing it becomes the problem due
its quadratic complexity. The solution is use dict:update/4 in the way:

preppend(Key, Val, D) ->
    dict:update(Key, fun (Old) -> [Val|Old] end, [Val], D).

With this fix, it is around 24 times faster on mine computer.

Faster process dictionary version is available at
It is not good practice but can be an inspiration how to write fast code
when there is not other option like NIF, C-port or C-node.

On Tue, Feb 24, 2015 at 9:40 AM, Andrea Peruffo <
andrea.peruffo1982@REDACTED> wrote:

> This:
> https://github.com/andreaferretti/kmeans
> is a benchmark of different languages on a kmeans clustering algo.
> I made an implementation of it but is terribly slow...
> A question is about the data structure "dict" I used, is that the proper
> use case or there is anything better?
> I have checked with "ets" but looks even slower...
> Is there anything wrong with the provided implementation?
> Thanks in advance.
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150224/3d423a7c/attachment.htm>

More information about the erlang-questions mailing list