[erlang-questions] Strange performance degradation in dict when it's storing lists

Kyungho Yun inanun@REDACTED
Mon Aug 22 22:01:30 CEST 2016


Hi there.

The various size of return values could be confusable.
So I tested original source and some modified source; value() ->
lists:seq(0,255),
though I couldn't get such a big value in OTP 19.

But I guess GC might be the cause.

and the reference of binary may use for messaging in same node.



On Mon, Aug 22, 2016 at 10:53 PM, Sverker Eriksson <
sverker.eriksson@REDACTED> wrote:

> Looks like you measure the latency of send + dict:find + receive
> that sometimes happens to get preemted by very heavy calls
> to lists:delete.
>
> /Sverker
>
>
>
> On 08/22/2016 10:35 AM, Park, Sungjin wrote:
>
> I observed a strange performance degradation in dict.  Let me share the
> code I used in the test first.
>
>
>     -module(data).
>     -export([start_link/1, get/1, get_concurrent/1]).
>     -export([init/0]).
>
>     start_link() ->
>       proc_lib:start_link(?MODULE, init, []).
>
>     init() ->
>       register(?MODULE, self()),
>       % Initialize data:
>       % 0 => [],
>       % 1 => [1],
>       % 2 => [1,2]
>       % ...
>       Dict = lists:foldl(
>         fun (Key, Dict0) -> dict:store(Key, value(Key), Dict0) end,
>         dict:new(), lists:seq(0, 255)
>       ),
>       proc_lib:init_ack({ok, self()}),
>       loop(Dict).
>
>     value(Key) ->
>         lists:seq(1, Key).
>
>     loop(Dict) ->
>       receive
>         {get, Key, From} ->
>           case dict:find(Key, Dict) of
>             {ok, Value} -> From ! Value;
>             error -> From ! undefined
>           end;
>         _ ->
>           ok
>       end,
>       loop(Dict).
>
>     get(Key) ->
>       ?MODULE ! {get, Key, self()},
>       receive
>         Value -> Value
>       end.
>
>     %% Run get N times and return average execution time.
>     -spec get_concurrent(integer()) -> number().
>     get_concurrent(N) ->
>       Profiler = self(),
>       Workers = [
>          prof_lib:spawn_link(
>            fun () ->
>              Key = erlang:system_time() rem 255,
>              Result = timer:tc(?MODULE, get, [Key]),
>              Profiler ! {self(), Result}
>            end
>          ) || _ <- lists:seq(1, N)
>        ],
>        Ts = receive_all(Workers, []),
>        lists:sum(Ts) / length(Ts).
>
>     receive_all([], Ts) ->
>       Ts;
>     receive_all(Workers, Ts) ->
>       receive
>         {Worker, {T, _}} -> receive_all(lists:delete(Worker, Workers), [T |
> Ts])
>       end.
>
>
> When I ran the test in the shell, I got.
>
>     1> data:start_link().
>     {ok, <0.6497.46>}
>     2> timer:tc(data, get, [5]).
>     {23,[1,2,3,4,5]}
>
>
> I could get a value in 23 microseconds and expected something not too
> slower results for concurrent get but,
>
>     3> data:get_concurrent(100000).
>     19442.828
>
>
> The value 19442.828 microseconds seemed to be too big a value so I tested
> with different values such as large binaries and tuples.  And this time the
> same get_concurrent(100000) gave me 200 something microseconds.
>
> I also tried the same with an ets instead of a dict, but there was no such
> performance degradation by the value type.
>
>
>
>
> _______________________________________________
> erlang-questions mailing listerlang-questions@REDACTED://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160823/89862f4b/attachment.htm>


More information about the erlang-questions mailing list