[erlang-questions] ETS and CPU

Wed Mar 16 18:33:50 CET 2016

I was curious enough to try it:

-module(ets_vs_msg).

-export([start/1]).

-export([ets/2, ets_h/2, msg/2, arg/2]).

-define(Tab, ?MODULE).

-define(MapSize, 100000). %% 100000 is 2.87 MB

start(N) ->
    Map = gen_map(),
    ets_init(Map),
    [[{X, element(1, timer:tc(fun ?MODULE:X/2, [N, Map]))/N}
      || X <- [ets_h, ets, msg, arg]]
     || _ <- lists:seq(1, 3)].

gen_map() ->
    gen_map(?MapSize).

gen_map(N) ->
    maps:from_list([{X, []} || X <- lists:seq(1, N)]).

ets_init(Map) ->
    (catch ets:new(?Tab, [named_table])),
    ets:insert(?Tab, {foo, Map}).

ets(N, _Msg) ->
    Pids = [ spawn_link(fun loop/0) || _ <- lists:seq(1, N) ],
    [ Pid ! {ets, self()} || Pid <- Pids],
    [ receive {ok, Pid} -> ok end || Pid <- Pids ].

ets_h(N, Msg) ->
    Size = 2*erts_debug:flat_size(Msg),
    Pids = [ spawn_opt(fun loop/0, [link, {min_heap_size,Size}]) || _ <-
lists:seq(1, N) ],
    [ Pid ! {ets, self()} || Pid <- Pids],
    [ receive {ok, Pid} -> ok end || Pid <- Pids ].

msg(N, Msg) ->
    Pids = [ spawn_link(fun loop/0) || _ <- lists:seq(1, N) ],
    [ Pid ! {msg, self(), Msg} || Pid <- Pids],
    [ receive {ok, Pid} -> ok end || Pid <- Pids ].

arg(N, Msg) ->
    Pids = [ spawn_link(fun() -> init(Msg) end) || _ <- lists:seq(1, N) ],
    [ Pid ! {do, self()} || Pid <- Pids],
    [ receive {ok, Pid} -> ok end || Pid <- Pids ].

init(_) ->
    loop().

loop() ->
    receive
        {ets, From} ->
            ets:lookup(?Tab, foo),
            From;
        {msg, From, _Msg} ->
            From;
        {do, From} ->
            From
    end ! {ok, self()}.

Reading from ets with prepared heap is clear winner:

40> ets_vs_msg:start(1000).
[[{ets_h,805.83},{ets,2383.31},{msg,4492.15},{arg,3957.693}],
 [{ets_h,918.221},
  {ets,2379.459},
  {msg,4651.258},
  {arg,4028.799}],
 [{ets_h,927.538},
  {ets,2370.421},
  {msg,4519.885},
  {arg,4057.264}]]

But there is a catch. If I look to CPU utilisation, only ets_h and ets uses
all cores/schedulers (i7 with 4 HT in my case) which indicate that both msg
and arg version copy the map from the single process. In my case sending a
message from more processes would lead to max 4x speed up for msg and arg
version.

On Wed, Mar 16, 2016 at 5:20 PM, Sverker Eriksson <
sverker.eriksson@REDACTED> wrote:

> Well, I would expect copy_shallow (from ETS) to be less CPU intensive
> than copy_struct (from process).
>
> However, as indicated by others, ets:lookup on such a big map will probably
> trigger a garbage collection on the process, which will lead to
> yet another copy of the big map.
>
> The spawn(fun() -> do_something(BigMap) end) on the other hand will
> allocate a big enough heap for the process form the start and only do
> one copy of the big map.
>
> /Sverker, Erlang/OTP
>
>
>
> On 03/16/2016 10:43 AM, Alex Howle wrote:
>
> Assuming that when you say "win" you mean that ets:lookup should be more
> efficient (and less CPU intensive) then I'm seeing the opposite.
> On 15 Mar 2016 11:32, "Sverker Eriksson" <sverker.eriksson@REDACTED>
> wrote:
>
>> Each successful ets:lookup call is a copy operation of the entire term
>> from ETS to the process heap.
>>
>> If you are comparing ets:lookup of big map
>> to sending big map in message then I would expect
>> ets:lookup to win, as copy_shallow (used by ets:lookup)
>> is optimized to be faster than copy_struct (used by send).
>>
>>
>> /Sverker, Erlang/OTP
>>
>>
>> On 03/15/2016 09:52 AM, Alex Howle wrote:
>>
>> I've been experiencing an issue and was wondering if anyone else has any
>> experience in this area. I've stripped back the problem to its bare bones
>> for the purposes of this mail.
>>
>>
>>
>> I have an Erlang 18.1 application that uses ETS to store an Erlang map
>> structure. Using erts_debug:flat_size/1 I can approximate the map's size to
>> be 1MB. Upon the necessary activity trigger the application spawns about 25
>> short-lived processes to perform the main work of the application. This
>> activity trigger is fired roughly 9 times a second under normal operating
>> conditions. Each of these 25 processes performs 1 x ets:lookup/2 calls to
>> read from the map.
>>
>>
>>
>> What I've found is that the above implementation has a CPU profile that
>> is quite "expensive" - each of the CPU cores (40 total comprised of 2
>> Processors with 10 hyperthreaded cores) frequently runs at 100%. The
>> machine in question also has 32GB RAM of which about 9GB is used at peak.
>> There is no swap usage whatsoever. Examination shows that copy_shallow is
>> performing the most work.
>>
>>
>>
>> After changing the implementation so that the 25 spawned processes no
>> longer read from the ETS table to retrieve the map structure and, instead
>> the map is passed to the processes on spawn, the CPU usage on the server is
>> considerably lower.
>>
>>
>>
>> Can anyone offer advice as to why I'm seeing the differing CPU profiles?
>>
>>
>> _______________________________________________
>> erlang-questions mailing listerlang-questions@REDACTED://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160316/74c4edc5/attachment.htm>