<div dir="ltr">I have tried parallel version of msg and arg<div><br></div><div><div>msg_p(N, Msg) -></div><div>Â  Â  do_p(fun msg/2, N, Msg).</div><div><br></div><div>arg_p(N, Msg) -></div><div>Â  Â  do_p(fun arg/2, N, Msg).</div><div><br></div><div>do_p(F, N, Msg) -></div><div>Â  Â  Schedulers = erlang:system_info(schedulers),</div><div>Â  Â  Parent = self(),</div><div>Â  Â  N2 = N div Schedulers,</div><div>Â  Â  Pids = [spawn_link(fun() -> F(N2, Msg), Parent ! {ok, self()} end)</div><div>Â  Â  Â  Â  Â  Â  || _ <- lists:seq(1, Schedulers) ],</div><div>Â  Â  [ receive {ok, Pid} -> ok end || Pid <- Pids].</div><div><br></div></div><div>and it performs better but still worse than ets but I don't know how it would behave on HW with 40 CPUs/schedulers</div><div><div><br></div><div>[[{ets_h,787.688},</div><div>Â  {ets,2215.42},</div><div>Â  {msg_p,2525.365},</div><div>Â  {msg,4964.156},</div><div>Â  {arg_p,2780.5},</div><div>Â  {arg,4248.214}],</div><div>Â [{ets_h,901.369},</div><div>Â  {ets,2343.145},</div><div>Â  {msg_p,2368.203},</div><div>Â  {msg,5062.984},</div><div>Â  {arg_p,2073.172},</div><div>Â  {arg,4260.998}],</div><div>Â [{ets_h,906.705},</div><div>Â  {ets,2423.889},</div><div>Â  {msg_p,3135.662},</div><div>Â  {msg,5069.39},</div><div>Â  {arg_p,2186.49},</div><div>Â  {arg,4268.753}]]</div></div><div><br></div><div>Setting initial heap size in msg helps little bit</div><div><br></div><div><div>msg(N, Msg) -></div><div>Â  Â  Size = 2*erts_debug:flat_size(Msg),</div><div>Â  Â  Pids = [ spawn_opt(fun loop/0, [link, {min_heap_size,Size}]) || _ <- lists:seq(1, N) ],</div><div>Â  Â  [ Pid ! {msg, self(), Msg} || Pid <- Pids],</div><div>Â  Â  [ receive {ok, Pid} -> ok end || Pid <- Pids ].</div><div><br></div></div><div><div>[[{ets_h,823.901},</div><div>Â  {ets,2200.168},</div><div>Â  {msg_p,1974.292},</div><div>Â  {msg,4678.855},</div><div>Â  {arg_p,2082.779},</div><div>Â  {arg,4666.294}],</div><div>Â [{ets_h,906.677},</div><div>Â  {ets,2033.719},</div><div>Â  {msg_p,2092.892},</div><div>Â  {msg,4665.692},</div><div>Â  {arg_p,2005.953},</div><div>Â  {arg,4707.86}],</div><div>Â [{ets_h,902.813},</div><div>Â  {ets,2290.883},</div><div>Â  {msg_p,2041.713},</div><div>Â  {msg,4655.373},</div><div>Â  {arg_p,2011.422},</div><div>Â  {arg,4659.18}]]</div></div><div><br></div><div>So I think sending message could be reasonably faster than ets version on HW with 40 CPUs. Anyway storing or sending map this big doesn't seem good design.</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 16, 2016 at 6:33 PM, Hynek Vychodil <span dir="ltr"><<a href="mailto:vychodil.hynek@gmail.com" target="_blank">vychodil.hynek@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I was curious enough to try it:<div><div><br></div><div>-module(ets_vs_msg).</div><div><br></div><div>-export([start/1]).</div><div><br></div><div>-export([ets/2, ets_h/2, msg/2, arg/2]).</div><div><br></div><div>-define(Tab, ?MODULE).</div><div><br></div><div>-define(MapSize, 100000). %% 100000 is 2.87 MB</div><div><br></div><div>start(N) -></div><div>Â  Â  Map = gen_map(),</div><div>Â  Â  ets_init(Map),</div><div>Â  Â  [[{X, element(1, timer:tc(fun ?MODULE:X/2, [N, Map]))/N}</div><div>Â  Â  Â  || X <- [ets_h, ets, msg, arg]]</div><div>Â  Â  Â || _ <- lists:seq(1, 3)].</div><div><br></div><div>gen_map() -></div><div>Â  Â  gen_map(?MapSize).</div><div><br></div><div>gen_map(N) -></div><div>Â  Â  maps:from_list([{X, []} || X <- lists:seq(1, N)]).</div><div><br></div><div>ets_init(Map) -></div><div>Â  Â  (catch ets:new(?Tab, [named_table])),</div><div>Â  Â  ets:insert(?Tab, {foo, Map}).</div><div><br></div><div>ets(N, _Msg) -></div><div>Â  Â  Pids = [ spawn_link(fun loop/0) || _ <- lists:seq(1, N) ],</div><div>Â  Â  [ Pid ! {ets, self()} || Pid <- Pids],</div><div>Â  Â  [ receive {ok, Pid} -> ok end || Pid <- Pids ].</div><div><br></div><div>ets_h(N, Msg) -></div><div>Â  Â  Size = 2*erts_debug:flat_size(Msg),</div><div>Â  Â  Pids = [ spawn_opt(fun loop/0, [link, {min_heap_size,Size}]) || _ <- lists:seq(1, N) ],</div><div>Â  Â  [ Pid ! {ets, self()} || Pid <- Pids],</div><div>Â  Â  [ receive {ok, Pid} -> ok end || Pid <- Pids ].</div><div><br></div><div>msg(N, Msg) -></div><div>Â  Â  Pids = [ spawn_link(fun loop/0) || _ <- lists:seq(1, N) ],</div><div>Â  Â  [ Pid ! {msg, self(), Msg} || Pid <- Pids],</div><div>Â  Â  [ receive {ok, Pid} -> ok end || Pid <- Pids ].</div><div><br></div><div>arg(N, Msg) -></div><div>Â  Â  Pids = [ spawn_link(fun() -> init(Msg) end) || _ <- lists:seq(1, N) ],</div><div>Â  Â  [ Pid ! {do, self()} || Pid <- Pids],</div><div>Â  Â  [ receive {ok, Pid} -> ok end || Pid <- Pids ].</div><div><br></div><div>init(_) -></div><div>Â  Â  loop().</div><div><br></div><div>loop() -></div><div>Â  Â  receive</div><div>Â  Â  Â  Â  {ets, From} -></div><div>Â  Â  Â  Â  Â  Â  ets:lookup(?Tab, foo),</div><div>Â  Â  Â  Â  Â  Â  From;</div><div>Â  Â  Â  Â  {msg, From, _Msg} -></div><div>Â  Â  Â  Â  Â  Â  From;</div><div>Â  Â  Â  Â  {do, From} -></div><div>Â  Â  Â  Â  Â  Â  From</div><div>Â  Â  end ! {ok, self()}.</div></div><div><br></div><div>Reading from ets with prepared heap is clear winner:</div><div><br></div><div><div>40> ets_vs_msg:start(1000).</div><div>[[{ets_h,805.83},{ets,2383.31},{msg,4492.15},{arg,3957.693}],</div><div>Â [{ets_h,918.221},</div><div>Â  {ets,2379.459},</div><div>Â  {msg,4651.258},</div><div>Â  {arg,4028.799}],</div><div>Â [{ets_h,927.538},</div><div>Â  {ets,2370.421},</div><div>Â  {msg,4519.885},</div><div>Â  {arg,4057.264}]]</div></div><div><br></div><div>But there is a catch. If I look to CPU utilisation, only ets_h and ets uses all cores/schedulers (i7 with 4 HT in my case) which indicate that both msg and arg version copy the map from the single process. In my case sending a message from more processes would lead to max 4x speed up for msg and arg version.</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 16, 2016 at 5:20 PM, Sverker Eriksson <span dir="ltr"><<a href="mailto:sverker.eriksson@ericsson.com" target="_blank">sverker.eriksson@ericsson.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
  <div bgcolor="#FFFFFF" text="#000000">

    Well, I would expect copy_shallow (from ETS) to be less CPU

    intensive<br>

    than copy_struct (from process).<br>

    <br>

    However, as indicated by others, ets:lookup on such a big map will

    probably<br>

    trigger a garbage collection on the process, which will lead to<br>

    yet another copy of the big map.<br>

    <br>

    The spawn(fun() -> do_something(BigMap) end) on the other hand

    will<br>

    allocate a big enough heap for the process form the start and only

    do<br>

    one copy of the big map.<br>

    <br>

    /Sverker, Erlang/OTP<div><div><br>

    <br>

    <br>

    <div>On 03/16/2016 10:43 AM, Alex Howle

      wrote:<br>

    </div>

    <blockquote type="cite">

      
      <p dir="ltr">Assuming that when you say "win" you mean that

        ets:lookup should be more efficient (and less CPU intensive)

        then I'm seeing the opposite.</p>

      <div class="gmail_quote">On 15 Mar 2016 11:32, "Sverker Eriksson"

        <<a href="mailto:sverker.eriksson@ericsson.com" target="_blank">sverker.eriksson@ericsson.com</a>>

        wrote:<br type="attribution">

        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div bgcolor="#FFFFFF" text="#000000"> Each successful

            ets:lookup call is a copy operation of the entire term<br>

            from ETS to the process heap.<br>

            <br>

            If you are comparing ets:lookup of big map<br>

            to sending big map in message then I would expect<br>

            ets:lookup to win, as copy_shallow (used by ets:lookup)<br>

            is optimized to be faster than copy_struct (used by send).<br>

            <br>

            <br>

            /Sverker, Erlang/OTP<br>

            <br>

            <br>

            <div>On 03/15/2016 09:52 AM, Alex Howle wrote:<br>

            </div>

            <blockquote type="cite">

              <p dir="ltr">I've been experiencing an issue and was

                wondering if anyone else has any experience in this

                area. I've stripped back the problem to its bare bones

                for the purposes of this mail.</p>

              <p dir="ltr">Â </p>

              <p dir="ltr">I have an Erlang 18.1 application that uses

                ETS to store an Erlang map structure. Using

                erts_debug:flat_size/1 I can approximate the map's size

                to be 1MB. Upon the necessary activity trigger the

                application spawns about 25 short-lived processes to

                perform the main work of the application. This activity

                trigger is fired roughly 9 times a second under normal

                operating conditions. Each of these 25 processes

                performs 1 x ets:lookup/2 calls to read from the map.</p>

              <p dir="ltr">Â </p>

              <p dir="ltr">What I've found is that the above

                implementation has a CPU profile that is quite

                "expensive" - each of the CPU cores (40 total comprised

                of 2 Processors with 10 hyperthreaded cores) frequently

                runs at 100%. The machine in question also has 32GB RAM

                of which about 9GB is used at peak. There is no swap

                usage whatsoever. Examination shows that copy_shallow is

                performing the most work.</p>

              <p dir="ltr">Â </p>

              <p dir="ltr">After changing the implementation so that the

                25 spawned processes no longer read from the ETS table

                to retrieve the map structure and, instead the map is

                passed to the processes on spawn, the CPU usage on the

                server is considerably lower.<br>

              </p>

              <p dir="ltr">Â </p>

              <p dir="ltr">Can anyone offer advice as to why I'm seeing

                the differing CPU profiles?<br>

              </p>

              <br>

              <fieldset></fieldset>

              <br>

              <pre>_______________________________________________

erlang-questions mailing list

<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a>

</pre>

            </blockquote>

            <br>

          </div>

        </blockquote>

      </div>

    </blockquote>

    <br>

  </div></div></div>


<br>_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

<br></blockquote></div><br></div>

</div></div></blockquote></div><br></div>