<div dir="ltr">I have tried parallel version of msg and arg<div><br></div><div><div>msg_p(N, Msg) -></div><div>Â Â do_p(fun msg/2, N, Msg).</div><div><br></div><div>arg_p(N, Msg) -></div><div>Â Â do_p(fun arg/2, N, Msg).</div><div><br></div><div>do_p(F, N, Msg) -></div><div>Â Â Schedulers = erlang:system_info(schedulers),</div><div>Â Â Parent = self(),</div><div>Â Â N2 = N div Schedulers,</div><div>Â Â Pids = [spawn_link(fun() -> F(N2, Msg), Parent ! {ok, self()} end)</div><div>Â Â Â Â Â Â || _ <- lists:seq(1, Schedulers) ],</div><div>Â Â [ receive {ok, Pid} -> ok end || Pid <- Pids].</div><div><br></div></div><div>and it performs better but still worse than ets but I don't know how it would behave on HW with 40 CPUs/schedulers</div><div><div><br></div><div>[[{ets_h,787.688},</div><div>Â {ets,2215.42},</div><div>Â {msg_p,2525.365},</div><div>Â {msg,4964.156},</div><div>Â {arg_p,2780.5},</div><div>Â {arg,4248.214}],</div><div>Â [{ets_h,901.369},</div><div>Â {ets,2343.145},</div><div>Â {msg_p,2368.203},</div><div>Â {msg,5062.984},</div><div>Â {arg_p,2073.172},</div><div>Â {arg,4260.998}],</div><div>Â [{ets_h,906.705},</div><div>Â {ets,2423.889},</div><div>Â {msg_p,3135.662},</div><div>Â {msg,5069.39},</div><div>Â {arg_p,2186.49},</div><div>Â {arg,4268.753}]]</div></div><div><br></div><div>Setting initial heap size in msg helps little bit</div><div><br></div><div><div>msg(N, Msg) -></div><div>Â Â Size = 2*erts_debug:flat_size(Msg),</div><div>Â Â Pids = [ spawn_opt(fun loop/0, [link, {min_heap_size,Size}]) || _ <- lists:seq(1, N) ],</div><div>Â Â [ Pid ! {msg, self(), Msg} || Pid <- Pids],</div><div>Â Â [ receive {ok, Pid} -> ok end || Pid <- Pids ].</div><div><br></div></div><div><div>[[{ets_h,823.901},</div><div>Â {ets,2200.168},</div><div>Â {msg_p,1974.292},</div><div>Â {msg,4678.855},</div><div>Â {arg_p,2082.779},</div><div>Â {arg,4666.294}],</div><div>Â [{ets_h,906.677},</div><div>Â {ets,2033.719},</div><div>Â {msg_p,2092.892},</div><div>Â {msg,4665.692},</div><div>Â {arg_p,2005.953},</div><div>Â {arg,4707.86}],</div><div>Â [{ets_h,902.813},</div><div>Â {ets,2290.883},</div><div>Â {msg_p,2041.713},</div><div>Â {msg,4655.373},</div><div>Â {arg_p,2011.422},</div><div>Â {arg,4659.18}]]</div></div><div><br></div><div>So I think sending message could be reasonably faster than ets version on HW with 40 CPUs. Anyway storing or sending map this big doesn't seem good design.</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 16, 2016 at 6:33 PM, Hynek Vychodil <span dir="ltr"><<a href="mailto:vychodil.hynek@gmail.com" target="_blank">vychodil.hynek@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I was curious enough to try it:<div><div><br></div><div>-module(ets_vs_msg).</div><div><br></div><div>-export([start/1]).</div><div><br></div><div>-export([ets/2, ets_h/2, msg/2, arg/2]).</div><div><br></div><div>-define(Tab, ?MODULE).</div><div><br></div><div>-define(MapSize, 100000). %% 100000 is 2.87 MB</div><div><br></div><div>start(N) -></div><div>Â Â Map = gen_map(),</div><div>Â Â ets_init(Map),</div><div>Â Â [[{X, element(1, timer:tc(fun ?MODULE:X/2, [N, Map]))/N}</div><div>Â Â Â || X <- [ets_h, ets, msg, arg]]</div><div>Â Â Â || _ <- lists:seq(1, 3)].</div><div><br></div><div>gen_map() -></div><div>Â Â gen_map(?MapSize).</div><div><br></div><div>gen_map(N) -></div><div>Â Â maps:from_list([{X, []} || X <- lists:seq(1, N)]).</div><div><br></div><div>ets_init(Map) -></div><div>Â Â (catch ets:new(?Tab, [named_table])),</div><div>Â Â ets:insert(?Tab, {foo, Map}).</div><div><br></div><div>ets(N, _Msg) -></div><div>Â Â Pids = [ spawn_link(fun loop/0) || _ <- lists:seq(1, N) ],</div><div>Â Â [ Pid ! {ets, self()} || Pid <- Pids],</div><div>Â Â [ receive {ok, Pid} -> ok end || Pid <- Pids ].</div><div><br></div><div>ets_h(N, Msg) -></div><div>Â Â Size = 2*erts_debug:flat_size(Msg),</div><div>Â Â Pids = [ spawn_opt(fun loop/0, [link, {min_heap_size,Size}]) || _ <- lists:seq(1, N) ],</div><div>Â Â [ Pid ! {ets, self()} || Pid <- Pids],</div><div>Â Â [ receive {ok, Pid} -> ok end || Pid <- Pids ].</div><div><br></div><div>msg(N, Msg) -></div><div>Â Â Pids = [ spawn_link(fun loop/0) || _ <- lists:seq(1, N) ],</div><div>Â Â [ Pid ! {msg, self(), Msg} || Pid <- Pids],</div><div>Â Â [ receive {ok, Pid} -> ok end || Pid <- Pids ].</div><div><br></div><div>arg(N, Msg) -></div><div>Â Â Pids = [ spawn_link(fun() -> init(Msg) end) || _ <- lists:seq(1, N) ],</div><div>Â Â [ Pid ! {do, self()} || Pid <- Pids],</div><div>Â Â [ receive {ok, Pid} -> ok end || Pid <- Pids ].</div><div><br></div><div>init(_) -></div><div>Â Â loop().</div><div><br></div><div>loop() -></div><div>Â Â receive</div><div>Â Â Â Â {ets, From} -></div><div>Â Â Â Â Â Â ets:lookup(?Tab, foo),</div><div>Â Â Â Â Â Â From;</div><div>Â Â Â Â {msg, From, _Msg} -></div><div>Â Â Â Â Â Â From;</div><div>Â Â Â Â {do, From} -></div><div>Â Â Â Â Â Â From</div><div>Â Â end ! {ok, self()}.</div></div><div><br></div><div>Reading from ets with prepared heap is clear winner:</div><div><br></div><div><div>40> ets_vs_msg:start(1000).</div><div>[[{ets_h,805.83},{ets,2383.31},{msg,4492.15},{arg,3957.693}],</div><div>Â [{ets_h,918.221},</div><div>Â {ets,2379.459},</div><div>Â {msg,4651.258},</div><div>Â {arg,4028.799}],</div><div>Â [{ets_h,927.538},</div><div>Â {ets,2370.421},</div><div>Â {msg,4519.885},</div><div>Â {arg,4057.264}]]</div></div><div><br></div><div>But there is a catch. If I look to CPU utilisation, only ets_h and ets uses all cores/schedulers (i7 with 4 HT in my case) which indicate that both msg and arg version copy the map from the single process. In my case sending a message from more processes would lead to max 4x speed up for msg and arg version.</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 16, 2016 at 5:20 PM, Sverker Eriksson <span dir="ltr"><<a href="mailto:sverker.eriksson@ericsson.com" target="_blank">sverker.eriksson@ericsson.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Well, I would expect copy_shallow (from ETS) to be less CPU
intensive<br>
than copy_struct (from process).<br>
<br>
However, as indicated by others, ets:lookup on such a big map will
probably<br>
trigger a garbage collection on the process, which will lead to<br>
yet another copy of the big map.<br>
<br>
The spawn(fun() -> do_something(BigMap) end) on the other hand
will<br>
allocate a big enough heap for the process form the start and only
do<br>
one copy of the big map.<br>
<br>
/Sverker, Erlang/OTP<div><div><br>
<br>
<br>
<div>On 03/16/2016 10:43 AM, Alex Howle
wrote:<br>
</div>
<blockquote type="cite">
<p dir="ltr">Assuming that when you say "win" you mean that
ets:lookup should be more efficient (and less CPU intensive)
then I'm seeing the opposite.</p>
<div class="gmail_quote">On 15 Mar 2016 11:32, "Sverker Eriksson"
<<a href="mailto:sverker.eriksson@ericsson.com" target="_blank">sverker.eriksson@ericsson.com</a>>
wrote:<br type="attribution">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Each successful
ets:lookup call is a copy operation of the entire term<br>
from ETS to the process heap.<br>
<br>
If you are comparing ets:lookup of big map<br>
to sending big map in message then I would expect<br>
ets:lookup to win, as copy_shallow (used by ets:lookup)<br>
is optimized to be faster than copy_struct (used by send).<br>
<br>
<br>
/Sverker, Erlang/OTP<br>
<br>
<br>
<div>On 03/15/2016 09:52 AM, Alex Howle wrote:<br>
</div>
<blockquote type="cite">
<p dir="ltr">I've been experiencing an issue and was
wondering if anyone else has any experience in this
area. I've stripped back the problem to its bare bones
for the purposes of this mail.</p>
<p dir="ltr">Â </p>
<p dir="ltr">I have an Erlang 18.1 application that uses
ETS to store an Erlang map structure. Using
erts_debug:flat_size/1 I can approximate the map's size
to be 1MB. Upon the necessary activity trigger the
application spawns about 25 short-lived processes to
perform the main work of the application. This activity
trigger is fired roughly 9 times a second under normal
operating conditions. Each of these 25 processes
performs 1 x ets:lookup/2 calls to read from the map.</p>
<p dir="ltr">Â </p>
<p dir="ltr">What I've found is that the above
implementation has a CPU profile that is quite
"expensive" - each of the CPU cores (40 total comprised
of 2 Processors with 10 hyperthreaded cores) frequently
runs at 100%. The machine in question also has 32GB RAM
of which about 9GB is used at peak. There is no swap
usage whatsoever. Examination shows that copy_shallow is
performing the most work.</p>
<p dir="ltr">Â </p>
<p dir="ltr">After changing the implementation so that the
25 spawned processes no longer read from the ETS table
to retrieve the map structure and, instead the map is
passed to the processes on spawn, the CPU usage on the
server is considerably lower.<br>
</p>
<p dir="ltr">Â </p>
<p dir="ltr">Can anyone offer advice as to why I'm seeing
the differing CPU profiles?<br>
</p>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
erlang-questions mailing list
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a>
</pre>
</blockquote>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div></div></div>
<br>_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
<br></blockquote></div><br></div>
</div></div></blockquote></div><br></div>