<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-text-flowed" style="font-family: -moz-fixed;
font-size: 12px;" lang="x-unicode">That works fine in the simple
case, but I'm contemplating repeatedly adjusting weights deep
within a nested data structure. Your approach would result in
creating an altered copy of the entire structure for each
recursion. This is probably only about 1KB or so of information,
so doing this a few times isn't a problem, but doing it millions
of time quickly becomes a problem.
<br>
<br>
This can be addressed by either ets or the process directory, and
those allow the internal structure to be safely modified. In the
process directory it's safe because the information is never
exported from the process (except for i/o, which must be special
cased). Similarly a private ets can handle it without problems.
And so can a global ets, as then a unique process specific id (NOT
pid, as this needs to survive restarts) can be used as a part of
the key. So those three methods would work. The question in my
mind is how to predict the tradeoffs as it scales up. I suspect
that the process directory would use the least memory, though
possibly it would be the global ets table. A private ets table
seems the most natural approach, but it looks, to my naive eyes,
as if it would scale poorly WRT memory use.
<br>
<br>
What I'd really like is to use a Mnesia system which kept a cache
of active entries, but didn't require everything to be rolled in
from disk. AFAIKT, however, my choices with a Mnesia table are to
keep everything in memory or to keep everything rolled out to
disk.
<br>
<br>
I also haven't been able to determine whether processes that are
waiting to receive a message can be rolled out to inactive
memory. There are some indications ("use enough processes, but
not too many") that they can't. This means that I need to adapt
my memory use to the systems that are being run on rather
carefully. If background processes keep activating every live
process to check it's status I could easily end up with severe
thrashing. And <b class="moz-txt-star"><span class="moz-txt-tag">*</span>THAT<span
class="moz-txt-tag">*</span></b> will affect the design. If I
need to hand manage the caching, then I loose a lot of the
benefits that I'm hoping to get from Erlang.
<br>
<br>
The basic design calls for a huge number of "processes" to be
doing n x m communication, and the simple design calls for each
"process" to be able to send messages to each other process,
though only a subset of the messages would be actually sent. My
first sketch of a design called for each "process" to be mapped to
a separate Erlang process, but this doesn't work, because Erlang
doesn't like to have that many processes. Even this simple
design, however, required to figure for allowing 1000 inputs and
1000 outputs to each "process", and probably well over 100,000
"processes". Most of them would be idle most of the time, but all
would need to be "activatable" when messaged, and all would need
to become dormant when just waiting for a message. The idea is
not a neural net, but it has certain similarities.
<br>
<br>
Now if I could actually have one process per "process", then your
proposal, which I recognize as the normal Erlang approach, would
make sense, but that isn't going to work. This could be done in
that case by having lots of variables, so that there wouldn't be
the need to have any modifications of deeply nested items, so not
much would need to be copied.
<br>
<br>
As for KISS, that's a great approach, but it doesn't reveal
scaling problems. When one is adapting an approach one should
always KISS, but when designing which approach to try it's
important to pick one that will work when the system approaches
its initial design goal.
<br>
<br>
<br>
On 02/07/2018 03:45 PM, <a class="moz-txt-link-abbreviated"
href="mailto:zxq9@zxq9.com">zxq9@zxq9.com</a> wrote:
<br>
<blockquote type="cite" style="color: #000000;">On 2018年2月7日水曜日
8時56分01秒 JST Charles Hixson wrote:
<br>
<blockquote type="cite" style="color: #000000;">...so passing
the state as function parameters would
<br>
entail huge amounts of copying. (Essentially I'd be modifying
nodes
<br>
deep within trees.)
<br>
<br>
Mutable state would allow me to avoid the copying, and the
state is not
<br>
exported from the process...
<br>
</blockquote>
You seem to be confused a bit about the nature of mutability. If
I set a variable X and in my service loop alter X, the next time
the service loop recurses (loops) X will be a different value --
it will have mutated, but within the context of a single call of
the service loop function the thing labelled X at the time of
the function call will be immutable.
<br>
<br>
-module(simple).
<br>
-export([start/1]).
<br>
<br>
start(X) ->
<br>
spawn(fun() -> loop(X) end).
<br>
<br>
loop(X) ->
<br>
ok = io:format("X is ~p~n", [X]),
<br>
receive
<br>
{add, Y} ->
<br>
NewX = X + Y,
<br>
loop(NewX);
<br>
{sub, Y} ->
<br>
NewX = X - Y,
<br>
loop(NewX);
<br>
stop ->
<br>
ok = io:format("Bye!~n"),
<br>
exit(normal);
<br>
Unexpected ->
<br>
ok = io:format("I don't understand ~tp~n", [Unexpected]),
<br>
loop(X)
<br>
end.
<br>
<br>
<br>
1> c(simple).
<br>
{ok,simple}
<br>
2> P = simple:start(10).
<br>
X is 10
<br>
<0.72.0>
<br>
3> P ! {add, 15}.
<br>
X is 25
<br>
{add,15}
<br>
4> P ! {sub, 100}.
<br>
X is -75
<br>
{sub,100}
<br>
<br>
<br>
That is all there is to state maintenance, and this is how
gen_servers work. This is also the form that has the least
mysterious memory management model in the normal case, and the
form that gives you all that nifty memory isolation and fault
tolerance Erlang is famous for. Note that X is <b
class="moz-txt-star"><span class="moz-txt-tag">*</span>not<span
class="moz-txt-tag">*</span></b> copied every time we enter
loop/1. If we send a message containing X to another process,
though, <b class="moz-txt-star"><span class="moz-txt-tag">*</span>then<span
class="moz-txt-tag">*</span></b> X is copied into the
context of the process receiving that message.
<br>
<br>
It doesn't matter at all what sort of a structure X is. Here it
is a number, but it could be anything. Gigantic tuples chock
full of maps and gb_trees and other process references and lists
of things and queues and whatnot are the norm -- and none of
this causes trouble in the normal case.
<br>
<br>
As for mucking around in deep tree structures, altering nodes in
trees does not necessarily entail making a copy of the whole
tree. To you as a programmer there are two versions of the data
which are effectively distinct, but that does not necessarily
mean that they are two complete versions of the data in memory.
The nature of copying (or whether copying happens at all under
the hood) and how fast things can be garbage collected has to do
with the nature of the task and what kind of data structures you
are using. Because of immutability you <b class="moz-txt-star"><span
class="moz-txt-tag">*</span>actually<span
class="moz-txt-tag">*</span></b> get to share more data in
the underlying implementation than otherwise.
<br>
<br>
Fred provided a great explanation a while back here:
<br>
<a class="moz-txt-link-freetext"
href="http://erlang.org/pipermail/erlang-questions/2015-December/087040.html">http://erlang.org/pipermail/erlang-questions/2015-December/087040.html</a>
<br>
<br>
The general approach to performance issues -- whether memory,
I/O bottlenecks, messaging bottlenecks, or raw thunk time -- is
to start out writing your processes in the vanilla way using
state variables in a loop and only stepping away from that when
some extreme deficiency is demonstrated. If you are going to be
spawning a ton of processes at once to do things then you've
really got no way of knowing what is going to break first until
you actually have some working code and can see it break for
yourself. People get themselves into trouble with the process
dictionary, ETS, NIFs, etc. all the time because the use cases
often do not warrant the use of these techniques.
<br>
<br>
So keep it simple. Write an example of what you want to do. Try
it out. You might wind up just saturating your processor or
memory bus way before you hit an actual space problem. If
something breaks try to measure why -- but right now without
telling anyone the kind of data you're dealing with or what
kinds of operations you're doing or any example code that is
known to break in a certain way at a certain scale we can't
really give you much helpful advice.
<br>
<br>
-Craig
<br>
_______________________________________________
<br>
erlang-questions mailing list
<br>
<a class="moz-txt-link-abbreviated"
href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>
<br>
<a class="moz-txt-link-freetext"
href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a>
<br>
</blockquote>
<br>
</div>
<br>
</body>
</html>