[erlang-questions] Private ets table vs. Process directory

Thu Feb 8 00:45:05 CET 2018

On 2018年2月7日水曜日 8時56分01秒 JST Charles Hixson wrote:
> ...so passing the state as function parameters would 
> entail huge amounts of copying.  (Essentially I'd be modifying nodes 
> deep within trees.)
> 
> Mutable state would allow me to avoid the copying, and the state is not 
> exported from the process...

You seem to be confused a bit about the nature of mutability. If I set a variable X and in my service loop alter X, the next time the service loop recurses (loops) X will be a different value -- it will have mutated, but within the context of a single call of the service loop function the thing labelled X at the time of the function call will be immutable.

-module(simple).
-export([start/1]).

start(X) ->
  spawn(fun() -> loop(X) end).

loop(X) ->
  ok = io:format("X is ~p~n", [X]),
  receive
    {add, Y} ->
      NewX = X + Y,
      loop(NewX);
    {sub, Y} ->
      NewX = X - Y,
      loop(NewX);
    stop ->
      ok = io:format("Bye!~n"),
      exit(normal);
    Unexpected ->
      ok = io:format("I don't understand ~tp~n", [Unexpected]),
      loop(X)
  end.

1> c(simple).
{ok,simple}
2> P = simple:start(10).
X is 10
<0.72.0>
3> P ! {add, 15}.
X is 25
{add,15}
4> P ! {sub, 100}.                                                                                                                                                                                                                                                          
X is -75                                                                                                                                                                                                                                                                    
{sub,100}

That is all there is to state maintenance, and this is how gen_servers work. This is also the form that has the least mysterious memory management model in the normal case, and the form that gives you all that nifty memory isolation and fault tolerance Erlang is famous for. Note that X is *not* copied every time we enter loop/1. If we send a message containing X to another process, though, *then* X is copied into the context of the process receiving that message.

It doesn't matter at all what sort of a structure X is. Here it is a number, but it could be anything. Gigantic tuples chock full of maps and gb_trees and other process references and lists of things and queues and whatnot are the norm -- and none of this causes trouble in the normal case.

As for mucking around in deep tree structures, altering nodes in trees does not necessarily entail making a copy of the whole tree. To you as a programmer there are two versions of the data which are effectively distinct, but that does not necessarily mean that they are two complete versions of the data in memory. The nature of copying (or whether copying happens at all under the hood) and how fast things can be garbage collected has to do with the nature of the task and what kind of data structures you are using. Because of immutability you *actually* get to share more data in the underlying implementation than otherwise.

Fred provided a great explanation a while back here:
http://erlang.org/pipermail/erlang-questions/2015-December/087040.html

The general approach to performance issues -- whether memory, I/O bottlenecks, messaging bottlenecks, or raw thunk time -- is to start out writing your processes in the vanilla way using state variables in a loop and only stepping away from that when some extreme deficiency is demonstrated. If you are going to be spawning a ton of processes at once to do things then you've really got no way of knowing what is going to break first until you actually have some working code and can see it break for yourself. People get themselves into trouble with the process dictionary, ETS, NIFs, etc. all the time because the use cases often do not warrant the use of these techniques.

So keep it simple. Write an example of what you want to do. Try it out. You might wind up just saturating your processor or memory bus way before you hit an actual space problem. If something breaks try to measure why -- but right now without telling anyone the kind of data you're dealing with or what kinds of operations you're doing or any example code that is known to break in a certain way at a certain scale we can't really give you much helpful advice.

-Craig