[erlang-questions] What happens with atoms across different systems?

Tue Jun 18 16:56:28 CEST 2013

On 2013-06-15 15:57 , Yves S. Garret wrote:
> Hello,
>
> I'm reading the book Erlang and OTP in Action and came across this
> part: "In Erlang, an atom is a special kind of string constant that
> is identified only by the characters in the string, so that two atoms
> are always considered to be exactly the same if they have the same
> character representation. But internally, these strings are stored in
> a table and are referred to by the table index, so that checking
> atoms for equivalence at runtime amounts to comparing two small
> integers; and each time you use an atom, it takes up only one word of
> memory. (The index number used for any particular atom is
> automatically assigned at runtime and can vary from one run of the
> system to the next; there is no way, and no need, for the user to
> know this.)"
>
> Now, this got me thinking.  What if I have two systems that send
> messages to one another.  I update one and include a new atom that
> I'm using.  This atom gets sent to system #2 that cannot recognize it
> just yet.  Does that mean that that message will sit in the receiving
> thread's queue until it can recognize it or can be handled/thrown
> away?

The atom tables on the two nodes are completely separate, and apart from 
pre-defined atoms needed for the runtime system, the indexes used will 
depend on in what order atoms are added (which mostly depends on the 
order in which code is loaded). Hence, 'foo' can be entry 4711 on one 
node and entry 12345 on another node. But the runtime representation of 
an atom (a single word containing the index and a type tag) is local to 
the node, and these indexes are never passed between nodes.

Instead, when a node sends a term to another node, the Erlang 
distribution protocol keeps a cache which assigns a temporary number to 
the atoms that have been sent, so the first time an atom is seen on the 
wire, it is passed as a string plus the cache entry number, and then for 
repeated occurrences only the number is passed, to save bandwidth. The 
receiving node then converts the incoming atoms to its own local 
representation. This all happens before the message is passed on to the 
actual receiving process, so by the time it ends up in the message 
queue, the atoms already exist.

If the receiving node hasn't yet loaded any code that will accept 
messages tagged with the new atoms, the messages will stay in the 
mailbox until the code is loaded (as long as you don't have a catch-all 
clause that will swallow any currently unknown messages).

> As a follow-up question, does it make sense to use atoms in this
> fashion?

Sure, it's being done all the time.

     /Richard