[erlang-questions] What happens with atoms across different systems?
Richard Carlsson
carlsson.richard@REDACTED
Tue Jun 18 16:56:28 CEST 2013
On 2013-06-15 15:57 , Yves S. Garret wrote:
> Hello,
>
> I'm reading the book Erlang and OTP in Action and came across this
> part: "In Erlang, an atom is a special kind of string constant that
> is identified only by the characters in the string, so that two atoms
> are always considered to be exactly the same if they have the same
> character representation. But internally, these strings are stored in
> a table and are referred to by the table index, so that checking
> atoms for equivalence at runtime amounts to comparing two small
> integers; and each time you use an atom, it takes up only one word of
> memory. (The index number used for any particular atom is
> automatically assigned at runtime and can vary from one run of the
> system to the next; there is no way, and no need, for the user to
> know this.)"
>
> Now, this got me thinking. What if I have two systems that send
> messages to one another. I update one and include a new atom that
> I'm using. This atom gets sent to system #2 that cannot recognize it
> just yet. Does that mean that that message will sit in the receiving
> thread's queue until it can recognize it or can be handled/thrown
> away?
The atom tables on the two nodes are completely separate, and apart from
pre-defined atoms needed for the runtime system, the indexes used will
depend on in what order atoms are added (which mostly depends on the
order in which code is loaded). Hence, 'foo' can be entry 4711 on one
node and entry 12345 on another node. But the runtime representation of
an atom (a single word containing the index and a type tag) is local to
the node, and these indexes are never passed between nodes.
Instead, when a node sends a term to another node, the Erlang
distribution protocol keeps a cache which assigns a temporary number to
the atoms that have been sent, so the first time an atom is seen on the
wire, it is passed as a string plus the cache entry number, and then for
repeated occurrences only the number is passed, to save bandwidth. The
receiving node then converts the incoming atoms to its own local
representation. This all happens before the message is passed on to the
actual receiving process, so by the time it ends up in the message
queue, the atoms already exist.
If the receiving node hasn't yet loaded any code that will accept
messages tagged with the new atoms, the messages will stay in the
mailbox until the code is loaded (as long as you don't have a catch-all
clause that will swallow any currently unknown messages).
> As a follow-up question, does it make sense to use atoms in this
> fashion?
Sure, it's being done all the time.
/Richard
More information about the erlang-questions
mailing list