[erlang-questions] Glossary

Mon Jul 20 15:18:29 CEST 2009

Apologies. This got long. I think it is reasonably accurate. Others
will correct my mistakes I'm sure.

>>> Node
>>
>> It's an operating system process. A virtual machine which contains
>> processes. Large systems off consist of several nodes distributed
>> across many hosts for redundancy, load balancing or some other reason.
>> Processes can send messages between themselves across nodes and across
>> hosts (network transparent message passing).
>
> Node = (approx) a single processor/computer/machine?
> That's how I had it.  Then one of those is 'the erlang node', i.e. where
> whole programs are started from.

Hmmm. A node is a program/process started in the operating system on a
machine. You would see it as a task under Windows task manager or a
process in UNIX top. It is a virtual machine program for running
erlang code in. When a node program starts it is configured to
internally start various erlang processes based on what that node is
designed to achieve. Erlang nodes can start other nodes, but systems
don't always emanate from a nucleus. It can be detrimental to fault
tolerance (which is a big reason for using erlang).

>>> Process
>>
>> A light-weight state machine with a mailbox for receiving messages
>> from other processes or system IO resources. In java you might know
>> them as green threads. The node schedules processes to run when there
>> is something for them to do (like a message in the mailbox).
>
> This seems the key bit. Yet least natural to get hold of.
> Receiving messages is a key part.

An erlang program as a whole is event driven. An external event (eg.
IO or timer) will cause a message to arrive at a process (eg. data
from a socket) which will cause other messages to flow between
processes, aggregating data, checking what should be done. Some
processes are connected to the outside world (eg. network socket, file
handle) and data will flow out of the erlang node via those.

Erlang was designed for highly asynchronous applications with lots of
partially completed tasks running concurrently and safely (not
interfering with each other). Because erlang processes CANNOT access
the information stored in any other process, the only way to get to
that info is to ask nicely and wait for the response message. This
might seem inefficient to a C coder who would normally just
dereference the memory location but it removes a litany of programming
problems:
 * All data access is sequential (no race conditions, no mutex deadlocks).
 * All data access is via a uniform mechanism so extending access to
another node or across the globe is very easy and reliable.
 * Because state cannot be shared, when a process suffers a terminal
fault (eg. unhandled situation), only that small process is killed.
The rest of the system can be guaranteed to be unaffected and keeps on
ticking. So what would typically be a segfault or abort in a C program
becomes an internal process restart. Errors are contained and the
system keeps running.

There are no(?) blocking calls in erlang. AFAIK those that look like
blocking calls are artificial constructs which wait for a particular
event to be delivered.

> Is it true that a process is generally a single module ('a chunk of code')?
> Even if code from other modules is used?
> And what's the relationship between scheduling, the VM and processes/modules?

A process is started by specifying to the erlang node the
module+function+arguments to start the process running (called
spawning). This is often abbreviated to MFA
(Module/Function/Arguments). A process runs until the code calls the
exit function (or gets killed). OTP (which you are probably using)
provides some common boiler-plate process templates like gen_server.
These use one module to drive core process behaviour. So in this
respect you are correct. And yes from that core module the code can
run code from other modules.

The erlang VM like an OS kernel it receives events when there is work
for a process to perform (eg. timer expires, file descriptor/handle
gets data, etc) and manages swapping CPU time between the internal
erlang processes to achieve this goal. Modules are just containers of
code which processes call as they run. You can think of modules as
shared libraries that all the erlang processes can use.

Unsolicited advice: Erlang programming syntax can seem just plain
wrong to a newcomer. If you hit one of these issues where you think
erlang must be the dumbest most inefficient language on earth, ask
about it. Erlang is different, but it works the way it does for very
good reasons.

--
  Rich