[erlang-questions] Glossary

Richard O'Keefe ok@REDACTED
Tue Jul 21 00:11:16 CEST 2009


On Jul 20, 2009, at 8:28 PM, Dave Pawson wrote:
>>> Node
>> Node = (approx) a single processor/computer/machine?
> That's how I had it.  Then one of those is 'the erlang node', i.e.  
> where
> whole programs are started from.

There is no "THE" Erlang node.  You can have any number of
communicating nodes, and any number of programs running on
them, and any of the programs might be started on any of the
nodes.

With virtualisation these days, a single board might be
hosting several operating systems, each of which might
count as a "node" in the conventional internet sense.

So perhaps the best analogy these days is "an Erlang node is
like a loaded and running guest operating system on a
virtualised host".

>>> Process
>>
>> A light-weight state machine with a mailbox for receiving messages
>> from other processes or system IO resources. In java you might know
>> them as green threads. The node schedules processes to run when there
>> is something for them to do (like a message in the mailbox).
>
> This seems the key bit. Yet least natural to get hold of.

Erlang processes are precisely the same as operating system
tasks in the B6700 MCP (logically separate stacks sharing an
address space) or threads in Microsoft's Singularity operating
system.  They are lightweight because the compiler guarantees
that they *cannot* interfere with each other's data structures,
so hardware protection schemes (and the heavy costs of
switching them) are not needed.  (This is one reason why a
Foreign Function Interface for Erlang is a Bad Idea:  load
FFI code and you lose the guarantees, completely.)

There is little or no *conceptual* difference between Erlang
processes and Unix or Windows processes (without using
intra-OS-process threads).  They are isolated from each other,
they run when they are ready to run and the system has
capacity to run them, they communicate via channels (or
possibly through file system objects).  The differences are
*practical*:  Erlang processes take much less space, are
much cheaper to create, and don't have heavy switching costs,
so they are much more "thinkable".
other,
>
> Receiving messages is a key part.
> Is it true that a process is generally a single module ('a chunk of  
> code')?

No.
>
> Even if code from other modules is used?

I thought I understood the question, now I'm not so sure.

Since a process does not come into existence until a spawn()
of some kind is executed, and since such a spawn() must occur
*somewhere*, that is in some module, you can associate a
module with each process.  Since each process starts out
running some function, and that function must be in some
module, you can associate a module with each process.  There
is no reason for those two modules to be the same, and they
very often aren't.  The code that the process spends most of
its time executing may come from other modules entirely; the
process may not even care if neither of the two associated
modules exists any more.  (By the way, I wrote that "you"
can associate modules with processes.  Erlang associates
the starting function's module with the code, but not the
spawning function's module.)

More importantly, OTP provides a number of "behaviours".
These are commonly useful patterns of concurrency that are
more robustly engineered than anything you might initially
have constructed yourself.  Processes implemented using
behaviours are an intimate mix of *two* modules: the
behaviour module and the callback module.
>
> And what's the relationship between scheduling, the VM and processes/ 
> modules?

Erlang		UNIX
VM		architecture
node		computer
process		process
module		.o or .so file
scheduling	scheduling

> Less understood here. I guess resource is a bit vague!

No, "resource" just means in Erlang what it means in other
languages.  (Apple's use of the term "resource" in Mac OS
is related but distinct because the notion of sharing is
unimportant.)

A resource is something that you get, use for
a while, and release.  It's like checking out a book from
a library, reading it for a while, and returning it.  The
thing about resources is
  - other people/processes may want the same things
  - they have to be given back or others will never get them
Think of files, databases, devices:

   You have to get a connection to the file or database or device.
   You get to use the resource _through_ the connection.
   You HAVE to close the connection.

This concept is extremely important in most programming
languages, with the possible exception of Limbo.  The reason
is that even in a garbage collected language, manual return of
resources is important so that resources are returned for
others to use *promptly*.  C++ programmers tend to rely on
destructors, via the "Resource Acquisition Is Initialisation"
design pattern, but the nearest analogue of destructors in
Java (finalizers) don't work the same way, and so manual
release of resources is still needed.  (Necula has a very nice
paper about this.  The point is that a lot of Java gets this
wrong, thanks to exception (mis-)handling.)

The concept is important for Erlang for the same reason.
Once you have gained a resource, you MUST arrange for it to
be given back, even if your process gets an exception, or is
killed by a process at the other end of a link exiting.

So don't worry about what a resource is.
Worry about "what MUST I arrange to give back?"

>>> Task

Look that one up in an English dictionary,
not in an Erlang book.  A task might be something that
a human has to do, not just a computer.



More information about the erlang-questions mailing list