Erlang question on Artima blog
Ulf Wiger
ulf.wiger@REDACTED
Sat Mar 13 22:24:29 CET 2004
On Sat, 13 Mar 2004 08:56:25 -0800 (PST), Isaac Gouy <igouy2@REDACTED>
wrote:
> There's a question about the granularity of Erlang's approach to
> fault-tolerance (and the relationship to Design by Contract) on the
> Artima blog:
>
> http://www.artima.com/forums/flat.jsp?forum=226&thread=37878
>
> Could some experienced individual please provide answers? ;-)
I was going to, but the Artima user registration service seemed
to have problems sending me a password for my account. When I wrote
their webmaster and was asked to reply to some generated email
address, I got an NDN back. Oh, well...
Basically, the answer was going to be something like:
1) If system failure is not an option, you have to go with
hardware redundancy, so a process might be allowed to
crash the processor/OS it is running on. This is important,
as it allows you write "kernel processes", that have to be
assumed correct for the node to be operational. In Erlang,
you can build a system using multiple "Erlang nodes", where
distribution aspects can be either transparent or explicit,
depending on the role of your program. This is how redundancy
is normally implemented, and it can be done in several ways,
depending on requirements:
a) Hot standby: typically, a process on another computer
would monitor the active process, and the two would
employ some replication protocol to stay in synch.
This implies quite explicit exception handling on the
part of the standby process. However, the logic required
can be packaged as a reusable framework, so that the
process assuming the active role is notified through a
simple callback function.
b) Cold standby: The Erlang nodes can be configured so that
the applications running on one node will be restarted
on another in case of failure. The applications can detect
that they are starting due to "failover" from another node,
or they can start as they normally do.
2) A process crash does not have to lead to a node crash.
Erlang's "process linking" concept can be used in a variety
of ways.
a) The default behaviour is that if a process dies, all
processes linked to it will also die. This is called
"cascading exit", and allows you to clean up a fairly
large amount of work automatically.
b) A process that wants to take action when another dies
can trap exits. Example: if process A wants to open
a file, the file library spawns a process B that opens
the file and acts as a middle man; B becomes A's file
handle. If A dies, B, having linked itself to A and
trapping exits, detects this (it receives an 'EXIT'
message from A), closes the file, and then exits.
c) Supervisors are special processes built on the linking
concept. If a supervised process dies, it is restarted
with default values by its supervisor. If necessary,
the supervisor can be configured to restart a group
of processes, as this may simplify the re-synchronization.
If the restart frequency exceeds a configured limit, the
supervisor exits, and lets the next-level supervisor
handle the situation (escalated restart.)
d) Re-acquiring a process handle may not be necessary. A
process can register itself using a logical name, and
other processes wanting to talk to it, can use the
logical name as the destination for message sending.
After a crash, the new process registers under the
same name, and other processes may never know the
difference.
3) Erlang doesn't really use Design by Contract, but relies
rather heavily on pattern matching. For example,
The function file:open(File, Mode) is defined so that it
returns {ok, FileDescr} or {error, Reason}. A typical
call to this function would be formulated:
{ok, Fd} = file:open("foo.txt", [read]).
This means that the caller will assert that the returned
value is a 2-tuple where the first element is the constant
'ok', and the second is some object that becomes bound to
the free variable Fd. If the function would return e.g.
{error, enoent}, the caller would crash. This is called
"programming for the correct case", and is widely used in
Erlang. It works wonderfully for both large and small
systems.
Pattern matching can also be used on the inputs to a
function. For example the function hd(List), extracting
the first element from a linked list, could be written:
hd([Head|_]) -> Head.
Meaning that the function will only accept as input a
list containing at least one element (_ is a "don't care"
pattern, and in this case represents the tail of the list.)
Any other input will cause a function_clause exception.
This could also be written explicitly as:
hd([Head|_]) -> Head;
hd(Other) -> exit({function_clause, Other}).
/Uffe
--
Ulf Wiger
More information about the erlang-questions
mailing list