Erlang logo

Error Handling


back to top

Exit Signals are Sent when Processes Crash

When a process crashes (e.g. failure of a BIF or a pattern match) Exit Signals are sent to all processes to which the failing process is currently linked.

Dies and sends signal to linked processes

back to top

Exit Signals propagate through Links

Suppose we have a number of processes which are linked together, as in the following diagram. Process A is linked to B, B is linked to C (The links are shown by the arrows).

Now suppose process A fails - exit signals start to propogate through the links:

Exit signals propagating, A to B to C...

These exit signals eventuall reach all the processes which are linked together.

The rule for propagating errors is: If the process which receives an exit signal, caused by an error, is not trapping exits then the process dies and sends exit signals to all its linked processes.

back to top

Processes can trap exit signals

In the following diagram P1 is linked to P2 and P2 is linked to P3. An error occurs in P1 - the error propagates to P2. P2 traps the error and the error is not propagated to P3.

Process traps exit and does not propagate exit

P2 has the following code:

    {'EXIT', P1, Why} ->
	... exit signals ...
    {P3, Msg} ->
	... normal messages ...

back to top

Complex Exit signal Propagation

Suppose we have the following set of processes and links:

Bidirectional links in chain of processes

The process marked with a double ring is an error trapping process.

Process that traps exit stops propagation

If an error occurs in any of A, B, or C then All of these process will die (through propagation of errors). Process D will be unaffected.

back to top

Exit Signal Propagation Semantics

back to top

Robust Systems can be made by Layering

By building a system in layers we can make a robust system. Level1 traps and corrects errors occuring in Level2. Level2 traps and corrects errors ocuring in the application level.

In a well designed system we can arrange that application programers will not have to write any error handling code since all error handling is isolated to deper levels in the system.

Hierarhical layered trapping - supervision

back to top

Primitives For Exit Signal Handling

What really happens is as follows: Each process has an associated mailbox - Pid ! Msg sends the message Msg to the mailbox associated with the process Pid.

The receive .. end construct attempts to remove messages from the mailbox of the current process. Exit signals which arrive at a process either cause the process to crash (if the process is not trapping exit signals) or are treated as normal messages and placed in the process mailbox (if the process is trapping exit signals). Exit signals are sent implicitly (as a result of evaluating a BIF with incorrect arguments) or explicitly (using exit(Pid, Reason), or exit(Reason) ).

If Reason is the atom normal - the receiving process ignores the signal (if it is not trapping exits). When a process terminates without an error it sends normal exit signals to all linked processes. Don't say you didn't ask!

back to top

A Robust Server

The following server assumes that a client process will send an alloc message to allocate a resource and then send a release message to deallocate the resource.

This is unreliable - What happens if the client crashes before it sends the release message?

top(Free, Allocated) ->
	{Pid, alloc} ->
	    top_alloc(Free, Allocated, Pid);
	{Pid ,{release, Resource}} ->
	    Allocated1 = delete({Resource,Pid}, Allocated),
    	    top([Resource|Free], Allocated1)

top_alloc([], Allocated, Pid) ->
    Pid ! no,
    top([], Allocated);

top_alloc([Resource|Free], Allocated, Pid) ->
    Pid ! {yes, Resource},
    top(Free, [{Resource,Pid}|Allocated]).
This is the top loop of an allocator with no error recovery. Free is a list of unreserved resources. Allocated is a list of pairs {Resource, Pid} - showing which resource has been allocated to which process.

back to top

Allocator with Error Recovery

The following is a reliable server. If a client craches after it has allocated a resource and before it has released the resource, then the server will automatically release the resource.

The server is linked to the client during the time interval when the resource is allocted. If an exit message comes from the client during this time the resource is released.

top_recover_alloc([], Allocated, Pid) ->
    Pid ! no,
    top_recover([], Allocated);

top_recover_alloc([Resource|Free], Allocated, Pid) ->
    %% No need to unlink.
    Pid ! {yes, Resource},
    top_recover(Free, [{Resource,Pid}|Allocated]).

top_recover(Free, Allocated) ->
	{Pid , alloc} ->
	    top_recover_alloc(Free, Allocated, Pid);
	{Pid, {release, Resource}} ->
 	    Allocated1 = delete({Resource, Pid}, Allocated),
	    top_recover([Resource|Free], Allocated1);
	{'EXIT', Pid, Reason} ->
	    %% No need to unlink.
	    Resource = lookup(Pid, Allocated),
	    Allocated1 = delete({Resource, Pid}, Allocated),
	    top_recover([Resource|Free], Allocated1)
Not done -- multiple allocation to same process. i.e. before doing the unlink(Pid) we should check to see that the process has not allocated more than one device.

back to top

Allocator Utilities

delete(H, [H|T]) ->
delete(X, [H|T]) ->
    [H|delete(X, T)].

lookup(Pid, [{Resource,Pid}|_]) ->
lookup(Pid, [_|Allocated]) ->
    lookup(Pid, Allocated).

back to top