The Details of Error Handling
Links
A link is something that defines an error propagation path between two processes. If two processes are linked together and one of the processes dies, then an exit signal will be sent to the other process. The set of processes that are currently linked to a given process is called the link set of that process.
Exit signals
An exit signal is something generated by a process when the process dies. This signal is broadcast to all processes that are in the link set of the dying process. The exit signal contains an argument giving the reason why the process died. The reason can be any Erlang data term. This reason can be set explicitly by calling the primitive exit(Reason), or it is set implicitly when an error occurs. For example, if a program tries to divide a number by zero, then the exit reason will be the atom badarith.
When a process has successfully evaluated the function it was spawned with, it will die with the exit reason normal.
In addition, a process Pid1 can explicitly send an exit signal X to a process Pid2 by evaluating exit(Pid2, X). The process that sends the exit signal does not die; it resumes execution after it has sent the signal. Pid2 will receive a {’EXIT’, Pid1, X} message (if it is trapping exits), exactly as if the originating process had died. Using this mechanism, Pid1 can “fake” its own death (this is deliberate).
System processes
When a process receives a non-normal exit signal, it too will die unless it is special kind of process called a system process. When a system process receives an exit signal Why from a process Pid, then the exit signal is converted to the message {’EXIT’, Pid, Why} and added to the mailbox of the system process.
Calling the BIF process_flag(trap_exit, true) turns a normal process
into a system process that can trap exits.
When an exit signal arrives at a process, then a number of different things might happen. What happens depends upon the state of the receiving process and upon the value of the exit signal and is determined by the following table:
trap_exit Exit Signal Action
true kill Die: Broadcast the exit signal killed to the link set.
true X Add {’EXIT’, Pid, X} to the mailbox.
false normal Continue: Do-nothing signal vanishes.
false kill Die: Broadcast the exit signal killed to the link set.
false X Die: Broadcast the exit signal X to the link set.
Monitors
An alternative to links are monitors. A process Pid1 can create a monitor for Pid2 by calling the BIF erlang:monitor(process, Pid2). The function returns a reference Ref.
If Pid2 terminates with exit reason Reason, a 'DOWN' message is sent to Pid1: {'DOWN', Ref, process, Pid2, Reason} If Pid2 does not exist, the 'DOWN' message is sent immediately with Reason set to noproc.
Monitors are unidirectional. Repeated calls to erlang:monitor(process, Pid) will create several, independent monitors and each one will send a 'DOWN' message when Pid terminates.
A monitor can be removed by calling erlang:demonitor(Ref). It is possible to create monitors for processes with registered names, also at other nodes.
Hidden Nodes
In a distributed Erlang system, it is sometimes useful to connect to a node without also connecting to all other nodes. An example could be some kind of O&M functionality used to inspect the status of a system without disturbing it. For this purpose, a hidden node may be used.
A hidden node is a node started with the command line flag -hidden. Connections between hidden nodes and other nodes are not transitive, they must be set up explicitly. Also, hidden nodes does not show up in the list of nodes returned by nodes(). Instead, nodes(hidden) or nodes(connected) must be used. This means, for example, that the hidden node will not be added to the set of nodes that global is keeping track of.
C Nodes
A C node is a C program written to act as a hidden node in a distributed Erlang system. The library Erl_Interface contains functions for this purpose.
epmd
The Erlang Port Mapper Daemon epmd is automatically started at every host where an Erlang node is started. It is responsible for mapping the symbolic node names to machine addresses.