<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0cm;
margin-bottom:.0001pt;
font-size:10.5pt;
font-family:Consolas;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:Consolas;}
.MsoChpDefault
{mso-style-type:export-only;}
@page Section1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=SV link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><span lang=EN-GB style='font-size:11.0pt;font-family:"Courier New"'>Hi,
as you have discovered there are a few different restart strategies available
when designing a supervisor (e.g: one_for_one, one_for all). One can of course come
up with more or less an infinite number of such strategies, each one with its
own twist.<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-GB style='font-size:11.0pt;font-family:"Courier New"'>The
main idea and problem at the time when the supervisor behaviour was constructed
was that you have a set of more or less permanent processes that implements a
subsystem. There should not be any ‘illegal’ terminations (such
that causes the supervisor to act) amongst the children. But, as we know, no
non trivial system is completely correct, hence an occasional failure and
following restart must be allowed. If we have repeated failures it may indicate
that the problem concerns more than this subsystem, therefore the need to
eventually escalate the restarts (or as you have discovered kill all children
and itself).<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-GB style='font-size:11.0pt;font-family:"Courier New"'><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-GB style='font-size:11.0pt;font-family:"Courier New"'>If
you really want to achieve a situation where failures never escalates above a
certain supervisor you can bump up the max-restart threshold and at the same
time shorten the sliding window. It is a not uncommon mistake (in normal usage
of supervisors </span><span lang=EN-GB style='font-size:11.0pt;font-family:
Wingdings'>J</span><span lang=EN-GB style='font-size:11.0pt;font-family:"Courier New"'>
) to have a too high max-restart-intensity in combination with a too short
sliding window at a higher level supervisor. It may then be too long between
escalation attempts by lower level supervisor for the same error, making a higher
level supervisor not consider two failures amongst its children being the same
error, and therefore not eventually escalate to its superior supervisor.<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-GB style='font-size:11.0pt;font-family:"Courier New"'><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-GB style='font-size:11.0pt;font-family:"Courier New"'>Best
Regards<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-GB style='font-size:11.0pt;font-family:"Courier New"'>Lennart<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-GB style='font-size:11.0pt;font-family:"Calibri","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoPlainText><span lang=EN-GB>-------------------------------------------------------------<o:p></o:p></span></p>
<p class=MsoPlainText><span lang=EN-GB>Lennart Öhman direct
: +46 8 587 623 27<o:p></o:p></span></p>
<p class=MsoPlainText><span lang=EN-GB>Sjöland & Thyselius Telecom AB
cellular: +46 70 552 6735<o:p></o:p></span></p>
<p class=MsoPlainText>Hälsingegatan 43, 10 th floor fax : +46 8 667 82 30<o:p></o:p></p>
<p class=MsoPlainText>SE-113 31 STOCKHOLM, SWEDEN email :
lennart.ohman@st.se<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoNormal><span lang=EN-GB style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'>
<p class=MsoNormal><b><span lang=EN-US style='font-size:10.0pt;font-family:
"Tahoma","sans-serif"'>From:</span></b><span lang=EN-US style='font-size:10.0pt;
font-family:"Tahoma","sans-serif"'> erlang-questions-bounces@erlang.org
[mailto:erlang-questions-bounces@erlang.org] <b>On Behalf Of </b>steve ellis<br>
<b>Sent:</b> den 20 mars 2009 20:42<br>
<b>To:</b> erlang-questions@erlang.org<br>
<b>Subject:</b> [erlang-questions] to supervise or not to supervise<o:p></o:p></span></p>
</div>
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal style='margin-bottom:12.0pt'>New to supervision trees and
trying to figure out when to use them (and when not to)...<br>
<br>
I have bunch of spawned processes created through spawn_link. Want these
processes to say running indefinitely. If one exits in an error state, we want
to restart it N times. After N, we want to error log it, and stop trying to
restart it. Perfect job for a one_to_one supervisor right?<br>
<br>
Well sort of. The problem is that when the max restarts for the error process
is reached, the supervisor terminates all its children and itself. Ouch! (At
least in our case). We'd rather that the supervisor just keep supervising all
the children that are ok and not swallow everything up.<br>
<br>
The Design Principles appear to be saying that swallowing everything up is what
supervisors are supposed to do when max restarts is reached which leaves me a
little puzzled. Why would you want to kill the supervisor just because a child
process is causing trouble? Seems a little harsh.<br>
<br>
Is this a case of me thinking supervisors are good for too many things? Is it
that our case is better handled by simply spawning these processes and trapping
exits on them, and restarting/error logging in the trap exit?<br>
<br>
Thanks!<br>
<br>
Steve<br>
<br>
<o:p></o:p></p>
</div>
</body>
</html>