I just realized that the process that spawns my standalone supervisors would linked by default to the supervisors through its call to start_link to start the supervisors in the first place. So when a supervisor dies because it has reached its max restarts, the calling gen_server process will get an exit signal in its handle_info callback of {'EXIT', DeadSupervisorPid, reached_max_restart_intensity}. This is basic error handling stuff and it is where i would write my code to do something with the error.<br>
<br>And now as I read the docs on handle_info/2 i see that that is where all system messages get sent which seems to answer my other question.<br><br>So I think I'm on the right track. Please someone let me know if I'm missing something. Thanks!<br>
<br>Steve<br><br><div class="gmail_quote">On Sun, Mar 22, 2009 at 10:58 AM, steve ellis <span dir="ltr"><<a href="mailto:steve.e.123@gmail.com">steve.e.123@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Thanks Lennart and Mihai! Very helpful information. Lennart it's good to know about the intent behind supervisor's orignial design.<br><br>I like Mihai's suggestion of having one supervisor supervise each process. This would get us most of the way there and it would be easy to implement.<br>
<br>But is there any way in OTP to see when a supervisor reaches its max restarts? I know this is logged by the sasl error logger. But how would I trap/detect this event in my code to do something with it?<br><br>It doesn't look like supervisor has a function like gen_server's handy terminate/2.<br>
<br>Maybe it would make more sense in our case to have one gen_server process monitor a child gen_server process. The child could call a function in the parent when it terminates. This way we'd have access to the terminate function of the monitoring/supervising gen_server. The problem with this though is that we'd have to implement our own restart strategy behavior, which is what is so great about supervisor.<br>
<br>This might be related to something more general that I've been wondering about (which I should post as a question in a new thread). How to tap into the sasl error logger so my system can do stuff with those events. For example I'd like to send these events to another machine via tcp.<br>
<br>Thanks!<br><font color="#888888"><br>Steve</font><div><div></div><div class="h5"><br><br><div class="gmail_quote">On Fri, Mar 20, 2009 at 5:29 PM, Mihai Balea <span dir="ltr"><<a href="mailto:mihai@hates.ms" target="_blank">mihai@hates.ms</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div><div></div><div><br>
On Mar 20, 2009, at 3:42 PM, steve ellis wrote:<br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
New to supervision trees and trying to figure out when to use them (and when not to)...<br>
<br>
I have bunch of spawned processes created through spawn_link. Want these processes to say running indefinitely. If one exits in an error state, we want to restart it N times. After N, we want to error log it, and stop trying to restart it. Perfect job for a one_to_one supervisor right?<br>
<br>
Well sort of. The problem is that when the max restarts for the error process is reached, the supervisor terminates all its children and itself. Ouch! (At least in our case). We'd rather that the supervisor just keep supervising all the children that are ok and not swallow everything up.<br>
<br>
The Design Principles appear to be saying that swallowing everything up is what supervisors are supposed to do when max restarts is reached which leaves me a little puzzled. Why would you want to kill the supervisor just because a child process is causing trouble? Seems a little harsh.<br>
<br>
Is this a case of me thinking supervisors are good for too many things? Is it that our case is better handled by simply spawning these processes and trapping exits on them, and restarting/error logging in the trap exit?<br>
</blockquote>
<br></div></div>
As far as I know, the standard supervisor cannot behave the way you want it to.<br>
<br>
So, at least until this type of behavior is added to the standard supervisor, you can work around it with double layers of supervision. Basically have one dedicated supervisor for each process you want to supervise and, in turn, each dedicated supervisor is set up as a transient child to one big supervisor.<br>
<font color="#888888">
<br>
Mihai<br>
<br>
</font></blockquote></div><br>
</div></div></blockquote></div><br>