<div dir="ltr">Although I agree that the "normal message triggers a marker" algorithm solves the problem perfectly, I would also like to add a method to the table that could give a reasonably good approximation of the message queue delay without sending any marker messages. Just for the fun of it.<div><br></div><div>Once again, our process has two states: doing a measurement and not doing a measurement. When it receives a message while not doing a measurement it shall:</div><div><ul><li>Save the current length of its message queue (N)</li><li>Save the current time (T1)</li><li>Switch to doing a measurement state</li></ul><div>While doing a measurement simply count the number of messages received until we reach N messages. At this point save the current time (T2), and return to not doing a measurement mode.</div></div><div><br></div><div>T2 - T1 is a <i>lower limit</i> on the message queue delay (for this N-th message), because the message we are processing at T2 was already in the queue at T1. The actual delay may be somewhat larger, since the message entered the queue at some T0 < T1 time.</div><div><br></div><div>We can also get an <i>upper limit</i> on the message queue delay if we'd check the time (T3) at the processing of the N+1:th message: message N+1 was not yet in the message queue at T1 and got out of the queue at T3, so spent at most T3 - T1 time waiting.</div><div><br></div><div>One more thing to note: T3 = T2 + the time spent processing the N-th message + the time spent between handling two messages (which is: gen_server overhead, waiting for the next message to arrive and waiting for the CPU to run). Since gen_server overhead is low and we can assume there was no time spent waiting for the next message (that would mean the queue is empty, and the delay is 0), we can approximate T3 as T2 + time spent processing the N-th message (the error is the time spent waiting for the CPU).</div><div><br></div><div>So when we finish processing the N-th message we will have a definite lower limit and a quite accurate upper limit for the message queue delay. They are not measured for the same message though, but as long as there are no sudden jumps or drops in the message queue delay it doesn't really matter.</div><div><br></div><div>It's up to you whether you want to report both the upper and lower limits as metrics or calculate a single value from them (e.g. take either of them, calculate the average or whatever).</div><div><br></div><div>In case of gen_servers one possible caveat is that there may be system messages in the queue as well which are not delivered to your callback module. This means your message counting will be off by one. But system messages are extremely rare and unless the process is otherwise idle they won't cause a significant measurement error.</div><div><br></div><div>Cheers,</div><div>Daniel</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-04-29 17:50 GMT+02:00 Jesper Louis Andersen <span dir="ltr"><<a href="mailto:jesper.louis.andersen@gmail.com" target="_blank">jesper.louis.andersen@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Wed, Apr 29, 2015 at 11:12 AM, Roger Lipscombe <span dir="ltr"><<a href="mailto:roger@differentpla.net" target="_blank">roger@differentpla.net</a>></span> wrote:<br></span><span class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="overflow:hidden">I thought of posting a message to self(), with a timestamp and then<br>
measuring the delta. I could then post another one as soon as that<br>
message is processed. Obviously, this leaves the process continually<br>
looping, which is not good.<br></div></blockquote></span></div><br>I don't think this should be thrown out. It is salvagable, I think, and here is how:</div><div class="gmail_extra"><br></div><div class="gmail_extra">We update our gen_server with a new element in the state record:</div><div class="gmail_extra"><br></div><div class="gmail_extra">-record(state, { ..., marker_active = false }).</div><div class="gmail_extra"><br></div><div class="gmail_extra">Whenever we get a normal message, we check this state:</div><div class="gmail_extra"><br></div><div class="gmail_extra">handle_call(Msg, From, #state { marker_active = false } = State) -></div><div class="gmail_extra"> self() ! {marker, erlang:monotonic_time()},</div><div class="gmail_extra"> handle_call(Msg, From, State#state { marker_active = true });</div><div class="gmail_extra">....</div><div class="gmail_extra"><br>Of course handle_cast/2 and handle_info/2 needs the same treatment. This means whenever there is flow in the mailbox, we enable a marker with a timestamp.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Now, we handle the marker:</div><div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra">handle_info({marker, Old}, State) -></div><div class="gmail_extra"> Diff = erlang:monotonic_time() - Old,</div><div class="gmail_extra"> report_to_hdrhistogram(erlang:convert_time_unit(Diff, native, milli_seconds)),</div><div class="gmail_extra"> {noreply, State#state { marker_active = false }}</div><div class="gmail_extra">...</div><div><br></div></div><div class="gmail_extra">This means once we get a marker, we process its queue sojourn latency, but we avoid reinjecting a new marker. The next processed message will do that.</div><div class="gmail_extra"><br></div><div class="gmail_extra">In turn, if you have 1 message going in, there will be exactly one marker. But if other processes manages to enter 100 messages between a marker and the next one, then we have one marker per 100 messages. The process will never spin if there is nothing to process. An alternative is to add some sampling interval delay:</div><div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra">handle_info({marker, Old}, State) -></div><div class="gmail_extra"> Diff = erlang:monotonic_time() - Old,</div><div class="gmail_extra"> report_to_hdrhistogram(erlang:convert_time_unit(Diff, native, milli_seconds)),</div><div class="gmail_extra"> timer:send_after(50, self(), rearm_marker),</div><div class="gmail_extra"> {noreply, State};</div><div class="gmail_extra">handle_info(rearm_marker, State) -></div><div class="gmail_extra"> {noreply, State#state { marker_active = false }};</div><div class="gmail_extra">...</div><div class="gmail_extra"><br></div><div class="gmail_extra">which closes off the problem where you have markers all over the place all the time. </div><div><br></div><div>The essential idea is to use the arrival of a "normal" message as a trigger for invoking a time sojourn marker. And never have the marker itself reinject a marker so it can't cycle. In turn, there has to be a flux through your mailbox queue for the markers to get added, and if there is no flux, no marker will ever get added. The really astute reader will prove my ramblings :)</div><span class="HOEnZb"><font color="#888888"><div><br></div>-- <br><div>J.</div>
</font></span></div></div>
<br>_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
<br></blockquote></div><br></div>