[erlang-questions] How to do this in Erlang -- any suggestions ?

Wed Jun 15 02:02:01 CEST 2011

Been thinking about this some more...

First you need to detect/gather the events, then you need a way for your  
actors to subscribe to the events, then you a way for the actors to filter  
and process the gathered events.

Some possibilities:

1. DETECTING/ACCUMULATING EVENTS

I think you have two choices here: send massages or trace messages.  
Sending messages to whatever process is collecting them is easy so I won't  
bother with an example. But it means you have to hard-code the events in  
advance, which might be restrictive for your purposes. Personally, I'd  
think you'd have better luck with tracing (as long as you limit tracing to  
a few processes - like your supervisors).

Using the previous "network_down" example, I think all you'd have to do is  
trace massages sent to supervisor Y, then filter those for 'EXIT' messages  
(NOTE: I haven't tested this AT ALL, this is purely thinking aloud)...

=== detector.erl ===

start_collecting(Node, MaxMicroSecs, MaxMsgCnt) ->
     {ok, Tracer} = dbg:tracer(Node, process, {fun handle_trace/2,
                                               {MaxMicroSecs, MaxMsgCnt,
                                                null, []}}), % Accumulate  
messages you're interested in
     {ok, _} = dbg:p(sup_y, [r]), % Trace messages received by supervisor Y
     Tracer.

handle_trace({trace, _, 'recieve', {'EXIT',_,_}} = Trace,
              {MaxMicroSecs, MaxMsgCnt, null, []})
     ->
     handle_trace(Trace, {MaxMicroSecs, MaxMsgCnt, erlang:now(), []});
handle_trace({trace, _, 'recieve', {'EXIT',_,_} = Msg},
              {MaxMicroSecs, MaxMsgCnt, LastTime0, Acc0})
     ->
     HeadTime = erlang:now(),
     Acc1 = [{HeadTime, Msg} | Acc0],
     case length(Acc1) =:= MaxMsgCnt of
         true ->
             collector:report_events(Acc1),
             {MaxMicroSecs, MaxMsgCnt, null, []};
         false ->
             case timer:now_diff(HeadTime, LastTime0) > MaxMicroSecs of
                 true ->
                     Acc2 = lists:dropwhile(
                                 fun ({X,_}) ->
                                     timer:now_diff(HeadTime, X) >  
MaxMicroSecs
                                 end, lists:reverse(Acc1)),
                     [LastTime1, _] = Acc2,
                     {MaxMicroSecs, MaxMsgCnt, LastTime1,  
lists:reverse(Acc2)};
                 false ->
                     {MaxMicroSecs, MaxMsgCnt, LastTime0, Acc1}
             end
     end;
handle_trace(_, Data) ->
     Data.

===

Now, so if you called...

> detector:start_collecting(Node, 1000000, 30).

... collector:report_events/1 should be called with a list of 30 {Time,  
ExitMessage} tuples detected over a period of 1 second (I hope.)

I suppose you could achieve the same by using sasl and a custom handler  
instead then respond to supervisor reports, but the advantage of the above  
method is that it could be easily modified to handle messages other than  
'EXIT' messages.

2. SUBSCRIBERS:

You'll require an easy way for many actors to respond to the same events  
and to add/remove themselves as possible handlers. I see two shortcuts for  
this: gen_event or et_collector.

For gen_event, just follow the documentation -- collector.erl would be a  
gen_event behaviour and collector:report_events/1 would call  
gen_event:notify(Ref, L) where L is the {Time, ExitMessage} list already  
grouped/accumulated by detector.erl. Then your actors call  
gen_event:add_handler, etc, etc.

However, I see advantages to trying out et_collector instead. It will take  
some hacking though, but I think it would be worth it. It has it's own  
dictionary service and built in a way of filtering events. Not to mention  
a nice GUI (et_viewer) that you can use later to view events post-morten.  
(a quick look at it's source reveals that you'll probably end up trying to  
rewrite a lot of the same code.) From the (sparse) documentation...

"...The Collector is a generic full-fledged framework that allows  
processes to 'subscribe' to the Events that it collects. One Collector can  
serve several Viewers... The architecture does also allow you to implement  
your own Viewer program as long as it complies to the protocol between the  
Collector/Viewer protocol..."

So your actors would be non-visual viewers created from a module written  
by you based code inspired by et_viewer.erl. Then, the  
collector:report_events/1 could look like...

report_events(L) ->
    et_collector:report_event(99,sup_y,alarm,exit_seq,L).

Then all subscribers would be notified do their thing. Best of all, later,  
you can use the standard et_viewer to see...

    sup_y        alarm
     |   exit_seq  |
     |------>------|
     |             |

Clicking on exit_seq would display {Time, ExitMsg} list L.

3. PROCESSING:

With gen_event, it would be a matter of applying whatever statistics you  
want to apply to the {Time, ExitMsg} list L passed to each handler, then  
performing your automatic actions based on the stats. Erlang being a  
soft-real-time system, I think you'd usually get to respond in time even  
though messages would arrive asynchronously.

With et, it would be the same except you'd perform the stats and automated  
with the subscribers.

What's left is finding a way of doing the stat processing and auto actions  
based on a set of instructions coming from non-programmers. I honestly  
have ZERO suggestions there :(

- Edmond -

On Wed, 15 Jun 2011 03:06:58 +1000, Banibrata Dutta  
<banibrata.dutta@REDACTED> wrote:

> On Tue, Jun 14, 2011 at 10:09 PM, Edmond Begumisa <
> ebegumisa@REDACTED> wrote:
>
>> The more I think about it, the more I think Banibrata is really onto
>> something here!
>>
>
> :-) , well, the more I read about CEP (Complex Event Processing), the  
> more I
> feel, people have been onto this thing for quite a while now. The  
> use-cases
> all seem very real.
>
>
>> A generic tool of this nature could be rather useful. If such a tool
>> allowed me to say things like...
>>
>> "If 30 'EXIT' messages appear in supervision subtree Y over a period of
>> 1sec and 60% of these contain {tcp_error, _} then call
>> sasl:set_alarm({'network_down', Y})" in which case an SMS would be sent  
>> to
>> my phone.
>>
>
> Precisely. It is extremely close to what I was thinking.
>
>>
>> I can think of a million uses for such a tool :)
>>
>
> Indeed. This is the reason, the more I think of FSM (in gen_fsm terms,  
> for
> instance), the more I get the feeling of getting into something that is
> "static", something that is embedded deep within an application. Mihai  
> here,
> mentioned DSL to represent the patterns & actions, and EPL (Event  
> Processing
> Language) in the CEP terminology, appears to be just that. Thanks to  
> Darach
> for recommending a really interesting book as well.
>
> And to answer your questions from the previous mail -- by real-time I  
> mean
> soft real-time, or more like nearly synchronous processing, and not an
> offline/batch processing for post-facto analysis. Also all this  
> processing,
> and looking into to time window to correlate events, needs to be done by  
> the
> computer. No human looking here... well, other than to fix things if they
> are broken.
>
> On Tue, 14 Jun 2011 06:13:56 +1000, Mihai Balea <mihai@REDACTED> wrote:
>
>
>> On Jun 13, 2011, at 9:12 AM, Banibrata Dutta wrote:
>>
>>
>>> Well, I haven't. In fact, starting with a small set of possible  
>>> patterns,
>>> seems to be the most intuitive approach. When I bring in the thought of
>>> permitting non-programmers to write "something" (e.g. natural language
>>> rules, or a simple DSL snippet), and then such a thing modifying the  
>>> FSM,
>>> without me having to code anything further in Erlang, is when I am  
>>> hitting a
>>> mental roadblock. I've coded fairly complex FSM's in C/C++ using the
>>> table-based approach. Does gen_fsm take a similar approach ? Will it  
>>> be a
>>> good, natural fit ? Maybe a rather naive question for people extremely
>>> familiar with Erlang, but I'm still very much on the learning curve's
>>> positive vector.
>>>
>>
>> Erlang gen_fsms are quite simple and standard, they define a bunch of
>> states and use pattern matching to select actions for each state/event
>> combination. Pretty much equivalent to a table-based fsm, only there's  
>> no
>> distinct table to speak of.
>>
>>
>>
>>> however you might want to make them a bit more flexible. Maybe have a
>>> bunch of processes running pattern recognizers, say one process per  
>>> pattern
>>> or class of patterns.
>>>
>>> My natural tendency was to think more in terms of 1 process per source,
>>> e.g. 1st event from a source causes an Erlang process to be created,  
>>> which
>>> then embodies the entire possible FSM. The main issue with pattern  
>>> based
>>> processes (if I understood it correctly), is determining the right  
>>> process
>>> to hand-off the event to, and once handed-off, backtracking would mean  
>>> extra
>>> work.
>>>
>>> Maybe you can filter events based on source so certain recognizers will
>>> only get certain events.
>>>
>>> Yes, that was the starting point, as I had imagined.
>>>
>>
>> Event forwarding based on source is technically easy in Erlang, just  
>> use a
>> process registry (proc, grpoc, even the built in process registrar) and  
>> use
>> sources as keys.
>> The problem with this approach is you might have at least some  
>> correlation
>> between sources. Events originating from different sources but  
>> belonging to
>> the same pattern (i.e. server going black due to switch failing in some
>> other room, due to power outage and the like).
>>
>>
>>> If you have a low correlation between event sources maybe you can even
>>> design your system to distribute the processing of unrelated events to
>>> different nodes in a cluster (big scalability gain if you can do that).
>>>
>>> Excellent thought... indeed. Source based sharding.
>>>
>>> Another way of approaching this would be to duplicate the event stream  
>>> to
>>> multiple nodes in your cluster and have each node only look for certain
>>> subsets of patterns.
>>>
>>> Actually, my bad, I should've mentioned, I wanted to do all this in
>>> Real-Time. So I don't really have a pattern-string, so to say. I don't,
>>> without that I can really take this approach. Or can I ?
>>>
>>
>> I think you can. If your patterns are independent, or at least can be
>> grouped in independent categories, what would prevent you from assigning
>> sets of independent patterns to different nodes? Not saying you need to  
>> do
>> that, but it might be helpful if you find out that recognizing and
>> processing thousands of patterns take up too much cpu.
>>
>> Mihai
>>
>>

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/