[erlang-questions] Is it possible to add an filter property for the erlang process?

Fri Jun 18 00:38:19 CEST 2010

On Jun 17, 2010, at 4:29 PM, litao cheng wrote:

> hi, buddies.
> my question is if it's possible to add an "filter" property for the  
> erlang
> process, so that process can drop the uninterested messages. then we  
> can
> reduce the needless memory consuming.

The idea is at least 12 years old, so you are definitely
not the first person to think it might be useful.
The presentation of "abstract patterns" pointed out that
they are the idea candidates for filtering; it's safe to
run them in the context of the sender, so it's possible
for unwanted messages to (sometimes) be dropped even before
they are copied.

However, we don't actually need such a mechanism, because we
can program it using another process.  Instead of

	A ------> B

we use

	A ------> Filter ------> B

where the identity of B is known only to the Filter process,
so no other process can send to it.

filtered_spawn(Fun, Predicate)
   when is_function(Fun, 0), is_function(Predicate, 1) ->
     spawn(fun () ->
	Pid = spawn_link(Fun),
	filtered_spawn_loop(Pid, Predicate)
     end).

filtered_spawn_loop(Pid, Predicate) ->
     receive Message ->
         case Predicate(Message)
           of true -> Pid ! Message
            ; false -> ok
	end,
	filtered_spawn_loop(Pid, Predicate)
     end.

If you want process B to be able to change the Predicate,
it's not _enormously_ hard except that now B must know
Filter as well as Filter knowing B.

filtered_spawn(Fun, Predicate)
   when is_function(Fun, 0), is_function(Predicate, 1) ->
     spawn(fun() ->
         Pid = spawn_link(fun () -> filtered_spawn_target(Fun) end),
	Pid ! self(),
	filtered_spawn_loop(Pid, Predicate)
     end).

filtered_spawn_target(Fun) ->
     receive Filter ->
         put(my_filter_pid, Filter),
         Fun()
     end.

set_filter(Predicate)
   when is_function(Predicate, 1) ->
     get(my_filter_pid) ! {set_filter,self(),Predicate}.

filtered_spawn_loop(Pid, Predicate) ->
     receive
	{set_filter,Pid,Predicate1} ->
	    filtered_spawn_loop(Pid, Predicate1)
       ; Message ->
	    case Predicate(Message)
               of true -> Pid ! Message
	       ; _ -> ok
	    end,
	    filtered_spawn_loop(Pid, Predicate)
     end.

This is of course UNTESTED CODE meant solely for the purpose of
illustrating an idea.

Concurrency ALWAYS seems to introduce nasty subtle points,
and there's one here.  There is in fact a difference between
filtering done inside the message sending machinery and
filtering done using a filter process.  The filter process
discards messages *only* when it is running; while it is not
scheduled, any number of messages could build up in its mailbox.
With filtering done inside the machinery, the filtering cannot
be delayed like that.

I once read a science fiction story in Analog magazine
where the plot gimmick was that a machine had been developed
for restoring people's health by restoring the balance of
their fields (or some such bafflegab) rather than by means of
antibiotics &c.  There was a growing problem of "Box addicts";
people who felt sick if they didn't take a treatment with the
Box at least every day.  It turned out that some "extinct" disease
(smallpox? plague?) had been released from an archaeological dig,
and the "addicts" didn't have a psychological problem, they were
genuinely sick with a fatal disease that the Box could help them
survive with but not actually cure.

The point here is that a filtering process or filtering machinery
is just like the Box.  It helps a seriously sick system keep
running despite a bug that would otherwise have taken it down.
I can't help thinking that simply dropping bad messages and
stumbling on will *prevent* the necessary detection and repair of
the problem for longer than anyone would want.

The Erlang philosophy is not "keep going at all costs, including
sanity" but "let it crash".  We want bad messages to *crash*
something so that we find out.  (This is one of the things UBF
is for.)

Messages that are a *legal* part of the protocol but currently have
no effect really ought to be handled by explicit code that accepts
them and does nothing with them.  It can be hard enough for a
maintenance programmer to find out what a protocol is without moving
part of it elsewhere and hiding it.

>
> like this:
> % spawn a process with filter : {foo, _}
> PidA = spawn(?MODULE, do_something, [], [{filter, {foo, _}}]),
>
> % the message {foo, hello} will be saved to PidA message queue,  
> because it
> match the filter
> PidA ! {foo, hello},
>
> % this message will be dropped
> PidA ! {bar, world},
>
> % reset the filter
> process_flag(filter, [_|_]),
>
> % this message will be stored in message queue
> PidA ! "hello world",
> ok.
>
> by this feature,  I can write the code like this ( in my logging  
> library,
> which is similar with python logging module):
> {ok, _} = logging:start_logger("logname"),
> Logger = logging:get_logger("logname"),
> Logger:set_level(?CRITICAL),
> Logger:debug("log"),
> Logger:warning("log"),
> Logger:critical("log"),
> ok
>
> logger is an parameterized module, logging is an gen_server process,  
> in
> logging module:
> handle_call({set_level, Level}, _From, State) ->
>    process_flag(filter, #log_record{level = MsgLevel} when MsgLevel >
> Level),
>    {reply, ok, State};
>
> in my logging library,  I have two methods to resolve this problem:
> * dynamic compile the logger module, in the debug, info, warning,  
> error
> function check if the log allowed
> * in logging gen_server process handl_xxx function, test if the log  
> record
> is allowed
> all two solutions have flew.
>
> I want to known if this process filter feature is valuable?
> thanks