[erlang-questions] word filtering

Robert Virding <>
Wed Jun 6 22:14:35 CEST 2007


I did just this way back in the Bluetail days. The input would be a file 
containing all the words you wanted to detect, one per line. Then I had 
an AWK script (or it could have been Erlang) which then generated a leex 
input file which was compiled and run on the message. It was fast, but I 
can't remember how fast.

The funny thing with doing it this way. When modifying the input words 
it was probably faster to regenerate the leex file and recompile it than 
to keep the words in a smart database and update that.

If you want I will see if I can find my old code. If 
Bluetail/Alteon/Nortel don't mind, though I doubt they know, or care. :-)

Robert

ok wrote:
> On 5 Jun 2007, at 4:00 pm, shehan wrote:
>> I want to write spam detecting (word filtering) function. I already  
>> know
>> that regexp can be used for that & it is just string comparing & too
>> slow when used in high volume usage.(ex: 500 text messages/sec) Can
>> somebody tell me that, is there any method in Erlang to filter words
>> faster than regexp?
> 
> There are regular expressions, and then again, there are regular
> expressions.  More precisely, there are various regular expression
> library modules for Erlang, which all build some kind of data
> structure which has to be interpreted at run time, but there is also
> Leex, an Erlang equivalent of lex/flex.  See
> http://trapexit.erlang-consulting.com/forum/viewtopic.php? 
> p=20845&sid=3c7cc47cd5cb6a75d401d0e5694dfec9
> 
> What you get with Leex is Erlang source code which you can compile
> as usual (even to native code, using HiPE).  I would expect this to
> cope with 500 text messages per second.
> 
> There are other approaches.
> 
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
> 



More information about the erlang-questions mailing list