[erlang-questions] [eeps] EEP 9

Vat Raghavan machinshin2002@REDACTED
Tue Mar 4 21:25:59 CET 2008

I really like this eep, and i can't wait for it (or something quite similar :) ) to be part of otp. At least in part,
it will mollify those who complain about erlang's string manipulation support

though, i think a better name of the module would be binary_string or something along those lines.

according to the eep, the reference implementation was given to the otp team along w/ the eep, it seems to me 
that according to the 'many eyes' theory such an implementation should also be available to all, either at the eep site (preferred) or at the author's website or what have you.

as to your question paul, the eep makes some suggestions about either aho-corasick, or boyer-moore, so i think some profiling would be required before any implementation decision could be made; even still, we're more in api design phase at the moment, and whatever implementation is finally used, i don't think it's very relevant now.

i do like the suggestion about binary:match,i often find when i search strings i want not only the index searched, but the string that was found ->

-Maybe binary:match(<<"hello, world\n">>,[<<"xx">>,<<" ">>])
                should return {Needle, Index} (i.e. {<<" ">>, 7}) instead?
                or perhaps {Index, NeedleLength} i.e. {7, 1}?

re: Unicode. perhaps it be better to have 2 seperate libraries for ascii vs. unicode?
also, how would the module handle different encodings, utf-8/utf-16/utf-32, etc? 


Without the hope that things will get better, that our inheritors will know a world that is fuller and richer than our own,
life is pointless, and evolution is vastly overrated
 -- Delenn

----- Original Message ----
From: Paul Fisher <pfisher@REDACTED>
To: erlang-questions <erlang-questions@REDACTED>
Sent: Tuesday, March 4, 2008 11:52:21 AM
Subject: Re: [erlang-questions] [eeps] EEP 9

 On  Tue,  2008-03-04  at  17:15  +0100,  Raimo  Niskanen  wrote:
>  EEP  9  is  recognized  by  the  EEP  editor(s).
>  http://www.erlang.org/eeps/

Overall,  good  show.   This  is  progress  in  a  very  good  direction.   A
couple  of  questions:

1)  Can  the  reference  implementation  be  made  available  publicly  as  a
patch  to  R12B-1?   (Or  actually  in  any  fashion  would  be  great.)

2)  Which  algorithm  was  choosen  for  the  binary:match()?   For  multiple
keyword,  Aho-Corasick  would  be  great,  especially  if  the  interface  was
something  like  this:

      MatchContext  =  binary:match_compile(  [<<"the">>,  <<"big">>, 
                                                               <<"frog">>]  ),
      Value  =  <<"when  we  had  a  frog,  he  was  big">>,
      [{3,  14},  {2,  27}]  =  binary:match(  MatchContext,  Value  )

Where  the  result  tuples  were  keyword  #  and  byte  offset.

More  comments  later,  once  I  have  some  more  time  to  consider  the  rest  of
the  document.



erlang-questions  mailing  list

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 

More information about the erlang-questions mailing list