[erlang-questions] Fwd: [eeps] EEP 9
Fredrik Svahn
fredrik.svahn@REDACTED
Fri Mar 7 23:13:50 CET 2008
Thanks for your comments! Please see answers below:
On Tue, Mar 4, 2008 at 9:25 PM, Vat Raghavan <machinshin2002@REDACTED> wrote:
> I really like this eep, and i can't wait for it (or something quite
similar :) ) to be part of otp. At least in part,
> it will mollify those who complain about erlang's string
manipulation support
>
> though, i think a better name of the module would be binary_string
or something along those lines.
There seems to be several similar suggestions. binary_string,
string_as_binary, bstring. I personally prefer binary_string.
> according to the eep, the reference implementation was given to
the otp team along w/ the eep, it seems to me
> that according to the 'many eyes' theory such an implementation
should also be available to all, either at the eep site (preferred) or
at the author's website or what have you.
>
> as to your question paul, the eep makes some suggestions about
either aho-corasick, or boyer-moore, so i think some profiling would
be required before any implementation decision could be made; even
still, we're more in api design phase at the moment, and whatever
implementation is finally used, i don't think it's very relevant now.
The code has been made available, although you should probably look at
it as more of an example of what the code could look like than a final
implementation. I had fun writing it, but I am sure that the OTP team
will make it into something better and faster.
> i do like the suggestion about binary:match,i often find when i
search strings i want not only the index searched, but the string that
was found ->
>
> [eep]
> -Maybe binary:match(<<"hello, world\n">>,[<<"xx">>,<<" ">>])
> should return {Needle, Index} (i.e. {<<" ">>, 7}) instead?
The downside is that it might be slower, especially for the split
function. We should probably do some measurements to see if it
matters. Having {Needle, Index} makes for more beautiful code than
{NeedleNumber, Index}.
> or perhaps {Index, NeedleLength} i.e. {7, 1}?
In retrospect this is probably not very good. To know which pattern
matched you will first have to extract it from the binary.
> [/eep]
>
>
> re: Unicode. perhaps it be better to have 2 seperate libraries for
ascii vs. unicode?
> also, how would the module handle different encodings,
utf-8/utf-16/utf-32, etc?
Either that or default ascii with an optional parameter for the Encoding.
BR /Fredrik
More information about the erlang-questions
mailing list