[erlang-questions] [eeps] EEP 9

Fredrik Svahn fredrik.svahn@REDACTED
Sat Mar 8 22:34:42 CET 2008

I like this proposal. There is already a proposal to make a copy of
the string module for binaries and call it binary_string (and possibly
with variants for ascii/utf-x). If it could be merged with the current
string module in a backward compatible fashion, all the better.

As you say, implementation could become quite messy in some cases when
there is a mix of data types. Consider for instance finding "cat" in
["c",<<"a">>, "ts and dogs"] if there are different algorithms for
lists and binaries. I guess this would require the definition of
string() to be widened somewhat to equal the definition of an iolist()
(as defined in e.g. the manual page for the kernel/erlang module).
Could this cause any trouble somewhere? I do not know. I would like to
have some comments on this proposal.

Anyway to sum up where we are now: we are looking into splitting the
proposed set of functions into two parts, one binary_string (possibly
merged with the current string module depending on the feedback this
proposal gets) and one binary module.

The string/binary_string module should have the functions in string
today + a bif-version of str/2 which is faster and which could search
for more than one key (this is the match function in the current
version of the EEP). In addition to speeding up tokens/2 I propose a
split/2 function which takes a separator string/binary rather than a
list of separator chars. Something probably needs to be done about
utf-x support as well, I am not sure if this is to big for this EEP or

The binary module would only need functions for binaries which are not
strings, e.g.
nth/2 (different from sub_binary, this gets the nth byte)
match/2,3 (useful even if it isn't a string)
split/2,3 (useful even if it isn't a string)

In addition I would like to add functions to transform and filter
binaries in a more efficient fashion than can be done by binary
comprehensions, e.g. by a lookup table as suggested by Jay here:
http://www.duomark.com/erlang/publications/acm2005.pdf (chapter 4.1 is
especially interesting). This would mean some new built-in functions,
binary:translate, binary:extract and possibly some more.

BR /Fredrik

On Sat, Mar 8, 2008 at 6:06 PM, Vlad Balin <gaperton@REDACTED> wrote:
> Yes, and there are another idea, which would lead to more general approach.
>  If we allow these functions to work on iolists (!)  we could keep just
>  single module "strings". That would be ideal approach from the
>  application's programmer:
>  1) Concatenation of binaries is more expensive than making iolist out
>  of 2 binaries. Therefore, we'll take benefit of extremely cheap string
>  concatenations.
>  2) Function split can be used to reformat iolist. Quite intellegent
>  operation, which is more general than just strings manipulation.
>  I understand that this approach require more effort to implement than
>  original proposal, but
>  1) It features backward compatibility with strings module.
>  2) It opens extended possibilities working with binary strings, which
>  are not present in current languages, making Erlang one of the most
>  advancel language for string manipulations. Such as "lazy" string
>  concatenations with iolists.
>  2008/3/8, Vlad Balin <gaperton@REDACTED>:
> > One more issue. Take this function as an example.
>  >
>  >  split(Binary, SplitKeys) -> List
>  >             Binary = binary()
>  >             SplitKeys = binary() | [binary()]
>  >
>  >
>  >  Wouldn't it be useful to allow integers in SplitKeys in place of
>  >  binaries? We can treat them as 1-byte binaries (should be easy to
>  >  implement), and it will make code more readable in cases when keys
>  >  consist of 1 character.
>  >
>  >  With this option, we can just write
>  >
>  >  split( Buffer, "\n " )
>  >
>  >  instead of
>  >
>  >  > binary:match(<<"hello, world\n">>,[<<"\n">>,<<" ">>]).
>  >
>  >  It can be applied to many functions in this module, and it should
>  >  increase code readability in general.
>  >
>  >  Thanks,
>  >
>  > Vlad.
>  >

More information about the erlang-questions mailing list