[erlang-questions] [eeps] EEP 9
Sat Mar 8 22:34:42 CET 2008
I like this proposal. There is already a proposal to make a copy of
the string module for binaries and call it binary_string (and possibly
with variants for ascii/utf-x). If it could be merged with the current
string module in a backward compatible fashion, all the better.
As you say, implementation could become quite messy in some cases when
there is a mix of data types. Consider for instance finding "cat" in
["c",<<"a">>, "ts and dogs"] if there are different algorithms for
lists and binaries. I guess this would require the definition of
string() to be widened somewhat to equal the definition of an iolist()
(as defined in e.g. the manual page for the kernel/erlang module).
Could this cause any trouble somewhere? I do not know. I would like to
have some comments on this proposal.
Anyway to sum up where we are now: we are looking into splitting the
proposed set of functions into two parts, one binary_string (possibly
merged with the current string module depending on the feedback this
proposal gets) and one binary module.
The string/binary_string module should have the functions in string
today + a bif-version of str/2 which is faster and which could search
for more than one key (this is the match function in the current
version of the EEP). In addition to speeding up tokens/2 I propose a
split/2 function which takes a separator string/binary rather than a
list of separator chars. Something probably needs to be done about
utf-x support as well, I am not sure if this is to big for this EEP or
The binary module would only need functions for binaries which are not
nth/2 (different from sub_binary, this gets the nth byte)
match/2,3 (useful even if it isn't a string)
split/2,3 (useful even if it isn't a string)
In addition I would like to add functions to transform and filter
binaries in a more efficient fashion than can be done by binary
comprehensions, e.g. by a lookup table as suggested by Jay here:
http://www.duomark.com/erlang/publications/acm2005.pdf (chapter 4.1 is
especially interesting). This would mean some new built-in functions,
binary:translate, binary:extract and possibly some more.
On Sat, Mar 8, 2008 at 6:06 PM, Vlad Balin <gaperton@REDACTED> wrote:
> Yes, and there are another idea, which would lead to more general approach.
> If we allow these functions to work on iolists (!) we could keep just
> single module "strings". That would be ideal approach from the
> application's programmer:
> 1) Concatenation of binaries is more expensive than making iolist out
> of 2 binaries. Therefore, we'll take benefit of extremely cheap string
> 2) Function split can be used to reformat iolist. Quite intellegent
> operation, which is more general than just strings manipulation.
> I understand that this approach require more effort to implement than
> original proposal, but
> 1) It features backward compatibility with strings module.
> 2) It opens extended possibilities working with binary strings, which
> are not present in current languages, making Erlang one of the most
> advancel language for string manipulations. Such as "lazy" string
> concatenations with iolists.
> 2008/3/8, Vlad Balin <gaperton@REDACTED>:
> > One more issue. Take this function as an example.
> > split(Binary, SplitKeys) -> List
> > Binary = binary()
> > SplitKeys = binary() | [binary()]
> > Wouldn't it be useful to allow integers in SplitKeys in place of
> > binaries? We can treat them as 1-byte binaries (should be easy to
> > implement), and it will make code more readable in cases when keys
> > consist of 1 character.
> > With this option, we can just write
> > split( Buffer, "\n " )
> > instead of
> > > binary:match(<<"hello, world\n">>,[<<"\n">>,<<" ">>]).
> > It can be applied to many functions in this module, and it should
> > increase code readability in general.
> > Thanks,
> > Vlad.
More information about the erlang-questions