[eeps] EEP 31
Patrik Nyblom
pan@REDACTED
Thu Dec 10 11:47:21 CET 2009
Hi Robert and all others!
Thanks for the input, I'll comment on the different points below:
On Thu, 10 Dec 2009, Robert Virding wrote:
> Almost missed this.
Yes, that was close :)
>
> I have a few comments, in no specific order:
>
> - I understand that giving list of binaries for a pattern means that they
> are alternatives. I could not see where this was stated and I wonder if it
> not a little confusing?
I can't really see what they would otherwise be - what would be the
alternative interpretation? I could clarify that of course.
>
> - Does 're' always return the first *shortest* match? I thought that it
> returned the first matching alternative irrespective of length. Could be
> wrong though. Always returning the shortest is very restrictive.
If you are to return non-overlapping matches and want to return the first
match, you could either select the longest or the shortest at the first
matching point. Without an option to control that, you of course have to
select one and be consistent. The "shortest" behaviour is consistent with
re, but not with regexp, which uses the other strategy. The regexp module
does therefore not return the largest number of non overlapping matches
possible, as opposed to re. I think the re behaviour is the best and has
chosen that for the binary module.
I can possibly see a use for the alternative strategy, maybe an option? I
am however unsure if it's worth the implementation effort. Maybe...
Examples:
27> re:run("abc","ab|abc",[global]).
{match,[[{0,2}]]}
28> regexp:matches("abc","ab|abc").
{match,[{1,3}]}
29> re:run("abc","ab|abc|c",[global]).
{match,[[{0,2}],[{2,1}]]}
30> regexp:matches("abc","ab|abc|c").
{match,[{1,3}]}
>
> - Are first/1 and last/1 really necessary?
>
No. None of the access methods are necessary - they are convenience
functions.
I quote the EEP:
"Decomposition of binaries are usually done by using bit-syntax. However
some common operations are useful to have as ordinary functions, both for
performance and to support a more traditional functional programming
style."
Access functions can be made more efficient than bit syntax as they need
not build a match-context, but that is an implementation detail, the most
important reason is the programming style argument.
> - This is the first time we have a copy function.
Yes. The usage is twofold and not really attractive. The need for such a
"cloning" function is however obvious from discussions on
erlang-questions.
Having a lists:duplicate-like function is also nice, but reusing the name
from the list module would be slightly confusing because of the copy/1
function and also due to the parameter ordering.
Not directly stating what need you could have for the copy-function with
only one copy would make programs less understandable. To copy
sub-binaries the today strategy in code is:
Copy = list_to_binary([Bin]).
Not really obvious... To instead write:
Copy = binary:copy(Bin)
seems more attractive, especially if the documentation states why this
could be necessary.
Suggestions on other naming?
>
> - It seems like 'binary' indexes binaries from 0. Is this wise? While
> indexing them from 1 may not have been a good choice having two different
> standards must surely be much worse and be a source of future confusion. I
> know that 're' does this but I think that was a bad mistake!
We (OTP), reluctantly, made the decision to have zero-based indices as a
rule for binary-oriented modules although Erlang is traditionally
one-based. The reason beeing foremost the hassle of
using one-based indices in bit syntax (the only thing you can make with a
one based index is to make it zero based, it's useless in bit-syntax until
that is done). Having different bases in different binary-oriented modules
would add to the confusion (and make for a less convenient interface).
So, the design relu is now that all indices in binaries are zero-based. I
obviously won't make this module an exception.
>
> - I would probably add to_list/1 and from_list/1 for completeness even
> though they are the same as the bifs. Perhaps a future path would be to
> phase out the 'erlang' bifs and move them into 'binary'?
Yes - good idea. I will add those. Moving the bif's to the binary module
would of course break BC, but adding them here and maybe deprecating the
old for future removal (probably at the time of my retirement) is possible
and probably desireable.
>
> - I am really serious about numbering from 0.
Understood. I won't however change that unless you buy Ericsson or OTP and
enforce new design rules ;) ... In which case I would not change it anyway
because you would fire me on the instant I suspect :D
>
> - The function name part seems strange, perhaps just because it is new. How
> about using sub? I liked the names used in 'regexp', taken from the ancient
> masters. :-)
I used names from other modules when functionality and interface was very
similar. That resulted in match/matches/split from regexp (as re has the
multifunctional run function I do not want to use here), but replace
instead of sub as sub can easilly be confused both with "sub-binariey" and
subtract in this general module (named after the datatype and not the
functionality).
>
> That's about all for now. I'll be back,
I'll have to start implementing, so be back soon :)
To summarize:
- I'll clarify that multiple search strings mean they are alternatives.
- I'll think about an option for getting the longest matches instead of
the shortest when having multiple alternatives if that is really
interesting to anyone. What do others think?
- Any suggestion for other interface/interfaces than the copy-functions?
Including the spelled out twofold functionality?
- Zero based indices will not change, although we fully understand the
objection.
- The suggested list conversion functions will be added (if noone
objects strongly, with good, valid and sound arguments :)).
- Unless a large part of the community is for a change of the name sub
(for instead of replace i suppose you mean, not instead of part?), I
will not change that.
Thanks again for taking the time to read and comment on the EEP!
Cheers,
/Patrik
>
> Robert
>
> 2009/11/26 Patrik Nyblom <pan@REDACTED>
>
>> A new EEP is submitted.
>>
>> EEP 31 is a slight rework of a part of EEP 9.
>>
>> Comments on this EEP are accepted until 10-Dec-2009. Please send all
>> comments and suggestions to this mailing list (eeps@REDACTED).
>>
>> Best regards,
>>
>> /The OTP team/Patrik
>>
>> ________________________________________________________________
>> eeps mailing list. See http://www.erlang.org/faq.html
>> eeps (at) erlang.org
>>
>>
>
More information about the eeps
mailing list