[eeps] EEP 31

Thu Dec 10 11:47:21 CET 2009

Hi Robert and all others!

Thanks for the input, I'll comment on the different points below:

On Thu, 10 Dec 2009, Robert Virding wrote:

> Almost missed this.

Yes, that was close :)

>
> I have a few comments, in no specific order:
>
> - I understand that giving list of binaries for a pattern means that they
> are alternatives. I could not see where this was stated and I wonder if it
> not a little confusing?

I can't really see what they would otherwise be - what would be the 
alternative interpretation? I could clarify that of course.

>
> - Does 're' always return the first *shortest* match? I thought that it
> returned the first matching alternative irrespective of length. Could be
> wrong though. Always returning the shortest is very restrictive.

If you are to return non-overlapping matches and want to return the first 
match, you could either select the longest or the shortest at the first 
matching point. Without an option to control that, you of course have to 
select one and be consistent. The "shortest" behaviour is consistent with 
re, but not with regexp, which uses the other strategy. The regexp module 
does therefore not return the largest number of non overlapping matches 
possible, as opposed to re. I think the re behaviour is the best and has 
chosen that for the binary module.

I can possibly see a use for the alternative strategy, maybe an option? I 
am however unsure if it's worth the implementation effort. Maybe...

Examples:
27> re:run("abc","ab|abc",[global]).
{match,[[{0,2}]]}
28> regexp:matches("abc","ab|abc").
{match,[{1,3}]}
29> re:run("abc","ab|abc|c",[global]).
{match,[[{0,2}],[{2,1}]]}
30> regexp:matches("abc","ab|abc|c").
{match,[{1,3}]}

>
> - Are first/1 and last/1 really necessary?
>
No. None of the access methods are necessary - they are convenience 
functions.

I quote the EEP:
"Decomposition of binaries are usually done by using bit-syntax. However 
some common operations are useful to have as ordinary functions, both for 
performance and to support a more traditional functional programming 
style."

Access functions can be made more efficient than bit syntax as they need 
not build a match-context, but that is an implementation detail, the most 
important reason is the programming style argument.

> - This is the first time we have a copy function.

Yes. The usage is twofold and not really attractive. The need for such a 
"cloning" function is however obvious from discussions on 
erlang-questions.

Having a lists:duplicate-like function is also nice, but reusing the name 
from the list module would be slightly confusing because of the copy/1 
function and also due to the parameter ordering.

Not directly stating what need you could have for the copy-function with 
only one copy would make programs less understandable. To copy 
sub-binaries the today strategy in code is:

Copy = list_to_binary([Bin]).

Not really obvious... To instead write:

Copy = binary:copy(Bin)

seems more attractive, especially if the documentation states why this 
could be necessary.

Suggestions on other naming?

>
> - It seems like 'binary' indexes binaries from 0. Is this wise? While
> indexing them from 1 may not have been a good choice having two different
> standards must surely be much worse and be a source of future confusion. I
> know that 're' does this but I think that was a bad mistake!

We (OTP), reluctantly, made the decision to have zero-based indices as a 
rule for binary-oriented modules although Erlang is traditionally 
one-based. The reason beeing foremost the hassle of 
using one-based indices in bit syntax (the only thing you can make with a 
one based index is to make it zero based, it's useless in bit-syntax until 
that is done). Having different bases in different binary-oriented modules 
would add to the confusion (and make for a less convenient interface).
So, the design relu is now that all indices in binaries are zero-based. I 
obviously won't make this module an exception.

>
> - I would probably add to_list/1 and from_list/1 for completeness even
> though they are the same as the bifs. Perhaps a future path would be to
> phase out the 'erlang' bifs and move them into 'binary'?

Yes - good idea. I will add those. Moving the bif's to the binary module 
would of course break BC, but adding them here and maybe deprecating the 
old for future removal (probably at the time of my retirement) is possible 
and probably desireable.

>
> - I am really serious about numbering from 0.

Understood. I won't however change that unless you buy Ericsson or OTP and 
enforce new design rules ;) ... In which case I would not change it anyway 
because you would fire me on the instant I suspect :D

>
> - The function name part seems strange, perhaps just because it is new. How
> about using sub? I liked the names used in 'regexp', taken from the ancient
> masters. :-)

I used names from other modules when functionality and interface was very 
similar. That resulted in match/matches/split from regexp (as re has the 
multifunctional run function I do not want to use here), but replace 
instead of sub as sub can easilly be confused both with "sub-binariey" and 
subtract in this general module (named after the datatype and not the 
functionality).

>
> That's about all for now. I'll be back,

I'll have to start implementing, so be back soon :)

To summarize:

- I'll clarify that multiple search strings mean they are alternatives.
- I'll think about an option for getting the longest matches instead of
   the shortest when having multiple alternatives if that is really
   interesting to anyone. What do others think? 
- Any suggestion for other interface/interfaces than the copy-functions?
   Including the spelled out twofold functionality?
- Zero based indices will not change, although we fully understand the
   objection.
- The suggested list conversion functions will be added (if noone
   objects strongly, with good, valid and sound arguments :)).
- Unless a large part of the community is for a change of the name sub
   (for instead of replace i suppose you mean, not instead of part?), I
   will not change that.

Thanks again for taking the time to read and comment on the EEP!

Cheers,
/Patrik
>
> Robert
>
> 2009/11/26 Patrik Nyblom <pan@REDACTED>
>
>> A new EEP is submitted.
>>
>> EEP 31 is a slight rework of a part of EEP 9.
>>
>> Comments on this EEP are accepted until 10-Dec-2009. Please send all
>> comments and suggestions to this mailing list (eeps@REDACTED).
>>
>> Best regards,
>>
>> /The OTP team/Patrik
>>
>> ________________________________________________________________
>> eeps mailing list. See http://www.erlang.org/faq.html
>> eeps (at) erlang.org
>>
>>
>