[eeps] EEP 31

Robert Virding rvirding@REDACTED
Fri Dec 11 15:56:10 CET 2009


Comments follow.

2009/12/10 Patrik Nyblom <pan@REDACTED>

>
> - I understand that giving list of binaries for a pattern means that they
>> are alternatives. I could not see where this was stated and I wonder if it
>> not a little confusing?
>>
>
>
> I can't really see what they would otherwise be - what would be the
> alternative interpretation? I could clarify that of course.
>

Well, the only (?) other place where you interpret lists of binaries is in
io_lists, and there it
means the concatenation of the binaries. This is what I thought of when I
first saw it, though I couldn't understand why you didn't just have one
binary. :-)

- Does 're' always return the first *shortest* match? I thought that it
>> returned the first matching alternative irrespective of length. Could be
>> wrong though. Always returning the shortest is very restrictive.
>>
>
> If you are to return non-overlapping matches and want to return the first
> match, you could either select the longest or the shortest at the first
> matching point. Without an option to control that, you of course have to
> select one and be consistent. The "shortest" behaviour is consistent with
> re, but not with regexp, which uses the other strategy. The regexp module
> does therefore not return the largest number of non overlapping matches
> possible, as opposed to re. I think the re behaviour is the best and has
> chosen that for the binary module.
>

No, no, no. First 'regexp' follows POSIX and always returns the longest
match where possible; there is no way of affecting this. The 're' module
does the same as Perl, being PCRE, and it *does* take the lexically first
alternative in a match pattern. This gives you some control, Friedl shows
how you can use it. So from you example:

1> re:run("abc","ab|abc",[global]).
{match,[[{0,2}]]}
2> re:run("abc","abc|ab",[global]).
{match,[[{0,3}]]}                                 <===
3> re:run("abc","ab|abc|c",[global]).
{match,[[{0,2}],[{2,1}]]}
4> re:run("abc","abc|ab|c",[global]).
{match,[[{0,3}]]}                                 <===

I think that if you are going to have alternative binaries then it would be
best to to choose the same way. There will a difficulty in explaining how
you mean first of the first, pretty much like trying to explain how you pick
messages of the message queue in a receive with multiple patterns.

- This is the first time we have a copy function.
>>
>
>
> Yes. The usage is twofold and not really attractive. The need for such a
> "cloning" function is however obvious from discussions on erlang-questions.
>

One thing that worries me with having a "cloning" function like copy, and
the other functions like referenced_byte_size is that with them you try to
make global decisions with local data. If you get it wrong it can go so
horribly wrong.


>
>> - I am really serious about numbering from 0.
>>
>
>
> Understood. I won't however change that unless you buy Ericsson or OTP and
> enforce new design rules ;) ... In which case I would not change it anyway
> because you would fire me on the instant I suspect :D


Well, be careful I have a few shares and I will start saving money so I can
buy up the company! :-)


> To summarize:
>
> - I'll clarify that multiple search strings mean they are alternatives.
> - I'll think about an option for getting the longest matches instead of
>  the shortest when having multiple alternatives if that is really
>  interesting to anyone. What do others think? - Any suggestion for other
> interface/interfaces than the copy-functions?
>  Including the spelled out twofold functionality?
>

Here take the first alternative as in 're' which takes the lexically first,
see example.


> - Zero based indices will not change, although we fully understand the
>  objection.
>

Then I think you should have as a goal to make all binary/bitstring
functions be 0-based. One way is to deprecate the BIFs in 'erlang' and
replace them, either there or in 'binary'.

The first to go should be binary_to_list/3 which uses positions so you can't
extract a zer0 length binary. Have bin_to_list(Binary, Start, Length)
instead!


> - The suggested list conversion functions will be added (if noone
>  objects strongly, with good, valid and sound arguments :)).
> - Unless a large part of the community is for a change of the name sub
>  (for instead of replace i suppose you mean, not instead of part?), I
>  will not change that.
>

In this case I was thinking more in the lines of subbin instead of part. But
I would prefer sub/gsub instead of replace as well. :-)

Robert


More information about the eeps mailing list