[eeps] EEP 31
Robert Virding
rvirding@REDACTED
Fri Dec 11 15:56:10 CET 2009
Comments follow.
2009/12/10 Patrik Nyblom <pan@REDACTED>
>
> - I understand that giving list of binaries for a pattern means that they
>> are alternatives. I could not see where this was stated and I wonder if it
>> not a little confusing?
>>
>
>
> I can't really see what they would otherwise be - what would be the
> alternative interpretation? I could clarify that of course.
>
Well, the only (?) other place where you interpret lists of binaries is in
io_lists, and there it
means the concatenation of the binaries. This is what I thought of when I
first saw it, though I couldn't understand why you didn't just have one
binary. :-)
- Does 're' always return the first *shortest* match? I thought that it
>> returned the first matching alternative irrespective of length. Could be
>> wrong though. Always returning the shortest is very restrictive.
>>
>
> If you are to return non-overlapping matches and want to return the first
> match, you could either select the longest or the shortest at the first
> matching point. Without an option to control that, you of course have to
> select one and be consistent. The "shortest" behaviour is consistent with
> re, but not with regexp, which uses the other strategy. The regexp module
> does therefore not return the largest number of non overlapping matches
> possible, as opposed to re. I think the re behaviour is the best and has
> chosen that for the binary module.
>
No, no, no. First 'regexp' follows POSIX and always returns the longest
match where possible; there is no way of affecting this. The 're' module
does the same as Perl, being PCRE, and it *does* take the lexically first
alternative in a match pattern. This gives you some control, Friedl shows
how you can use it. So from you example:
1> re:run("abc","ab|abc",[global]).
{match,[[{0,2}]]}
2> re:run("abc","abc|ab",[global]).
{match,[[{0,3}]]} <===
3> re:run("abc","ab|abc|c",[global]).
{match,[[{0,2}],[{2,1}]]}
4> re:run("abc","abc|ab|c",[global]).
{match,[[{0,3}]]} <===
I think that if you are going to have alternative binaries then it would be
best to to choose the same way. There will a difficulty in explaining how
you mean first of the first, pretty much like trying to explain how you pick
messages of the message queue in a receive with multiple patterns.
- This is the first time we have a copy function.
>>
>
>
> Yes. The usage is twofold and not really attractive. The need for such a
> "cloning" function is however obvious from discussions on erlang-questions.
>
One thing that worries me with having a "cloning" function like copy, and
the other functions like referenced_byte_size is that with them you try to
make global decisions with local data. If you get it wrong it can go so
horribly wrong.
>
>> - I am really serious about numbering from 0.
>>
>
>
> Understood. I won't however change that unless you buy Ericsson or OTP and
> enforce new design rules ;) ... In which case I would not change it anyway
> because you would fire me on the instant I suspect :D
Well, be careful I have a few shares and I will start saving money so I can
buy up the company! :-)
> To summarize:
>
> - I'll clarify that multiple search strings mean they are alternatives.
> - I'll think about an option for getting the longest matches instead of
> the shortest when having multiple alternatives if that is really
> interesting to anyone. What do others think? - Any suggestion for other
> interface/interfaces than the copy-functions?
> Including the spelled out twofold functionality?
>
Here take the first alternative as in 're' which takes the lexically first,
see example.
> - Zero based indices will not change, although we fully understand the
> objection.
>
Then I think you should have as a goal to make all binary/bitstring
functions be 0-based. One way is to deprecate the BIFs in 'erlang' and
replace them, either there or in 'binary'.
The first to go should be binary_to_list/3 which uses positions so you can't
extract a zer0 length binary. Have bin_to_list(Binary, Start, Length)
instead!
> - The suggested list conversion functions will be added (if noone
> objects strongly, with good, valid and sound arguments :)).
> - Unless a large part of the community is for a change of the name sub
> (for instead of replace i suppose you mean, not instead of part?), I
> will not change that.
>
In this case I was thinking more in the lines of subbin instead of part. But
I would prefer sub/gsub instead of replace as well. :-)
Robert
More information about the eeps
mailing list