[erlang-questions] erlang-questions Digest, Vol 17, Issue 45

Fri Oct 10 07:24:03 CEST 2008

On Thu, Oct 9, 2008 at 11:53 PM, Richard O'Keefe <ok@REDACTED> wrote:

> On 10 Oct 2008, at 3:11 pm, Edwin Fine wrote:
>
>> Ok, then to preserve the Principle Of Least Astonishment, let string:split
>> accept a regular expression, which is just a string with special RE
>> operators. If the string contains no RE operators, use an optimized special
>> case of split (like the one you wrote) that does not use an RE engine. Get
>> the best of both worlds.
>>
>
> No, that *violates* the principle of least astonishment, Big Time!
>
> First, absolutely nothing else whatever in the 'string' module
> has anything to do with regular expressions.  This would be
> highly exceptional and very confusing.
>

I disagree. Take for example the String classes of Ruby (
http://www.ruby-doc.org/core/classes/String.html#M000818), JavaScript (
http://www.w3schools.com/jsref/jsref_split.asp) and Java (
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#split(java.lang.String)),
all of which use split with an RE. It's very intuitive to use and doesn't
seem odd or out of place in a string class or module at all. One could argue
that strings and regular expressions are inextricably intertwined (Python,
on the other hand, uses split in its re class as far as I can see, so it's
not universally true).

> Second, something that splits strings using regular expressions
> would be expected in the 'regexp' module, not the 'string' module.
> In fact, regexp:split/2 already exists.  It would be rather
> astonishing to replicate this in the wrong module.
>

I think it's a matter of taste or opinion. I don't think split belongs in
the regexp module. The new re module correctly does not include it. I think
regexp:split should be deprecated, in fact, the entire regexp module should
be deprecated because enough people have used it and been burned by its lack
of performance (because it is written in Erlang when it should have been
written in C) that it warrants an entire Caveat section in the Erlang
Efficiency Guide.

> Third, if there were a string:split/2 that used regular
> expressions, that would make it very incompatible with
> string:join/2, which doesn't.  I thought we were wanting
>        string:join(string:split(String, Sep), Sep)
> to give back Sep.
>

Did you mean to write "to give back String?"

Yes, that is what we want, and it is what will happen, because it is
meaningless to join using an RE, isn't it. You can't join things together
with a regular expression. So Sep would *have* to be a literal string. As it
happens, split and join work exactly like this in other languages and don't
seem to overly confuse people.

> In short, if you want something called 'split' that preserves
> the principle of least astonishment, use regexp:split/2.
> If you want an opposite of string:join/2, DON'T make it use
> regular expressions, and DON'T call i split/2.
>

So we are stuck with contriving some name because of history (lists:split
should have been lists:split_as from Haskell, regexp already has split).
This, in spite of there being more than enough precedents amongst other
languages that use split as a method of their String classes.

By the way, I have a use for
>        string:join(reverse([Stuff|tail(reverse(
>        string:unjoin(Thing, ".")))]), ".")
> so I would be very unhappy to have to write one of
> those string literals as "\\." and the other NOT.
>

That's a VERY good point. It's a glaring inconsistency. Other languages deal
with that by having a regular expression type and special syntax (e.g. /abc/
or %r{abc}) to avoid confusion, and (IIRC) if a string type is passed in
stead of an RE type, the receiving method treats it as a non-RE. One would
then be able to use the same literal in both join and unjoin. Maybe Erlang
should have a new regular expression type and syntax, seeing as it is going
to be used in more and more applications that do heavy text processing.

> There is, or should be, a regular pattern
>       words <-> unwords
>       lines <-> unlines
>       unjoin <-> join
> Oh well, sort of regular...
>
> Sorry, and no offense meant, but that's really horribly ugly.
>

So you really hate it that English has "tie" and "untie",
> "do" and "undo", "wary" and "unwary", amongst many pairs?
>

Maybe it's because I'm accustomed to the words "tie" and "untie" and the
others you mentioned, and unlines, unwords and so on just sound ghastly to
me due to lack of familiarity. I honestly can't intuit what unwords and
unlines do. Well, that's not true. If I think about it, I suppose lines
would split a string into lines (similarly "words" would split a string into
words), and unlines would "undo" the lines operation and join them back
together. But I did have to think about it for a while. Maybe I'm not that
smart.

>
>  Reminds me of that sentence, "A not unblack dog chased a not unbrown
>> rabbit across a not ungreen field."
>>
>
> Straw man.  The nasty thing about that sentence is the double
> negations.  There is no double negation in unjoin.

Yes, that is true, but in my eyes it's not only the double negation that's
ugly. I was citing that sentence to show the ugliness of using "un" in
places that are unusual. Note: "un" is not unusual in "unusual."

> "unwords" and "unlines" are not my inventions, they are
> from the Haskell standard Prelude.

There is no intended irony or sarcasm in this: seeing as the inventors of
Haskell are undoubtedly smarter than I am, I defer to their superior
intelligence, but those words are still ugly to me. Sorry.

>
>  Should the opposite of down then be "undown"? Languages (well, I can vouch
>> for two, anyway) contain many complementary words that are not syntactically
>> constructable by adding or removing "un-".
>>
>
> What of it?  The words/unwords lines/unline pattern DOES exist
> in the language Erlang copied many of its list processing
> function names from.  It's not my invention.  Indeed, the
> Haskell community is as familiar with "unfold" as with "fold".
> In this specific context, UNdoing the effect of a list
> operation, "un-" is an excellent cultural fit.

Fold and unfold are words in the English dictionary, and I can't swear to it
because I am too lazy to look it up, but I would suspect that "unwords" and
"unlines" are not. Still, I take your point. One can get used to almost
anything given time. Not being a Haskell initiate, the usage looks strange
to me.

> The fact that for example "add" and "subtract" are not related
> in that way really doesn't signify anything, just as the fact
> that there are many colours doesn't mean that black is a bad
> choice.

But you wouldn't create a function named "unadd" when "subtract " is a more
acceptable usage - would you?

> Like split and join, for example. Hard and soft. Big and small. Break and
>> mend. Wake and sleep.
>>
>
> Join is also related to sever, disunite, unfasten, disconnect, unyoke,
> separate, put asunder, unlink, disassociate, disaffiliate, resign,
> detach, disengage, leave, part, divide, quit, ...
>

Interesting. One of the words you wrote above gave me an idea. How about
meeting halfway? How about two new functions, string:separate(String,
Separator) and string:unseparate(List, Separator)? No clash and it makes
even more sense (to me) than split and join.

Split is also related to unite, unify, connect, fragmént (stress shown),
> and another whole lot of words.
>
> It's not as if "split" and "join" were each other's _only_ relative.
>

Point(s) well taken. How do you feel about "separate" and "unseparate",
then?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20081010/8a35d300/attachment.htm>