bug or feature?

Thu Apr 20 05:29:34 CEST 2006

I wrote a short clear solution to the "remove a prefix if present otherwise
do nothing" problem that uses only existing features and functions.

Serge Aleynikov <serge@REDACTED> wrote:
	In a general case you are absolutely right as the complexity of the 
	proposed solution is still linear.  Would anyone care that the other 
	approach would work 2 to 3 times faster?

Which other approach?  The message I was replying to had two.

	When it comes to soft-real-time applications with extensive
	prefix analysis, I would.

If you have EXTENSIVE prefix analysis, ALL the solutions on the table
must count as inefficient.  This is actually a wonderful example of what
I have come to call "the PERL effect".  The language offers you a feature
that makes it simple, straightforward, incredibly easy ... to do the wrong
thing.

What do I mean in this case?  All three of the solutions (the two that work
and the one that isn't (yet) legal syntax) require O(|Prefix|) time to check
for a prefix.  If we have N prefixes, an average of P characters long, the
time to check them all is O(P.N).  Using a binary search tree takes this
down to O(P.lg N).  Using a trie instead takes the time down to O(P).  And
the key trick is to see the problem as doing MANY prefix checks, not as
doing one.  The difference between a built-in reversible ++ and one you
program hardly matters compared with an O(N) speedup.

Of course, if "extensive prefix analysis" means analysing many strings
using a small number of prefixes, it's easy enough to generate an
Erlang function (checking for those particular prefixes) *as a data structure*
and cause it to be compiled, so that even in that case the main result from
the generalised "feature" would be to prevent you thinking of a better
solution.

	A more rhetorical question would be that if '++' is a "syntactic
	sugar" why wouldn't that syntactic sugar apply for bound Prefix
	variables as well to make it more generic?

The term "syntactic sugar" refers to a construct which can be simply and
mechanically replaced by some other construct(s) in the language and
exists only to make the language more convenient.

The pattern
    "c1...cn" ++ Pat
is justly called "syntactic sugar" because it is syntactic sugar *FOR*
something, namely
    [$c1|[$c2|...[$cn|Pat]...]]
with a KNOWN number of pairs.  The pattern
    Var ++ Pat
should it ever exist, would NOT be syntactic sugar at all; there is nothing
that could replace it, no combination of existing pattern notation could be
made to do the same thing.

To say that "Var ++ Pat" would not be syntactic sugar is not to say that it
would not be a useful extension but only to say that it WOULD be a non-trivial
extension of the language.  You couldn't hack it just be a small change to
the front end of the compiler, as literal++Pat was.  Personally, I believe
that it would NOT be a useful extension to the language; it is just too
specialised, and the problems it is good for have better solutions.