bug or feature?

Richard A. O'Keefe <>
Mon Apr 24 00:21:37 CEST 2006


I wrote:
    > Of course, if "extensive prefix analysis" means analysing many strings
    > using a small number of prefixes, it's easy enough to generate an
    > Erlang function (checking for those particular prefixes) *as a data structure*
    > and cause it to be compiled, so that even in that case the main result from
    > the generalised "feature" would be to prevent you thinking of a better
    > solution.

Serge Aleynikov <> replied:
	Actually this is exactly the problem and the approach we took that 
	triggered this email thread.  Given a relatively small number of 
	prefixes (unknown at compile time) and analysing many strings, we tried 
	to generate a function doing a pattern match of a string against each 
	prefix.  It looked like having a fun(Prefix ++ Suffix, Prefix) notation 
	would be a convenient representation for this task, when we realised it 
	was an illegal expression.
	
No, from the sound of it, the approach you tried is NOT "exactly the
approach" that I'm talking about but very different indeed.  The approach
I am talking about would NOT use fun(Prefix++Suffix, Prefix)  -- in which
I for one would expect the first occurrence of Prefix to count as NOT BOUND.

Let's take an example.  Suppose the prefixes are
    http:
    https:
    ftp:
    tftp:

Here's the function:

    f(S0) ->
	[C1|S1] = S0,
	case C1
	  of $h ->
	     [C2|S2] = S1,
	     case C2
	       of $t ->
	          [C3|S3] = S2,
	          case C3
	            of $t ->
	               [C4|S4] = S3,
	               case C4
	                 of $p ->
	                    [C5|S5] = S4,
	                    case C4
	                      of $: -> /* Suffix is */ S5
	                       ; $s ->
	                         [C6|S6] = S5,
	                         case C6
	                           of $: -> /* Suffix is */ S6
	                            ; _  -> S0
	                         end
	                       ; _ -> S0
	                    end

oh the heck with it, you get the idea.  It's a trie.  It's not the kind of
code you want to write by hand (which is why I didn't want to finish it),
but it's the kind of code that is very easy to generate by computer.
It's fast:  each character of the input is examined at most once.
It generates no intermediate data structures.	          
And it has absolutely NO use for Prefix++Suffix in any form anywhere.

Ideally, if you wrote

    f("http:"  ++ S) -> S;
    f("https:" ++ S) -> S;
    f("ftp:"   ++ S) -> S;
    f("tftp:"  ++ S) -> S;
    f(            S) -> S.

the compiler would generate essentially the same trie.  Certainly that's
what I would expect a Haskell or SML compiler to do.  I don't know what
the current Erlang compiler will do with it, which is why I mention the
possibility of generating your own trie code.  It's really easy to do,
once you have got your head around how to represent Erlang code as data
structures.




More information about the erlang-questions mailing list