[erlang-questions] examples for erlang with joins

Richard O'Keefe ok@REDACTED
Tue Mar 10 00:28:32 CET 2009


On 9 Mar 2009, at 9:40 pm, Zvi wrote:

>
> actualy I simplified my example from mochiweb. When I wrote the  
> message was
> thinking about this:
>
> tokenize_script(Bin, S=#decoder{offset=O}, Start) ->
>   case Bin of
>        %% Just a look-ahead, we want the end_tag separately
>        <<_:O/binary, $<, $/, SS, CC, RR, II, PP, TT, _/binary>>
>                  when (SS =:= $s orelse SS =:= $S) andalso
>                          (CC =:= $c orelse CC =:= $C) andalso
>                          (RR =:= $r orelse RR =:= $R) andalso
>                          (II =:= $i orelse II =:= $I) andalso
>                          (PP =:= $p orelse PP =:= $P) andalso
>                          (TT=:= $t orelse TT =:= $T) -> ...
>
> i.e. match ignoring case "</script" closing tag:
>
> <<_:O/binary, $<, $/, $s^$S, $c^$C, $r^$R, $i^$I, $p^$P, $t^$T, _/ 
> binary>>

Please, _please_ do not use ^ to mean "or".
We already have four ways to say "or" in Erlang (;, or, orelse, bor).
We do not need a fifth.

The problem before us is "recognise an HTML end-tag in a binary".
An end-tag for "script" has the form
	"</[Ss][Cc][Rr][Ii][Pp][Tt][[:space:]]*>"
using POSIX regular expression syntax.

The most natural way to express this would be something like

     Script_End = "^</[Ss][Cc][Rr][Ii][Pp][Tt][[:space:]]*>",
     case eep9:match(Bin, Script_End, {O, byte_size(Bin)})
	of 0 -> ...not matched...
	 ; _ -> ...matched...
     end

and then we could start worrying about some kind of parse transform
to precompile the regular expression.

We had a mail thread on disjunctive patterns, was it last year?
Note that disjunctive patterns don't quite meet the need here.
What if there is a "</script-wizard>" end-tag?  A really robust
check does need to look for 0 or more white space characters
followed by ">", and disjunctive patterns don't help with that.

They are certainly a very heavy sledgehammer to crack the nut of
either-case matching.

(With reference to the title of this thread, this is an example
of a pattern with "or" in it.  Joins are all about patterns
with "and".)
>

> In my code I usualy just convert everyting to lower case before  
> matching,
> but this is less efficient.

Is it?  Have you measured it?
Take this example again:

     case Bin
       of <<_:O/binary, $<, $/, S, C, R, I, P, T, _/binary>>
          when (S bor 32) =:= $s,
	      (C bor 32) =:= $c,
	      (R bor 32) =:= $r,
	      (I bor 32) =:= $i,
	      (P bor 32) =:= $p,
	      (T bor 32) =:= $t
	   -> ...

I suspect this is faster than the version full of orelses;
it is certainly shorter.  Maybe a macro:

     -define(LC(X), ((X) bor 32)).
	% Use this only when X is known to be a Latin-1 cased
	% letter or is about to be tested for being one.

     case Bin
       of <<_:O/binary, $<, $/, S, C, R, I, P, T, _/binary>>
          when ?LC(S) =:= $s, ?LC(C) =:= $c, ?LC(R) =:= $r,
	      ?LC(I) =:= $i, ?LC(P) =:= $p, ?LC(T) =:= $t
	   -> ...

Right now, without 'andalso' and 'orelse', that's probably
about as good as the match is going to get, and it's not _too_
horrible to write.

I haven't actually been able to compare the code, because
the first version crashes erlc.  Pleasantly, my simplified
version does not.





More information about the erlang-questions mailing list