[erlang-questions] examples for erlang with joins
Richard O'Keefe
ok@REDACTED
Tue Mar 10 00:28:32 CET 2009
On 9 Mar 2009, at 9:40 pm, Zvi wrote:
>
> actualy I simplified my example from mochiweb. When I wrote the
> message was
> thinking about this:
>
> tokenize_script(Bin, S=#decoder{offset=O}, Start) ->
> case Bin of
> %% Just a look-ahead, we want the end_tag separately
> <<_:O/binary, $<, $/, SS, CC, RR, II, PP, TT, _/binary>>
> when (SS =:= $s orelse SS =:= $S) andalso
> (CC =:= $c orelse CC =:= $C) andalso
> (RR =:= $r orelse RR =:= $R) andalso
> (II =:= $i orelse II =:= $I) andalso
> (PP =:= $p orelse PP =:= $P) andalso
> (TT=:= $t orelse TT =:= $T) -> ...
>
> i.e. match ignoring case "</script" closing tag:
>
> <<_:O/binary, $<, $/, $s^$S, $c^$C, $r^$R, $i^$I, $p^$P, $t^$T, _/
> binary>>
Please, _please_ do not use ^ to mean "or".
We already have four ways to say "or" in Erlang (;, or, orelse, bor).
We do not need a fifth.
The problem before us is "recognise an HTML end-tag in a binary".
An end-tag for "script" has the form
"</[Ss][Cc][Rr][Ii][Pp][Tt][[:space:]]*>"
using POSIX regular expression syntax.
The most natural way to express this would be something like
Script_End = "^</[Ss][Cc][Rr][Ii][Pp][Tt][[:space:]]*>",
case eep9:match(Bin, Script_End, {O, byte_size(Bin)})
of 0 -> ...not matched...
; _ -> ...matched...
end
and then we could start worrying about some kind of parse transform
to precompile the regular expression.
We had a mail thread on disjunctive patterns, was it last year?
Note that disjunctive patterns don't quite meet the need here.
What if there is a "</script-wizard>" end-tag? A really robust
check does need to look for 0 or more white space characters
followed by ">", and disjunctive patterns don't help with that.
They are certainly a very heavy sledgehammer to crack the nut of
either-case matching.
(With reference to the title of this thread, this is an example
of a pattern with "or" in it. Joins are all about patterns
with "and".)
>
> In my code I usualy just convert everyting to lower case before
> matching,
> but this is less efficient.
Is it? Have you measured it?
Take this example again:
case Bin
of <<_:O/binary, $<, $/, S, C, R, I, P, T, _/binary>>
when (S bor 32) =:= $s,
(C bor 32) =:= $c,
(R bor 32) =:= $r,
(I bor 32) =:= $i,
(P bor 32) =:= $p,
(T bor 32) =:= $t
-> ...
I suspect this is faster than the version full of orelses;
it is certainly shorter. Maybe a macro:
-define(LC(X), ((X) bor 32)).
% Use this only when X is known to be a Latin-1 cased
% letter or is about to be tested for being one.
case Bin
of <<_:O/binary, $<, $/, S, C, R, I, P, T, _/binary>>
when ?LC(S) =:= $s, ?LC(C) =:= $c, ?LC(R) =:= $r,
?LC(I) =:= $i, ?LC(P) =:= $p, ?LC(T) =:= $t
-> ...
Right now, without 'andalso' and 'orelse', that's probably
about as good as the match is going to get, and it's not _too_
horrible to write.
I haven't actually been able to compare the code, because
the first version crashes erlc. Pleasantly, my simplified
version does not.
More information about the erlang-questions
mailing list