[erlang-questions] leex and case-insensitive match

Fri Jun 5 03:52:46 CEST 2015

One issue is exactly what it is you want to do.
Do you want all word-like tokens treated case-insensitively?
Do you want a specific set of fixed word-like things that
you want case-insensitively and an open set not?
Are we talking about an SQL dialect here?

Does case insensitivity apply to
 - the ASCII letters
 - the Latin-1 letters
 - the cased letters currently in the Basic Multilingual Plane
 - the cased letters currently anywhere in Unicode
 - any codepoints that may be designated as cased in past
   present or future Unicode
and does it
 - require identity of accents or not (one way of writing
   French preserves accents when you uppercase, another way
   does not)
 - pay attention to the current locale (the famous
   English -vs- Turkish 'what is the capital of "i"' issue)
?

It wouldn't actually be all that hard to make re_parse in
leex.erl do something with (?i): it's a matter of mapping
x to (x|X) where the other-case version of a letter is in
Unicode not necessarily a single codepoint.

Wait: I tell a lie.  Doing _anything_ with Unicode is
somewhere between so-hard-big-companies-don't-get-it-right-
first-time to beyond-human-powers.