[erlang-questions] leex and case-insensitive match
ok@REDACTED
ok@REDACTED
Fri Jun 5 03:52:46 CEST 2015
One issue is exactly what it is you want to do.
Do you want all word-like tokens treated case-insensitively?
Do you want a specific set of fixed word-like things that
you want case-insensitively and an open set not?
Are we talking about an SQL dialect here?
Does case insensitivity apply to
- the ASCII letters
- the Latin-1 letters
- the cased letters currently in the Basic Multilingual Plane
- the cased letters currently anywhere in Unicode
- any codepoints that may be designated as cased in past
present or future Unicode
and does it
- require identity of accents or not (one way of writing
French preserves accents when you uppercase, another way
does not)
- pay attention to the current locale (the famous
English -vs- Turkish 'what is the capital of "i"' issue)
?
It wouldn't actually be all that hard to make re_parse in
leex.erl do something with (?i): it's a matter of mapping
x to (x|X) where the other-case version of a letter is in
Unicode not necessarily a single codepoint.
Wait: I tell a lie. Doing _anything_ with Unicode is
somewhere between so-hard-big-companies-don't-get-it-right-
first-time to beyond-human-powers.
More information about the erlang-questions
mailing list