Leex And Character Encodings

Gordon Guthrie gordon@REDACTED
Fri Aug 20 09:30:43 CEST 2010


I'm hitting some problems with the character encoding for Leex.

I have a front end which is submitting proper unicode in utf-8 format and
the utf-8 is round-tripping correctly - I submit it from the webpage in a
Jquery post, it is processed on the back end and then returned to the front
end in utf-8 where it displays correctly...

During the back end processing I need to feed it through leex to generate
user actions - certain posts contain a domain specific language.

The DSL is fairly well specified and strings in it are quoted so they just
pass through the lexer in utf-8 and are fine and dandy and we so some
processing on them in unicode - by running language parsers over the lexical
token stream. The utf-8 just streams through the parser as single character
stream and we don't care...

The problem is that the white space elements of the DSL get knocked about,
so two spaces are turned into Â

It seems to me that I can't expect lex to work with utf-8 natively and I
just have to suck it up and create a whitespace lexical token that matches Â

Or am I just being a fool?

Gordon

-- 
Gordon Guthrie
CEO hypernumbers

http://hypernumbers.com
t: hypernumbers
+44 7776 251669


More information about the erlang-questions mailing list