[erlang-questions] Leex And Character Encodings
Richard O'Keefe
ok@REDACTED
Mon Aug 23 00:41:11 CEST 2010
On Aug 21, 2010, at 8:37 PM, Gordon Guthrie wrote:
> The problem comes when I put spaces in the white space:
> = 1 + 2 "=Â 1 +Â Â Â Â 2" = 1 +
> 2 #ERROR!
>
> The expression round trips fine but (unlike the previous examples) the
> server-side expression returns an error for the value because the expression
> doesn't match any valid syntax.
>
> Tabs are expanded to white spaces so the only problem (I think) is with
> multiple white spaces - which is why I think just adding a lexical token to
> make  the same as 2 spaces would work.
It's not clear to me what precisely is mangling the spaces.
What _is_ clear is that "Â " is precisely what you see when
the Latin-1 No-Break-Space is first converted to UTF-8 and
then displayed by something expecting Latin-1.
1. How do no-break-space characters turn up?
2. What is it that is rendering them as if they were encoded
in Latin-1 rather than UTF-8?
3. In any case, if you are going to hack it, you should make
the 16#C2,16#20 sequence equivalent to ONE space, not two.
More information about the erlang-questions
mailing list