[erlang-questions] Leex And Character Encodings
Gordon Guthrie
gordon@REDACTED
Mon Aug 23 08:23:24 CEST 2010
Richard
This all makes more sense now...
1. How do no-break-space characters turn up?
The input is coming from a browser and there will be a library in there that
is saying "hey many spaces, you will need to them to keep them
2. What is it that is rendering them as if they were encoded
in Latin-1 rather than UTF-8?
Erlang io:format - we just store utf-8 in the database...
3. In any case, if you are going to hack it, you should make
the 16#C2,16#20 sequence equivalent to ONE space, not two.
I just need to recognise 16#C2, 16#20 as white space - it has no
significance in the lexer...
Cheers
Gordon
On 22 August 2010 23:41, Richard O'Keefe <ok@REDACTED> wrote:
>
> On Aug 21, 2010, at 8:37 PM, Gordon Guthrie wrote:
> > The problem comes when I put spaces in the white space:
> > = 1 + 2 "=Â 1 +Â Â Â Â 2" = 1 +
> > 2 #ERROR!
> >
> > The expression round trips fine but (unlike the previous examples) the
> > server-side expression returns an error for the value because the
> expression
> > doesn't match any valid syntax.
> >
> > Tabs are expanded to white spaces so the only problem (I think) is with
> > multiple white spaces - which is why I think just adding a lexical token
> to
> > make  the same as 2 spaces would work.
>
> It's not clear to me what precisely is mangling the spaces.
> What _is_ clear is that "Â " is precisely what you see when
> the Latin-1 No-Break-Space is first converted to UTF-8 and
> then displayed by something expecting Latin-1.
>
> 1. How do no-break-space characters turn up?
> 2. What is it that is rendering them as if they were encoded
> in Latin-1 rather than UTF-8?
> 3. In any case, if you are going to hack it, you should make
> the 16#C2,16#20 sequence equivalent to ONE space, not two.
>
>
--
Gordon Guthrie
CEO hypernumbers
http://hypernumbers.com
t: hypernumbers
+44 7776 251669
More information about the erlang-questions
mailing list