[erlang-questions] Leex And Character Encodings

Mon Aug 23 08:23:24 CEST 2010

Richard

This all makes more sense now...

1.  How do no-break-space   characters turn up?

The input is coming from a browser and there will be a library in there that
is saying "hey many spaces, you will need to   them to keep them

2.  What is it that is rendering them as if they were encoded
   in Latin-1 rather than UTF-8?

Erlang io:format - we just store utf-8 in the database...

3.  In any case, if you are going to hack it, you should make
   the 16#C2,16#20 sequence equivalent to ONE space, not two.

I just need to recognise 16#C2, 16#20 as white space - it has no
significance in the lexer...

Cheers

Gordon

On 22 August 2010 23:41, Richard O'Keefe <ok@REDACTED> wrote:

>
> On Aug 21, 2010, at 8:37 PM, Gordon Guthrie wrote:
> > The problem comes when I put spaces in the white space:
> > =  1 +     2                  "=Â  1 +Â Â Â Â  2"                  =  1 +
> >  2                #ERROR!
> >
> > The expression round trips fine but (unlike the previous examples) the
> > server-side expression returns an error for the value because the
> expression
> > doesn't match any valid syntax.
> >
> > Tabs are expanded to white spaces so the only problem (I think) is with
> > multiple white spaces - which is why I think just adding a lexical token
> to
> > make Â the same as 2 spaces would work.
>
> It's not clear to me what precisely is mangling the spaces.
> What _is_ clear is that "Â " is precisely what you see when
> the Latin-1 No-Break-Space is first converted to UTF-8 and
> then displayed by something expecting Latin-1.
>
> 1.  How do no-break-space   characters turn up?
> 2.  What is it that is rendering them as if they were encoded
>    in Latin-1 rather than UTF-8?
> 3.  In any case, if you are going to hack it, you should make
>    the 16#C2,16#20 sequence equivalent to ONE space, not two.
>
>

-- 
Gordon Guthrie
CEO hypernumbers

http://hypernumbers.com
t: hypernumbers
+44 7776 251669