[erlang-questions] Leex And Character Encodings

Mon Aug 23 10:13:11 CEST 2010

Richard

> 3.  In any case, if you are going to hack it, you should make
   the 16#C2,16#20 sequence equivalent to ONE space, not two.

I am getting 16#C2, 16#A0 and not 16#C2, 16#20 which I think is right for
non-breaking spaces...

Gordon

On 22 August 2010 23:41, Richard O'Keefe <ok@REDACTED> wrote:

>
> On Aug 21, 2010, at 8:37 PM, Gordon Guthrie wrote:
> > The problem comes when I put spaces in the white space:
> > =  1 +     2                  "=Â  1 +Â Â Â Â  2"                  =  1 +
> >  2                #ERROR!
> >
> > The expression round trips fine but (unlike the previous examples) the
> > server-side expression returns an error for the value because the
> expression
> > doesn't match any valid syntax.
> >
> > Tabs are expanded to white spaces so the only problem (I think) is with
> > multiple white spaces - which is why I think just adding a lexical token
> to
> > make Â the same as 2 spaces would work.
>
> It's not clear to me what precisely is mangling the spaces.
> What _is_ clear is that "Â " is precisely what you see when
> the Latin-1 No-Break-Space is first converted to UTF-8 and
> then displayed by something expecting Latin-1.
>
> 1.  How do no-break-space   characters turn up?
> 2.  What is it that is rendering them as if they were encoded
>    in Latin-1 rather than UTF-8?
> 3.  In any case, if you are going to hack it, you should make
>    the 16#C2,16#20 sequence equivalent to ONE space, not two.
>
>

-- 
Gordon Guthrie
CEO hypernumbers

http://hypernumbers.com
t: hypernumbers
+44 7776 251669