[erlang-questions] Which Erlang JSON parser?

Steve Vinoski vinoski@REDACTED
Thu Jul 29 23:10:50 CEST 2010


On Thu, Jul 29, 2010 at 3:07 PM, Alexander Kotelnikov <sacha@REDACTED> wrote:
>>>>>> On Thu, 29 Jul 2010 16:05:56 +0200
>>>>>> "RV" == Robert Virding <rvirding@REDACTED> wrote:
> RV>
>>> The former did not build for me because of some rebar issues. The latter
>>> did after some changes to Makefiles. A little problem with it is that I
>>> do not understand, how it decodes unicode:
> 1> eep0018:json_to_term("\"\\u0433\\u043e\\u0440\\u043e\\u0434\"").
>>> <<208,179,208,190,209,128,208,190,208,180>>
> RV>
> RV> As I said earlier a binary is a sequence of bytes without any other
> RV> internal information and when you print a binary this is what you see,
> RV> the *bytes* of which it is composed. In this case each of the utf-8
> RV> encoded characters uses 2 bytes in big endian order, which is what you
> RV> see. Apparently it works as it should.
>
> Really?
>
> 5> (208 bsl 8) + 179.
> 53427
> 6> 16#433.
> 1075
>
> I guess, something is wrong.

These calculations don't reflect how character U+0433 is properly
UTF-8 encoded. The two bytes required for this character can be
calculated as follows:

1> ((16#433 bsr 6) band 16#1F) bor 16#C0.
208
2> (16#433 band 16#3F) bor 16#80.
179

which is what the binary from json_to_term above contains. See
<http://en.wikipedia.org/wiki/UTF-8> for details.

--steve


More information about the erlang-questions mailing list