[erlang-questions] Which Erlang JSON parser?
Robert Virding
rvirding@REDACTED
Thu Jul 29 16:05:56 CEST 2010
Hi,
I am afraid I can't help you by suggesting a better JSON parser, but I
can explain what you are seeing. The basic to remember that a binary
internally is a sequence of bytes/bits, there is no extra information
saying how to interpret these bytes/bits. All interpretation is done
when you create or access the binary, it then you say whether it is a
32 bit float, or 3 bit integer, or a utf-8 encoded character. That
means that there is nothing to stop you from creating a binary of
utf-8 characters and then decoding it as 5 bit integers. Not useful
perhaps but perfectly legal.
So:
On 29 July 2010 11:01, Alexander Kotelnikov <sacha@REDACTED> wrote:
> Hello.
>
> It is a terrible story. I needed a JSON parcer to deal with JSON data in
> my Erlang programm.
>
> At first I picked json_eep
> (http://github.com/jchris/erlang-json-eep-parser.git) which worked quite
> fine, but later I found out that it is not able to parse (some!) escaped
> unicode characters:
> 28> json_eep:json_to_term("\"\\u0433\\u043e\\u0440\\u043e\\u0434\"").
> ** exception error: bad argument
> in function list_to_binary/1
> called as list_to_binary([1075,1086,1088,1086,1076])
> in call from json_grammar:yeccpars2_9/7
> in call from json_grammar:yeccpars0/2
> in call from json_eep:json_to_term/1
>
> My guess is that just a little change near list_to_binary should fix the
> problem.
You are right in that the problem is the call to list_to_binary.
List_to_binary is a very low-level function as it expects its input to
be a, possibly nested, list of byte values, 0 - 255. Here, obviously,
this has not been done properly and the code is trying to call
list_to_binary with a list of the unicode codepoint values.
> Then I start investigation of other parsers. I found around 7. Most of
> them not eep0018 parsers. So I tried
> http://github.com/davisp/eep0018.git
> and
> http://github.com/dizzyd/eep0018.git (both are based on yajl).
>
> The former did not build for me because of some rebar issues. The latter
> did after some changes to Makefiles. A little problem with it is that I
> do not understand, how it decodes unicode:
> 1> eep0018:json_to_term("\"\\u0433\\u043e\\u0440\\u043e\\u0434\"").
> <<208,179,208,190,209,128,208,190,208,180>>
As I said earlier a binary is a sequence of bytes without any other
internal information and when you print a binary this is what you see,
the *bytes* of which it is composed. In this case each of the utf-8
encoded characters uses 2 bytes in big endian order, which is what you
see. Apparently it works as it should.
> PS And, just in case if anyone cares, none of these parsers implements
> json_to_term/2.
As yet there is no "standard" JSON parser and converter. Hopefully we
will see one soon, using NIFs it should be possible to do an efficient
one. If we can agree to the erlang representation. :-)
Robert
More information about the erlang-questions
mailing list