[erlang-questions] Which Erlang JSON parser?

Paul Davis paul.joseph.davis@REDACTED
Thu Jul 29 22:19:43 CEST 2010


On Thu, Jul 29, 2010 at 5:01 AM, Alexander Kotelnikov <sacha@REDACTED> wrote:
> Hello.
>
> It is a terrible story. I needed a JSON parcer to deal with JSON data in
> my Erlang programm.
>
> At first I picked json_eep
> (http://github.com/jchris/erlang-json-eep-parser.git) which worked quite
> fine, but later I found out that it is not able to parse (some!) escaped
> unicode characters:
> 28> json_eep:json_to_term("\"\\u0433\\u043e\\u0440\\u043e\\u0434\"").
> ** exception error: bad argument
>     in function  list_to_binary/1
>        called as list_to_binary([1075,1086,1088,1086,1076])
>     in call from json_grammar:yeccpars2_9/7
>     in call from json_grammar:yeccpars0/2
>     in call from json_eep:json_to_term/1
>
> My guess is that just a little change near list_to_binary should fix the
> problem.
>
>
> Then I start investigation of other parsers. I found around 7. Most of
> them not eep0018 parsers. So I tried
> http://github.com/davisp/eep0018.git
> and
> http://github.com/dizzyd/eep0018.git (both are based on yajl).
>
> The former did not build for me because of some rebar issues. The latter
> did after some changes to Makefiles. A little problem with it is that I
> do not understand, how it decodes unicode:
> 1> eep0018:json_to_term("\"\\u0433\\u043e\\u0440\\u043e\\u0434\"").
> <<208,179,208,190,209,128,208,190,208,180>>
>
> Probably authors of these modules read this list and will clarify and/or
> fix if there is something to fix.
>
> Thanks, A.
>
> PS And, just in case if anyone cares, none of these parsers implements
> json_to_term/2.
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
>

Alexander,

I'm the author of http://github.com/davisp/eep0018.git If you send me
any of the rebar errors you had while building I can take a look. I
haven't heard of anyone having issues with it, but my first guess is
that its an issue with which version of Erlang you're using. If memory
serves, that version requires R13B and hasn't been upgraded to R14 NIF
api yet.

I think the issue you're having with the unicode decoding is that most
decoders are going to return a UTF-8 encoded string by default. To get
a list of unicode code points you'll need to do something like:

{ok, Bin} = json:decode("\"\\u0433\\u043e\\u0440\\u043e\\u0434\""),
CodePoints = xmerl_ucs:from_utf8(Bin).


As to EEP0018 parsers, I don't really think there's a single parser
that you can point to as most compliant. The EEP itself is mostly a
collection of all the various points of disagreement in the
implementations that existed at the time. Until someone's parser
either gets included in the standard library, or just gains enough
common usage I doubt there will ever be a finalized version of what's
EEP compliant and what's not.

If you're looking for a good JSON parser, I would recommend either the
mochijson2.erl parser or one of the C implementations. mochijson2 is
pretty solid and has been tested extensively. The C based ones can
give some pretty good speedups for large decoding operatiosn though,
so that might be a motivaton against. And IIRC, Alisdair's latest
decoder is stream based so you can use it for incremental parsing if
you need that.

HTH,
Paul


More information about the erlang-questions mailing list