Fw: [erlang-questions] Which Erlang JSON parser?

alisdair sullivan alisdairsullivan@REDACTED
Thu Jul 29 20:16:54 CEST 2010





----- Forwarded Message ----
From: alisdair sullivan <alisdairsullivan@REDACTED>
To: Robert Virding <rvirding@REDACTED>
Sent: Thu, July 29, 2010 11:15:44 AM
Subject: Re: [erlang-questions] Which Erlang JSON parser?


I have an (undocumented) json parser at http://github.com/talentdeficit/json. it 
implements the interface recommended in eep0018 with a few additions. it also 
parses naked json values (ie, 'true', 'false', '1', '"a string"') when passed 
the option {strict, false} (default true) and (c style, /* ... */) comments when 
passed the option {comments, true} (default false). the encoder will encode 
naked json values when passed the option {strict, false} also.

it relies on jsx (available at http://github.com/talentdeficit/jsx), my port of 
yajl to erlang.

to install jsx into ERL_LIBS, run 'make', './rebar install'. to install json, 
run  './rebar compile' then './rebar install'. (froom the roots of the 
respective projects).

I'm working on documentation, but I hadn't planned to 'go public' quite yet. I'm 
confident in the underlying jsx project, less so on json, though we've been 
using the decoder in production with no problems yet.


On 29 July 2010 11:01, Alexander Kotelnikov <sacha@REDACTED> wrote:
> Hello.
>
> It is a terrible story. I needed a JSON parcer to deal with JSON data in
> my Erlang programm.
>
> At first I picked json_eep
> (http://github.com/jchris/erlang-json-eep-parser.git) which worked quite
> fine,  but later I found out that it is not able to parse (some!) escaped
> unicode characters:
> 28> json_eep:json_to_term("\"\\u0433\\u043e\\u0440\\u043e\\u0434\"").
> ** exception error: bad argument
>     in function  list_to_binary/1
>        called as list_to_binary([1075,1086,1088,1086,1076])
>     in call from json_grammar:yeccpars2_9/7
>     in call from json_grammar:yeccpars0/2
>     in call from json_eep:json_to_term/1
>
> My guess is that just a little change near list_to_binary should fix the
> problem.

You are right in that the problem is the call to list_to_binary.
List_to_binary is a very low-level function as it expects its input to
be a, possibly nested, list of byte values, 0 - 255. Here, obviously,
this has not been done properly and the code is trying to call
list_to_binary with a list of  the unicode codepoint values.

> Then I start investigation of other parsers. I found around 7. Most of
> them not eep0018 parsers. So I tried
> http://github.com/davisp/eep0018.git
> and
> http://github.com/dizzyd/eep0018.git (both are based on yajl).
>
> The former did not build for me because of some rebar issues. The latter
> did after some changes to Makefiles. A little problem with it is that I
> do not understand, how it decodes unicode:
> 1> eep0018:json_to_term("\"\\u0433\\u043e\\u0440\\u043e\\u0434\"").
> <<208,179,208,190,209,128,208,190,208,180>>

As I said earlier a binary is a sequence of bytes without any other
internal information and when you print a binary this is what you see,
the *bytes* of which it is  composed. In this case each of the utf-8
encoded characters uses 2 bytes in big endian order, which is what you
see. Apparently it works as it should.

> PS And, just in case if anyone cares, none of these parsers implements
> json_to_term/2.

As yet there is no "standard" JSON parser and converter. Hopefully we
will see one soon, using NIFs it should be possible to do an efficient
one. If we can agree to the erlang representation. :-)

Robert

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED



More information about the erlang-questions mailing list