[erlang-questions] EEP0018/native JSON parsing
Tue Jan 27 02:52:43 CET 2009
I'm looking for a faster JSON parser too, and was about to tackle this, so
thanks for sharing your work. (I found that Erlang, even with +native
compilation, takes ~800 millsecs to parse a 1MB JSON document, where as perl
JSON:XS (in C) takes 5 milliseconds for the same data. The reverse encoding
is just as bad.)
I haven't tried your code yet, but I did glance briefly at it:
I noticed a couple of things that I suspect are bottlenecks:
- the code using the 'output()' callback in the driver entry structure to
receive the input JSON document. That means that Erlang is having to copy
- the code returns the encoded result via driver_output2() (rather than
driver_output_binary or driver_outputv). This means that the encoded
response is being copied.
- the code generates the result Erlang term via ei_x_encode*. This
copies the string data, and also encodes it to Erlang's external-term format
(which is rather similar to a binary form of JSON). The Erlang side then
has to decode from the external-term format via (and erlang:binary_to_term
on the receiving end).
I think it is possible to avoid all of these, and hopefully achieve
JSON:XS-level speed mentioned above.
1. Use outputv() to receive the input JSON document as one large binary
(no copying). e.g. If the data was received via gen_tcp socket in binary
mode, only small binary header pointing to the data would be sent to the
2. Parse with yaj, and generate the result term with the little
term-construction language definef for driver_output_term() see
In particular, using ERL_DRV_BINARY strings (keys and values) can be
sub-binaries of the original input-binary, so there is no copying.
3. Send the result back to Erlang via driver_send_term or
driver_output_term. This directly transfers ownership of the constructed
term to the VM, so there is
a) no copying, of either the term structure, or the key/value
strings (which are still sub-binaries of the larger input binary)
and b) no encoding or decoding through an intermediate format (i.e. no use
of the external term format)
5. Use EDTK for all of the boilerplate. This implements the above
outputv / driver_send_term mechanism, but with one current problem.
Currently the size of the result term is very limited (it's only been used
for sending small results). I would need to make that dynamic. (This is
the main reason I haven't done it yet -- might require some surgery).
One consequence of making sub-binaries of the larger input binary is that
the larger input binary cannot be garbage-collected until every key and
value sub-binary has been destroyed. That may cause excessive memory usage
in some scenarios. So I'm planning on implementing an option to create
copies for keys and/or values. However, that only causes 1 copy, as step
(3) can still transfer the new binaries back to Erlang without copying the
However, I'm not sure when I'll get to tackle this -- perhaps in the next
few weeks, but it might be further out than that.
>> . I need some hints regarding "parallel execution".
>> The native driver does not support multithreading (and why should it? It
>> complicates things where OTP can do that kind of stuff by itself
This very much depends. Erlang 'multithreading' is at the Erlang
process-level only (i.e. Erlang has its own process scheduler). If Erlang
calls a driver, then by default the OS thread running the erlang process
scheduler is blocked until the driver returns.
That's fine for short operations, but e.g. if parsing a really huge JSON
object, you don't want to block the Erlang scheduler for tens/hundreds of
milliseconds (literally the entire VM is stalled, or part of the VM in SMP
mode, when there are multiple scheduler OS threads).
Drivers can use driver_async() to run work in a separate thread-pool, to
avoid blocking the scheduler. However, other Erlang drivers use that thread
pool for significant slow tasks, e.g. file IO.
EDTK implements private thread-pools to work around that problem. My plan
was to test the size of the input JSON document and do a blocking operation
if it is small (say a < 100KB) and a threadpool operation otherwise.
>>With the current code the driver gets loaded only once.
>>Therefore on a multicore machine only one CPU gets really used.
If you are not running Erlang in SMP mode then it will only use a single CPU
for driver operations unless you use the driver_async() feature or private
threadpools like EDTK.
If you are running in SMP mode with more than one scheduler OS thread, then
the level of driver concurrency depends on the 'locking mode' for the
driver. By default conservative 'driver-level' locking is used, as many
drivers were written before SMP was implemented and are not thread-safe.
However, if your driver is thread-safe then you can easily switch to
port-level locking, and if multiple Erlang processes each have their own
open port using the driver, then Erlang scheduler threads can call the
driver simultaneously. However, if you do this you still need to worry
about the effects of blocking the Erlang VM schedulers for any significant
amount of time. That's why private threadpools are still important.
"In the runtime system with SMP support, drivers are locked either on
driver level or port level (driver instance level). By default driver level
locking will be used, i.e., only one emulator thread will execute code in
the driver at a time. If port level locking is used, multiple emulator
threads may execute code in the driver at the same time. There will only be
one thread at a time calling driver call-backs corresponding to the same
port, though. In order to enable port level locking set the
driver_entry <http://www.erlang.org/doc/man/driver_entry.html> used by the
driver. When port level locking is used it is the responsibility of the
driver writer to synchronize all accesses to data shared by the ports
I hope this helps,
On Sun, Jan 25, 2009 at 1:49 PM, Enrico Thierbach <
> Hi guys,
> I have just finished what I would call the first stage of the native
> JSON parser implementation. This is the state as of now at
> http://github.com/pboy/eep0018/, Please see the readme file.
> In short, this is the status:
> - I parse everything that comes along like mochijson2 and rabbitmq
> - optionally I can parse according to eep0018
> - my code runs 6 times as fast as mochijson2/rabbitmq at JSON input
> of a certain size, and is usally not slower on very small JSON input.
> jan tried the module along with couchdb, and find one issue regarding
> UTF8 characters; besides of that everything seemed to run fine and
> much faster. The utf8 parsing issue is resolved (or better: worked
> around: the JSON parser and the CouchDB tests have different ideas on
> what is valid UTF8).
> What would be next?
> 1. I would like to invite you all to review and try the code.
> 2. I need some hints regarding "parallel execution". The native driver
> does not support multithreading (and why should it? It only
> complicates things where OTP can do that kind of stuff by itself
> already.) With the current code the driver gets loaded only once.
> Therefore on a multicore machine only one CPU gets really used. Is it
> somehow possible to load the driver multiple times? The only way I see
> so far is having the driver compiled and installed multiple times with
> different names; but I guess there is a better way. The code itself
> should luckily run in a parallel situation.
> 3. and finally I'll have to tackle the Erlang->JSON issue. I don't
> expect a speedup as big.
> Please see my next mail for some comments on EEP0018.
> A wee piece of ruby every monday: http://1rad.wordpress.com/
> erlang-questions mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions