Hi Enrico,<br><br>I'm looking for a faster JSON parser too, and was about to tackle this, so thanks for sharing your work.    (I found that Erlang, even with +native compilation, takes ~800 millsecs to parse

a 1MB JSON document, where as perl JSON:XS (in C) takes 5 milliseconds for the same data.  The reverse encoding is just as bad.)<br><br>I haven't tried your code yet, but I did glance briefly at it:<br>I noticed a couple of things that I suspect are bottlenecks:<br>

<br>  - the code using the 'output()' callback in the driver entry structure to receive the input JSON document.  That means that Erlang is having to copy the input.<br>  - the code returns the encoded result via driver_output2() (rather than driver_output_binary or driver_outputv).  This means that the encoded response is being copied.<br>

  - the code generates the result Erlang term via ei_x_encode*.   This copies the string data, and also encodes it to Erlang's external-term format (which is rather similar to a binary form of JSON).  The Erlang side then has to decode from the external-term format via  (and erlang:binary_to_term on the receiving end).<br>

<br>I think it is possible to avoid all of these, and hopefully achieve JSON:XS-level speed mentioned above.<br>e.g.<br><br>  1. Use outputv() to receive the input JSON document as one large binary (no copying).  e.g. If the data was received via gen_tcp socket in binary mode, only small binary header pointing to the data would be sent to the driver.<br>

<br>  2. Parse with yaj, and generate the result term with the little term-construction language definef for driver_output_term()   see <a href="http://www.erlang.org/doc/man/erl_driver.html#driver_output_term">http://www.erlang.org/doc/man/erl_driver.html#driver_output_term</a><br>

      In particular, using ERL_DRV_BINARY strings (keys and values) can be sub-binaries of the original input-binary, so there is no copying.<br><br>  3. Send the result back to Erlang via driver_send_term or driver_output_term.   This directly transfers ownership of the constructed term to the VM, so there is  <br>

        a) no copying, of either the term structure, or the key/value strings (which are still sub-binaries of the larger input binary)<br>  and b) no encoding or decoding through an intermediate format (i.e. no use of the external term format)<br>

<br>  5.  Use EDTK for all of the boilerplate.  This implements the above outputv / driver_send_term mechanism, but with one current problem.  Currently the size of the result term is very limited (it's only been used for sending small results).  I would need to make that dynamic.  (This is the main reason I haven't done it yet -- might require some surgery).<br>

<br>One consequence of making sub-binaries of the larger input binary is that the larger input binary cannot be garbage-collected until every key and value sub-binary has been destroyed.  That may cause excessive memory usage in some scenarios.  So I'm planning on implementing an option to create copies for keys and/or values.   However, that only causes 1 copy, as step (3) can still transfer the new binaries back to Erlang without copying the data.<br>

<br>However, I'm not sure when I'll get to tackle this -- perhaps in the next few weeks, but it might be further out than that.<br><br><br>

>> . I need some hints regarding "parallel execution". <br><br>>> The native driver does not support multithreading (and why should it? It only<br>>> complicates things where OTP can do that kind of stuff by itself<br>

>> already.) <br><br>This very much depends.  Erlang 'multithreading' is at the Erlang process-level only (i.e. Erlang has its own process scheduler).   If Erlang calls a driver, then by default the OS thread running the erlang process scheduler is blocked until the driver returns.  <br>

<br>That's fine for short operations, but e.g. if parsing a really huge JSON object, you don't want to block the Erlang scheduler for tens/hundreds of milliseconds (literally the entire VM is stalled, or part of the VM in SMP mode, when there are multiple scheduler OS threads).     <br>

<br>Drivers can use driver_async() to run work in a separate thread-pool, to avoid blocking the scheduler.  However, other Erlang drivers use that thread pool for significant slow tasks, e.g. file IO.    <br><br>EDTK implements private thread-pools to work around that problem.   My plan was to test the size of the input JSON document and do a blocking operation if it is small (say a < 100KB) and a threadpool operation otherwise.<br>

<br><br>>>With the current code the driver gets loaded only once.<br><br>That's fine.<br><br><br>>>Therefore on a multicore machine only one CPU gets really used. <br><br>If you are not running Erlang in SMP mode then it will only use a single CPU for driver operations unless you use the driver_async() feature or private threadpools like EDTK.  <br>

<br>If you are running in SMP mode with more than one scheduler OS thread, then the level of driver concurrency depends on the 'locking mode' for the driver.  By default conservative 'driver-level' locking is used, as many drivers were written before SMP was implemented and are not thread-safe.   However, if your driver is thread-safe then you can easily switch to port-level locking, and if multiple Erlang processes each have their own open port using the driver, then Erlang scheduler threads can call the driver simultaneously.    However, if you do this you still need to worry about the effects of blocking the Erlang VM schedulers for any significant amount of time.  That's why private threadpools are still important.<br>

<br>   From      <a href="http://www.erlang.org/doc/man/erl_driver.html">http://www.erlang.org/doc/man/erl_driver.html</a><br><p>   "In the runtime system with SMP support, drivers are locked either

on driver level or port level (driver instance level). By default

driver level locking will be used, i.e., only one emulator thread

will execute code in the driver at a time. If port level locking

is used, multiple emulator threads may execute code in the driver

at the same time. There will only be one thread at a time calling

driver call-backs corresponding to the same port, though. In order

to enable port level locking set the <span class="code">ERL_DRV_FLAG_USE_PORT_LOCKING</span>

<a href="http://www.erlang.org/doc/man/driver_entry.html#driver_flags">driver flag</a> in

the <a href="http://www.erlang.org/doc/man/driver_entry.html">driver_entry</a>

used by the driver. When port level locking is used it is the

responsibility of the driver writer to synchronize all accesses

to data shared by the ports (driver instances)."

</p>

I hope this helps,<br><br>Chris<br><br><br><div class="gmail_quote">On Sun, Jan 25, 2009 at 1:49 PM, Enrico Thierbach <span dir="ltr"><<a href="mailto:enrico.thierbach@googlemail.com">enrico.thierbach@googlemail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi guys,<br>

<br>

I have just finished what I would call the first stage of the native<br>

JSON parser implementation. This is the state as of now at<br>

<a href="http://github.com/pboy/eep0018/" target="_blank">http://github.com/pboy/eep0018/</a>, Please see the readme file.<br>

<br>

In short, this is the status:<br>

<br>

- I parse everything that comes along like mochijson2 and rabbitmq<br>

- optionally I can parse according to eep0018<br>

- my code runs 6 times as fast as  mochijson2/rabbitmq at JSON input<br>

of a certain size, and is usally not slower on very small JSON input.<br>

<br>

jan tried the module along with couchdb, and find one issue regarding<br>

UTF8 characters; besides of that everything seemed to run fine and<br>

much faster. The utf8 parsing issue is resolved (or better: worked<br>

around: the JSON parser and the CouchDB tests have different ideas on<br>

what is valid UTF8).<br>

<br>

What would be next?<br>

<br>

1. I would like to invite you all to review and try the code.<br>

<br>

2. I need some hints regarding "parallel execution". The native driver<br>

does not support multithreading (and why should it? It only<br>

complicates things where OTP can do that kind of stuff by itself<br>

already.) With the current code the driver gets loaded only once.<br>

Therefore on a multicore machine only one CPU gets really used. Is it<br>

somehow possible to load the driver multiple times? The only way I see<br>

so far is having the driver compiled and installed multiple times with<br>

different names; but I guess there is a better way. The code itself<br>

should luckily run in a parallel situation.<br>

<br>

3. and finally I'll have to tackle the Erlang->JSON issue. I don't<br>

expect a speedup as big.<br>

<br>

Please see my next mail for some comments on EEP0018.<br>

<br>

/eno<br>

<br>

====================================================================<br>

A wee piece of ruby every monday: <a href="http://1rad.wordpress.com/" target="_blank">http://1rad.wordpress.com/</a><br>

_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>

<a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br>

</blockquote></div><br>