For what it's worth, i ended up going with erlsom for xml parsing due to speed and ease of streaming parses.<br><br>I would just loop over my buffer with:<br>erlsom:parse_simple(Buffer)<br><br>which returns<br>{xmlstruct,Remainder}<br>
<br>or if there isn't a complete doc in the buffer, it raises {error,"Malformed:..."}, so basically I had<br><br>stream_loop(Data) -><br> case catch elrsom:simple_form(Data) of<br> {ok,Xml,Rest} -> server ! {newXML,Xml}, stream_loop(Rest);<br>
{error,"Malformed"++_} -> server ! {needMore,Data}<br> end.<br><br>(note: not real code, just an example)<br><br>This may be possible with xmerl as well, I know I had both working with just simple straight parsing, but I don't recall the "return struct + left over bytes" as being a feature of the parser.<br>
<br>in my testing, i think for my sample xml (smallish, 399 bytes), I was getting 90uSec / msg with erlsom and 125uSec / msg with xmerl. (running on dual quad 3ghz 8gb ram box, not that that matters for a single threaded test like this)<br>
<br>both perfectly speedy (and faster than any java parser I've used), but for this particular app (xml router) speed is king.<br><br>plus, I found erlsom's "simple form" to be a bit easier to deal with at the time. Not sure why I thought that now though.<br>
<br>It does solve you problem in that you get your xml doc(s) as soon as you receive the bytes.<br><br>hope that helps,<br><br>-David<br><br><br><div class="gmail_quote">On Mon, Dec 8, 2008 at 6:10 PM, Peter Sabaini <span dir="ltr"><<a href="mailto:peter@sabaini.at">peter@sabaini.at</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hm, as an afterthought -- this still doesn't solve the original problem, does<br>
it?<br>
<br>
Say I have this on my input stream:<br>
<br>
% telnet localhost 2345<br>
Trying 127.0.0.1...<br>
Connected to localhost.local.<br>
Escape character is '^]'.<br>
<doc><br>
a<br>
</doc><br>
<foo /><br>
<br>
-----<br>
<br>
then I only get the <doc>a</doc> structure back as soon as <foo /> is entered,<br>
correct?<br>
<br>
Thanks,<br>
<font color="#888888">peter.<br>
</font><div><div></div><div class="Wj3C7c"><br>
<br>
<br>
On Tuesday 09 December 2008 00:30:42 Peter Sabaini wrote:<br>
> On Monday 08 December 2008 23:53:59 Ulf Wiger wrote:<br>
> > True, you can't really use it directly, but you can copy<br>
> > the code. Basically, the read_chunk/2 function should<br>
> > be replaced by something along the lines of:<br>
> ><br>
> > read_chunk(Sofar) -><br>
> > receive<br>
> > {tcp, _Socket, Bin} -><br>
> > {ok, iolist_to_binary([Sofar, Bin])};<br>
> > {tcp, closed, _} -><br>
> > eof<br>
> > end.<br>
><br>
> Ok...<br>
><br>
> > (View this as pseudo code.)<br>
> ><br>
> > You should probably use gen_tcp:recv() instead, or<br>
> > at least an {active, once} socket.<br>
><br>
> At the moment, this is for "trusted" clients only, so I can code this<br>
> rather liberally, without fear that somebody could abuse that -- is that<br>
> what you meant?<br>
><br>
> > But you need to<br>
> > rewrite xmerl_eventp:stream/2 slightly.<br>
><br>
> Ok, I'll try that and report any outcome, maybe other people find this<br>
> useful too.<br>
><br>
> Thanks,<br>
> peter.<br>
><br>
> > The complication, when you get down to it, is that the<br>
> > stream continuation fun must take care not to break<br>
> > up the stream in the wrong place. This is because xmerl<br>
> > doesn't use a proper tokenizer, but does a one-pass<br>
> > parse which relies rather heavily on pattern matching.<br>
> ><br>
> > This is what the find_good_split() function is for.<br>
> ><br>
> > BR,<br>
> > Ulf W<br>
> ><br>
> > 2008/12/8 Peter Sabaini <<a href="mailto:peter@sabaini.at">peter@sabaini.at</a>>:<br>
> > > On Monday 08 December 2008 23:09:39 Ulf Wiger wrote:<br>
> > >> Hi Peter,<br>
> > >><br>
> > >> Have you looked at the module xmerl_eventp in xmerl?<br>
> > >><br>
> > >> You might even be able to use it directly.<br>
> > ><br>
> > > Yes, I suspected that this module might do what I need --<br>
> > > unfortunately, being the thick-skulled newbie that I am, I haven't been<br>
> > > able to figure out how... The docs here<br>
> > > <a href="http://www.erlang.org/doc/man/xmerl_eventp.html" target="_blank">http://www.erlang.org/doc/man/xmerl_eventp.html</a> are pretty succinct.<br>
> > > Aren't the functions in xmerl_eventp for scanning files? Or could I use<br>
> > > those also with a TCP socket?<br>
> > ><br>
> > > Thanks,<br>
> > > peter.<br>
> > ><br>
> > >> BR,<br>
> > >> Ulf W<br>
> > >><br>
> > >> 2008/12/8 Peter Sabaini <<a href="mailto:peter@sabaini.at">peter@sabaini.at</a>>:<br>
> > >> > Hi list,<br>
> > >> ><br>
> > >> > I am trying to get xmerl to parse a stream of data coming in via a<br>
> > >> > TCP socket. The goal would be for xmerl to return xmlRecords as soon<br>
> > >> > as one is complete.<br>
> > >> ><br>
> > >> > I use the continuation function option of xmerl and so far that<br>
> > >> > works ok; unfortunately I only get an xmlRecord as soon as the next<br>
> > >> > xml element starts. Is there a way to tell xmerl to "evaluate<br>
> > >> > eagerly"?<br>
> > >> ><br>
> > >> > Below is the test code I used; any help much appreciated. Is this<br>
> > >> > even possible? Or am I completely on the wrong track and should use<br>
> > >> > a SAX model instead?<br>
> > >> ><br>
> > >> > -- snip --<br>
> > >> ><br>
> > >> > -module(ap).<br>
> > >> > -compile(export_all).<br>
> > >> ><br>
> > >> > start_server() -><br>
> > >> > {ok, Listen} = gen_tcp:listen(2345, [binary, {packet, raw},<br>
> > >> > {reuseaddr, true},<br>
> > >> > {active, true}]),<br>
> > >> > spawn(fun() -> par_connect(Listen) end).<br>
> > >> ><br>
> > >> > par_connect(Listen) -><br>
> > >> > {ok, _Socket} = gen_tcp:accept(Listen),<br>
> > >> > spawn(fun() -> par_connect(Listen) end),<br>
> > >> > io:format("par_c ~n", []),<br>
> > >> > X = xmerl_scan:string("", [{continuation_fun, fun continue/3}]),<br>
> > >> > io:format("X: ~p ~n", [X]).<br>
> > >> ><br>
> > >> > continue(Continue, Exception, GlobalState) -><br>
> > >> > io:format("entered continue/3 ~n", []),<br>
> > >> > receive<br>
> > >> > {tcp, _Socket, Bin} -><br>
> > >> > Str = binary_to_list(Bin),<br>
> > >> > io:format("got Str ~p ~n", [Str]),<br>
> > >> > Continue(Str, GlobalState);<br>
> > >> > {tcp_closed, _} -><br>
> > >> > io:format("Server socket closed~n" ),<br>
> > >> > Exception(GlobalState)<br>
> > >> > end.<br>
> > >> ><br>
> > >> > main() -><br>
> > >> > start_server().<br>
> > >> ><br>
> > >> ><br>
> > >> > -- snip --<br>
> > >> ><br>
> > >> > --<br>
> > >> > Peter Sabaini<br>
> > >> > <a href="http://sabaini.at/" target="_blank">http://sabaini.at/</a><br>
> > >> ><br>
> > >> ><br>
> > >> > _______________________________________________<br>
> > >> > erlang-questions mailing list<br>
> > >> > <a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
> > >> > <a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br>
> > ><br>
> > > --<br>
> > > Peter Sabaini<br>
> > > <a href="http://sabaini.at/" target="_blank">http://sabaini.at/</a><br>
<br>
--<br>
Peter Sabaini<br>
<a href="http://sabaini.at/" target="_blank">http://sabaini.at/</a><br>
<br>
<br>
_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br>
</div></div></blockquote></div><br>