[erlang-questions] xmerl scan stream

Ulf Wiger <>
Mon Dec 8 23:53:59 CET 2008


True, you can't really use it directly, but you can copy
the code. Basically, the read_chunk/2 function should
be replaced by something along the lines of:

read_chunk(Sofar) ->
    receive
        {tcp, _Socket, Bin} ->
            {ok, iolist_to_binary([Sofar, Bin])};
        {tcp, closed, _} ->
            eof
    end.

(View this as pseudo code.)

You should probably use gen_tcp:recv() instead, or
at least an {active, once} socket. But you need to
rewrite xmerl_eventp:stream/2 slightly.

The complication, when you get down to it, is that the
stream continuation fun must take care not to break
up the stream in the wrong place. This is because xmerl
doesn't use a proper tokenizer, but does a one-pass
parse which relies rather heavily on pattern matching.

This is what the find_good_split() function is for.

BR,
Ulf W

2008/12/8 Peter Sabaini <>:
> On Monday 08 December 2008 23:09:39 Ulf Wiger wrote:
>> Hi Peter,
>>
>> Have you looked at the module xmerl_eventp in xmerl?
>>
>> You might even be able to use it directly.
>
> Yes, I suspected that this module might do what I need -- unfortunately, being
> the thick-skulled newbie that I am, I haven't been able to figure out how...
> The docs here http://www.erlang.org/doc/man/xmerl_eventp.html are pretty
> succinct. Aren't the functions in xmerl_eventp for scanning files? Or could I
> use those also with a TCP socket?
>
> Thanks,
> peter.
>
>> BR,
>> Ulf W
>>
>> 2008/12/8 Peter Sabaini <>:
>> > Hi list,
>> >
>> > I am trying to get xmerl to parse a stream of data coming in via a TCP
>> > socket. The goal would be for xmerl to return xmlRecords as soon as one
>> > is complete.
>> >
>> > I use the continuation function option of xmerl and so far that works ok;
>> > unfortunately I only get an xmlRecord as soon as the next xml element
>> > starts. Is there a way to tell xmerl to "evaluate eagerly"?
>> >
>> > Below is the test code I used; any help much appreciated. Is this even
>> > possible? Or am I completely on the wrong track and should use a SAX
>> > model instead?
>> >
>> >  -- snip --
>> >
>> > -module(ap).
>> > -compile(export_all).
>> >
>> > start_server() ->
>> >    {ok, Listen} = gen_tcp:listen(2345, [binary, {packet, raw},
>> >                                         {reuseaddr, true},
>> >                                         {active, true}]),
>> >    spawn(fun() -> par_connect(Listen) end).
>> >
>> > par_connect(Listen) ->
>> >    {ok, _Socket} = gen_tcp:accept(Listen),
>> >    spawn(fun() -> par_connect(Listen) end),
>> >    io:format("par_c ~n", []),
>> >    X = xmerl_scan:string("", [{continuation_fun, fun continue/3}]),
>> >    io:format("X: ~p ~n", [X]).
>> >
>> > continue(Continue, Exception, GlobalState) ->
>> >    io:format("entered continue/3 ~n", []),
>> >    receive
>> >        {tcp, _Socket, Bin} ->
>> >            Str = binary_to_list(Bin),
>> >            io:format("got Str ~p ~n", [Str]),
>> >            Continue(Str, GlobalState);
>> >        {tcp_closed, _} ->
>> >            io:format("Server socket closed~n" ),
>> >            Exception(GlobalState)
>> >    end.
>> >
>> > main() ->
>> >    start_server().
>> >
>> >
>> >  -- snip --
>> >
>> > --
>> >  Peter Sabaini
>> >  http://sabaini.at/
>> >
>> >
>> > _______________________________________________
>> > erlang-questions mailing list
>> > 
>> > http://www.erlang.org/mailman/listinfo/erlang-questions
>
> --
>  Peter Sabaini
>  http://sabaini.at/
>
>
>



More information about the erlang-questions mailing list