[erlang-bugs] xmerl_sax_parser:stream/2 doesn't know when to stop

Lars Thorsen <>
Fri Nov 12 13:22:41 CET 2010

the problem you have encountered has nothing to do with the patch
http://www.erlang.org/cgi-bin/ezmlm-cgi/4/46754 .

The effect is also expected and I shall try to explain why.

According to the XML a document is defined as:

document 	 ::= 	prolog element Misc*
Misc 	 ::= 	Comment | PI |  S

In the case you just give a string/binary to xmerl_sax_parser:stream/2 without a continuation function
to the parser it assumes that this is all input it will get and therefor it ends parsing (there is no default 
continuation function).

But if you have defined a continuation function it will be called after the starting string/binary is parsed to see if 
there is more and in your case the problem is that your function doesn't give anything that stops input.
io:getline gives at least a newline even if you just hit enter which is legal input at the end of the xml document.
As far as the parser knows it could come 10 newline and then a process instruction in the end.

When you type an x before hitting enter you got something that's not allowed at the end of the document.
The parser then returns that it got the end of a correct document and a rest value. If you have an input that consists 
just of one document the rest should be [] (or <<>> depending on input). This function is parsing a stream and the rest 
  can be the start of a new document on the stream. The rest can then be used as a start of a new call
to xmerl_sax_parser:stream/2.

If you just want to read one document from input the continuation function must return a [] or <<>> (depending on the 
input) to end the document.

ContFun = fun(S) -> case io:get_line(">> ") of "\n" -> {"",S} ; X -> {X,S}  end end.

Regards Lars Thorsen
OTP Team

Per Melin wrote:
> (I've only tested with R13B04 but there is nothing in the release
> notes that indicates that this would be different in R14B.)
> Without the continuation_fun option xmerl_sax_parser:stream/2 works as
> I would expect. But with a continuation_fun it needs to be fed at
> least one extra character after a complete document.
> 1> EventFun = fun(E, _, S) -> erlang:display(E), S end.
> #Fun<erl_eval.18.105910772>
> 2> ContFun = fun(S) -> {io:get_line(">> "), S} end.
> #Fun<erl_eval.6.13229925>
> 3> xmerl_sax_parser:stream("<foo />", [{event_fun, EventFun},
> {continuation_fun, ContFun}]).
> startDocument
> {startElement,[],"foo",{[],"foo"},[]}
> {endElement,[],"foo",{[],"foo"}}
> Here it has called ContFun and is waiting for additional input. It
> will accept anything that is not whitespace, and then return it as a
> rest. Let's give it an "x".
>>> x
> endDocument
> {ok,undefined,"x\n"}
> 4>
> It seems this used to be a problem even without continuation_fun, but
> the patch described here painted over it:
> http://www.erlang.org/cgi-bin/ezmlm-cgi/4/46754
> ________________________________________________________________
> erlang-bugs (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:

More information about the erlang-bugs mailing list