SV: xmerl usage question

Chandrashekhar Mullaparthi <>
Thu Apr 15 11:10:10 CEST 2004


Hi Fredrik,

On 14 Apr 2004, at 12:33, Fredrik Linder wrote:

> Hi Chandrashekhar
>
> First: It is an error in an external dtd spec to twice specify an 
> element or its attrlst. I think it is allowed to re-define elements in 
> the <!DOCTYPE tag however (not sure about this though).
>
> So back to xmerl:
>
> When xmerl parses the given string it will insert all <!ELEMENT and 
> <!ATTRLST information it finds into the rules (except the #FIXED 
> information), and if there already is such information there it will 
> generate the error you've seen. Perfectly as it should be.

The problem here is not that I have duplicate elements in my DTD. The 
problem is that when the same DTD is parsed again when parsing the next 
chunk of XML data, the parser complains about duplicate elements, 
because that element already exists in the rules table from the 
previous parse.

>
> Normally xmerl creates a *new* table for each call to 
> xmerl_scan:string/X, and hence will not generate that error.

There seems to be a bug where the table is not deleted after the parse, 
resulting in the number of ETS tables to keep growing until the more 
ETS tables can be created (the default limit seems to be 1400) and the 
node then crashes. I haven't tracked down this bug yet.

>
> I'm I correct if I guess that you like to read the dtd files only 
> ones? If so, what you probably need to do is to at the second++ run 
> *not* read into the rules table and later swap to the one you earlier 
> read. You would proabably also need to play with the fetch options 
> when playing with the rules options in this way.
>
> And now a little about how we utilize this to only read the dtd files 
> ones:
>
> The way we do is to first parse a fake xml with all dtd information in 
> it using the {prolog, stop} and rules options, to initialize the 
> rules.
>
> Later we set the fetch options to choose the correct rules table 
> (instead of reading the dtd files) that matched the incoming dtd spec, 
> and switch to that rule set when the <!DOCTYPE element is ending.

This sounds like a good idea. I will try it out.

thanks
Chandru

PS: I'm using xmerl-0.18




More information about the erlang-questions mailing list