"Terseness in XML markup is of minimal importance"

Fri Jan 27 09:52:05 CET 2006

Well here's a laugh, just in from the python list

"Just want to check which xml parser you guys have found to be the
quickest. I have xml documents with 250 000 records or more and the
processing of these documents are taking way to long. The validation is
the main problem. Any module names, non validating would be find to,
would help a lot."

I agree that people doing really big xml files are doing foolish
things, but actually...
there is a market there, a very real niche market:

http://www.datapower.com/products/xa35.html

datapower was recently bought by IBM weren't they. hmm.

On 1/26/06, David Hopwood <david.nospam.hopwood@REDACTED> wrote:
> Joe Armstrong (AL/EAB) wrote:
> > This is pure lunacy - design goals 10 in
> > http://www.w3.org/TR/2003/PER-xml-20031030/
> > says:
> >
> >      " 10. Terseness in XML markup is of minimal importance. "
> >
> >   But terseness of expression *is* important if you have lots of data,
> > this implies
> > that you should not use XML when there is lots of data.
> >
> >   Using XML for voluminous data is a sure sign of bad design
> >
> >   << in another project I pumped into, XML was being used to represent
> >      a quantity that had three discrete states.
> >
> >      THREE STATES CAN BE REPRESENTED IN TWO BITS
> >
> >      But they chose XML - the declaration of a single state look about
> >      190 Bytes - and they had *lots* of records, which they stored in a
> > big data base.
> >
> >      Now the data base was slow, so they bought more memory, it was
> > still slow,
> >      so they wanted to go distributed - so they asked me since "Joe
> > knows something about
> >      distributed programming" >>
> >
> >    Mindless use of XML is sure sign of excruciatingly bad design.
> >    >>
> >
> >    Idea - grade moderately difficult - XML should compress very nicely -
> > since the same tags get repeated over and over again, thus in LZSS
> > compression duplicated tags will appear as pointers.
>
> Duplicated byte strings will appear as pointers, but these will usually
> not start and end at boundaries of duplicated tags.
>
> In general I think this *kind* of idea for how to work around problems
> with XML (for example) by making things even more complicated, is part of
> the problem. Why can't we just point and laugh at the silly people who are
> designing systems that use 300+ MByte XML files?
>
> --
> David Hopwood <david.nospam.hopwood@REDACTED>
>
>