"Terseness in XML markup is of minimal importance"

David Hopwood <>
Thu Jan 26 18:34:08 CET 2006


Joe Armstrong (AL/EAB) wrote:
> This is pure lunacy - design goals 10 in
> http://www.w3.org/TR/2003/PER-xml-20031030/
> says:
> 
>      " 10. Terseness in XML markup is of minimal importance. "
>  
>   But terseness of expression *is* important if you have lots of data,
> this implies
> that you should not use XML when there is lots of data.
> 
>   Using XML for voluminous data is a sure sign of bad design
> 
>   << in another project I pumped into, XML was being used to represent
>      a quantity that had three discrete states. 
> 
>      THREE STATES CAN BE REPRESENTED IN TWO BITS
> 
>      But they chose XML - the declaration of a single state look about
>      190 Bytes - and they had *lots* of records, which they stored in a
> big data base.
> 
>      Now the data base was slow, so they bought more memory, it was
> still slow,
>      so they wanted to go distributed - so they asked me since "Joe
> knows something about
>      distributed programming" >>
> 
>    Mindless use of XML is sure sign of excruciatingly bad design.
>    >>
> 
>    Idea - grade moderately difficult - XML should compress very nicely -
> since the same tags get repeated over and over again, thus in LZSS
> compression duplicated tags will appear as pointers.

Duplicated byte strings will appear as pointers, but these will usually
not start and end at boundaries of duplicated tags.

In general I think this *kind* of idea for how to work around problems
with XML (for example) by making things even more complicated, is part of
the problem. Why can't we just point and laugh at the silly people who are
designing systems that use 300+ MByte XML files?

-- 
David Hopwood <>




More information about the erlang-questions mailing list