XML and Erlang

Richard A. O'Keefe ok@REDACTED
Fri Jun 24 04:55:58 CEST 2005

I wrote:
	> It looks as though the biggest space win for Erlang might be 
	> representing parsed character data and attribute values other
	> than enumeration values as binaries rather than lists.

I had a reason for recommending binaries rather than atoms.

"Ulf Wiger (AL/EAB)" <ulf.wiger@REDACTED> replied:
	Presumably, with string=atom, you would get the added advantage
	of "compression", since each unique atom is stored only once,
	which is not usually the case for binaries ...

It so happens that my C library for processing XML already does
"hash consing" for *everything*, not just strings.  I've never seen
an example where this didn't help; on the other hand, I've never seen
an example where it helped *much*.  10% is not unreasonable (and even
most of that comes from attribute values).  The real payoff is that
when you transform from one XML format to another, the new tree shares
lots of space with the old, which is something that happens in
functional languages anyway.

	It would of course also bring the added disadvantage of potentially
	filling the atom table, since it's not garbage collected, and 
	cannot be manually purged either.
Exactly so.  With binaries, you could always do the old LOGIX trick:
if you test whether two binaries are equal, and they are, change the
variable that pointed to the newer copy to point to the older.

More information about the erlang-questions mailing list