[erlang-questions] xmerl producing atoms

Tue Mar 13 06:53:35 CET 2007

Dear list,

I'm currently working on a system that parses user-provided XML data 
using xmerl. What I find is a problem is that xmerl produces new atoms 
for every element name or namespace URI it parses from the input. This 
is not a big deal if you work with a limited number of schemas and 
"internal" users, but when you have to accept input from the internet, 
your node could be quickly taken down by filling up the atom table. The 
documentation ("Efficiency Guide" / 7.1  "Memory") says that atoms are 
not garbage-collected.

I came to another post from 2005 describing this issue : 
http://www.erlang.org/ml-archive/erlang-questions/200502/msg00070.html

I have looked at alternative parsers like ErlSom, but it seems to work 
against a pre-compiled schema and for my application, I have to accept 
any XML document, without knowing its structure. Plus, I make heavy use 
of hook_fun in xmerl_scan.

My questions : Is there a (possibly-undocumented) option for telling 
xmerl to produce binaries or strings instead of atoms ? Or are there 
plans to garbage-collect atoms in the near future ?

Thanks in advance,

igwan