[erlang-questions] Rant: I hate parsing XML with Erlang

Joe Armstrong <>
Tue Oct 23 16:58:43 CEST 2007

This indicates that you don't want an XML parser. If the XML (HTNL) is not
well formed then you probably just want a tag parser

My guess is that if you tokenise the input into a sequence of tags and
then pattern
match over the tags you'll get what you want.

The tokenised file looks like this

      {eTag, img,[{src,"..."}]},

  Then you write pattens to extract the content

   this is described here


   From what has been posted I get the following picture

   1) There are lots of XML libraries around (I have a 6-pack)
       other people have mentioned libraries that I was unaware of
   2) The code for these cannot be found in one place
   3) The documentation for how to use these is non-existent

    The solution is

    - move all code to one site
    - organise it
    - document it

    This is a lot of work -


On 10/23/07, Anders Nygren <> wrote:
> On 10/23/07, Joel Reymont <> wrote:
> >
> > On Oct 23, 2007, at 2:46 PM, Kevin A. Smith wrote:
> >
> > > FWIW, I tried writing a very permissive feedparser but lost
> > > interest partially due to the ugliness of Erlang's XML parsing APIs.
> >
> > Running yaws_html:parse/1 on a sample RSS feed works just fine. I
> > suspect you can't get anymore permissive than that.
> I tried to use it a couple of years ago and it was of no help to me since
> it actually requires correct HTML. Which the sites I tried to scrape
> refused to provide, (missing end tags and so on).
> Anders
> _______________________________________________
> erlang-questions mailing list
> http://www.erlang.org/mailman/listinfo/erlang-questions

More information about the erlang-questions mailing list