[erlang-questions] Rant: I hate parsing XML with Erlang
Tue Oct 23 16:58:43 CEST 2007
This indicates that you don't want an XML parser. If the XML (HTNL) is not
well formed then you probably just want a tag parser
My guess is that if you tokenise the input into a sequence of tags and
match over the tags you'll get what you want.
The tokenised file looks like this
Then you write pattens to extract the content
this is described here
From what has been posted I get the following picture
1) There are lots of XML libraries around (I have a 6-pack)
other people have mentioned libraries that I was unaware of
2) The code for these cannot be found in one place
3) The documentation for how to use these is non-existent
The solution is
- move all code to one site
- organise it
- document it
This is a lot of work -
On 10/23/07, Anders Nygren <> wrote:
> On 10/23/07, Joel Reymont <> wrote:
> > On Oct 23, 2007, at 2:46 PM, Kevin A. Smith wrote:
> > > FWIW, I tried writing a very permissive feedparser but lost
> > > interest partially due to the ugliness of Erlang's XML parsing APIs.
> > Running yaws_html:parse/1 on a sample RSS feed works just fine. I
> > suspect you can't get anymore permissive than that.
> I tried to use it a couple of years ago and it was of no help to me since
> it actually requires correct HTML. Which the sites I tried to scrape
> refused to provide, (missing end tags and so on).
> erlang-questions mailing list
More information about the erlang-questions