[erlang-questions] Rant: I hate parsing XML with Erlang
Wed Oct 24 00:25:33 CEST 2007
On 23-Oct-07, at 12:09 PM, Joe Armstrong wrote:
> I've seen some work on parsing badly formed HTML.
> If I remember rightly you keep a stack of the currently open tags
> then stacks for things like <font> <b> <i> tags etc. So you end up
> several small stacks. Each new open or close tag pushes or pops
> things onto these stacks.
> When you hit raw data you pattern match over the stacks to figure out
> what to do.
> As an aside it occurred to me that mozilla is probably pretty good at
> sceen scraping
Not to mention lynx.
> (or whatever it's called) - so it should be possible to write
> a Firefox extension to do this that talks through a socket to Erlang.
> <somebody told me this was easy, but they obviously knew more than
> I do>
> You could then use Erlang as a coordination language controlling
> a load of firefoxes on different machines, telling them to go get
> pages and
> scrape the pages for data which they send back to Erlang.
> if we could use firefox as a component then we could avoid reinventing
> the wheel (again)
> On 10/23/07, Kevin A. Smith <> wrote:
>> Possibly. My understanding was that it still required well-formed
>> documents to function. A lot of feeds feature varying amounts of
>> "well-formedness", sadly.
>> On Oct 23, 2007, at 10:01 AM, Joel Reymont wrote:
>>> On Oct 23, 2007, at 2:46 PM, Kevin A. Smith wrote:
>>>> FWIW, I tried writing a very permissive feedparser but lost
>>>> interest partially due to the ugliness of Erlang's XML parsing
>>> Running yaws_html:parse/1 on a sample RSS feed works just fine. I
>>> suspect you can't get anymore permissive than that.
>> erlang-questions mailing list
> erlang-questions mailing list
More information about the erlang-questions