[erlang-questions] Rant: I hate parsing XML with Erlang

Toby Thain toby@REDACTED
Wed Oct 24 00:25:33 CEST 2007


On 23-Oct-07, at 12:09 PM, Joe Armstrong wrote:

> I've seen some work on parsing badly formed HTML.
>
> If I remember rightly you keep a stack of the currently open tags
> then stacks for things like <font> <b> <i> tags etc. So you end up  
> with
> several small stacks. Each new open or close tag pushes or pops
> things onto these stacks.
>
> When you hit raw data you pattern match over the stacks to figure out
> what to do.
>
> As an aside it occurred to me that mozilla is probably pretty good at
> sceen scraping

Not to mention lynx.

--Toby

> (or whatever it's called) - so it should be possible to write
> a Firefox extension to do this that talks through a socket to Erlang.
> <somebody told me this was easy, but they obviously knew more than  
> I do>
>
> You could then use Erlang as a coordination language controlling
> a load of firefoxes on different machines, telling them to go get  
> pages and
> scrape the pages for data which they send back to Erlang.
>
> if we could use firefox as a component then we could avoid reinventing
> the wheel (again)
>
> /Joe
>
>
> On 10/23/07, Kevin A. Smith <kevin@REDACTED> wrote:
>> Possibly. My understanding was that it still required well-formed
>> documents to function. A lot of feeds feature varying amounts of
>> "well-formedness", sadly.
>>
>> --Kevin
>> On Oct 23, 2007, at 10:01 AM, Joel Reymont wrote:
>>
>>>
>>> On Oct 23, 2007, at 2:46 PM, Kevin A. Smith wrote:
>>>
>>>> FWIW, I tried writing a very permissive feedparser but lost
>>>> interest partially due to the ugliness of Erlang's XML parsing  
>>>> APIs.
>>>
>>> Running yaws_html:parse/1 on a sample RSS feed works just fine. I
>>> suspect you can't get anymore permissive than that.
>>>
>>> --
>>> http://wagerlabs.com
>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions




More information about the erlang-questions mailing list