[erlang-questions] Rant: I hate parsing XML with Erlang

Zvi <>
Tue Oct 23 22:56:20 CEST 2007


using IE on Windows or Firefox on Linux is actually the best way to implement
Web Automation, i.e. bot logging into website/webapp, clicking links and
buttons, etc. This way not only bad-formed HTML, but virtually any Web
technology, like cookies, javascript, AJAX, Flash, plugins,Java applets, can
be supported. 
   The only problems with this approach:
1. It requires much more resources (i.e more heavyweight, than jsut HTML
parsing).
2. When running multiple firefox instances on the same node, there are canbe
security problems.
3. In server environment it should be possible to run Firefox in headless
mode (i.e. without X).

Zvi


Joe Armstrong-2 wrote:
> 
> The point is (or was) that firefox has code to parse virtally any kind of
> broken
> warped incomprehensable html - letting firefox figure out the "meaning" of
> deeply crippled and totally incomprehensible html and then scanning the
> result
> (the generated DOM) seems a lot easier than figuring out how to parse
> crippled HTML yourself - using other stuff as components to do what they
> are
> good at doesn't seem that crazy to me.
> 
> /Joe
> 
> 
> 
> On 10/23/07, Joel Reymont <> wrote:
>>
>> On Oct 23, 2007, at 4:09 PM, Joe Armstrong wrote:
>>
>> > You could then use Erlang as a coordination language controlling
>> > a load of firefoxes on different machines, telling them to go get
>> > pages and
>> > scrape the pages for data which they send back to Erlang.
>>
>> This is nuts!!! /With all due respect to Joe/
>>
>> --
>> http://wagerlabs.com
>>
>>
>>
>>
>>
>>
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
> 
> 

-- 
View this message in context: http://www.nabble.com/Rant%3A-I-hate-parsing-XML-with-Erlang-tf4676760.html#a13373590
Sent from the Erlang Questions mailing list archive at Nabble.com.




More information about the erlang-questions mailing list