[erlang-questions] Rant: I hate parsing XML with Erlang

Michael McDaniel erlangx@REDACTED
Wed Oct 24 01:00:56 CEST 2007


 On MS Windows I use classMechanizeIE.php
 (http://www.cgi-interactive-uk.com/com_functions_php_ie.html) and a
 small PHP script to grab pages by controlling MS Internet Explorer
 through its COM interface.  My Erlang program manages the various
 jobs and parses the resultant text files created, sending alerts as
 needed.

 My preference would have been something built-in to Erlang for the
 COM control but Comet no longer is integral with the distribution
 (I do not know if it would have been suitable for the task, anyway).
 I had to use Internet Explorer as the browser because the environment
 in which I am doing this task will check for valid login when you go
 to the page (that is, somehow the server knows if you are logged in
 to your workstation and requires you to use MS Internet Explorer to
 automatically authenticate when you go to specific pages).  Simply
 using http:request/4 or lynx or telnet would not authenticate
 properly.


~Michael


On Tue, Oct 23, 2007 at 02:58:03PM -0700, YC wrote:
> Agreed - utilizing firefox or IE will further allow you to handle javascript
> generated DOMs much more easily then having to write a javascript parser
> yourself, which will enable handling of a much larger sets of pages.
> 
> But is this *easy* to do within Erlang?
> 
> On 10/23/07, Joe Armstrong <erlang@REDACTED> wrote:
> 
>     The point is (or was) that firefox has code to parse virtally any kind of
>     broken
>     warped incomprehensable html - letting firefox figure out the "meaning" of
>     deeply crippled and totally incomprehensible html and then scanning the
>     result
>     (the generated DOM) seems a lot easier than figuring out how to parse
>     crippled HTML yourself - using other stuff as components to do what they
>     are
>     good at doesn't seem that crazy to me.
> 
>     /Joe
> 
> 
> 
>     On 10/23/07, Joel Reymont <joelr1@REDACTED> wrote:
>     >
>     > On Oct 23, 2007, at 4:09 PM, Joe Armstrong wrote:
>     >
>     > > You could then use Erlang as a coordination language controlling
>     > > a load of firefoxes on different machines, telling them to go get
>     > > pages and
>     > > scrape the pages for data which they send back to Erlang.
>     >
>     > This is nuts!!! /With all due respect to Joe/
>     >
>     > --
>     > http://wagerlabs.com
>     >
>     >
>     >
>     >
>     >
>     >
>     _______________________________________________
>     erlang-questions mailing list
>     erlang-questions@REDACTED
>     http://www.erlang.org/mailman/listinfo/erlang-questions
> 
> 
> !DSPAM:52,471e700950982146018883!

> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
> 
> !DSPAM:52,471e700950982146018883!


-- 
Michael McDaniel
Portland, Oregon, USA
http://autosys.us
+1 503 283 5284



More information about the erlang-questions mailing list