[erlang-questions] Rant: I hate parsing XML with Erlang
Michael McDaniel
erlangx@REDACTED
Wed Oct 24 01:00:56 CEST 2007
On MS Windows I use classMechanizeIE.php
(http://www.cgi-interactive-uk.com/com_functions_php_ie.html) and a
small PHP script to grab pages by controlling MS Internet Explorer
through its COM interface. My Erlang program manages the various
jobs and parses the resultant text files created, sending alerts as
needed.
My preference would have been something built-in to Erlang for the
COM control but Comet no longer is integral with the distribution
(I do not know if it would have been suitable for the task, anyway).
I had to use Internet Explorer as the browser because the environment
in which I am doing this task will check for valid login when you go
to the page (that is, somehow the server knows if you are logged in
to your workstation and requires you to use MS Internet Explorer to
automatically authenticate when you go to specific pages). Simply
using http:request/4 or lynx or telnet would not authenticate
properly.
~Michael
On Tue, Oct 23, 2007 at 02:58:03PM -0700, YC wrote:
> Agreed - utilizing firefox or IE will further allow you to handle javascript
> generated DOMs much more easily then having to write a javascript parser
> yourself, which will enable handling of a much larger sets of pages.
>
> But is this *easy* to do within Erlang?
>
> On 10/23/07, Joe Armstrong <erlang@REDACTED> wrote:
>
> The point is (or was) that firefox has code to parse virtally any kind of
> broken
> warped incomprehensable html - letting firefox figure out the "meaning" of
> deeply crippled and totally incomprehensible html and then scanning the
> result
> (the generated DOM) seems a lot easier than figuring out how to parse
> crippled HTML yourself - using other stuff as components to do what they
> are
> good at doesn't seem that crazy to me.
>
> /Joe
>
>
>
> On 10/23/07, Joel Reymont <joelr1@REDACTED> wrote:
> >
> > On Oct 23, 2007, at 4:09 PM, Joe Armstrong wrote:
> >
> > > You could then use Erlang as a coordination language controlling
> > > a load of firefoxes on different machines, telling them to go get
> > > pages and
> > > scrape the pages for data which they send back to Erlang.
> >
> > This is nuts!!! /With all due respect to Joe/
> >
> > --
> > http://wagerlabs.com
> >
> >
> >
> >
> >
> >
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
>
> !DSPAM:52,471e700950982146018883!
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
> !DSPAM:52,471e700950982146018883!
--
Michael McDaniel
Portland, Oregon, USA
http://autosys.us
+1 503 283 5284
More information about the erlang-questions
mailing list