[erlang-questions] Beginner Screen Scaping, and Auto-login/input on 3rd party web app?

Jeroen Koops koops.j@REDACTED
Wed Dec 29 17:29:22 CET 2010


Yes, mochiweb has the mochiweb_html module, which parses HTML.

On Wed, Dec 29, 2010 at 5:24 PM, Jesse Gumm <sigmastar@REDACTED> wrote:
> Hi there,
>
> I'd recommend looking into httpc module, which is an http client.
>
> As for parsing the data, you could check out the 're' module for regular expressions, and it could work mostly, but html is not technically regex parseable.  I don't know of an erlang module for parsing an html dom tree, perhaps mochiweb provides something.
>
> -Jesse
>
>
> --
> Jesse Gumm
> Sigma Star Systems
> 414.940.4866
> On Dec 29, 2010 10:19 AM, JETkoten <jetkoten@REDACTED> wrote:
>
> On 12/24/10 12:05 PM, Alain O'Dea wrote:
>
> > On 2010-12-24, at 11:08, JETkoten<jetkoten@REDACTED>  wrote:
>
> >
>
> >> [...] Hi Everyone,
>
> >>
>
> >> I'm (very) new to Erlang, and hoping to get some basic experience with it.
>
> >>
>
> >> I really learn best by doing something I'm interested in. I have a "pet project" that I would like to implement now in Erlang.
>
> >>
>
> >> Here it is:
>
> >>
>
> >> I have a large personal library of books and find that I don't need many of them anymore. I'd like to create a program that will help me manage my online sales on a marketplace site, by automatically checking competing sellers' prices at a set time interval an then logging into their website and adjusting my prices according to a formula I'd set based on the other prices.
>
> >>
>
> >> I did a Google search on Erlang "screen scraping" and saw some options:
>
> >>
>
> >> www_tools, Yaws parser, xmerl, mochiweb
>
> >>
>
> >> However, none of the posts that suggest those are less than 2 years old... which is the best/easiest way, and/or are there newer, better options now?
>
> >>
>
> >> Any ideas?
>
> >>
>
> >> Thanks in advance,
>
> >> Jack
>
> > Hi Jack:
>
> >
>
> > [...] Gradually it probably makes sense to switch to native Erlang utilities if you find them to perform or integrate better.
>
> >
>
> > [...]
>
> >
>
> > Eventually it makes sense to use OTP and supervisors to consistently handle agent crashes.  If you find yourself writing a lot of try/catch logic, then stop and refactor to OTP.  Erlang and OTP in Action http://manning.com/logan is the best book for this.
>
> >
>
> > Cheers and Merry Christmas,
>
> > Alain
>
> Hi Alain,
>
>
>
> Thanks very much for your reply.
>
>
>
> So, I do want to try and implement this with the native Erlang
>
> utilities, and am not sure where to begin. I started writing a module
>
> and then tried to think of what kind of functions I could use to perform
>
> these tasks, but I don't know how to access a website to screen scrape
>
> with Erlang or how/where to efficiently store and retreve the
>
> price/title data that I would scrape.
>
>
>
> Would Mnesia or something like Riak be good for the storage part?
>
>
>
> I also don't know how to get my program to log in to the site after
>
> calculating the new price from the scraped data and then changing it on
>
> the marketplace site...
>
>
>
> In the OTP version, would I be looking at gen_server or maybe gen_fsm to
>
> complete the tasks? I looked through the Erlang and OTP in Action book
>
> in a bookstore, but it seems too advanced for me at this point to get
>
> much benefit from.
>
>
>
> I'm truly a beginner here, so any concrete steps/tools anyone can offer
>
> would be a huge help! I've been looking through the online tutorials and
>
> books, but can't seem to find much about Erlang and WWW related tasks
>
> like these.
>
>
>
> Thanks again,
>
> Jack
>
>
>
> ________________________________________________________________
>
> erlang-questions (at) erlang.org mailing list.
>
> See http://www.erlang.org/faq.html
>
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
>
>
>
>


More information about the erlang-questions mailing list