anybody written a web cache?

Joe Armstrong erlang@REDACTED
Wed Feb 10 21:29:57 CET 2010


Hello,

Has anybody written a simple web cache?

What I'd like are:

     - own policy for time-to-live (ie many (most) web pages come back
with idiotic
       cache control directives - to force page refreshes)

     - rate limiting (per site) - ie I don't want to annoy sites by
requeting too much data in
       a short time

I'm interested in writing some web-aggregation/meshup software - why?
- because I'm fed
up with all the crap that most sites deliver.

<example>I've been looking at ads. for apartments in Stockholm - one
of the largest
sites which has ads. for apartments is rather slow - if you click on
the data for one apartment
134 http GET requests are issued - most are set uncachable. The
generated HTML is crap.
same true for hotels - try booking a hotel somewhere ...</aside>

Seems pretty easy to fetch and cache data on apartments for sale, and hotels,
extract the *information* content and re-mesh the information.

If the individual sites rate-limit queries form my machine we could
set up a distributed
net to request the data from random hosts :-)

Could be a fun project

/Joe


More information about the erlang-questions mailing list