Trapexit Open Source Erlang Project Crawler

Péter Szilágyi <>
Mon Nov 30 17:43:31 CET 2009


Hello everyone,

We are proud to announce the Trapexit project crawler: a web crawler to go through popular open source hosting sites (GitHub, BitBucket, SuorceForge and GoogleCode) and index the Erlang components/projects found. It tries to collect a lot of generic information about the projects (name, description, owners, home page, project page, language breakdown, rss/atom feeds, etc), and present it in one big searchable index. As an extra, in case a project has forks (even if at multiple hosting sites), the crawler tries to identify them (and the original project) and display them accordingly on the index page. Also to make the whole tool more useful for the community, we've implemented a rating system, through which anyone with a Trapexit user can rate a component based on a multitude of criteria.
You can access the tool at the Trapexit website: 

http://projects.trapexit.org . 

Have a go and let us have your feedback (preferably on the Trapexit forums), and if you have a trapexit account, rate the applications you use. Should you find an error or a missing open source Erlang component we did not find, then please contact us so that we can investigate. In case you're interested, the components used to create the crawler and the interface were: lhttpc (for the web page downloading), CouchDB as a storage engine for both the crawler and the website, couchbeam as the database interface and the Nitrogen Web Framework as the front-end for the website.

Hope you like it,
  Peter

PS: Sometimes the server has a hiccup and things slow down for a bit. We're still trying to figure out the reason behind this.

---
Péter Szilágyi
Erlang Training & Consulting Ltd.
http://www.erlang-consulting.com


More information about the erlang-questions mailing list