[erlang-questions] wkipedia rendering engine
Mon Jun 30 14:13:36 CEST 2008
as I am partially to blame for the noise around the wikirenderer, I will add
my two cents.
For our experiments, we used the XML dumps available at
http://download.wikimedia.org. We have a small Java program which converts
the XML dump to Erlang terms (http://www.zib.de/schuett/dumpreader.tgz). E.g.
converting the bavarian dump:
java -jar dumpreader.jar /home/schuett/barwiki-20080225-pages-meta-history.xml
But you still have to parse the mediawiki text and convert it to HTML.
For the last step we currently have two solutions:
1. Early experiments used flexbisonparse
(http://svn.wikimedia.org/viewvc/mediawiki/trunk/flexbisonparse/) to convert
the mediawiki text to XML and XSLT to convert the XML to HTML.
2. The current code is based on plog4u/bliki( see
On Monday 30 June 2008, Joe Armstrong wrote:
> On Mon, Jun 30, 2008 at 1:36 PM, Jan Lehnardt <> wrote:
> > On Jun 30, 2008, at 13:23, Joe Armstrong wrote:
> >> Is there a REST interface so that I can retreive the latest version of
> >> the MetaWiki markup for a specific page with, for example,
> >> a wget command.
> > You can get bulk dumps
> > http://en.wikipedia.org/wiki/Wikipedia:Database_download#Where_do_I_get..
> > Why would you do individual scraping? In order to keep up to date with
> > changes that happened between the last dump and now()?
> To get a few test cases to test my parser on *before* download the entire
> Also I suspect the dumps are in MySQL format with xml junk - so it might
> not be a trival job to extract the raw data. I (presumably) will have to
> install MySQL and
> turn some XML stuff into the raw data (just guessing here) - thought
> that could be a job for a
> volunteer :-)
> > Cheers
> > Jan
> > --
> >> Has anybody made an erlang interface to scrape individual pages from
> >> the wikipedia - or to bulk convert the entire
> >> wikipedia to erlang terms :-)
> >> /Joe
> >> On Mon, Jun 30, 2008 at 11:39 AM, Joe Armstrong <> wrote:
> >>> Hi,
> >>> I was at the erlang exchange and heard the *magnificant* talk
> >>> "Building a transactional distributed data store with Erlang", by
> >>> Alexander Reinefeld.
> >>> I'll be blogging this as soon as I have the URL of the video of the
> >>> talk.
> >>> (in advance of this there was talk at the google conference on
> >>> scalability
> >>> http://video.google.com/videoplay?docid=-6526287646296437003&q=erlang+s
> >>> oh and they also seem to have won the SCALE 2008 prize at the
> >>> CCGrid conferense in Lyon but there is zero publicity about this AFAICS
> >>> )
> >>> We (collectively) promised to help Alexander - I promised to provide
> >>> him with a
> >>> rendering engine (in Erlang) for the wikipedia markup language.
> >>> Before I start hacking has anybody done this before?
> >>> /Joe Armstrong
> >> --
> >> ;
> >> [Kopia av detta meddelande skickas till FRA för övervakningsändamål.
> >> De vill ju ändå läsa min e-post.]
> >> [A copy of this mail has been sent to
> >> FRA for monitoring purposes. FRA wants to read all my e-mail and have
> >> been allowed to do by the Swedish parliment - in violation of article
> >> 12 of the UN Universal Declaration of Human Rights]
> >> _______________________________________________
> >> erlang-questions mailing list
> >> http://www.erlang.org/mailman/listinfo/erlang-questions
More information about the erlang-questions