[erlang-questions] wkipedia rendering engine

Joe Armstrong erlang@REDACTED
Mon Jun 30 14:20:58 CEST 2008


Rock and roll....

can you be more explicit than http://download.wikimedia.org can you
point me to a specific file
that I can download that works with your dump reader?

Thanks

/Joe


On Mon, Jun 30, 2008 at 2:13 PM, Thorsten Schuett <schuett@REDACTED> wrote:
> Hi all,
>
> as I am partially to blame for the noise around the wikirenderer, I will add
> my two cents.
>
> For our experiments, we used the XML dumps available at
> http://download.wikimedia.org. We have a small Java program which converts
> the XML dump to Erlang terms (http://www.zib.de/schuett/dumpreader.tgz). E.g.
> converting the bavarian dump:
> java -jar dumpreader.jar /home/schuett/barwiki-20080225-pages-meta-history.xml
>
> But you still have to parse the mediawiki text and convert it to HTML.
> For the last step we currently have two solutions:
>
> 1. Early experiments used flexbisonparse
> (http://svn.wikimedia.org/viewvc/mediawiki/trunk/flexbisonparse/) to convert
> the mediawiki text to XML and XSLT to convert the XML to HTML.
>
> 2. The current code is based on plog4u/bliki( see
> http://matheclipse.org/en/Java_Wikipedia_API)
>
> Thorsten
>
> On Monday 30 June 2008, Joe Armstrong wrote:
>> On Mon, Jun 30, 2008 at 1:36 PM, Jan Lehnardt <jan@REDACTED> wrote:
>> > On Jun 30, 2008, at 13:23, Joe Armstrong wrote:
>> >> Is there a REST interface so that I can retreive the latest version of
>> >> the MetaWiki markup for a specific page with, for example,
>> >> a wget command.
>> >
>> > You can get bulk dumps
>> > http://en.wikipedia.org/wiki/Wikipedia:Database_download#Where_do_I_get..
>> >.
>> >
>> > Why would you do individual scraping? In order to keep up to date with
>> > changes that happened between the last dump and now()?
>>
>> To get a few test cases to test my parser on *before* download the entire
>> thing.
>>
>> Also I suspect the dumps are in MySQL format with xml junk - so it might
>> not be a trival job to extract the raw data. I (presumably) will have to
>> install MySQL and
>> turn some XML stuff into the raw data (just guessing here) - thought
>> that could be a job for a
>> volunteer :-)
>>
>> /Joe
>>
>> > Cheers
>> > Jan
>> > --
>> >
>> >> Has anybody made an erlang interface to scrape individual pages from
>> >> the wikipedia - or to bulk convert the entire
>> >> wikipedia to erlang terms :-)
>> >>
>> >> /Joe
>> >>
>> >> On Mon, Jun 30, 2008 at 11:39 AM, Joe Armstrong <erlang@REDACTED> wrote:
>> >>> Hi,
>> >>>
>> >>> I was at the erlang exchange and heard the *magnificant*  talk
>> >>>
>> >>> "Building a transactional distributed data store with Erlang", by
>> >>> Alexander Reinefeld.
>> >>>
>> >>> I'll be blogging this as soon as I have the URL of the video of the
>> >>> talk.
>> >>>
>> >>> (in advance of this there was talk at the google conference on
>> >>> scalability
>> >>>
>> >>>
>> >>> http://video.google.com/videoplay?docid=-6526287646296437003&q=erlang+s
>> >>>calable&ei=cZ9oSLiDNIiCiwLL9fGwCA&hl=en
>> >>>
>> >>> oh and they also seem to have won the SCALE 2008 prize at the
>> >>> CCGrid conferense in Lyon but there is zero publicity about this AFAICS
>> >>> )
>> >>>
>> >>> We (collectively) promised to help Alexander - I promised to provide
>> >>> him with a
>> >>> rendering engine (in Erlang) for the wikipedia markup language.
>> >>>
>> >>> Before I start hacking has anybody done this before?
>> >>>
>> >>> /Joe Armstrong
>> >>
>> >> --
>> >> fra@REDACTED; ingvar.akesson@REDACTED
>> >>
>> >> [Kopia av detta meddelande skickas till FRA för övervakningsändamål.
>> >> De vill ju ändå läsa min e-post.]
>> >>
>> >> [A copy of this mail has been sent to
>> >> FRA for monitoring purposes. FRA wants to read all my e-mail and have
>> >> been allowed to do by the Swedish parliment - in violation of article
>> >> 12 of the UN Universal Declaration of Human Rights]
>> >> _______________________________________________
>> >> erlang-questions mailing list
>> >> erlang-questions@REDACTED
>> >> http://www.erlang.org/mailman/listinfo/erlang-questions
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>



-- 
fra@REDACTED; ingvar.akesson@REDACTED

[Kopia av detta meddelande skickas till FRA för övervakningsändamål.
De vill ju ändå läsa min e-post.]

[A copy of this mail has been sent to
FRA for monitoring purposes. FRA wants to read all my e-mail and have
been allowed to do by the Swedish parliment - in violation of article
12 of the UN Universal Declaration of Human Rights]



More information about the erlang-questions mailing list