[erlang-questions] Project volunteers
Joe Armstrong
erlang@REDACTED
Wed Jul 2 10:05:10 CEST 2008
I've been thinking. I think what I'd like to do it follow the approach
described in http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html
doing as much as possible in Erlang. This will be a good test of my
Erlang toolset. Then I'll rewrite the rendering pipeline.
Then I'd like to play with coutchDB to store the index and derived
dataset obtained
by parsing the page dumps.
The *real* wikipedia has a complex data model described at
http://www.mediawiki.org/wiki/Manual:Database_layout
It would be very interesting to see what this looks like in a
schema-free Key->TypedTuple
data store.
This problem is interesting to me - because the data volumes are large and the
content is reasonable quality.
Cheers
/Joe Armstrong
On Mon, Jun 30, 2008 at 11:50 AM, Joe Armstrong <erlang@REDACTED> wrote:
> Hi Guys,
>
> I've been at the erlang exchange and come back with a headful of ideas.
>
> I have an idea for a fun project.
>
> Make an offline stand-alone version of the wikipedia for places
> without internet access.
> Distribute to the world.
>
> I thought to use the following:
>
> - erlang
> - coutchDB
> - mochiWeb
>
> Jobs to do:
>
> - convert wikipedia dumps (mySQL format) to coutchDB
> - make rendering engine to convert wiki text to HTML
> - compress data dumps to make entiore wikipedia as small as possible
> - shoehorn into a low-power "one laptop for every child" computer
> - make distruibution package
> - release manager (set up groups)
> - write documentation
>
>
> /Joe Armstrong
>
--
fra@REDACTED; ingvar.akesson@REDACTED
[Kopia av detta meddelande skickas till FRA för övervakningsändamål.
De vill ju ändå läsa min e-post.]
[A copy of this mail has been sent to
FRA for monitoring purposes. FRA wants to read all my e-mail and have
been allowed to do by the Swedish parliment - in violation of article
12 of the UN Universal Declaration of Human Rights]
More information about the erlang-questions
mailing list