[erlang-questions] wkipedia rendering engine
Sat Jul 5 02:14:31 CEST 2008
Joe Armstrong wrote:
>>> We (collectively) promised to help Alexander - I promised to provide him with a
>>> rendering engine (in Erlang) for the wikipedia markup language.
>>> Before I start hacking has anybody done this before?
>> What exactly do you mean by a 'rendering engine'? Translating the
>> markup language (its name is Mediawiki, by the way) to something else?
> I want a number of functions
> mediaWiki_to_rtf(bin()) -> rtf().
> rtf_to_html(rtf()) -> html().
> rtf_to_pdf(rtf()) -> pdf()
> etc. where rtf(), html() pdf() are abstract datav types representing
> (abstracted) rich text, html, and pdf() etc.
> The rendering engine is a wrapper round these routines to display ther
> result in a browser or generate PDF etc.
From my experience it is very hard to convert mediaWiki format to an
AST. Well, it's pretty easy to get 80% working, but the rest 20% are
really tough. I wasn't even able to find a renderer which would be
compatible with MediaWiki; most of them use simple regexp substitutions
which work ok "most of the time". It was a year ago, though; may be
things have changed.
>> It's not a trivial task you have set yourself. There are some elements
>> that are quite complex, for example the fact that '' is italics and
>> ''' is bold. Notice the difference between:
>> '''this is bold'''
>> '''this is italic, starting with a ' ''
>> '''this is bold '' and this part italic as well '''''
> This is almost trivial :-)
It seems so until you get into gory details of templates, tables, math
formulas and the like.
We ended up in screenscraping with a few heuristics to handle most
common cases. The requirements had to be relaxed significantly.
Vlad Skvortsov, vss@REDACTED, http://vss.73rus.com
More information about the erlang-questions