[erlang-questions] wkipedia rendering engine

Vlad Skvortsov vss@REDACTED
Sat Jul 5 02:14:31 CEST 2008

Joe Armstrong wrote:
>>> We (collectively) promised to help Alexander - I promised to provide him with a
>>> rendering engine (in Erlang) for the wikipedia markup language.
>>> Before I start hacking has anybody done this before?
>> What exactly do you mean by a 'rendering engine'? Translating the
>> markup language (its name is Mediawiki, by the way) to something else?
> I want a number of functions
>      mediaWiki_to_rtf(bin()) -> rtf().
>      rtf_to_html(rtf()) -> html().
>      rtf_to_pdf(rtf()) -> pdf()
> etc. where rtf(), html() pdf() are abstract datav types representing
> (abstracted) rich text, html, and pdf() etc.
> The rendering engine is a wrapper round these routines to display ther
> result in a browser or generate PDF etc.

 From my experience it is very hard to convert mediaWiki format to an 
AST. Well, it's pretty easy to get 80% working, but the rest 20% are 
really tough. I wasn't even able to find a renderer which would be 
compatible with MediaWiki; most of them use simple regexp substitutions 
which work ok "most of the time". It was a year ago, though; may be 
things have changed.

>> It's not a trivial task you have set yourself. There are some elements
>> that are quite complex, for example the fact that '' is italics and
>> ''' is bold. Notice the difference between:
>> '''this is bold'''
>> '''this is italic, starting with a ' ''
>> '''this is bold '' and this part italic as well '''''
> This is almost trivial :-)

It seems so until you get into gory details of templates, tables, math 
formulas and the like.

We ended up in screenscraping with a few heuristics to handle most 
common cases. The requirements had to be relaxed significantly.

Vlad Skvortsov, vss@REDACTED, http://vss.73rus.com

More information about the erlang-questions mailing list