[erlang-questions] Rant: I hate parsing XML with Erlang
Thomas Lindgren
thomasl_erlang@REDACTED
Tue Oct 23 16:27:19 CEST 2007
--- Joel Reymont <joelr1@REDACTED> wrote:
>
> On Oct 23, 2007, at 2:30 PM, Sean Hinde wrote:
>
> > Take a look at yaws_html.erl. That is quite a nice
> parser that
> > doesn't produce the same bloat as xmerl
>
> Are there any examples of using yaws_html as well as
> the output that
> it produces? Would be nice to include in this
> thread.
1> yaws_html:parse("ShowLetter.html").
{ehtml,[],
[{html,[],
[{head,[],
[{title,[],"Yahoo! Mail -
thomasl_erlang@REDACTED"},
{script,[],
"\n<!-- \n\tif(typeof
top.frames[\"wmailmain\"] != \"undefined\")
window.open(\"http://mail.yahoo.com\", \"_top\");\n//
-->\n"},
{link,[{rel,"stylesheet"},
{href,"ShowLetter_files/mail_blue_all.css"},
{type,"text/css"},
{media,"all"}]},
{script,[{src,"ShowLetter_files/mailcommonlib.js"}],[]},
...
and so on
However, note that yaws_html (1.68 in this case)
apparently isn't robust enough to handle unclosed tags
and perhaps other nastiness. You get parse errors
instead, which might not be what you want for a real
html processor. Good luck.
(For extra credit, write an xmerl -> ehtml converter.)
Best,
Thomas
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the erlang-questions
mailing list