[erlang-questions] built-in xml parser
Dale Harvey
harveyd@REDACTED
Sat Jun 27 10:21:16 CEST 2009
r13b1 released with a new xml sax parser module, supposedly around the
performance level as erlsom
http://erlang.org/doc/man/xmerl_sax_parser.html
http://erlang.org/download/otp_src_R13B01.readme
2009/6/26 Praveen Ray <praveen.ray@REDACTED>
> Jeffrey Friedl will be proud of you!
> On Thu, Jun 25, 2009 at 3:14 PM, Per Melin <per.melin@REDACTED> wrote:
>
> > Joel Reymont:
> > > I used to avoid regular expressions but then the new 're' module became
> > part
> > > of OTP.
> > >
> > > I'm now using regular expressions with abandon!
> > >
> > > Is there a chance that future versions of OTP come with a built-in XML
> > > parser based on a C library, just like 're'?
> >
> > The solution is obvious; use 're' to parse XML.
> >
> > -module(regexml).
> >
> > -export([parse/1]).
> >
> > -define(XML_RE, "[^<]+|<(?:!(?:--(?:[^-]*-(?:[^-][^-]*-)*->?)?"
> > "|\\[CDATA\\[(?:[^\\]]*](?:[^\\]]+])*]+"
> > "(?:[^\\]>][^\\]]*](?:[^\\]]+])*]+)*>)?"
> > "|DOCTYPE(?:[ \\n\\t\\r]+(?:[A-Za-z_:]"
> > "|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> > "|[^\\x00-\\x7F])*(?:[ \\n\\t\\r]+(?:(?:[A-Za-z_:]"
> > "|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> > "|[^\\x00-\\x7F])*|\"[^\"]*\""
> > "|'[^']*'))*(?:[ \\n\\t\\r]+)?"
> > "(?:\\[(?:<(?:!(?:--[^-]*-(?:[^-][^-]*-)*->"
> > "|[^-](?:[^\\]\"'><]+|\"[^\"]*\"|'[^']*')*>)"
> > "|\\?(?:[A-Za-z_:]|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> > "|[^\\x00-\\x7F])*(?:\\?>"
> > "|[\\n\\r\\t ][^?]*\\?+(?:[^>?][^?]*\\?+)*>))"
> > "|%(?:[A-Za-z_:]|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> > "|[^\\x00-\\x7F])*;|[ \\n\\t\\r]+)*](?:[
> > \\n\\t\\r]+)?)?>?)?)?"
> > "|\\?(?:(?:[A-Za-z_:]|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> > "|[^\\x00-\\x7F])*(?:\\?>"
> > "|[\\n\\r\\t ][^?]*\\?+(?:[^>?][^?]*\\?+)*>)?)?"
> > "|/(?:(?:[A-Za-z_:]|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> > "|[^\\x00-\\x7F])*(?:[ \\n\\t\\r]+)?>?)?|(?:(?:[A-Za-z_:]"
> > "|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> > "|[^\\x00-\\x7F])*(?:[ \\n\\t\\r]+(?:[A-Za-z_:]"
> > "|[^\\x00-\\x7F])(?:[A-Za-z0-9_:.-]"
> > "|[^\\x00-\\x7F])*(?:[ \\n\\t\\r]+)?="
> > "(?:[ \\n\\t\\r]+)?(?:\"[^<\"]*\""
> > "|'[^<']*'))*(?:[ \\n\\t\\r]+)?/?>?)?)").
> >
> > parse(String) ->
> > re:run(String, ?XML_RE, [{capture, all, list}, global]).
> >
> > ---
> >
> > Adapted from http://www.cs.sfu.ca/~cameron/REX.html<http://www.cs.sfu.ca/%7Ecameron/REX.html>("XML Shallow
> > Parsing with Regular Expressions").
> >
> > ________________________________________________________________
> > erlang-questions mailing list. See http://www.erlang.org/faq.html
> > erlang-questions (at) erlang.org
> >
> >
>
>
> --
> Yellowfish Technologies Inc
> http://www.yellowfish.biz
> praveen.ray@REDACTED
> (888) 817 2969 x 233
> gtalk/skype: praveenray
>
More information about the erlang-questions
mailing list