xmerl simple out of CDATA blocks

Anthony Molinaro anthonym@REDACTED
Thu Nov 18 01:00:35 CET 2010


Hi,

  So I noticed after some searching that while you can read CDATA blocks
with xmerl you can't seem to write them out.  So for instance

Erlang R14B (erts-5.8.1) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]

Eshell V5.8.1  (abort with ^G)
1> I = "<HTMLResource><![CDATA[\n<html>\n  <body>\n    <h1>Hello World</h1>\n    <a href=\"http://www.example.com/?foo=bar&baz=bob\">Bye</a>\n  </body>\n</html>\n]]></HTMLResource>".
"<HTMLResource><![CDATA[\n<html>\n  <body>\n    <h1>Hello World</h1>\n    <a href=\"http://www.example.com/?foo=bar&baz=bob\">Bye</a>\n  </body>\n</html>\n]]></HTMLResource>"
2> {X,_} = xmerl_scan:string (I).
{{xmlElement,'HTMLResource','HTMLResource',[],
             {xmlNamespace,[],[]},
             [],1,[],
             [{xmlText,[{'HTMLResource',1}],
                       1,[],
                       "\n<html>\n  <body>\n    <h1>Hello World</h1>\n    <a href=\"http://www.example.com/?foo=bar&baz=bob\">Bye</a>\n  </body>\n</html>\n",
                       cdata}],
             [],"/home/molinaro/tmp",undeclared},
 []}
3> O = lists:flatten (xmerl:export_simple_content ([X], xmerl_xml)).
"<HTMLResource>\n<html>\n  <body>\n    <h1>Hello World</h1>\n    <a href=\"http://www.example.com/?foo=bar&baz=bob\">Bye</a>\n  </body>\n</html>\n</HTMLResource>"
4> I = O.
** exception error: no match of right hand side value "<HTMLResource>\n<html>\n  <body>\n    <h1>Hello World</h1>\n    <a href=\"http://www.example.com/?foo=bar&baz=bob\">Bye</a>\n  </body>\n</html>\n</HTMLResource>"
5>

However, with the following patch

--- a/xmerl.erl   2010-09-29 11:13:00.000000000 -0700
+++ b/xmerl.erl   2010-11-01 11:23:54.000000000 -0700
@@ -185,6 +185,8 @@
 %%     Content = [Element]
 %%     Callback = [atom()]
 %% @doc Exports normal XML content directly, without further context.
+export_content([#xmlText{value = Text, type = cdata} | Es], Callbacks) ->
+    [ "<![CDATA[", Text, "]]>" | export_content(Es, Callbacks) ]; 
 export_content([#xmlText{value = Text} | Es], Callbacks) ->
     [apply_text_cb(Callbacks, Text) | export_content(Es, Callbacks)];
 export_content([#xmlPI{} | Es], Callbacks) ->

I get the behavior I want.

Erlang R14B (erts-5.8.1) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]

Eshell V5.8.1  (abort with ^G)
1> I = "<HTMLResource><![CDATA[\n<html>\n  <body>\n    <h1>Hello World</h1>\n    <a href=\"http://www.example.com/?foo=bar&baz=bob\">Bye</a>\n  </body>\n</html>\n]]></HTMLResource>".
"<HTMLResource><![CDATA[\n<html>\n  <body>\n    <h1>Hello World</h1>\n    <a href=\"http://www.example.com/?foo=bar&baz=bob\">Bye</a>\n  </body>\n</html>\n]]></HTMLResource>"
2> {X,_} = xmerl_scan:string (I).                                              {{xmlElement,'HTMLResource','HTMLResource',[],                                               {xmlNamespace,[],[]},
             [],1,[],
             [{xmlText,[{'HTMLResource',1}],
                       1,[],
                       "\n<html>\n  <body>\n    <h1>Hello World</h1>\n    <a href=\"http://www.example.com/?foo=bar&baz=bob\">Bye</a>\n  </body>\n</html>\n",
                       cdata}],
             [],"/home/molinaro",undeclared},
 []}
3> O = lists:flatten (xmerl:export_simple_content ([X], xmerl_xml)).           "<HTMLResource><![CDATA[\n<html>\n  <body>\n    <h1>Hello World</h1>\n    <a href=\"http://www.example.com/?foo=bar&baz=bob\">Bye</a>\n  </body>\n</html>\n]]></HTMLResource>"
4> I = O.
"<HTMLResource><![CDATA[\n<html>\n  <body>\n    <h1>Hello World</h1>\n    <a href=\"http://www.example.com/?foo=bar&baz=bob\">Bye</a>\n  </body>\n</html>\n]]></HTMLResource>"

However, it seems like this sort of patch would be unacceptable since it may
be invalid for certain formatters (ie, it assumes cdata is always rendered
the same).   However, I don't see anyway to do this with the current code
as there is only a '#text#' callback which takes the text from an #xmlText
element without the type.  This means you can't have different rules based
on the type.

So I'm basically looking for a little guidance before I start hacking a
larger patch.  What sort of behavior would be acceptable?  Maybe a new
callback '#cdata#' with a default of using '#text#'?  Or a '#text#'/2
function with takes the type?  Other ideas?  Or maybe this patch is
fine?

Thanks,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@REDACTED>


More information about the erlang-questions mailing list