When does xmerl handles unicode_char()

Roessner, Silvester silvester.roessner@REDACTED
Tue Jun 30 08:09:47 CEST 2009


Hi all,

xmerl (R13B2) still uses (only) its own unicode managment.
Is it planned that xmerl also support unicode_char()?

I run into the follwoing problem since I receive a UTF-8 string from .NET via OTP.NET.



PASSES: First run with no umlaut:

	(czv_rx_bridge@REDACTED)12> f().
	ok
	(czv_rx_bridge@REDACTED)13> {ok, B} = file:read_file("V:/test.xml").
	{ok,<<"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<test>This is a test</test>">>}
	(czv_rx_bridge@REDACTED)14> U = unicode:characters_to_list(B).
	"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<test>This is a test</test>"
	(czv_rx_bridge@REDACTED)15> xmerl_scan:string(U).
	{{xmlElement,test,test,[],
	             {xmlNamespace,[],[]},
	             [],1,[],
	             [{xmlText,[{test,1}],1,[],"This is a test",text}],
	             [],"C:/Documents and Settings/visrn/workspace",undeclared},
	 []}


FAILS: Second run with a single umlaut:

	(czv_rx_bridge@REDACTED)16> f().
	ok
	(czv_rx_bridge@REDACTED)17> {ok, B} = file:read_file("V:/test.xml").
	{ok,<<"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<test>This is a test with umlaut ä</test>">>}
	(czv_rx_bridge@REDACTED)18> U = unicode:characters_to_list(B).
	"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<test>This is a test with umlaut ä</test>"
	(czv_rx_bridge@REDACTED)19> xmerl_scan:string(U).
	3265- fatal: {error,{wfc_Legal_Character,{error,{bad_character,228}}}}
	** exception exit: {fatal,
	                       {{error,
	                            {wfc_Legal_Character,{error,{bad_character,228}}}},
	                        {file,file_name_unknown},
	                        {line,2},
	                        {col,36}}}
	     in function  xmerl_scan:fatal/2
	     in call from xmerl_scan:scan_char_data/5
	     in call from xmerl_scan:scan_content/11
	     in call from xmerl_scan:scan_element/12
	     in call from xmerl_scan:scan_document/2
	     in call from xmerl_scan:string/2

PASSES: But when xmerl handles the list itself as UTF-8 all works nice:

	(czv_rx_bridge@REDACTED)20> S = binary_to_list(B).
	"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<test>This is a test with umlaut ä</test>"
	(czv_rx_bridge@REDACTED)21> xmerl_scan:string(S).
	{{xmlElement,test,test,[],
	             {xmlNamespace,[],[]},
	             [],1,[],
	             [{xmlText,[{test,1}],
	                       1,[],"This is a test with umlaut ä",text}],
	             [],"C:/Documents and Settings/visrn/workspace",undeclared},
	 []}
This message is intended for a particular addressee only and
may contain business or company secrets. If you have received
this email in error, please contact the sender and delete the
message immediately. Any use of this email, including saving,
publishing, copying, replication or forwarding of the message
or the contents is not permitted.



More information about the erlang-questions mailing list