[erlang-bugs] eunit_surefire doesn't ensure proper UTF-8 encoding
Magnus Henoch
magnus@REDACTED
Wed Nov 13 13:38:59 CET 2013
Compile the following module and run eunit_xml_encoding_bug:doit() from
an Erlang shell:
-module(eunit_xml_encoding_bug).
-compile(export_all).
-include_lib("eunit/include/eunit.hrl").
doit() ->
eunit:test(?MODULE, [{report, {eunit_surefire,[]}}]).
my_test_() ->
?_test(io:format([128,10])).
This creates a file called TEST-eunit_xml_encoding_bug.xml which claims
to be in UTF-8 (its first line is '<?xml version="1.0" encoding="UTF-8" ?>')
but contains an improperly encoded character. Most XML tools will
refuse to do anything with such an XML file. For example xmllint says:
$ xmllint /tmp/TEST-eunit_xml_encoding_bug.xml
/tmp/TEST-eunit_xml_encoding_bug.xml:4: parser error : Input is not proper UTF-8, indicate encoding !
And opening the file in Firefox yields:
XML Parsing Error: not well-formed
Location: file:///tmp/TEST-eunit_xml_encoding_bug.xml
Line Number 4, Column 17:
I came across this problem when running a Quickcheck property inside
Eunit. The Quickcheck property would output random binary data with
io:format("~p"), and sometimes that would end up being high bytes which
were valid Latin-1 but invalid UTF-8.
As eunit_surefire declares its output files to be in UTF-8 encoding, I
think it should check that the contents of <system-out> etc are properly
encoded, and if not do something about it, e.g. convert from Latin-1 to
UTF-8 or insert replacement characters (U+FFFD).
Regards,
Magnus
More information about the erlang-bugs
mailing list