[erlang-bugs] eunit_surefire doesn't ensure proper UTF-8 encoding

Magnus Henoch magnus@REDACTED
Wed Nov 13 13:38:59 CET 2013


Compile the following module and run eunit_xml_encoding_bug:doit() from
an Erlang shell:

-module(eunit_xml_encoding_bug).

-compile(export_all).

-include_lib("eunit/include/eunit.hrl").

doit() ->
    eunit:test(?MODULE, [{report, {eunit_surefire,[]}}]).

my_test_() ->
    ?_test(io:format([128,10])).

This creates a file called TEST-eunit_xml_encoding_bug.xml which claims
to be in UTF-8 (its first line is '<?xml version="1.0" encoding="UTF-8" ?>')
but contains an improperly encoded character.  Most XML tools will
refuse to do anything with such an XML file.  For example xmllint says:

$ xmllint /tmp/TEST-eunit_xml_encoding_bug.xml 
/tmp/TEST-eunit_xml_encoding_bug.xml:4: parser error : Input is not proper UTF-8, indicate encoding !

And opening the file in Firefox yields:

XML Parsing Error: not well-formed
Location: file:///tmp/TEST-eunit_xml_encoding_bug.xml
Line Number 4, Column 17:

I came across this problem when running a Quickcheck property inside
Eunit.  The Quickcheck property would output random binary data with
io:format("~p"), and sometimes that would end up being high bytes which
were valid Latin-1 but invalid UTF-8.

As eunit_surefire declares its output files to be in UTF-8 encoding, I
think it should check that the contents of <system-out> etc are properly
encoded, and if not do something about it, e.g. convert from Latin-1 to
UTF-8 or insert replacement characters (U+FFFD).

Regards,
Magnus



More information about the erlang-bugs mailing list