[erlang-questions] Character encodings and lager

Roger Lipscombe <>
Mon Aug 3 17:12:31 CEST 2015


I'm tracking down a crash in one of our custom lager backends. The
relevant piece of code is, cut down, something like the following:

    Message = lager_msg:message(Msg),
    JSON = mochijson2:encode({struct, [{"msg", list_to_binary(Message)}]}).

When I call it with the following...

    lager:log(info, self(), "~p", [<<178, 179>>]).

...it crashes with an exception: {ucs,{bad_utf8_character_code}}

Now, I know that's not a valid UTF8 character code: it's superscript-2
and superscript-3, as encoded in Latin1.

Cutting this down further, I get:

    Message = [60,60,178,179,62,62].
    mochijson2:encode({struct, [{"msg", list_to_binary(Message)}]}).
** exception exit: {ucs,{bad_utf8_character_code}}

So, my question would -- usually -- be: "how do I convert the Latin1
string to UTF8?".

However, the binary isn't supposed to contain anything outside the
32-127 ASCII range. In fact, it should be an uppercase hexadecimal
string: [A-F0-9] in ASCII.

Note: In the original crash, the string was sent from an embedded
device, and it appears to have garbage in it because of some kind of
corruption in configuration NVRAM.

So, I have an actual *binary*, which usually only contains valid hex
characters (in ASCII), but occasionally has bytes outside this range.
How do I get that into mochijson2, via lager, without anything
crashing?

I tried the following:

    mochijson2:encode({struct, [{"msg",
unicode:characters_to_binary(Message)}]}).

...which works, but am I going to get burnt if I start using UTF-8 in
my logging once we move to Erlang 17 or 18?

How do others deal with this kind of thing in Erlang?

Regards,
Roger.


More information about the erlang-questions mailing list