[erlang-questions] Is this proper way to convert a latin string to utf8 string

Raimo Niskanen raimo+erlang-questions@REDACTED
Mon Aug 3 13:30:22 CEST 2015


On Mon, Jul 27, 2015 at 03:30:26AM +0800, 王昊 wrote:
> I think my questions is wrong. It is not a latin1 encoded string. It is actually utf8 encoded string but be read by Erlang into [232,191,153] represented by a latin - charlist.
> 
> A simple list_to_binary will give me back the UTF8 already. It's just I need a list to accommodate other part of the program. 
> 
> I am using a web framework(Chicagoboss). I posted the data into Erlang in Chinese in utf8 encoded string from a web form. It is read by Erlang as [232,191,153]. This is just one single Chinese character. But erlang read it as [232,191,153]. So I want to consume via ajax later on on the client side.
> 
> But because this piece of information is in a blob of long json data and it is needed to be converted to binary before sending down the wire. So in order to make this piece of information to be correctly converted as one part of the whole assembled json, it needs to turn to a utf8 list first like this:
> 
> asn1rt:utf8_binary_to_list(list_to_binary([232,191,153])),

1> unicode:characters_to_list(list_to_binary([232,191,153])).
[36825]

/ Raimo



> 
> this will give me [36825] which represent the same Chinese character as  <<232,191,153>>. You can test this by 
> io:format("~ts~n",[[36825]]). and io:format("~ts~n",[<<232,191,153>>]). They all output the same character: 这
> 
> then later, asn1rt:utf8_list_to_binary will convert all the json data together to binary.
> 
> 
> 
> 
> --
> Hao
> 
>   
> 
> 在 2015-07-27 00:14:16,"Jesper Louis Andersen" <jesper.louis.andersen@REDACTED> 写道:
> 
> 
> 
> On Sun, Jul 26, 2015 at 3:11 PM, 王昊 <jusfeel@REDACTED> wrote:
> 
> Hi,
> Does anyone know if this is a proper way to convert latin string to utf-8 string?
> 
> 
> {ok, S} = asn1rt:utf8_binary_to_list(list_to_binary([232,191,153])).
> io:format("~ts~n",[S]).
> 
> Use the `unicode` module for character conversion:
> 
> 
> 1> unicode:characters_to_binary([232,191,153], latin1, utf8).
> <<195,168,194,191,194,153>>
> 2> io:format("~ts~n", [v(1)]).
> 
> 
> It prints as three characters:
> 
> 
> LATIN SMALL LETTER E WITH GRAVE
> INVERTED QUESTION MARK
> (unbound 0x0099 part of the Latin-1 supplement range)
> 
> 
> I don't know if this is correct for you.
> 
> 
> What are you trying to do generally? That is, what is the problem you are having. Perhaps we can give better help if we know your situation.
> 
> 
> --
> 
> J.

> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions


-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB



More information about the erlang-questions mailing list