[erlang-questions] Is this proper way to convert a latin string to utf8 string

王昊 jusfeel@REDACTED
Sun Jul 26 21:30:26 CEST 2015


I think my questions is wrong. It is not a latin1 encoded string. It is actually utf8 encoded string but be read by Erlang into [232,191,153] represented by a latin - charlist.

A simple list_to_binary will give me back the UTF8 already. It's just I need a list to accommodate other part of the program. 

I am using a web framework(Chicagoboss). I posted the data into Erlang in Chinese in utf8 encoded string from a web form. It is read by Erlang as [232,191,153]. This is just one single Chinese character. But erlang read it as [232,191,153]. So I want to consume via ajax later on on the client side.

But because this piece of information is in a blob of long json data and it is needed to be converted to binary before sending down the wire. So in order to make this piece of information to be correctly converted as one part of the whole assembled json, it needs to turn to a utf8 list first like this:

asn1rt:utf8_binary_to_list(list_to_binary([232,191,153])),

this will give me [36825] which represent the same Chinese character as  <<232,191,153>>. You can test this by 
io:format("~ts~n",[[36825]]). and io:format("~ts~n",[<<232,191,153>>]). They all output the same character: 这

then later, asn1rt:utf8_list_to_binary will convert all the json data together to binary.




--
Hao

  

在 2015-07-27 00:14:16,"Jesper Louis Andersen" <jesper.louis.andersen@REDACTED> 写道:



On Sun, Jul 26, 2015 at 3:11 PM, 王昊 <jusfeel@REDACTED> wrote:

Hi,
Does anyone know if this is a proper way to convert latin string to utf-8 string?


{ok, S} = asn1rt:utf8_binary_to_list(list_to_binary([232,191,153])).
io:format("~ts~n",[S]).

Use the `unicode` module for character conversion:


1> unicode:characters_to_binary([232,191,153], latin1, utf8).
<<195,168,194,191,194,153>>
2> io:format("~ts~n", [v(1)]).


It prints as three characters:


LATIN SMALL LETTER E WITH GRAVE
INVERTED QUESTION MARK
(unbound 0x0099 part of the Latin-1 supplement range)


I don't know if this is correct for you.


What are you trying to do generally? That is, what is the problem you are having. Perhaps we can give better help if we know your situation.


--

J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150727/15066b88/attachment.htm>


More information about the erlang-questions mailing list