[erlang-questions] cookbook entry #1 - unicode/UTF-8 strings

Francesco Cesarini francesco@REDACTED
Thu Oct 20 10:42:56 CEST 2011


Joe,

it would be great if what has already been done on an earlier version of 
the Erlang cookbook could be reviewed and integrated:

http://trapexit.com/Category:CookBook

It has 89 articles in 9 categories.

Rgds,
Francesco



On 19/10/2011 11:14, Joe Armstrong wrote:
> cookbook # 1 - draft 1
>
> <aside>
>   We're going to write a cookbook.
>
>   This will be free (in an electronic version, PDF, epub)
>   And you will be able to buy a paper version (POD)
>
>   The development model is
>
>    - a few authors
>    - many reviewers (you are the reviewers)
>      the reviewers report errors/suggest changes
>      the authors make the changes
>
>   The POD version we hope will generate some income
>   this will be split according to the contributions. Authors
>   will be paid as will reviewers whose suggestions are incorporated.
>
>   Payment (if we make a profit) will be in direct relation to the size
> of the contribution
>
>   Expensive things like professional proof reading, will be
>   sponsorship, or crowd sourced, or otherwise financed.
>
>   To start the ball rolling I have some text below.
>
>   Please comment on this text. If your comments are accepted one day you
> might get paid :-)
>
>   Note: 1) By commenting you are implicitly agreeing that if your comments
> are accepted into the final text then you will be subject to the
> licensing conditions of that text. The text will always be free and
> open source.
>
> </aside>
>
> Cookbook Question:
>
> I have often seen the words "UTF-8 string" used in sentences like
> "Java has UTF-8 strings". What does this mean when applied to Erlang?
>
> ----------------------------------------------------------------------
>
> Answer:
>
> In Erlang strings are syntactic sugar for "lists of integers"
>
> Imagine the string "10(Euro)" - (Euro) is the glyph representing the
> Euro currency symbol.
>
> The term "UF8-string" representing "10(euro)" in Erlang could
> mean one of two things:
>
>     Either a) [49,48,8364]           (ie its a list of three unicode integers)
>     Or     b) [49,48,226,130,172]    (ie its the UTF-8 encoding of the
>                                       unicode characters)
>
> The so words "UTF-8" string might mean a) or might mean b)
>
> Erlang folks have always said "unicode/UTF-8 is easy in Erlang
> since strings are just lists of integers" - by this we mean that
> Erlang programs should always manipulate strings given the type a)
> interpretation. *all* library functions assume type a) encoding.
>
> The type b) interpretation only has meaning when you write data to a
> file etc. and should be as invisible to the user as possible (but when
> things go wrong and you get the wrong character printed you need to
> understand the difference)
>
> Question 1) How can we get a unicode characters into a list item?
>              or what does a string literal look like?
>
>     >  X = "10\x{20ac}"
>     [49,48,8364]
>
>     This is not described in my book since the change came after the
>     book was published (is it in the other Erlang books yet?)	
>
> Question 2) How can we convert between representations a) and b) above?
>
>     Easy - though one has to dig in the documentation a bit.
>
>     >  B = unicode:characters_to_binary(X, unicode, utf8).
>     <<49,48,226,130,172>>
>     >  unicode:characters_to_list(B).
>     [49,48,8364]
>
> Question 3) Can I write "10(Euro)" in an editor which supports
> unicode/UTF-8 and does the erlang tool chain support this?
>
> Will "erlc foo.erl" automatically detect that foo.erl is unicode
> encoded and do the right thing when scanning and tokenising strings?
>
>     Answer: I don't know?
>
> Question 4)  Can string literals be improved on?
>
> I hope so -- In Html I can say (I hope)€
>
> I'd like to say:
>
>        X = "10€" in Erlang
>
>        People who know far more about this than I do can tell me if this
> is OK
>
>
> ----------------------------------------------------------------------
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-- 
Erlang Solutions Ltd.
http://www.erlang-solutions.com




More information about the erlang-questions mailing list