[erlang-questions] cookbook entry #1 - unicode/UTF-8 strings

Joe Armstrong erlang@REDACTED
Thu Oct 20 10:49:10 CEST 2011


On Thu, Oct 20, 2011 at 10:42 AM, Francesco Cesarini
<francesco@REDACTED> wrote:
> Joe,
>
> it would be great if what has already been done on an earlier version of the
> Erlang cookbook could be reviewed and integrated:

Absolutely - we won't start from scratch - and we want some kind of
commenting system like
http://book.realworldhaskell.org/read/

I'm digging around to see how this can be achieved

/Joe



> http://trapexit.com/Category:CookBook
>
> It has 89 articles in 9 categories.
>
> Rgds,
> Francesco
>
>
>
> On 19/10/2011 11:14, Joe Armstrong wrote:
>>
>> cookbook # 1 - draft 1
>>
>> <aside>
>>  We're going to write a cookbook.
>>
>>  This will be free (in an electronic version, PDF, epub)
>>  And you will be able to buy a paper version (POD)
>>
>>  The development model is
>>
>>   - a few authors
>>   - many reviewers (you are the reviewers)
>>     the reviewers report errors/suggest changes
>>     the authors make the changes
>>
>>  The POD version we hope will generate some income
>>  this will be split according to the contributions. Authors
>>  will be paid as will reviewers whose suggestions are incorporated.
>>
>>  Payment (if we make a profit) will be in direct relation to the size
>> of the contribution
>>
>>  Expensive things like professional proof reading, will be
>>  sponsorship, or crowd sourced, or otherwise financed.
>>
>>  To start the ball rolling I have some text below.
>>
>>  Please comment on this text. If your comments are accepted one day you
>> might get paid :-)
>>
>>  Note: 1) By commenting you are implicitly agreeing that if your comments
>> are accepted into the final text then you will be subject to the
>> licensing conditions of that text. The text will always be free and
>> open source.
>>
>> </aside>
>>
>> Cookbook Question:
>>
>> I have often seen the words "UTF-8 string" used in sentences like
>> "Java has UTF-8 strings". What does this mean when applied to Erlang?
>>
>> ----------------------------------------------------------------------
>>
>> Answer:
>>
>> In Erlang strings are syntactic sugar for "lists of integers"
>>
>> Imagine the string "10(Euro)" - (Euro) is the glyph representing the
>> Euro currency symbol.
>>
>> The term "UF8-string" representing "10(euro)" in Erlang could
>> mean one of two things:
>>
>>    Either a) [49,48,8364]           (ie its a list of three unicode
>> integers)
>>    Or     b) [49,48,226,130,172]    (ie its the UTF-8 encoding of the
>>                                      unicode characters)
>>
>> The so words "UTF-8" string might mean a) or might mean b)
>>
>> Erlang folks have always said "unicode/UTF-8 is easy in Erlang
>> since strings are just lists of integers" - by this we mean that
>> Erlang programs should always manipulate strings given the type a)
>> interpretation. *all* library functions assume type a) encoding.
>>
>> The type b) interpretation only has meaning when you write data to a
>> file etc. and should be as invisible to the user as possible (but when
>> things go wrong and you get the wrong character printed you need to
>> understand the difference)
>>
>> Question 1) How can we get a unicode characters into a list item?
>>             or what does a string literal look like?
>>
>>    >  X = "10\x{20ac}"
>>    [49,48,8364]
>>
>>    This is not described in my book since the change came after the
>>    book was published (is it in the other Erlang books yet?)
>>
>> Question 2) How can we convert between representations a) and b) above?
>>
>>    Easy - though one has to dig in the documentation a bit.
>>
>>    >  B = unicode:characters_to_binary(X, unicode, utf8).
>>    <<49,48,226,130,172>>
>>    >  unicode:characters_to_list(B).
>>    [49,48,8364]
>>
>> Question 3) Can I write "10(Euro)" in an editor which supports
>> unicode/UTF-8 and does the erlang tool chain support this?
>>
>> Will "erlc foo.erl" automatically detect that foo.erl is unicode
>> encoded and do the right thing when scanning and tokenising strings?
>>
>>    Answer: I don't know?
>>
>> Question 4)  Can string literals be improved on?
>>
>> I hope so -- In Html I can say (I hope)€
>>
>> I'd like to say:
>>
>>       X = "10€" in Erlang
>>
>>       People who know far more about this than I do can tell me if this
>> is OK
>>
>>
>> ----------------------------------------------------------------------
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>
> --
> Erlang Solutions Ltd.
> http://www.erlang-solutions.com
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>



More information about the erlang-questions mailing list