[erlang-questions] cookbook entry #1 - unicode/UTF-8 strings
Joe Armstrong
erlang@REDACTED
Thu Oct 20 10:49:10 CEST 2011
On Thu, Oct 20, 2011 at 10:42 AM, Francesco Cesarini
<francesco@REDACTED> wrote:
> Joe,
>
> it would be great if what has already been done on an earlier version of the
> Erlang cookbook could be reviewed and integrated:
Absolutely - we won't start from scratch - and we want some kind of
commenting system like
http://book.realworldhaskell.org/read/
I'm digging around to see how this can be achieved
/Joe
> http://trapexit.com/Category:CookBook
>
> It has 89 articles in 9 categories.
>
> Rgds,
> Francesco
>
>
>
> On 19/10/2011 11:14, Joe Armstrong wrote:
>>
>> cookbook # 1 - draft 1
>>
>> <aside>
>> We're going to write a cookbook.
>>
>> This will be free (in an electronic version, PDF, epub)
>> And you will be able to buy a paper version (POD)
>>
>> The development model is
>>
>> - a few authors
>> - many reviewers (you are the reviewers)
>> the reviewers report errors/suggest changes
>> the authors make the changes
>>
>> The POD version we hope will generate some income
>> this will be split according to the contributions. Authors
>> will be paid as will reviewers whose suggestions are incorporated.
>>
>> Payment (if we make a profit) will be in direct relation to the size
>> of the contribution
>>
>> Expensive things like professional proof reading, will be
>> sponsorship, or crowd sourced, or otherwise financed.
>>
>> To start the ball rolling I have some text below.
>>
>> Please comment on this text. If your comments are accepted one day you
>> might get paid :-)
>>
>> Note: 1) By commenting you are implicitly agreeing that if your comments
>> are accepted into the final text then you will be subject to the
>> licensing conditions of that text. The text will always be free and
>> open source.
>>
>> </aside>
>>
>> Cookbook Question:
>>
>> I have often seen the words "UTF-8 string" used in sentences like
>> "Java has UTF-8 strings". What does this mean when applied to Erlang?
>>
>> ----------------------------------------------------------------------
>>
>> Answer:
>>
>> In Erlang strings are syntactic sugar for "lists of integers"
>>
>> Imagine the string "10(Euro)" - (Euro) is the glyph representing the
>> Euro currency symbol.
>>
>> The term "UF8-string" representing "10(euro)" in Erlang could
>> mean one of two things:
>>
>> Either a) [49,48,8364] (ie its a list of three unicode
>> integers)
>> Or b) [49,48,226,130,172] (ie its the UTF-8 encoding of the
>> unicode characters)
>>
>> The so words "UTF-8" string might mean a) or might mean b)
>>
>> Erlang folks have always said "unicode/UTF-8 is easy in Erlang
>> since strings are just lists of integers" - by this we mean that
>> Erlang programs should always manipulate strings given the type a)
>> interpretation. *all* library functions assume type a) encoding.
>>
>> The type b) interpretation only has meaning when you write data to a
>> file etc. and should be as invisible to the user as possible (but when
>> things go wrong and you get the wrong character printed you need to
>> understand the difference)
>>
>> Question 1) How can we get a unicode characters into a list item?
>> or what does a string literal look like?
>>
>> > X = "10\x{20ac}"
>> [49,48,8364]
>>
>> This is not described in my book since the change came after the
>> book was published (is it in the other Erlang books yet?)
>>
>> Question 2) How can we convert between representations a) and b) above?
>>
>> Easy - though one has to dig in the documentation a bit.
>>
>> > B = unicode:characters_to_binary(X, unicode, utf8).
>> <<49,48,226,130,172>>
>> > unicode:characters_to_list(B).
>> [49,48,8364]
>>
>> Question 3) Can I write "10(Euro)" in an editor which supports
>> unicode/UTF-8 and does the erlang tool chain support this?
>>
>> Will "erlc foo.erl" automatically detect that foo.erl is unicode
>> encoded and do the right thing when scanning and tokenising strings?
>>
>> Answer: I don't know?
>>
>> Question 4) Can string literals be improved on?
>>
>> I hope so -- In Html I can say (I hope)€
>>
>> I'd like to say:
>>
>> X = "10€" in Erlang
>>
>> People who know far more about this than I do can tell me if this
>> is OK
>>
>>
>> ----------------------------------------------------------------------
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>
> --
> Erlang Solutions Ltd.
> http://www.erlang-solutions.com
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
More information about the erlang-questions
mailing list