[erlang-questions] cookbook entry #1 - unicode/UTF-8 strings
Francesco Cesarini
francesco@REDACTED
Thu Oct 20 10:42:56 CEST 2011
Joe,
it would be great if what has already been done on an earlier version of
the Erlang cookbook could be reviewed and integrated:
http://trapexit.com/Category:CookBook
It has 89 articles in 9 categories.
Rgds,
Francesco
On 19/10/2011 11:14, Joe Armstrong wrote:
> cookbook # 1 - draft 1
>
> <aside>
> We're going to write a cookbook.
>
> This will be free (in an electronic version, PDF, epub)
> And you will be able to buy a paper version (POD)
>
> The development model is
>
> - a few authors
> - many reviewers (you are the reviewers)
> the reviewers report errors/suggest changes
> the authors make the changes
>
> The POD version we hope will generate some income
> this will be split according to the contributions. Authors
> will be paid as will reviewers whose suggestions are incorporated.
>
> Payment (if we make a profit) will be in direct relation to the size
> of the contribution
>
> Expensive things like professional proof reading, will be
> sponsorship, or crowd sourced, or otherwise financed.
>
> To start the ball rolling I have some text below.
>
> Please comment on this text. If your comments are accepted one day you
> might get paid :-)
>
> Note: 1) By commenting you are implicitly agreeing that if your comments
> are accepted into the final text then you will be subject to the
> licensing conditions of that text. The text will always be free and
> open source.
>
> </aside>
>
> Cookbook Question:
>
> I have often seen the words "UTF-8 string" used in sentences like
> "Java has UTF-8 strings". What does this mean when applied to Erlang?
>
> ----------------------------------------------------------------------
>
> Answer:
>
> In Erlang strings are syntactic sugar for "lists of integers"
>
> Imagine the string "10(Euro)" - (Euro) is the glyph representing the
> Euro currency symbol.
>
> The term "UF8-string" representing "10(euro)" in Erlang could
> mean one of two things:
>
> Either a) [49,48,8364] (ie its a list of three unicode integers)
> Or b) [49,48,226,130,172] (ie its the UTF-8 encoding of the
> unicode characters)
>
> The so words "UTF-8" string might mean a) or might mean b)
>
> Erlang folks have always said "unicode/UTF-8 is easy in Erlang
> since strings are just lists of integers" - by this we mean that
> Erlang programs should always manipulate strings given the type a)
> interpretation. *all* library functions assume type a) encoding.
>
> The type b) interpretation only has meaning when you write data to a
> file etc. and should be as invisible to the user as possible (but when
> things go wrong and you get the wrong character printed you need to
> understand the difference)
>
> Question 1) How can we get a unicode characters into a list item?
> or what does a string literal look like?
>
> > X = "10\x{20ac}"
> [49,48,8364]
>
> This is not described in my book since the change came after the
> book was published (is it in the other Erlang books yet?)
>
> Question 2) How can we convert between representations a) and b) above?
>
> Easy - though one has to dig in the documentation a bit.
>
> > B = unicode:characters_to_binary(X, unicode, utf8).
> <<49,48,226,130,172>>
> > unicode:characters_to_list(B).
> [49,48,8364]
>
> Question 3) Can I write "10(Euro)" in an editor which supports
> unicode/UTF-8 and does the erlang tool chain support this?
>
> Will "erlc foo.erl" automatically detect that foo.erl is unicode
> encoded and do the right thing when scanning and tokenising strings?
>
> Answer: I don't know?
>
> Question 4) Can string literals be improved on?
>
> I hope so -- In Html I can say (I hope)€
>
> I'd like to say:
>
> X = "10€" in Erlang
>
> People who know far more about this than I do can tell me if this
> is OK
>
>
> ----------------------------------------------------------------------
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
--
Erlang Solutions Ltd.
http://www.erlang-solutions.com
More information about the erlang-questions
mailing list