[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

Ilya Khaprov <>
Sat Jan 14 17:00:52 CET 2017


>> Given that both Erlang and elixir are implemented on top of BEAM, the wheel might not need reinventing?



Why Elixir implements Unicode in Elixir? You have to rewrite it anyway.



Ilya



From: Oliver Korpilla<mailto:>
Sent: Saturday, January 14, 2017 06:53 PM
To: Michał Muskała<mailto:>
Cc: Erlang Questions<mailto:>
Subject: Re: [erlang-questions] Erlang basic doubts about String, message passing and context switching overhead



Could the Unicode support in elixir serve as a starting point?

https://hexdocs.pm/elixir/1.3.3/String.html#content

String.upcase/1 and String.downcase/1 seem to be Unicode-aware. And a lot of effort seems have gone in scenarios like this:

"For example, the codepoint “é” is two bytes:

iex> byte_size("é")
2"

Given that both Erlang and elixir are implemented on top of BEAM, the wheel might not need reinventing? I know engineers and programmers love inventing stuff, and this discussion seems to point in that direction, but...

Cheers,
Oliver



Gesendet: Freitag, 13. Januar 2017 um 23:34 Uhr
Von: "Michał Muskała" <>
An: "Richard A. O'Keefe" <>, "Steve Davis" <>, , "Jesper Louis Andersen" <>
Cc: "Erlang Questions" <>
Betreff: Re: [erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

I fully agree there are no languages that deal with strings perfectly. That said there are those that are better at it and those that aren't so good. A language, where I need to look for a library to upcase or downcase my own name, fits into the second group in my book.

Michał.
On 13 Jan 2017, 13:20 +0100, Jesper Louis Andersen <>, wrote:

Richard is indeed right, depending on what your definition of "String" is.
 If a "String" is "An array of characters from some alphabet", then you need to take into account Strings are Unicode codepoints in practice. This is also the most precise definition from a technical point of view.
 When I wrote my post, I was--probably incorrectly--assuming the older notion of a "String" where the representation is either ASCII or something like ISO-8859-15. In this case, a string coincides with a stream of bytes.
 Data needs parsing. A lot of data comes in as some kind of stringy representation: UTF-8, byte array (binary), and so on.
 And of course, that isn't the whole story, since there are examples of input which are not string-like in their forms.


On Fri, Jan 13, 2017 at 2:34 AM Richard A. O'Keefe <[mailto:]> wrote:

On 13/01/17 8:56 AM, Jesper Louis Andersen wrote:
> Strings are really just streams of bytes.

That was true a long time ago.  Maybe.
But it isn't anywhere near accurate as a description
of Unicode:
  - Unicode is made of 21-bit code points, not bytes.
  - Most possible code points are not defined.
  - Some of those that are defined are defined as
    "it is illegal to use this".
  - Unicode sequences have *structure*; it is simply
    not the case that every sequence of allowable
    Unicode code points is a legal Unicode string.
  - As a special case of that, if s is a non-empty
    valid Unicode string, it is not true that every
    substring of s is a valid Unicode string.

In case you were thinking of UTF-8, not all byte
sequences are valid UTF-8.

Byte streams are as important as you say, but it's
really hard to see the software for a radar or a
radio telescope as processing strings...
 _______________________________________________
erlang-questions mailing list

http://erlang.org/mailman/listinfo/erlang-questions_______________________________________________ erlang-questions mailing list  http://erlang.org/mailman/listinfo/erlang-questions[http://erlang.org/mailman/listinfo/erlang-questions]
_______________________________________________
erlang-questions mailing list

http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170114/cd62644f/attachment.html>


More information about the erlang-questions mailing list