[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

Richard A. O'Keefe ok@REDACTED
Fri Jan 13 02:33:56 CET 2017



On 13/01/17 8:56 AM, Jesper Louis Andersen wrote:
> Strings are really just streams of bytes.

That was true a long time ago.  Maybe.
But it isn't anywhere near accurate as a description
of Unicode:
  - Unicode is made of 21-bit code points, not bytes.
  - Most possible code points are not defined.
  - Some of those that are defined are defined as
    "it is illegal to use this".
  - Unicode sequences have *structure*; it is simply
    not the case that every sequence of allowable
    Unicode code points is a legal Unicode string.
  - As a special case of that, if s is a non-empty
    valid Unicode string, it is not true that every
    substring of s is a valid Unicode string.

In case you were thinking of UTF-8, not all byte
sequences are valid UTF-8.

Byte streams are as important as you say, but it's
really hard to see the software for a radar or a
radio telescope as processing strings...




More information about the erlang-questions mailing list