[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead
Jesper Louis Andersen
Fri Jan 13 13:20:05 CET 2017
Richard is indeed right, depending on what your definition of "String" is.
If a "String" is "An array of characters from some alphabet", then you need
to take into account Strings are Unicode codepoints in practice. This is
also the most precise definition from a technical point of view.
When I wrote my post, I was--probably incorrectly--assuming the older
notion of a "String" where the representation is either ASCII or something
like ISO-8859-15. In this case, a string coincides with a stream of bytes.
Data needs parsing. A lot of data comes in as some kind of stringy
representation: UTF-8, byte array (binary), and so on.
And of course, that isn't the whole story, since there are examples of
input which are not string-like in their forms.
On Fri, Jan 13, 2017 at 2:34 AM Richard A. O'Keefe <ok@REDACTED>
> On 13/01/17 8:56 AM, Jesper Louis Andersen wrote:
> > Strings are really just streams of bytes.
> That was true a long time ago. Maybe.
> But it isn't anywhere near accurate as a description
> of Unicode:
> - Unicode is made of 21-bit code points, not bytes.
> - Most possible code points are not defined.
> - Some of those that are defined are defined as
> "it is illegal to use this".
> - Unicode sequences have *structure*; it is simply
> not the case that every sequence of allowable
> Unicode code points is a legal Unicode string.
> - As a special case of that, if s is a non-empty
> valid Unicode string, it is not true that every
> substring of s is a valid Unicode string.
> In case you were thinking of UTF-8, not all byte
> sequences are valid UTF-8.
> Byte streams are as important as you say, but it's
> really hard to see the software for a radar or a
> radio telescope as processing strings...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions