<div dir="ltr"><div><div><div><div>Richard is indeed right, depending on what your definition of "String" is.<br><br></div>If a "String" is "An array of characters from some alphabet", then you need to take into account Strings are Unicode codepoints in practice. This is also the most precise definition from a technical point of view.<br><br></div>When I wrote my post, I was--probably incorrectly--assuming the older notion of a "String" where the representation is either ASCII or something like ISO-8859-15. In this case, a string coincides with a stream of bytes.<br><br></div>Data needs parsing. A lot of data comes in as some kind of stringy representation: UTF-8, byte array (binary), and so on.<br><br></div>And of course, that isn't the whole story, since there are examples of input which are not string-like in their forms.<br><br></div><br><div class="gmail_quote"><div dir="ltr">On Fri, Jan 13, 2017 at 2:34 AM Richard A. O'Keefe <<a href="mailto:ok@cs.otago.ac.nz">ok@cs.otago.ac.nz</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">
<br class="gmail_msg">
On 13/01/17 8:56 AM, Jesper Louis Andersen wrote:<br class="gmail_msg">
> Strings are really just streams of bytes.<br class="gmail_msg">
<br class="gmail_msg">
That was true a long time ago. Maybe.<br class="gmail_msg">
But it isn't anywhere near accurate as a description<br class="gmail_msg">
of Unicode:<br class="gmail_msg">
- Unicode is made of 21-bit code points, not bytes.<br class="gmail_msg">
- Most possible code points are not defined.<br class="gmail_msg">
- Some of those that are defined are defined as<br class="gmail_msg">
"it is illegal to use this".<br class="gmail_msg">
- Unicode sequences have *structure*; it is simply<br class="gmail_msg">
not the case that every sequence of allowable<br class="gmail_msg">
Unicode code points is a legal Unicode string.<br class="gmail_msg">
- As a special case of that, if s is a non-empty<br class="gmail_msg">
valid Unicode string, it is not true that every<br class="gmail_msg">
substring of s is a valid Unicode string.<br class="gmail_msg">
<br class="gmail_msg">
In case you were thinking of UTF-8, not all byte<br class="gmail_msg">
sequences are valid UTF-8.<br class="gmail_msg">
<br class="gmail_msg">
Byte streams are as important as you say, but it's<br class="gmail_msg">
really hard to see the software for a radar or a<br class="gmail_msg">
radio telescope as processing strings...<br class="gmail_msg">
<br class="gmail_msg">
</blockquote></div>