[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

Fri Jan 13 23:34:55 CET 2017

I fully agree there are no languages that deal with strings perfectly. That said there are those that are better at it and those that aren't so good. A language, where I need to look for a library to upcase or downcase my own name, fits into the second group in my book.

Michał.

On 13 Jan 2017, 13:20 +0100, Jesper Louis Andersen <jesper.louis.andersen@REDACTED>, wrote:
> Richard is indeed right, depending on what your definition of "String" is.
>
> If a "String" is "An array of characters from some alphabet", then you need to take into account Strings are Unicode codepoints in practice. This is also the most precise definition from a technical point of view.
>
> When I wrote my post, I was--probably incorrectly--assuming the older notion of a "String" where the representation is either ASCII or something like ISO-8859-15. In this case, a string coincides with a stream of bytes.
>
> Data needs parsing. A lot of data comes in as some kind of stringy representation: UTF-8, byte array (binary), and so on.
>
> And of course, that isn't the whole story, since there are examples of input which are not string-like in their forms.
>
>
> > On Fri, Jan 13, 2017 at 2:34 AM Richard A. O'Keefe <ok@REDACTED> wrote:
> > >
> > >
> > > On 13/01/17 8:56 AM, Jesper Louis Andersen wrote:
> > > > Strings are really just streams of bytes.
> > >
> > > That was true a long time ago.  Maybe.
> > > But it isn't anywhere near accurate as a description
> > > of Unicode:
> > >   - Unicode is made of 21-bit code points, not bytes.
> > >   - Most possible code points are not defined.
> > >   - Some of those that are defined are defined as
> > >     "it is illegal to use this".
> > >   - Unicode sequences have *structure*; it is simply
> > >     not the case that every sequence of allowable
> > >     Unicode code points is a legal Unicode string.
> > >   - As a special case of that, if s is a non-empty
> > >     valid Unicode string, it is not true that every
> > >     substring of s is a valid Unicode string.
> > >
> > > In case you were thinking of UTF-8, not all byte
> > > sequences are valid UTF-8.
> > >
> > > Byte streams are as important as you say, but it's
> > > really hard to see the software for a radar or a
> > > radio telescope as processing strings...
> > >
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170113/2e2b282f/attachment.htm>