[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

Sat Jan 14 15:56:21 CET 2017

We need support for locales before we can do proper operations on text. 
Just Unicode isn't enough.

On 01/14/2017 03:18 PM, John Doe wrote:
> Indeed, unicode upercase/lowercsase is one of the most essential
> features of string which don't exist yet in erlang stdlib. I'm aware
> about problems with some letters and scripts, such as german SS or
> turkish I, but still having upper/lower in stdlib is the must, IMO. The
> problem is that uppercase/lowercase would require support of unicode
> normalization.
>
> 2017-01-14 1:34 GMT+03:00 Michał Muskała <michal@REDACTED
> <mailto:michal@REDACTED>>:
>
>     I fully agree there are no languages that deal with strings
>     perfectly. That said there are those that are better at it and those
>     that aren't so good. A language, where I need to look for a library
>     to upcase or downcase my own name, fits into the second group in my
>     book.
>
>
>     Michał.
>
>     On 13 Jan 2017, 13:20 +0100, Jesper Louis Andersen
>     <jesper.louis.andersen@REDACTED
>     <mailto:jesper.louis.andersen@REDACTED>>, wrote:
>>     Richard is indeed right, depending on what your definition of
>>     "String" is.
>>
>>     If a "String" is "An array of characters from some alphabet", then
>>     you need to take into account Strings are Unicode codepoints in
>>     practice. This is also the most precise definition from a
>>     technical point of view.
>>
>>     When I wrote my post, I was--probably incorrectly--assuming the
>>     older notion of a "String" where the representation is either
>>     ASCII or something like ISO-8859-15. In this case, a string
>>     coincides with a stream of bytes.
>>
>>     Data needs parsing. A lot of data comes in as some kind of stringy
>>     representation: UTF-8, byte array (binary), and so on.
>>
>>     And of course, that isn't the whole story, since there are
>>     examples of input which are not string-like in their forms.
>>
>>
>>     On Fri, Jan 13, 2017 at 2:34 AM Richard A. O'Keefe
>>     <ok@REDACTED <mailto:ok@REDACTED>> wrote:
>>
>>
>>
>>         On 13/01/17 8:56 AM, Jesper Louis Andersen wrote:
>>         > Strings are really just streams of bytes.
>>
>>         That was true a long time ago.  Maybe.
>>         But it isn't anywhere near accurate as a description
>>         of Unicode:
>>           - Unicode is made of 21-bit code points, not bytes.
>>           - Most possible code points are not defined.
>>           - Some of those that are defined are defined as
>>             "it is illegal to use this".
>>           - Unicode sequences have *structure*; it is simply
>>             not the case that every sequence of allowable
>>             Unicode code points is a legal Unicode string.
>>           - As a special case of that, if s is a non-empty
>>             valid Unicode string, it is not true that every
>>             substring of s is a valid Unicode string.
>>
>>         In case you were thinking of UTF-8, not all byte
>>         sequences are valid UTF-8.
>>
>>         Byte streams are as important as you say, but it's
>>         really hard to see the software for a radar or a
>>         radio telescope as processing strings...
>>
>>     _______________________________________________
>>     erlang-questions mailing list
>>     erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>>     http://erlang.org/mailman/listinfo/erlang-questions
>>     <http://erlang.org/mailman/listinfo/erlang-questions>
>
>     _______________________________________________
>     erlang-questions mailing list
>     erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>     http://erlang.org/mailman/listinfo/erlang-questions
>     <http://erlang.org/mailman/listinfo/erlang-questions>
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>

-- 
Loïc Hoguin
https://ninenines.eu