[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

Sat Jan 14 15:18:40 CET 2017

Indeed, unicode upercase/lowercsase is one of the most essential features
of string which don't exist yet in erlang stdlib. I'm aware about problems
with some letters and scripts, such as german SS or turkish I, but still
having upper/lower in stdlib is the must, IMO. The problem is that
uppercase/lowercase would require support of unicode normalization.

2017-01-14 1:34 GMT+03:00 Michał Muskała <michal@REDACTED>:

> I fully agree there are no languages that deal with strings perfectly.
> That said there are those that are better at it and those that aren't so
> good. A language, where I need to look for a library to upcase or downcase
> my own name, fits into the second group in my book.
>
> Michał.
>
> On 13 Jan 2017, 13:20 +0100, Jesper Louis Andersen <
> jesper.louis.andersen@REDACTED>, wrote:
>
> Richard is indeed right, depending on what your definition of "String" is.
>
> If a "String" is "An array of characters from some alphabet", then you
> need to take into account Strings are Unicode codepoints in practice. This
> is also the most precise definition from a technical point of view.
>
> When I wrote my post, I was--probably incorrectly--assuming the older
> notion of a "String" where the representation is either ASCII or something
> like ISO-8859-15. In this case, a string coincides with a stream of bytes.
>
> Data needs parsing. A lot of data comes in as some kind of stringy
> representation: UTF-8, byte array (binary), and so on.
>
> And of course, that isn't the whole story, since there are examples of
> input which are not string-like in their forms.
>
>
> On Fri, Jan 13, 2017 at 2:34 AM Richard A. O'Keefe <ok@REDACTED>
> wrote:
>
>>
>>
>> On 13/01/17 8:56 AM, Jesper Louis Andersen wrote:
>> > Strings are really just streams of bytes.
>>
>> That was true a long time ago.  Maybe.
>> But it isn't anywhere near accurate as a description
>> of Unicode:
>>   - Unicode is made of 21-bit code points, not bytes.
>>   - Most possible code points are not defined.
>>   - Some of those that are defined are defined as
>>     "it is illegal to use this".
>>   - Unicode sequences have *structure*; it is simply
>>     not the case that every sequence of allowable
>>     Unicode code points is a legal Unicode string.
>>   - As a special case of that, if s is a non-empty
>>     valid Unicode string, it is not true that every
>>     substring of s is a valid Unicode string.
>>
>> In case you were thinking of UTF-8, not all byte
>> sequences are valid UTF-8.
>>
>> Byte streams are as important as you say, but it's
>> really hard to see the software for a radar or a
>> radio telescope as processing strings...
>>
>> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170114/4b8e08ae/attachment.htm>