<div dir="ltr">Indeed, unicode upercase/lowercsase is one of the most essential features of string which don't exist yet in erlang stdlib. I'm aware about problems with some letters and scripts, such as german SS or turkish I, but still having upper/lower in stdlib is the must, IMO. The problem is that uppercase/lowercase would require support of unicode normalization.</div><div class="gmail_extra"><br><div class="gmail_quote">2017-01-14 1:34 GMT+03:00 Michał Muskała <span dir="ltr"><<a href="mailto:michal@muskala.eu" target="_blank">michal@muskala.eu</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>

<div name="messageBodySection" style="font-size:14px;font-family:-apple-system,BlinkMacSystemFont,sans-serif">

<p style="margin:0px;line-height:normal;font-family:'Helvetica Neue';color:rgb(51,51,51)">I fully agree there are no languages that deal with strings perfectly. That said there are those that are better at it and those that aren't so good. A language, where I need to look for a library to upcase or downcase my own name, fits into the second group in my book.</p><span class="HOEnZb"><font color="#888888">

</font></span></div><span class="HOEnZb"><font color="#888888">

<div name="messageSignatureSection" style="font-size:14px;font-family:-apple-system,BlinkMacSystemFont,sans-serif"><br>

Michał.</div>

</font></span><div name="messageReplySection" style="font-size:14px;font-family:-apple-system,BlinkMacSystemFont,sans-serif"><div><div class="h5"><br>

On 13 Jan 2017, 13:20 +0100, Jesper Louis Andersen <<a href="mailto:jesper.louis.andersen@gmail.com" target="_blank">jesper.louis.andersen@gmail.<wbr>com</a>>, wrote:<br>

</div></div><blockquote type="cite" style="margin:5px 5px;padding-left:10px;border-left:thin solid #1abc9c"><div><div class="h5">

<div dir="ltr">

<div>

<div>

<div>

<div>Richard is indeed right, depending on what your definition of "String" is.<br>

<br></div>

If a "String" is "An array of characters from some alphabet", then you need to take into account Strings are Unicode codepoints in practice. This is also the most precise definition from a technical point of view.<br>

<br></div>

When I wrote my post, I was--probably incorrectly--assuming the older notion of a "String" where the representation is either ASCII or something like ISO-8859-15. In this case, a string coincides with a stream of bytes.<br>

<br></div>

Data needs parsing. A lot of data comes in as some kind of stringy representation: UTF-8, byte array (binary), and so on.<br>

<br></div>

And of course, that isn't the whole story, since there are examples of input which are not string-like in their forms.<br>

<br></div>

<br>

<div class="gmail_quote">

<div dir="ltr">On Fri, Jan 13, 2017 at 2:34 AM Richard A. O'Keefe <<a href="mailto:ok@cs.otago.ac.nz" target="_blank">ok@cs.otago.ac.nz</a>> wrote:<br></div>

<blockquote class="gmail_quote" style="margin:5px 5px;padding-left:10px;border-left:thin solid #e67e22"><br class="m_-4967730909517647241gmail_msg">

<br class="m_-4967730909517647241gmail_msg">

On 13/01/17 8:56 AM, Jesper Louis Andersen wrote:<br class="m_-4967730909517647241gmail_msg">

> Strings are really just streams of bytes.<br class="m_-4967730909517647241gmail_msg">

<br class="m_-4967730909517647241gmail_msg">

That was true a long time ago.  Maybe.<br class="m_-4967730909517647241gmail_msg">

But it isn't anywhere near accurate as a description<br class="m_-4967730909517647241gmail_msg">

of Unicode:<br class="m_-4967730909517647241gmail_msg">

  - Unicode is made of 21-bit code points, not bytes.<br class="m_-4967730909517647241gmail_msg">

  - Most possible code points are not defined.<br class="m_-4967730909517647241gmail_msg">

  - Some of those that are defined are defined as<br class="m_-4967730909517647241gmail_msg">

    "it is illegal to use this".<br class="m_-4967730909517647241gmail_msg">

  - Unicode sequences have *structure*; it is simply<br class="m_-4967730909517647241gmail_msg">

    not the case that every sequence of allowable<br class="m_-4967730909517647241gmail_msg">

    Unicode code points is a legal Unicode string.<br class="m_-4967730909517647241gmail_msg">

  - As a special case of that, if s is a non-empty<br class="m_-4967730909517647241gmail_msg">

    valid Unicode string, it is not true that every<br class="m_-4967730909517647241gmail_msg">

    substring of s is a valid Unicode string.<br class="m_-4967730909517647241gmail_msg">

<br class="m_-4967730909517647241gmail_msg">

In case you were thinking of UTF-8, not all byte<br class="m_-4967730909517647241gmail_msg">

sequences are valid UTF-8.<br class="m_-4967730909517647241gmail_msg">

<br class="m_-4967730909517647241gmail_msg">

Byte streams are as important as you say, but it's<br class="m_-4967730909517647241gmail_msg">

really hard to see the software for a radar or a<br class="m_-4967730909517647241gmail_msg">

radio telescope as processing strings...<br class="m_-4967730909517647241gmail_msg">

<br class="m_-4967730909517647241gmail_msg"></blockquote>

</div></div></div><span class="">

______________________________<wbr>_________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/<wbr>listinfo/erlang-questions</a><br></span></blockquote>

</div>

</div>


<br>______________________________<wbr>_________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/<wbr>listinfo/erlang-questions</a><br>

<br></blockquote></div><br></div>