[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

John Doe donpedrothird@REDACTED
Sat Jan 14 17:22:56 CET 2017


Also as far as I remember icu works with utf16, so any call to icu requires
encoding binary, which is usually usually utf8, to utf16 and then encoding
it back to utf8 binary.

2017-01-14 19:06 GMT+03:00 Benoit Chesneau <bchesneau@REDACTED>:

>
>
> On Sat, Jan 14, 2017 at 4:53 PM Oliver Korpilla <Oliver.Korpilla@REDACTED>
> wrote:
>
>> Could the Unicode support in elixir serve as a starting point?
>>
>> https://hexdocs.pm/elixir/1.3.3/String.html#content
>>
>> String.upcase/1 and String.downcase/1 seem to be Unicode-aware. And a lot
>> of effort seems have gone in scenarios like this:
>>
>> "For example, the codepoint “é” is two bytes:
>>
>> iex> byte_size("é")
>> 2"
>>
>> Given that both Erlang and elixir are implemented on top of BEAM, the
>> wheel might not need reinventing? I know engineers and programmers love
>> inventing stuff, and this discussion seems to point in that direction,
>> but...
>>
>> Cheers,
>> Oliver
>>
>>
>
> If I remember correctly the unicode support of Elixir is written in elixir
> and data come from the unicode/icu projects. data resources (codepoints and
> so on ) are compiled as beam. (I do the dame in my idna lib).
>
> The work may be simpler in using/wrting a nif over the well supported ICU
> lib thoug. I'm curious about the reasonning that conducted to the current
> implementation in elixir.
>
> - benoit
>
>
>
>>
>>
>> Gesendet: Freitag, 13. Januar 2017 um 23:34 Uhr
>> Von: "Michał Muskała" <michal@REDACTED>
>> An: "Richard A. O'Keefe" <ok@REDACTED>, "Steve Davis" <
>> steven.charles.davis@REDACTED>, g@REDACTED, "Jesper Louis Andersen" <
>> jesper.louis.andersen@REDACTED>
>> Cc: "Erlang Questions" <erlang-questions@REDACTED>
>> Betreff: Re: [erlang-questions] Erlang basic doubts about String, message
>> passing and context switching overhead
>>
>> I fully agree there are no languages that deal with strings perfectly.
>> That said there are those that are better at it and those that aren't so
>> good. A language, where I need to look for a library to upcase or downcase
>> my own name, fits into the second group in my book.
>>
>> Michał.
>> On 13 Jan 2017, 13:20 +0100, Jesper Louis Andersen <
>> jesper.louis.andersen@REDACTED>, wrote:
>>
>> Richard is indeed right, depending on what your definition of "String" is.
>>  If a "String" is "An array of characters from some alphabet", then you
>> need to take into account Strings are Unicode codepoints in practice. This
>> is also the most precise definition from a technical point of view.
>>  When I wrote my post, I was--probably incorrectly--assuming the older
>> notion of a "String" where the representation is either ASCII or something
>> like ISO-8859-15. In this case, a string coincides with a stream of bytes.
>>  Data needs parsing. A lot of data comes in as some kind of stringy
>> representation: UTF-8, byte array (binary), and so on.
>>  And of course, that isn't the whole story, since there are examples of
>> input which are not string-like in their forms.
>>
>>
>> On Fri, Jan 13, 2017 at 2:34 AM Richard A. O'Keefe <ok@REDACTED
>> [mailto:ok@REDACTED]> wrote:
>>
>> On 13/01/17 8:56 AM, Jesper Louis Andersen wrote:
>> > Strings are really just streams of bytes.
>>
>> That was true a long time ago.  Maybe.
>> But it isn't anywhere near accurate as a description
>> of Unicode:
>>   - Unicode is made of 21-bit code points, not bytes.
>>   - Most possible code points are not defined.
>>   - Some of those that are defined are defined as
>>     "it is illegal to use this".
>>   - Unicode sequences have *structure*; it is simply
>>     not the case that every sequence of allowable
>>     Unicode code points is a legal Unicode string.
>>   - As a special case of that, if s is a non-empty
>>     valid Unicode string, it is not true that every
>>     substring of s is a valid Unicode string.
>>
>> In case you were thinking of UTF-8, not all byte
>> sequences are valid UTF-8.
>>
>> Byte streams are as important as you say, but it's
>> really hard to see the software for a radar or a
>> radio telescope as processing strings...
>>  _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions_____
>> __________________________________________ erlang-questions mailing list
>> erlang-questions@REDACTED http://erlang.org/mailman/
>> listinfo/erlang-questions[http://erlang.org/mailman/
>> listinfo/erlang-questions]
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170114/3d269102/attachment.htm>


More information about the erlang-questions mailing list