[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

Sat Jan 14 17:14:58 CET 2017

Nif would make erlang distribution way less portable than it is now. At the
moment erlang compiled on with static openss can be moven just by copying
between most modern linux distros of the same bits. ICU is a C++ lib,
anything with linkage against libstdc++ is not portable, new versions of
GCC break compatibility of libstdc++ quite often.

2017-01-14 19:06 GMT+03:00 Benoit Chesneau <bchesneau@REDACTED>:

>
>
> On Sat, Jan 14, 2017 at 4:53 PM Oliver Korpilla <Oliver.Korpilla@REDACTED>
> wrote:
>
>> Could the Unicode support in elixir serve as a starting point?
>>
>> https://hexdocs.pm/elixir/1.3.3/String.html#content
>>
>> String.upcase/1 and String.downcase/1 seem to be Unicode-aware. And a lot
>> of effort seems have gone in scenarios like this:
>>
>> "For example, the codepoint “é” is two bytes:
>>
>> iex> byte_size("é")
>> 2"
>>
>> Given that both Erlang and elixir are implemented on top of BEAM, the
>> wheel might not need reinventing? I know engineers and programmers love
>> inventing stuff, and this discussion seems to point in that direction,
>> but...
>>
>> Cheers,
>> Oliver
>>
>>
>
> If I remember correctly the unicode support of Elixir is written in elixir
> and data come from the unicode/icu projects. data resources (codepoints and
> so on ) are compiled as beam. (I do the dame in my idna lib).
>
> The work may be simpler in using/wrting a nif over the well supported ICU
> lib thoug. I'm curious about the reasonning that conducted to the current
> implementation in elixir.
>
> - benoit
>
>
>
>>
>>
>> Gesendet: Freitag, 13. Januar 2017 um 23:34 Uhr
>> Von: "Michał Muskała" <michal@REDACTED>
>> An: "Richard A. O'Keefe" <ok@REDACTED>, "Steve Davis" <
>> steven.charles.davis@REDACTED>, g@REDACTED, "Jesper Louis Andersen" <
>> jesper.louis.andersen@REDACTED>
>> Cc: "Erlang Questions" <erlang-questions@REDACTED>
>> Betreff: Re: [erlang-questions] Erlang basic doubts about String, message
>> passing and context switching overhead
>>
>> I fully agree there are no languages that deal with strings perfectly.
>> That said there are those that are better at it and those that aren't so
>> good. A language, where I need to look for a library to upcase or downcase
>> my own name, fits into the second group in my book.
>>
>> Michał.
>> On 13 Jan 2017, 13:20 +0100, Jesper Louis Andersen <
>> jesper.louis.andersen@REDACTED>, wrote:
>>
>> Richard is indeed right, depending on what your definition of "String" is.
>>  If a "String" is "An array of characters from some alphabet", then you
>> need to take into account Strings are Unicode codepoints in practice. This
>> is also the most precise definition from a technical point of view.
>>  When I wrote my post, I was--probably incorrectly--assuming the older
>> notion of a "String" where the representation is either ASCII or something
>> like ISO-8859-15. In this case, a string coincides with a stream of bytes.
>>  Data needs parsing. A lot of data comes in as some kind of stringy
>> representation: UTF-8, byte array (binary), and so on.
>>  And of course, that isn't the whole story, since there are examples of
>> input which are not string-like in their forms.
>>
>>
>> On Fri, Jan 13, 2017 at 2:34 AM Richard A. O'Keefe <ok@REDACTED
>> [mailto:ok@REDACTED]> wrote:
>>
>> On 13/01/17 8:56 AM, Jesper Louis Andersen wrote:
>> > Strings are really just streams of bytes.
>>
>> That was true a long time ago.  Maybe.
>> But it isn't anywhere near accurate as a description
>> of Unicode:
>>   - Unicode is made of 21-bit code points, not bytes.
>>   - Most possible code points are not defined.
>>   - Some of those that are defined are defined as
>>     "it is illegal to use this".
>>   - Unicode sequences have *structure*; it is simply
>>     not the case that every sequence of allowable
>>     Unicode code points is a legal Unicode string.
>>   - As a special case of that, if s is a non-empty
>>     valid Unicode string, it is not true that every
>>     substring of s is a valid Unicode string.
>>
>> In case you were thinking of UTF-8, not all byte
>> sequences are valid UTF-8.
>>
>> Byte streams are as important as you say, but it's
>> really hard to see the software for a radar or a
>> radio telescope as processing strings...
>>  _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions_____
>> __________________________________________ erlang-questions mailing list
>> erlang-questions@REDACTED http://erlang.org/mailman/
>> listinfo/erlang-questions[http://erlang.org/mailman/
>> listinfo/erlang-questions]
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170114/71d66c1e/attachment.htm>