[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

Sat Jan 14 21:33:46 CET 2017

Or better yet, a bif :)

--

  Tristan Sloughter

  "I am not a crackpot" - Abe Simpson

  t@REDACTED

On Sat, Jan 14, 2017, at 08:06 AM, Benoit Chesneau wrote:

> 

> 

> On Sat, Jan 14, 2017 at 4:53 PM Oliver Korpilla
> <Oliver.Korpilla@REDACTED> wrote:
>> Could the Unicode support in elixir serve as a starting point?

>> 

>> https://hexdocs.pm/elixir/1.3.3/String.html#content

>> 

>>  String.upcase/1 and String.downcase/1 seem to be Unicode-aware. And
>>  a lot of effort seems have gone in scenarios like this:
>> 

>>  "For example, the codepoint “é” is two bytes:

>> 

>>  iex> byte_size("é")

>>  2"

>> 

>>  Given that both Erlang and elixir are implemented on top of BEAM,
>>  the wheel might not need reinventing? I know engineers and
>>  programmers love inventing stuff, and this discussion seems to point
>>  in that direction, but...
>> 

>>  Cheers,

>>  Oliver

>>   

> 

> If I remember correctly the unicode support of Elixir is written in
> elixir and data come from the unicode/icu projects. data resources
> (codepoints and so on ) are compiled as beam. (I do the dame in my
> idna lib).
> The work may be simpler in using/wrting a nif over the well supported
> ICU lib thoug. I'm curious about the reasonning that conducted to the
> current implementation in elixir.
> - benoit

> 

>  

>>  

>> 

>>  Gesendet: Freitag, 13. Januar 2017 um 23:34 Uhr

>>  Von: "Michał Muskała" <michal@REDACTED>

>>  An: "Richard A. O'Keefe" <ok@REDACTED>, "Steve Davis"
>>  <steven.charles.davis@REDACTED>, g@REDACTED, "Jesper Louis Andersen"
>>  <jesper.louis.andersen@REDACTED>
>>  Cc: "Erlang Questions" <erlang-questions@REDACTED>

>>  Betreff: Re: [erlang-questions] Erlang basic doubts about String,
>>  message passing and context switching overhead
>> 

>>  I fully agree there are no languages that deal with strings
>>  perfectly. That said there are those that are better at it and
>>  those that aren't so good. A language, where I need to look for a
>>  library to upcase or downcase my own name, fits into the second
>>  group in my book.
>> 

>>  Michał.

>>  On 13 Jan 2017, 13:20 +0100, Jesper Louis Andersen
>>  <jesper.louis.andersen@REDACTED>, wrote:
>> 

>>  Richard is indeed right, depending on what your definition of
>>  "String" is.
>>   If a "String" is "An array of characters from some alphabet", then
>>   you need to take into account Strings are Unicode codepoints in
>>   practice. This is also the most precise definition from a technical
>>   point of view.
>>   When I wrote my post, I was--probably incorrectly--assuming the
>>   older notion of a "String" where the representation is either ASCII
>>   or something like ISO-8859-15. In this case, a string coincides
>>   with a stream of bytes.
>>   Data needs parsing. A lot of data comes in as some kind of stringy
>>   representation: UTF-8, byte array (binary), and so on.
>>   And of course, that isn't the whole story, since there are examples
>>   of input which are not string-like in their forms.
>>    

>> 

>>  On Fri, Jan 13, 2017 at 2:34 AM Richard A. O'Keefe
>>  <ok@REDACTED[mailto:ok@REDACTED]> wrote:
>> 

>>  On 13/01/17 8:56 AM, Jesper Louis Andersen wrote:

>>  > Strings are really just streams of bytes.

>> 

>>  That was true a long time ago.  Maybe.

>>  But it isn't anywhere near accurate as a description

>>  of Unicode:

>>    - Unicode is made of 21-bit code points, not bytes.

>>    - Most possible code points are not defined.

>>    - Some of those that are defined are defined as

>>      "it is illegal to use this".

>>    - Unicode sequences have *structure*; it is simply

>>      not the case that every sequence of allowable

>>      Unicode code points is a legal Unicode string.

>>    - As a special case of that, if s is a non-empty

>>      valid Unicode string, it is not true that every

>>      substring of s is a valid Unicode string.

>> 

>>  In case you were thinking of UTF-8, not all byte

>>  sequences are valid UTF-8.

>> 

>>  Byte streams are as important as you say, but it's

>>  really hard to see the software for a radar or a

>>  radio telescope as processing strings...

>>   _______________________________________________

>>  erlang-questions mailing list

>> erlang-questions@REDACTED

>> http://erlang.org/mailman/listinfo/erlang-questions_______________________________________________
>> erlang-questions mailing list erlang-questions@REDACTED http://erlang.org/mailman/listinfo/erlang-questions[
>> http://erlang.org/mailman/listinfo/erlang-questions][1]
>>  _______________________________________________

>>  erlang-questions mailing list

>> erlang-questions@REDACTED

>> http://erlang.org/mailman/listinfo/erlang-questions

> _________________________________________________

> erlang-questions mailing list

> erlang-questions@REDACTED

> http://erlang.org/mailman/listinfo/erlang-questions

Links:

  1. http://erlang.org/mailman/listinfo/erlang-questions%5B
     http://erlang.org/mailman/listinfo/erlang-questions%5D
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170114/6e3c11c4/attachment.htm>