<div dir="ltr"><div><div><div>I'd suggest looking a bit at what the Go people are doing here as well. They have this problem at a large scale at Google, so I expect them to have handling of some of the nastier corner cases. Some observations:<br><br></div>* They keep part of the unicode handling outside of the standard library. This allows it to develop independently of the main code base, which in some cases is nice. The things that seems to be in the separate package are collation, normalization, and canocalization. I have a hunch this happens because these things tend to update a bit like time zones and SSL top-level certificates: haphazardly. By keeping it outside, you avoid the problem of having to wait for a new major release, or even having to push a new minor release because some random pacific island decided on their monthly meeting to switch their collation rules :)<br><br></div>* Keep it rather simple for starters. The Elixir set is probably what "most people need supported".<br><br></div>* Currently, I'm mostly interested in normalization and canocalization. I have some data where this would be highly useful to have access to directly from Erlang.<br></div><br><div class="gmail_quote"><div dir="ltr">On Sun, Jan 15, 2017 at 11:17 AM Dan Gudmundsson <<a href="mailto:dgud@erlang.org">dgud@erlang.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_quote gmail_msg"><div dir="ltr" class="gmail_msg">On Sun, Jan 15, 2017 at 10:48 AM Loïc Hoguin <<a href="mailto:essen@ninenines.eu" class="gmail_msg" target="_blank">essen@ninenines.eu</a>> wrote:<br class="gmail_msg"></div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Great work!<br class="gmail_msg"></blockquote><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">You have not seen it yet :-)</div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"> </div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br class="gmail_msg">
Does this include (some) support for locales? As far as I recall they<br class="gmail_msg">
are necessary to do uppercasing and lowercasing properly.<br class="gmail_msg">
<br class="gmail_msg"></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg">No not currently I have taken the elixir approach of a basic unicode support without locale handling, that is another can of worm which I guess can be opened later if there is need.</div><div class="gmail_msg">But we don't have locale support anywhere else and I don't want start with adding that first,</div><div class="gmail_msg">then will we never get any unicode support.</div><div class="gmail_msg">From my understanding for uppercase and lowercase there very few cases which need<br class="gmail_msg"></div><div class="gmail_msg">the locale to correctly transform them.</div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">/Dan</div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"> </div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
If this includes support for locales, or would in the future, may I<br class="gmail_msg">
suggest 'text' for the module name? Otherwise something else. :-)<br class="gmail_msg">
<br class="gmail_msg">
On 01/15/2017 10:01 AM, Dan Gudmundsson wrote:<br class="gmail_msg">
> We have started to work on a new string module, we will make a PR when<br class="gmail_msg">
> we have decided directions of how the api should look like.<br class="gmail_msg">
><br class="gmail_msg">
> Basic stuff like uppercase, lowercase, to_nfc, to _nfd and gc<br class="gmail_msg">
> (grapheme_clusters) are implemented for unicode:chardata() input, I have<br class="gmail_msg">
> used elixir's module as inspiration.<br class="gmail_msg">
><br class="gmail_msg">
> That is the easy part, writing a nice api on top of that is the hard<br class="gmail_msg">
> part and naming the module<br class="gmail_msg">
> something different than string.<br class="gmail_msg">
><br class="gmail_msg">
> /Dan<br class="gmail_msg">
><br class="gmail_msg">
> On Sun, Jan 15, 2017 at 3:25 AM Michael Truog <<a href="mailto:mjtruog@gmail.com" class="gmail_msg" target="_blank">mjtruog@gmail.com</a><br class="gmail_msg">
> <mailto:<a href="mailto:mjtruog@gmail.com" class="gmail_msg" target="_blank">mjtruog@gmail.com</a>>> wrote:<br class="gmail_msg">
><br class="gmail_msg">
> ||This thread started to talk about the need for unicode<br class="gmail_msg">
> functionality in Erlang and how it exists currently in Elixir but<br class="gmail_msg">
> not in Erlang. I created a repository with the Elixir functions<br class="gmail_msg">
> created as Erlang functions in an Erlang module as an example of<br class="gmail_msg">
> what I want at <a href="https://github.com/okeuday/unicode_data/" rel="noreferrer" class="gmail_msg" target="_blank">https://github.com/okeuday/unicode_data/</a><br class="gmail_msg">
><br class="gmail_msg">
> The generated module (unicode_data) only includes functions from the<br class="gmail_msg">
> first Erlang module contained in unicode.ex (Elixir.String.Unicode)<br class="gmail_msg">
> though it does bring up some important topics:<br class="gmail_msg">
> 1) Add the unicode version to the Erlang module version. The<br class="gmail_msg">
> generated unicode_data module has a timestamp suffix, so we know<br class="gmail_msg">
> both the unicode version and the timestamp when the unicode_data<br class="gmail_msg">
> module was generated.<br class="gmail_msg">
> 2) Use only lists, not binaries, to make sure all temporary data<br class="gmail_msg">
> stays on the heap of the Erlang process. That should be best for<br class="gmail_msg">
> performance, though I haven't proved that with any performance testing.<br class="gmail_msg">
><br class="gmail_msg">
> I haven't added tests, though I have compared the unicode_data<br class="gmail_msg">
> Erlang module, to the Elixir.String.Unicode module and the data<br class="gmail_msg">
> looks correct. Mainly thought this would help the discussion.<br class="gmail_msg">
><br class="gmail_msg">
> Best Regards,<br class="gmail_msg">
> Michael<br class="gmail_msg">
><br class="gmail_msg">
><br class="gmail_msg">
> On 01/10/2017 10:58 AM, Bhag Chandra wrote:<br class="gmail_msg">
>> Hello,<br class="gmail_msg">
>><br class="gmail_msg">
>> I have been coding in Erlang for 2 years. A wonderful language<br class="gmail_msg">
>> but not very big community, so I cant discuss my questions with<br class="gmail_msg">
>> programmers around me (Java, Python guys). I found out about this<br class="gmail_msg">
>> list today.<br class="gmail_msg">
>><br class="gmail_msg">
>> I have some fundamental doubts about the Erlang. It would be great<br class="gmail_msg">
>> if someone can help me clarify them.<br class="gmail_msg">
>><br class="gmail_msg">
>><br class="gmail_msg">
>> 1) "Strings in Erlang are internally treated as a list of integers<br class="gmail_msg">
>> of each character's ASCII values, this representation of string<br class="gmail_msg">
>> makes operations faster. For example, string concatenation is<br class="gmail_msg">
>> constant time operation in Erlang." Can someone explain why?<br class="gmail_msg">
>><br class="gmail_msg">
>> 2) "It makes sense to use Erlang only where system's availability<br class="gmail_msg">
>> is very high". Is it not a very general requirement of most of<br class="gmail_msg">
>> the systems? Whatsapp to Google to FB to Amazon to Paypal to<br class="gmail_msg">
>> Barclays etc they all are high availability systems, so we can use<br class="gmail_msg">
>> Erlang in all of them?<br class="gmail_msg">
>><br class="gmail_msg">
>> 3) "Every message which is sent to a process, goes to the mailbox<br class="gmail_msg">
>> of that process. When process is free, it consumes that message<br class="gmail_msg">
>> from mailbox". So how exactly does process ask from the mailbox<br class="gmail_msg">
>> for that message? Is there a mechanism in a process' memory which<br class="gmail_msg">
>> keeps polling its mailbox. I basically want to understand how<br class="gmail_msg">
>> message is sent from mailbox to my code in process.<br class="gmail_msg">
>><br class="gmail_msg">
>> 4) We say that a message is passed from process A to process B by<br class="gmail_msg">
>> simply using a bang (!) character, but what happens behind the<br class="gmail_msg">
>> scenes to pass this message? Do both processes establish a tcp<br class="gmail_msg">
>> connection first and then pass message or what?<br class="gmail_msg">
>><br class="gmail_msg">
>> 5) At 30:25 in this video ( <a href="https://youtu.be/YaUPdgtUYko?t=1825" rel="noreferrer" class="gmail_msg" target="_blank">https://youtu.be/YaUPdgtUYko?t=1825</a> )<br class="gmail_msg">
>> Mr. Armstrong is talking about the difference between the context<br class="gmail_msg">
>> switching overhead between OS threads and Erlang processes. He<br class="gmail_msg">
>> says, thread context switching is of order 700 words but Erlang<br class="gmail_msg">
>> process context switching is ... ?<br class="gmail_msg">
>> I cant understand what he said, if someone could tell.<br class="gmail_msg">
>><br class="gmail_msg">
>><br class="gmail_msg">
>> P.S. Please excuse for any grammatical errors, English is not my<br class="gmail_msg">
>> first language.<br class="gmail_msg">
>><br class="gmail_msg">
>><br class="gmail_msg">
>> _______________________________________________<br class="gmail_msg">
>> erlang-questions mailing list<br class="gmail_msg">
>> <a href="mailto:erlang-questions@erlang.org" class="gmail_msg" target="_blank">erlang-questions@erlang.org</a> <mailto:<a href="mailto:erlang-questions@erlang.org" class="gmail_msg" target="_blank">erlang-questions@erlang.org</a>><br class="gmail_msg">
>> <a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" class="gmail_msg" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br class="gmail_msg">
><br class="gmail_msg">
> _______________________________________________<br class="gmail_msg">
> erlang-questions mailing list<br class="gmail_msg">
> <a href="mailto:erlang-questions@erlang.org" class="gmail_msg" target="_blank">erlang-questions@erlang.org</a> <mailto:<a href="mailto:erlang-questions@erlang.org" class="gmail_msg" target="_blank">erlang-questions@erlang.org</a>><br class="gmail_msg">
> <a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" class="gmail_msg" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br class="gmail_msg">
><br class="gmail_msg">
><br class="gmail_msg">
><br class="gmail_msg">
> _______________________________________________<br class="gmail_msg">
> erlang-questions mailing list<br class="gmail_msg">
> <a href="mailto:erlang-questions@erlang.org" class="gmail_msg" target="_blank">erlang-questions@erlang.org</a><br class="gmail_msg">
> <a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" class="gmail_msg" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br class="gmail_msg">
><br class="gmail_msg">
<br class="gmail_msg">
--<br class="gmail_msg">
Loïc Hoguin<br class="gmail_msg">
<a href="https://ninenines.eu" rel="noreferrer" class="gmail_msg" target="_blank">https://ninenines.eu</a><br class="gmail_msg">
</blockquote></div></div>
_______________________________________________<br class="gmail_msg">
erlang-questions mailing list<br class="gmail_msg">
<a href="mailto:erlang-questions@erlang.org" class="gmail_msg" target="_blank">erlang-questions@erlang.org</a><br class="gmail_msg">
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" class="gmail_msg" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br class="gmail_msg">
</blockquote></div>