[erlang-questions] A proposal for Unicode variable and atom names in Erlang.

Rustom Mody rustompmody@REDACTED
Sat Nov 3 11:18:27 CET 2012


On Tue, Oct 30, 2012 at 12:24 PM, Stephen Hansen
<me+list/erlang@REDACTED>wrote:

>
>
> On Mon, Oct 29, 2012 at 9:11 PM, Richard O'Keefe <ok@REDACTED>wrote:
>
>>
>> On 22/10/2012, at 7:44 PM, Rustom Mody wrote:
>> > 1.
>> > Python made a choice to embrace unicode more thoroughly in going from
>> python 2 to python 3.  This seems to have caused some grief in that 'ASCII'
>> code that used to work in python 2 now often does not in python 3. Maybe
>> this has nothing to do with Richard's EEP because that is about the string
>> data structure this is about variable names. Still just mentioning.
>>
>> Can you be more specific?  Each ASCII character has the same numeric value
>> in Unicode, and an ASCII string represented as UTF-8 is exactly the same
>> sequence of bytes.  I can't help wondering if "ASCII" here really means
>> some 8-bit character set rather than ASCII.
>>
>
> I'm an erlang-lurker, but long time Python user.
>
> The issues with Python 3 and "unicode vs ascii" have absolutely nothing to
> do with encoding and really, no impact at all on this discussion. Python
> 2.x had a "string" type and a "unicode" type, but the former was used both
> as a binary data type, and as a text data type. In Python 3, they have
> decided to make a firm distinction between 'binary data' and 'textual
> data', and this change in the fundamental nature of types (and what 'str'
> means) has led to some difficulties.
>
>

I was not referring to the semantic incompatibilities introduced going
python 2 to 3
I was referring to the the (claims that) python 3 is slower than 2
as for example here:
http://mail.python.org/pipermail/python-list/2012-August/629317.html (and
whole thread)

Can these problems be addressed? Of course.
Are they directly related to this EEP? Probably not...
I was just mentioning them so that Erlang can learn from python's mistakes.

Basically python has chosen a 'flexible string representation"
http://www.python.org/dev/peps/pep-0393/
which does the magic of using only 1 byte for ascii, 2 for bmp and 4 for
the rest (Unicode 2.0 onwards)
In the process however (of detecting the optimal char-width) some inner
loops seem to have got less efficient (my guess; dont know for sure)
So python has traded time for space.
A command-line option to choose string-engine at start time could solve
this problem.
[Though in a world where one erlang node talking to another is a very
normal usecase, this could cause its own challenges]

Also 32 bits for 'wide' unicode is wasteful, given that the number of
unicode codepoints is 1114112.
1114112 = 17*2^16 < 32*2^16 = 2^21 < 2^24 < 2^32
IOW an acceptable width could be 3 bytes and at 21 bits one could even pack
3 chars into 64 bits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121103/0ae2df64/attachment.htm>


More information about the erlang-questions mailing list