<br><br><div class="gmail_quote">On Tue, Oct 30, 2012 at 12:24 PM, Stephen Hansen <span dir="ltr"><<a href="mailto:me+list/erlang@ixokai.io" target="_blank">me+list/erlang@ixokai.io</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br><br><div class="gmail_quote"><div>On Mon, Oct 29, 2012 at 9:11 PM, Richard O'Keefe <span dir="ltr"><<a href="mailto:ok@cs.otago.ac.nz" target="_blank">ok@cs.otago.ac.nz</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
On 22/10/2012, at 7:44 PM, Rustom Mody wrote:<br>
> 1.<br>
> Python made a choice to embrace unicode more thoroughly in going from python 2 to python 3. This seems to have caused some grief in that 'ASCII' code that used to work in python 2 now often does not in python 3. Maybe this has nothing to do with Richard's EEP because that is about the string data structure this is about variable names. Still just mentioning.<br>
<br>
Can you be more specific? Each ASCII character has the same numeric value<br>
in Unicode, and an ASCII string represented as UTF-8 is exactly the same<br>
sequence of bytes. I can't help wondering if "ASCII" here really means<br>
some 8-bit character set rather than ASCII.<br></blockquote><div><br></div></div><div>I'm an erlang-lurker, but long time Python user.</div><div><br></div><div>The issues with Python 3 and "unicode vs ascii" have absolutely nothing to do with encoding and really, no impact at all on this discussion. Python 2.x had a "string" type and a "unicode" type, but the former was used both as a binary data type, and as a text data type. In Python 3, they have decided to make a firm distinction between 'binary data' and 'textual data', and this change in the fundamental nature of types (and what 'str' means) has led to some difficulties.</div>
<div><br></div></div></blockquote><br>
</div><br>I was not referring to the semantic incompatibilities introduced going python 2 to 3<br>I was referring to the the (claims that) python 3 is slower than 2<br>as for example here: <a href="http://mail.python.org/pipermail/python-list/2012-August/629317.html" target="_blank">http://mail.python.org/pipermail/python-list/2012-August/629317.html</a> (and whole thread)<br>
<br>Can these problems be addressed? Of course.<br>Are they directly related to this EEP? Probably not...<br>I was just mentioning them so that Erlang can learn from python's mistakes.<br><br>Basically python has chosen a 'flexible string representation"<br>
<a href="http://www.python.org/dev/peps/pep-0393/" target="_blank">http://www.python.org/dev/peps/pep-0393/</a><br>
which does the magic of using only 1 byte for ascii, 2 for bmp and 4 for the rest (Unicode 2.0 onwards)<br>In the process however (of detecting the optimal char-width) some inner loops seem to have got less efficient (my guess; dont know for sure)<br>
So python has traded time for space.<br>A command-line option to choose string-engine at start time could solve this problem.<br>[Though in a world where one erlang node talking to another is a very normal usecase, this could cause its own challenges]<br>
<br>Also 32 bits for 'wide' unicode is wasteful, given that the number of unicode codepoints is 1114112.<br>1114112 = 17*2^16 < 32*2^16 = 2^21 < 2^24 < 2^32<br>IOW an acceptable width could be 3 bytes and at 21 bits one could even pack 3 chars into 64 bits<br>