Path: csiph.com!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail
From: BartC <bc@freeuk.com>
Newsgroups: comp.lang.python
Subject: Re: Pyhon 2.x or 3.x, which is faster?
Date: Wed, 9 Mar 2016 14:03:42 +0000
Organization: A noiseless patient Spider
Lines: 76
Message-ID: <nbpaaa$uic$1@dont-email.me>
References: <mailman.238.1457265255.20602.python-list@python.org> <nbjmv7$ad5$1@dont-email.me> <87d1r6iltx.fsf@elektro.pacujo.net> <nbjp1e$jhv$1@dont-email.me> <nbjrjm$m16$1@gioia.aioe.org> <nbjvas$h22$1@dont-email.me> <mailman.17.1457364684.10335.python-list@python.org> <nbkhei$dg6$1@dont-email.me> <mailman.43.1457377845.10335.python-list@python.org> <nbknir$avu$1@dont-email.me> <56de28a1$0$1604$c3e8da3$5496439d@news.astraweb.com> <nblae9$nl0$1@dont-email.me> <56de57b5$0$1590$c3e8da3$5496439d@news.astraweb.com> <nbml2q$l4n$1@dont-email.me> <56df6873$0$1588$c3e8da3$5496439d@news.astraweb.com> <nbnu1m$u1m$1@dont-email.me> <56df87f7$0$1620$c3e8da3$5496439d@news.astraweb.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 9 Mar 2016 14:00:42 -0000 (UTC)
Injection-Info: mx02.eternal-september.org; posting-host="cf45b3961a050227b1103bebc3cbc15a"; logging-data="31308"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+G9Ejo+6IKeVSMtlgzHrDF"
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0
In-Reply-To: <56df87f7$0$1620$c3e8da3$5496439d@news.astraweb.com>
Cancel-Lock: sha1:VVNHzv2BVbCttu8q5kf/7y5Iy48=
Xref: csiph.com comp.lang.python:104410

On 09/03/2016 02:18, Steven D'Aprano wrote:
> On Wed, 9 Mar 2016 12:28 pm, BartC wrote:
>
>> (Which wasn't as painful as I'd expected. However the next project I
>> have in mind is 20K lines rather than 0.7K. For that I'm looking at some
>> mechanical translation I think. And probably some library to wrap around
>> Python's i/o.)
>
> You almost certainly don't need another wrapper around Python's I/O, making
> it slower still. You need to understand what Python's I/O is doing.

Well, the original project will be using its file i/o library. So it'll 
use the same interface that will be reimplemented on top of Python i/o.

And input operations mainly consist of grabbing an entire file at once. 
Output is a little more mixed.

> If you open a file in binary mode, Python will give you a stream of bytes
> (ordinal values 0 through 255 inclusive). Python won't modify or change
> those bytes in any way. Whatever it reads from disk, it will give to you.
>
> If you open a file in text mode, Python 3 will give you a stream of Unicode
> code points (ordinal values 0 through 0x10FFFF). Earlier versions of Python
> 3 may behave somewhat strangely with so-called "astral characters": I
> recommend that you avoid anything below version 3.3. Unless you are
> including (e.g.) Chinese or ancient Phoenician in your text file, you
> probably won't care.

I've just tried a UTF-8 file and getting some odd results. With a file 
containing [three euro symbols]:

€€€

(including a 3-byte utf-8 marker at the start), and opened in text mode, 
Python 3 gives me this series of bytes (ie. the ord() of each character):

239
187
191
226
8218
172
226
8218
172
226
8218
172

And prints the resulting string as: ï»¿â‚¬â‚¬â‚¬. Although this latter 
might depend on my console's code page setting. Changing it to UTF-8 
however (CHCP 65001 in Windows) gives me this error when I run the 
program again:

----------
Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: cp65001

This application has requested the Runtime to terminate it in an unusual 
way.
Please contact the application's support team for more information.
----------

(That was with 3.1; 3.4 gives the same set of characters as above, and 
shows the string differently, but still wrong. While PyPy 3.2.4 gives a 
different set of byte values, all 0..255, and a different string again, 
although it now contains some actual € characters.

So I think I'll skip Unicode handling to start off with! (I've already 
had plenty of fun and games with it in the past.)

-- 
Bartc