Path: csiph.com!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail From: BartC Newsgroups: comp.lang.python Subject: Re: Pyhon 2.x or 3.x, which is faster? Date: Wed, 9 Mar 2016 14:03:42 +0000 Organization: A noiseless patient Spider Lines: 76 Message-ID: References: <87d1r6iltx.fsf@elektro.pacujo.net> <56de28a1$0$1604$c3e8da3$5496439d@news.astraweb.com> <56de57b5$0$1590$c3e8da3$5496439d@news.astraweb.com> <56df6873$0$1588$c3e8da3$5496439d@news.astraweb.com> <56df87f7$0$1620$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Wed, 9 Mar 2016 14:00:42 -0000 (UTC) Injection-Info: mx02.eternal-september.org; posting-host="cf45b3961a050227b1103bebc3cbc15a"; logging-data="31308"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+G9Ejo+6IKeVSMtlgzHrDF" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 In-Reply-To: <56df87f7$0$1620$c3e8da3$5496439d@news.astraweb.com> Cancel-Lock: sha1:VVNHzv2BVbCttu8q5kf/7y5Iy48= Xref: csiph.com comp.lang.python:104410 On 09/03/2016 02:18, Steven D'Aprano wrote: > On Wed, 9 Mar 2016 12:28 pm, BartC wrote: > >> (Which wasn't as painful as I'd expected. However the next project I >> have in mind is 20K lines rather than 0.7K. For that I'm looking at some >> mechanical translation I think. And probably some library to wrap around >> Python's i/o.) > > You almost certainly don't need another wrapper around Python's I/O, making > it slower still. You need to understand what Python's I/O is doing. Well, the original project will be using its file i/o library. So it'll use the same interface that will be reimplemented on top of Python i/o. And input operations mainly consist of grabbing an entire file at once. Output is a little more mixed. > If you open a file in binary mode, Python will give you a stream of bytes > (ordinal values 0 through 255 inclusive). Python won't modify or change > those bytes in any way. Whatever it reads from disk, it will give to you. > > If you open a file in text mode, Python 3 will give you a stream of Unicode > code points (ordinal values 0 through 0x10FFFF). Earlier versions of Python > 3 may behave somewhat strangely with so-called "astral characters": I > recommend that you avoid anything below version 3.3. Unless you are > including (e.g.) Chinese or ancient Phoenician in your text file, you > probably won't care. I've just tried a UTF-8 file and getting some odd results. With a file containing [three euro symbols]: €€€ (including a 3-byte utf-8 marker at the start), and opened in text mode, Python 3 gives me this series of bytes (ie. the ord() of each character): 239 187 191 226 8218 172 226 8218 172 226 8218 172 And prints the resulting string as: €€€. Although this latter might depend on my console's code page setting. Changing it to UTF-8 however (CHCP 65001 in Windows) gives me this error when I run the program again: ---------- Fatal Python error: Py_Initialize: can't initialize sys standard streams LookupError: unknown encoding: cp65001 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. ---------- (That was with 3.1; 3.4 gives the same set of characters as above, and shows the string differently, but still wrong. While PyPy 3.2.4 gives a different set of byte values, all 0..255, and a different string again, although it now contains some actual € characters. So I think I'll skip Unicode handling to start off with! (I've already had plenty of fun and games with it in the past.) -- Bartc