Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #59944

Re: [Python-ideas] Unicode stdin/stdout

Path csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <abarnert@yahoo.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'python,': 0.02; 'scripts': 0.03; 'subject:: [': 0.04; 'encoding': 0.05; 'subject:Python': 0.06; 'debug': 0.07; 'redirected': 0.07; 'referring': 0.07; 'socket': 0.07; 'users,': 0.07; 'utf-8': 0.07; 'correspond': 0.09; 'function,': 0.09; 'cc:addr:python-list': 0.11; 'anyway': 0.14; 'creates': 0.14; 'windows': 0.15; '16-bit': 0.16; '8-bit': 0.16; 'argument,': 0.16; 'constructs': 0.16; 'fds': 0.16; 'fds.': 0.16; 'hierarchy': 0.16; 'hierarchy.': 0.16; 'python),': 0.16; 'received:66.196': 0.16; 'reusable': 0.16; 'stderr': 0.16; 'subject:Unicode': 0.16; 'subject:ideas': 0.16; 'sys.stdout': 0.16; 'two.': 0.16; 'wrappers': 0.16; 'wraps': 0.16; 'attach': 0.16; 'wrote:': 0.18; 'bit': 0.19; 'module': 0.19; 'advance.': 0.19; 'file,': 0.19; 'implementing': 0.19; 'possible,': 0.19; "python's": 0.19; 'fit': 0.20; 'subject:] ': 0.20; '(the': 0.22; 'import': 0.22; '(in': 0.22; 'handles': 0.22; 'putting': 0.22; 'cc:addr:python.org': 0.22; 'byte': 0.24; 'bytes': 0.24; "shouldn't": 0.24; 'unicode': 0.24; 'mon,': 0.24; '(or': 0.24; 'cc:2**0': 0.24; 'extension': 0.26; 'pass': 0.26; 'subject:/': 0.26; 'header:In-Reply-To:1': 0.27; 'function': 0.29; 'leave': 0.29; 'character': 0.29; 'possibility': 0.29; "doesn't": 0.30; 'besides': 0.30; 'door': 0.30; 'mix': 0.30; "i'm": 0.30; 'work.': 0.31; '(which': 0.31; 'code': 0.31; 'that.': 0.31; '(although': 0.31; '(on': 0.31; 'ctypes': 0.31; 'libraries': 0.31; 'with,': 0.31; 'anyone': 0.31; 'file': 0.32; 'languages': 0.32; 'stuff': 0.32; 'another': 0.32; 'open': 0.33; '(including': 0.33; 'running': 0.33; '(i.e.': 0.33; 'skip:_ 10': 0.34; 'could': 0.34; 'problem': 0.35; 'received:66': 0.35; 'classes': 0.35; 'something': 0.35; 'usual': 0.35; 'but': 0.35; 'there': 0.35; 'really': 0.36; '(e.g.,': 0.36; "he's": 0.36; 'limitations': 0.36; 'possible': 0.36; 'should': 0.36; 'so,': 0.37; 'too': 0.37; 'clear': 0.37; 'being': 0.38; 'implement': 0.38; 'problems': 0.38; 'nov': 0.38; 'handle': 0.38; 'files': 0.38; 'issue': 0.38; 'fact': 0.38; 'track': 0.38; 'that,': 0.38; 'bad': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'enough': 0.39; 'course.': 0.60; 'received:bf1.yahoo.com': 0.60; 'received:mail.bf1.yahoo.com': 0.60; 'solve': 0.60; 'most': 0.60; 'new': 0.61; "you're": 0.61; 'more': 0.64; 'different': 0.65; 'to:2**2': 0.65; 'finish': 0.65; 'worth': 0.66; 'to:addr:python-ideas': 0.67; 'header:Reply-To:1': 0.67; 'started.': 0.68; 'default': 0.69; 'to,': 0.72; 'received:bullet.mail.bf1.yahoo.com': 0.74; 'special': 0.74; 'counts': 0.83; '_little_': 0.84; 'closes': 0.84; 'experiment': 0.84; 'expose': 0.84; '2013,': 0.91; 'careful': 0.91; 'good,': 0.91; 'hand,': 0.93
X-Yahoo-Newman-Property ymail-3
X-Yahoo-Newman-Id 322731.35393.bm@omp1022.access.mail.bf1.yahoo.com
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1384838861; bh=c5Om8CyzPxPo2CLKvJvFyRYO5Qyijp1+E9EicxXz4v0=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=sGH50CMfNXp7sLHI38IZrP0/IxicTcrcf8K94cyv6cYzPfQByTh2E8GUIo2Y/LH7bYd6LUOT8xVcaZtD+y7NKiz7sqCR5o+kw8t3TihbKoqxAmuN5ZYBzUJa7ZG1ct9U71Y/VQVeBNNNKcBRtHGgAx7/4Qj9M0cLjHKtHhBtuZY=
DomainKey-Signature a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=BoWTiC4idk0U1STPVvCWf2AwSNm5Oc4/2Uh+/K05RW0VKKUrwoQ1shbUEKZisIfsc2UXBNj3gJQafb5RMOjlWEn2VebjGaeelRieDlTb0Lvc/HZzv035lhBcxwdfZtl3TaG7W/ezQkfK86FyWq286b3ZOa/i9UOuaCSmOgPuRuY=;
X-YMail-OSG qdH96ZkVM1mu3Md34nMDzrLY3IKCvownhM.uc9tkfLdgwL0 9L5U0XM_CNTkK_EjHUcGjbJrWgs4S4I9Kuqz611FnPk0_mefiKf6BruYAAYs r7IygnueTqtQwG_BDLRlNjhGDjNYLmrONc3TPyPn8kCpD4xPEXo5Fs.FEHFG rrthU1sxginL0lUC15b99sD58Ch.9psTZeAirG5688jX0b4VdkWQlShg8MwI b.9L_apAuSyl0ccQmyPxcqzj75WvEWSEuinJIW57zH6WWtv95m7ZWUpE83w_ gOsIYyBAhTJDLRoPbAF_AtmwSFcJp9LPAPLaN1zbUOi2pOGwAQD9rAKFH_eW 8bYjg7GQyHlTR3xcyb5_yTuOMXjhjoIe7MNr7j3Hu40tn4VPbdbb.sjyk1uV inFXnGxXwi_u5VtM.OOnB.JKKbDzQruXwrZrmaNd00KDxZvrlu2s_ShUuIwV d5a6Gqjb1GfJ4Z2gvmv3LyV6FTapyYvkYObAV5HZ._fh9Kusaizl53U6OB_d YFJvBBYMdmiQtE2syOxKl7OY04uVODKZ4ssCnujgzH_1vyk7JWw3N4kXJuWn zal6nYfFdqzz9FeUk8vFhZEKvqfstQKTb
X-Rocket-MIMEInfo 002.001, RnJvbTogInJhbmRvbTgzMkBmYXN0bWFpbC51cyIgPHJhbmRvbTgzMkBmYXN0bWFpbC51cz4KCgoKPiBPbiBNb24sIE5vdiAxOCwgMjAxMywgYXQgNzozMywgUm9iaW4gQmVja2VyIHdyb3RlOgo.PiAgVVRGLTggc3R1ZmYKPiAKPiBUaGlzIGRvZXNuJ3QgcmVhbGx5IHNvbHZlIHRoZSBpc3N1ZSBJIHdhcyByZWZlcnJpbmcgdG8sIHdoaWNoIGlzIHRoYXQKPiB3aW5kb3dzIF9jb25zb2xlXyAoaS5lLiBub3QgcmVkaXJlY3RlZCBmaWxlIG9yIHBpcGUpIEkvTyBjYW4gb25seQo.IHN1cHBvcnQgdW5pY29kZSB2aWEBMAEBAQE-
X-Mailer YahooMailWebService/0.8.166.601
References <5286054F.6000707@chamonix.reportlab.co.uk> <1384539385.17784.47988289.02AE655C@webmail.messagingengine.com> <5289FE6C.7030007@chamonix.reportlab.co.uk> <528A092D.4060408@chamonix.reportlab.co.uk> <1384813806.25855.49113337.110F19E2@webmail.messagingengine.com>
Date Mon, 18 Nov 2013 21:27:41 -0800 (PST)
From Andrew Barnert <abarnert@yahoo.com>
Subject Re: [Python-ideas] Unicode stdin/stdout
To "random832@fastmail.us" <random832@fastmail.us>, Robin Becker <robin@reportlab.com>, "python-ideas@python.org" <python-ideas@python.org>
In-Reply-To <1384813806.25855.49113337.110F19E2@webmail.messagingengine.com>
MIME-Version 1.0
Content-Type text/plain; charset=utf-8
Content-Transfer-Encoding quoted-printable
Cc "python-list@python.org" <python-list@python.org>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
Reply-To Andrew Barnert <abarnert@yahoo.com>
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.2882.1384839071.18130.python-list@python.org> (permalink)
Lines 60
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1384839071 news.xs4all.nl 15902 [2001:888:2000:d::a6]:33914
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:59944

Show key headers only | View raw


From: "random832@fastmail.us" <random832@fastmail.us>



> On Mon, Nov 18, 2013, at 7:33, Robin Becker wrote:
>>  UTF-8 stuff
> 
> This doesn't really solve the issue I was referring to, which is that
> windows _console_ (i.e. not redirected file or pipe) I/O can only
> support unicode via wide character (UTF-16) I/O with a special function,
> not via using byte-based I/O with the normal write function.


The problem is that Windows 16-bit I/O doesn't fit into the usual io module hierarchy. Not because it uses an encoding of UTF-16 (although anyone familiar with ReadConsoleW/WriteConsoleW from other languages may be a bit confused that Python's lowest-level wrappers around them deal in byte counts instead of WCHAR counts), but because you have to use HANDLEs instead of fds. So, there are going to be some compromises and some complexity.

One possibility is to use as much of the io hierarchy as possible, but not try to make it flexible enough to be reusable for arbitrary HANDLEs: Add WindowsFileIO and WindowsConsoleIO classes that implement RawIOBase with a native HANDLE and ReadFile/WriteFile and ReadConsoleW/WriteConsoleW respectively. Both work in terms of bytes (which means WindowsConsoleIO.read has to //2 its argument, and write has to *2 the result). You also need a create_windows_io function that wraps a HANDLE by calling GetConsoleMode and constructing a WindowsConsoleIO or WindowsFileIO as appropriate, then creates a BufferedReader/Writer around that, then constructs a TextIOWrapper with UTF-16 or the default encoding around that. At startup, you just do that for the three GetStdHandle handles, and that's your stdin, stdout, and stderr.

Besides not being reusable enough for people who want to wrap HANDLEs from other libraries or attach to new consoles from Python, it's not clear what fileno() should return. You could fake it and return the MSVCRT fds that correspond to the same files as the HANDLEs, but it's possible to end up with one redirected and not the other (e.g., if you detach the console), and I'm not sure what happens if you mix and match the two. A more "correct" solution would be to call _open_osfhandle on the HANDLE (and then keep track of the fact that os.close closes the HANDLE, or leave it up to the user to deal with bad handle errors?), but I'm not sure that's any better in practice. Also, should a console HANDLE use _O_WTEXT for its fd (in which case the user has to know that he has a _O_WTEXT handle even though there's no way to see that from Python), or not (in which case he's mixing 8-bit and 16-bit I/O on the same file)?

It might be reasonable to just not expose fileno(); most code that wants the fileno() for stdin is just going to do something Unix-y that's not going to work anyway (select it, tcsetattr it, pass it over a socket to another file, …).

A different approach would be to reuse as _little_ of io as possible, instead of as much: Windows stdin/stdout/stderr could each be custom TextIOBase implementations that work straight on HANDLEs and don't even support buffer (or detach), much less fileno. That exposes even less functionality to users, of course. It also means we need a parallel implementation of all the buffering logic. (On the other hand, it also leaves the door open to expose some Windows functionality, like async ReadFileEx/WriteFileEx, in a way that would be very hard through the normal layers…)


It shouldn't be too hard to write most of these via an extension module or ctypes to experiment with it. As long as you're careful not to mix winsys.stdout and sys.stdout (the module could even set sys.stdin, sys.stdout, sys.stderr=stdin, stdout, stderr at import time, or just del them, for a bit of protection), it should work.

It might be worth implementing a few different designs to play with, and putting them through their paces with some modules and scripts that do different things with stdio (including running the scripts with cmd.exe redirected I/O and with subprocess PIPEs) to see which ones have problems or limitations that are hard to foresee in advance.

If you have a design that you think sounds good, and are willing to experiment the hell out of it, and don't know how to get started but would be willing to debug and finish a mostly-written/almost-working implementation, I could slap something together with ctypes to get you started.

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: [Python-ideas] Unicode stdin/stdout Andrew Barnert <abarnert@yahoo.com> - 2013-11-18 21:27 -0800

csiph-web