Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python,': 0.02; 'scripts': 0.03; 'subject:: [': 0.04; 'encoding': 0.05; 'subject:Python': 0.06; 'debug': 0.07; 'redirected': 0.07; 'referring': 0.07; 'socket': 0.07; 'users,': 0.07; 'utf-8': 0.07; 'correspond': 0.09; 'function,': 0.09; 'cc:addr:python-list': 0.11; 'anyway': 0.14; 'creates': 0.14; 'windows': 0.15; '16-bit': 0.16; '8-bit': 0.16; 'argument,': 0.16; 'constructs': 0.16; 'fds': 0.16; 'fds.': 0.16; 'hierarchy': 0.16; 'hierarchy.': 0.16; 'python),': 0.16; 'received:66.196': 0.16; 'reusable': 0.16; 'stderr': 0.16; 'subject:Unicode': 0.16; 'subject:ideas': 0.16; 'sys.stdout': 0.16; 'two.': 0.16; 'wrappers': 0.16; 'wraps': 0.16; 'attach': 0.16; 'wrote:': 0.18; 'bit': 0.19; 'module': 0.19; 'advance.': 0.19; 'file,': 0.19; 'implementing': 0.19; 'possible,': 0.19; "python's": 0.19; 'fit': 0.20; 'subject:] ': 0.20; '(the': 0.22; 'import': 0.22; '(in': 0.22; 'handles': 0.22; 'putting': 0.22; 'cc:addr:python.org': 0.22; 'byte': 0.24; 'bytes': 0.24; "shouldn't": 0.24; 'unicode': 0.24; 'mon,': 0.24; '(or': 0.24; 'cc:2**0': 0.24; 'extension': 0.26; 'pass': 0.26; 'subject:/': 0.26; 'header:In-Reply-To:1': 0.27; 'function': 0.29; 'leave': 0.29; 'character': 0.29; 'possibility': 0.29; "doesn't": 0.30; 'besides': 0.30; 'door': 0.30; 'mix': 0.30; "i'm": 0.30; 'work.': 0.31; '(which': 0.31; 'code': 0.31; 'that.': 0.31; '(although': 0.31; '(on': 0.31; 'ctypes': 0.31; 'libraries': 0.31; 'with,': 0.31; 'anyone': 0.31; 'file': 0.32; 'languages': 0.32; 'stuff': 0.32; 'another': 0.32; 'open': 0.33; '(including': 0.33; 'running': 0.33; '(i.e.': 0.33; 'skip:_ 10': 0.34; 'could': 0.34; 'problem': 0.35; 'received:66': 0.35; 'classes': 0.35; 'something': 0.35; 'usual': 0.35; 'but': 0.35; 'there': 0.35; 'really': 0.36; '(e.g.,': 0.36; "he's": 0.36; 'limitations': 0.36; 'possible': 0.36; 'should': 0.36; 'so,': 0.37; 'too': 0.37; 'clear': 0.37; 'being': 0.38; 'implement': 0.38; 'problems': 0.38; 'nov': 0.38; 'handle': 0.38; 'files': 0.38; 'issue': 0.38; 'fact': 0.38; 'track': 0.38; 'that,': 0.38; 'bad': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'enough': 0.39; 'course.': 0.60; 'received:bf1.yahoo.com': 0.60; 'received:mail.bf1.yahoo.com': 0.60; 'solve': 0.60; 'most': 0.60; 'new': 0.61; "you're": 0.61; 'more': 0.64; 'different': 0.65; 'to:2**2': 0.65; 'finish': 0.65; 'worth': 0.66; 'to:addr:python-ideas': 0.67; 'header:Reply-To:1': 0.67; 'started.': 0.68; 'default': 0.69; 'to,': 0.72; 'received:bullet.mail.bf1.yahoo.com': 0.74; 'special': 0.74; 'counts': 0.83; '_little_': 0.84; 'closes': 0.84; 'experiment': 0.84; 'expose': 0.84; '2013,': 0.91; 'careful': 0.91; 'good,': 0.91; 'hand,': 0.93 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 322731.35393.bm@omp1022.access.mail.bf1.yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1384838861; bh=c5Om8CyzPxPo2CLKvJvFyRYO5Qyijp1+E9EicxXz4v0=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=sGH50CMfNXp7sLHI38IZrP0/IxicTcrcf8K94cyv6cYzPfQByTh2E8GUIo2Y/LH7bYd6LUOT8xVcaZtD+y7NKiz7sqCR5o+kw8t3TihbKoqxAmuN5ZYBzUJa7ZG1ct9U71Y/VQVeBNNNKcBRtHGgAx7/4Qj9M0cLjHKtHhBtuZY= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=BoWTiC4idk0U1STPVvCWf2AwSNm5Oc4/2Uh+/K05RW0VKKUrwoQ1shbUEKZisIfsc2UXBNj3gJQafb5RMOjlWEn2VebjGaeelRieDlTb0Lvc/HZzv035lhBcxwdfZtl3TaG7W/ezQkfK86FyWq286b3ZOa/i9UOuaCSmOgPuRuY=; X-YMail-OSG: qdH96ZkVM1mu3Md34nMDzrLY3IKCvownhM.uc9tkfLdgwL0 9L5U0XM_CNTkK_EjHUcGjbJrWgs4S4I9Kuqz611FnPk0_mefiKf6BruYAAYs r7IygnueTqtQwG_BDLRlNjhGDjNYLmrONc3TPyPn8kCpD4xPEXo5Fs.FEHFG rrthU1sxginL0lUC15b99sD58Ch.9psTZeAirG5688jX0b4VdkWQlShg8MwI b.9L_apAuSyl0ccQmyPxcqzj75WvEWSEuinJIW57zH6WWtv95m7ZWUpE83w_ gOsIYyBAhTJDLRoPbAF_AtmwSFcJp9LPAPLaN1zbUOi2pOGwAQD9rAKFH_eW 8bYjg7GQyHlTR3xcyb5_yTuOMXjhjoIe7MNr7j3Hu40tn4VPbdbb.sjyk1uV inFXnGxXwi_u5VtM.OOnB.JKKbDzQruXwrZrmaNd00KDxZvrlu2s_ShUuIwV d5a6Gqjb1GfJ4Z2gvmv3LyV6FTapyYvkYObAV5HZ._fh9Kusaizl53U6OB_d YFJvBBYMdmiQtE2syOxKl7OY04uVODKZ4ssCnujgzH_1vyk7JWw3N4kXJuWn zal6nYfFdqzz9FeUk8vFhZEKvqfstQKTb X-Rocket-MIMEInfo: 002.001, RnJvbTogInJhbmRvbTgzMkBmYXN0bWFpbC51cyIgPHJhbmRvbTgzMkBmYXN0bWFpbC51cz4KCgoKPiBPbiBNb24sIE5vdiAxOCwgMjAxMywgYXQgNzozMywgUm9iaW4gQmVja2VyIHdyb3RlOgo.PiAgVVRGLTggc3R1ZmYKPiAKPiBUaGlzIGRvZXNuJ3QgcmVhbGx5IHNvbHZlIHRoZSBpc3N1ZSBJIHdhcyByZWZlcnJpbmcgdG8sIHdoaWNoIGlzIHRoYXQKPiB3aW5kb3dzIF9jb25zb2xlXyAoaS5lLiBub3QgcmVkaXJlY3RlZCBmaWxlIG9yIHBpcGUpIEkvTyBjYW4gb25seQo.IHN1cHBvcnQgdW5pY29kZSB2aWEBMAEBAQE- X-Mailer: YahooMailWebService/0.8.166.601 References: <5286054F.6000707@chamonix.reportlab.co.uk> <1384539385.17784.47988289.02AE655C@webmail.messagingengine.com> <5289FE6C.7030007@chamonix.reportlab.co.uk> <528A092D.4060408@chamonix.reportlab.co.uk> <1384813806.25855.49113337.110F19E2@webmail.messagingengine.com> Date: Mon, 18 Nov 2013 21:27:41 -0800 (PST) From: Andrew Barnert Subject: Re: [Python-ideas] Unicode stdin/stdout To: "random832@fastmail.us" , Robin Becker , "python-ideas@python.org" In-Reply-To: <1384813806.25855.49113337.110F19E2@webmail.messagingengine.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: "python-list@python.org" X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Andrew Barnert List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 60 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1384839071 news.xs4all.nl 15902 [2001:888:2000:d::a6]:33914 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:59944 From: "random832@fastmail.us" =0A=0A=0A=0A> On Mon, = Nov 18, 2013, at 7:33, Robin Becker wrote:=0A>> UTF-8 stuff=0A> =0A> This = doesn't really solve the issue I was referring to, which is that=0A> window= s _console_ (i.e. not redirected file or pipe) I/O can only=0A> support uni= code via wide character (UTF-16) I/O with a special function,=0A> not via u= sing byte-based I/O with the normal write function.=0A=0A=0AThe problem is = that Windows 16-bit I/O doesn't fit into the usual io module hierarchy. Not= because it uses an encoding of UTF-16 (although anyone familiar with ReadC= onsoleW/WriteConsoleW from other languages may be a bit confused that Pytho= n's lowest-level wrappers around them deal in byte counts instead of WCHAR = counts), but because you have to use HANDLEs instead of fds. So, there are = going to be some compromises and some complexity.=0A=0AOne possibility is t= o use as much of the io hierarchy as possible, but not try to make it flexi= ble enough to be reusable for arbitrary HANDLEs: Add=C2=A0WindowsFileIO and= WindowsConsoleIO classes that implement RawIOBase with a native HANDLE and= ReadFile/WriteFile and ReadConsoleW/WriteConsoleW respectively. Both work = in terms of bytes (which means WindowsConsoleIO.read has to //2 its argumen= t, and write has to *2 the result). You also need a create_windows_io funct= ion that wraps a HANDLE by calling GetConsoleMode and constructing a Window= sConsoleIO or WindowsFileIO as appropriate, then creates a BufferedReader/W= riter around that, then constructs a TextIOWrapper with UTF-16 or the defau= lt encoding around that. At startup, you just do that for the three GetStdH= andle handles, and that's your stdin, stdout, and stderr.=0A=0ABesides not = being reusable enough for people who want to wrap HANDLEs from other librar= ies or attach to new consoles from Python, it's not clear what fileno() sho= uld return. You could fake it and return the MSVCRT fds that correspond to = the same files as the HANDLEs, but it's possible to end up with one redirec= ted and not the other (e.g., if you detach the console), and I'm not sure w= hat happens if you mix and match the two. A more "correct" solution would b= e to call _open_osfhandle on=C2=A0the HANDLE (and then keep track of the fa= ct that os.close closes the HANDLE, or leave it up to the user to deal with= bad handle errors?), but I'm not sure that's any better in practice. Also,= should a console HANDLE use _O_WTEXT for its fd (in which case the user ha= s to know that he has a _O_WTEXT handle even though there's no way to see t= hat from Python), or not (in which case he's mixing 8-bit and 16-bit I/O on= the same file)?=0A=0AIt might be reasonable to just not expose fileno(); m= ost code that wants the fileno() for stdin is just going to do something Un= ix-y that's not going to work anyway (select it, tcsetattr it, pass it over= a socket to another file, =E2=80=A6).=0A=0AA different approach would be t= o reuse as _little_ of io as possible, instead of as much: Windows stdin/st= dout/stderr could each be custom TextIOBase implementations that work strai= ght on HANDLEs and don't=C2=A0even support buffer (or detach), much less fi= leno. That exposes even less functionality to users, of course. It also mea= ns we need a parallel implementation of all the buffering logic. (On the ot= her hand, it also leaves the door open to expose some Windows functionality= , like async ReadFileEx/WriteFileEx, in a way that would be very hard throu= gh the normal layers=E2=80=A6)=0A=0A=0AIt shouldn't be too hard to write mo= st of these via an extension module or ctypes to experiment with it. As lon= g as you're careful not to mix winsys.stdout and sys.stdout (the module cou= ld even set sys.stdin, sys.stdout, sys.stderr=3Dstdin, stdout, stderr at im= port time, or just del them, for a bit of protection), it should work.=0A= =0AIt might be worth implementing a few different designs to play with, and= putting them through their paces with some modules and scripts that do dif= ferent things with stdio (including running the scripts with cmd.exe redire= cted I/O and with subprocess PIPEs) to see which ones have problems or limi= tations that are hard to foresee in advance.=0A=0AIf you have a design that= you think sounds good, and are willing to experiment the hell out of it, a= nd don't know how to get started but would be willing to debug and finish a= mostly-written/almost-working implementation, I could slap something toget= her with ctypes to get you started.