Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #71389 > unrolled thread
| Started by | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| First post | 2014-05-12 16:19 +0100 |
| Last post | 2014-05-14 09:56 -0600 |
| Articles | 12 on this page of 72 — 25 participants |
Back to article view | Back to comp.lang.python
Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-12 16:19 +0100
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-12 17:47 +0000
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-12 12:31 -0600
Re: Everything you did not want to know about Unicode in Python 3 MRAB <python@mrabarnett.plus.com> - 2014-05-12 20:42 +0100
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-12 16:16 -0600
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 09:42 +1000
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 01:18 +0000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 11:39 +1000
Re: Everything you did not want to know about Unicode in Python 3 alex23 <wuwei23@gmail.com> - 2014-05-13 16:25 +1000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 16:32 +1000
Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-12 20:58 -0500
Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-13 03:33 +0100
Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-12 22:10 -0700
Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 00:39 -0500
Re: Everything you did not want to know about Unicode in Python 3 Gene Heskett <gheskett@wdtv.com> - 2014-05-13 01:45 -0400
Re: Everything you did not want to know about Unicode in Python 3 Ben Finney <ben@benfinney.id.au> - 2014-05-13 16:03 +1000
Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-12 23:09 -0700
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 16:18 +1000
Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 01:32 -0500
Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 01:32 -0500
Re: Everything you did not want to know about Unicode in Python 3 Roy Smith <roy@panix.com> - 2014-05-13 07:20 -0400
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 13:39 +0000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:43 +1000
Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-13 07:30 -0700
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 00:36 +1000
Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 13:51 +0000
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-13 14:42 +0000
Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 15:21 +0000
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 23:53 +0000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 10:08 +1000
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 12:42 +0000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 22:52 +1000
Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-16 14:46 +0000
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 01:07 +0000
Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-17 07:19 +0300
Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-17 09:35 +0100
Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 10:29 +0100
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 14:15 +0000
Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 22:01 +0100
Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 09:57 +0100
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 12:07 +0000
Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 22:07 +0100
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-17 19:18 +1000
Re: Everything you did not want to know about Unicode in Python 3 Ben Finney <ben@benfinney.id.au> - 2014-05-17 21:05 +1000
[OT] Copyright statements and why they can be useful (was: Everything you did not want to know about Unicode in Python 3) Ben Finney <ben@benfinney.id.au> - 2014-05-14 11:01 +1000
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-14 09:07 -0600
Re: Everything you did not want to know about Unicode in Python 3 Dave Angel <davea@davea.name> - 2014-05-13 21:56 -0400
Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 13:49 +0000
Re: Everything you did not want to know about Unicode in Python 3 gregor <gregor@ediwo.com> - 2014-05-13 09:27 +0200
Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 10:08 +0200
Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 11:25 +0300
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 18:38 +1000
Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 12:06 +0300
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 19:29 +1000
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve@pearwood.info> - 2014-05-13 09:44 +0000
Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 11:38 +0200
Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 11:46 +0200
Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 12:59 +0300
Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-13 14:30 +0100
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:37 +1000
Re: Everything you did not want to know about Unicode in Python 3 Skip Montanaro <skip@pobox.com> - 2014-05-13 09:02 -0500
Re: Everything you did not want to know about Unicode in Python 3 wxjmfauth@gmail.com - 2014-05-14 00:00 -0700
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-13 11:19 +0000
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-13 10:08 -0600
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-14 00:10 +0000
Re: Everything you did not want to know about Unicode in Python 3 Ethan Furman <ethan@stoneleaf.us> - 2014-05-13 17:53 -0700
Re: Everything you did not want to know about Unicode in Python 3 Terry Reedy <tjreedy@udel.edu> - 2014-05-14 17:47 -0400
Re: Everything you did not want to know about Unicode in Python 3 Antoine Pitrou <antoine@python.org> - 2014-05-16 11:50 +0000
Re: Everything you did not want to know about Unicode in Python 3 wxjmfauth@gmail.com - 2014-05-16 06:20 -0700
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 12:38 +0000
Re: Everything you did not want to know about Unicode in Python 3 Robin Becker <robin@reportlab.com> - 2014-05-14 16:30 +0100
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-14 09:56 -0600
Page 4 of 4 — ← Prev page 1 2 3 [4]
| From | Skip Montanaro <skip@pobox.com> |
|---|---|
| Date | 2014-05-13 09:02 -0500 |
| Message-ID | <mailman.9969.1399989757.18130.python-list@python.org> |
| In reply to | #71450 |
[Multipart message — attachments visible in raw view] — view raw
On Tue, May 13, 2014 at 3:38 AM, Chris Angelico <rosuav@gmail.com> wrote: >> Python 2's ambiguity allows me not to answer the tough philosophical >> questions. I'm not saying it's necessarily a good thing, but it has its >> benefits. > > It's not a good thing. It means that you have the convenience of > pretending there's no problem, which means you don't notice trouble > until something happens... and then, in all probability, your app is > in production and you have no idea why stuff went wrong. BITD, when I still maintained and developed Musi-Cal (an early online concert calendar, long since gone), I faced a challenge when I first started encountering non-ASCII band names and cities. I resisted UTF-8. After all, if I printed a string containing an "é", it came out looking like What kind of mess was that??? I tried to ignore it, or assume Latin-1 would cover all the bases (my first non-ASCII inputs tended to come from Western Europe). If nothing else, at least "é" was legible. Needless to say, those approaches didn't work well. After perhaps six months or a year, I broke down and started converting everything coming in or going out to UTF-8 at the boundaries of my system (making educated guesses at input encodings if necessary). My life got a whole lot easier after that. The distinction between bytes and text didn't really matter much, certainly not compared to the mess I had before where strings of unknown data leaked into my system and its database. Skip P.S. My apologies for the mess this message probably is. Amazing as it may seem, Gmail in Chrome does a crappy job editing anything other than plain text. Also, I'm surprised in this day and age that common tools like Gnome Terminal have little or no encoding support. I wound up having to pop up urxvt to get an encodings-flexible terminal emulator...
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-05-14 00:00 -0700 |
| Message-ID | <09bacccb-5a82-4c3d-93ee-ec9dee1a2588@googlegroups.com> |
| In reply to | #71449 |
Le mardi 13 mai 2014 10:08:45 UTC+2, Johannes Bauer a écrit : > On 13.05.2014 03:18, Steven D'Aprano wrote: > > > > > Armin Ronacher is an extremely experienced and knowledgeable Python > > > developer, and a Python core developer. He might be wrong, but he's not > > > *obviously* wrong. > > > > He's correct about file name encodings. Which can be fixed really easily > > wihtout messing everything up (sys.argv binary variant, open accepting > > binary filenames). But that he suggests that Go would be superior: > > > > > Which uses an even simpler model than Python 2: everything is a byte string. The assumed encoding is UTF-8. End of the story. > > > > Is just a horrible idea. An obviously horrible idea, too. > > > > Having dealt with the UTF-8 problems on Python2 I can safely say that I > > never, never ever want to go back to that freaky hell. If I deal with > > strings, I want to be able to sanely manipulate them and I want to be > > sure that after manipulation they're still valid strings. Manipulating > > the bytes representation of unicode data just doesn't work. > > > > And I'm very very glad that some people felt the same way and > > implemented a sane, consistent way of dealing with Unicode in Python3. > > It's one of the reasons why I switched to Py3 very early and I love it. > > > > Cheers, > > Johannes > > > > -- > > >> Wo hattest Du das Beben nochmal GENAU vorhergesagt? > > > Zumindest nicht öffentlich! > > Ah, der neueste und bis heute genialste Streich unsere großen > > Kosmologen: Die Geheim-Vorhersage. > > - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org> =========== A Rob 'Commander' Pike will never put utf16 and ebcdic in the same basket, when discussing coding of characters. jmf
[toc] | [prev] | [next] | [standalone]
| From | alister <alister.nospam.ware@ntlworld.com> |
|---|---|
| Date | 2014-05-13 11:19 +0000 |
| Message-ID | <Kcncv.34412$GL7.204@fx10.am4> |
| In reply to | #71416 |
On Tue, 13 May 2014 01:18:35 +0000, Steven D'Aprano wrote: > On Mon, 12 May 2014 17:47:48 +0000, alister wrote: > >> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: >> >>> This was *NOT* written by our resident unicode expert >>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ >>> >>> Posted as I thought it would make a rather pleasant change from >>> interminable threads about names vs values vs variables vs objects. >> >> Surely those example programs are not the pythonoic way to do things or >> am i missing something? > > Armin Ronacher is an extremely experienced and knowledgeable Python > developer, and a Python core developer. He might be wrong, but he's not > *obviously* wrong. > I am only an amateur python coder which is why I asked if I am missing something I could not see any reason to be using the shutil module if all that the programm is doing is opening a file, reading it & then printing it. is it python that causes the issue, the shutil module or just the OS not liking the data it is being sent? an explanation of why this approach is taken would be much appreciated. -- Revenge is a form of nostalgia.
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-05-13 10:08 -0600 |
| Message-ID | <mailman.9974.1399999114.18130.python-list@python.org> |
| In reply to | #71470 |
On Tue, May 13, 2014 at 5:19 AM, alister <alister.nospam.ware@ntlworld.com> wrote: > I am only an amateur python coder which is why I asked if I am missing > something > > I could not see any reason to be using the shutil module if all that the > programm is doing is opening a file, reading it & then printing it. > > is it python that causes the issue, the shutil module or just the OS not > liking the data it is being sent? > > an explanation of why this approach is taken would be much appreciated. No, that part is perfectly fine. This is exactly what the shutil module is meant for: providing shell-like operations. Although in this case the copyfileobj function is quite simple (have yourself a look at the source -- it just reads from one file and writes to the other in a loop), in general the Pythonic thing is to avoid reinventing the wheel. And since it's so simple, it shouldn't be hard to see that the use of the shutil module has nothing to do with the Unicode woes here. The crux of the issue is that a general-purpose command like cat typically can't know the encoding of its input and can't assume anything about it. In fact, there may not even be an encoding; cat can be used with binary data. The only non-destructive approach then is to copy the binary data straight from the source to the destination with no decoding steps at all, and trust the user to ensure that the destination will be able to accommodate the source encoding. Because Python 3 presents stdin and stdout as text streams however, it makes them more difficult to use with binary data, which is why Armin sets up all that extra code to make sure his file objects are binary.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-05-14 00:10 +0000 |
| Message-ID | <5372b493$0$29977$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #71502 |
On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote: > Because Python 3 presents stdin and stdout as text streams however, it > makes them more difficult to use with binary data, which is why Armin > sets up all that extra code to make sure his file objects are binary. What surprises me is how hard that is. Surely there's a simpler way to open stdin and stdout in binary mode? If not, there ought to be. -- Steven D'Aprano http://import-that.dreamwidth.org/
[toc] | [prev] | [next] | [standalone]
| From | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| Date | 2014-05-13 17:53 -0700 |
| Message-ID | <mailman.9986.1400031716.18130.python-list@python.org> |
| In reply to | #71519 |
On 05/13/2014 05:10 PM, Steven D'Aprano wrote: > On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote: > >> Because Python 3 presents stdin and stdout as text streams however, it >> makes them more difficult to use with binary data, which is why Armin >> sets up all that extra code to make sure his file objects are binary. > > What surprises me is how hard that is. Surely there's a simpler way to > open stdin and stdout in binary mode? If not, there ought to be. Somebody already posted this: https://docs.python.org/3/library/sys.html#sys.stdin which talks about .detach(). -- ~Ethan~
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-05-14 17:47 -0400 |
| Message-ID | <mailman.10023.1400104047.18130.python-list@python.org> |
| In reply to | #71519 |
On 5/13/2014 8:53 PM, Ethan Furman wrote: > On 05/13/2014 05:10 PM, Steven D'Aprano wrote: >> On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote: >> >>> Because Python 3 presents stdin and stdout as text streams however, it >>> makes them more difficult to use with binary data, which is why Armin >>> sets up all that extra code to make sure his file objects are binary. >> >> What surprises me is how hard that is. Surely there's a simpler way to >> open stdin and stdout in binary mode? If not, there ought to be. > > Somebody already posted this: > > https://docs.python.org/3/library/sys.html#sys.stdin > > which talks about .detach(). I sent a message to Armin about this. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Antoine Pitrou <antoine@python.org> |
|---|---|
| Date | 2014-05-16 11:50 +0000 |
| Message-ID | <mailman.10068.1400241064.18130.python-list@python.org> |
| In reply to | #71519 |
Terry Reedy <tjreedy <at> udel.edu> writes: > > On 5/13/2014 8:53 PM, Ethan Furman wrote: > > On 05/13/2014 05:10 PM, Steven D'Aprano wrote: > >> On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote: > >> > >>> Because Python 3 presents stdin and stdout as text streams however, it > >>> makes them more difficult to use with binary data, which is why Armin > >>> sets up all that extra code to make sure his file objects are binary. > >> > >> What surprises me is how hard that is. Surely there's a simpler way to > >> open stdin and stdout in binary mode? If not, there ought to be. > > > > Somebody already posted this: > > > > https://docs.python.org/3/library/sys.html#sys.stdin > > > > which talks about .detach(). > > I sent a message to Armin about this. And the documentation has now been fixed: http://bugs.python.org/issue21364 So something *can* come out of a python-list rantfest, it seems. Regards Antoine.
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-05-16 06:20 -0700 |
| Message-ID | <bbb2b11c-2a17-4eaa-8a84-f69d1750fa6e@googlegroups.com> |
| In reply to | #71658 |
Le vendredi 16 mai 2014 13:50:47 UTC+2, Antoine Pitrou a écrit : > Terry Reedy <tjreedy <at> udel.edu> writes: > > > > > > On 5/13/2014 8:53 PM, Ethan Furman wrote: > > > > On 05/13/2014 05:10 PM, Steven D'Aprano wrote: > > > >> On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote: > > > >> > > > >>> Because Python 3 presents stdin and stdout as text streams however, it > > > >>> makes them more difficult to use with binary data, which is why Armin > > > >>> sets up all that extra code to make sure his file objects are binary. > > > >> > > > >> What surprises me is how hard that is. Surely there's a simpler way to > > > >> open stdin and stdout in binary mode? If not, there ought to be. > > > > > > > > Somebody already posted this: > > > > > > > > https://docs.python.org/3/library/sys.html#sys.stdin > > > > > > > > which talks about .detach(). > > > > > > I sent a message to Armin about this. > > > > And the documentation has now been fixed: > > http://bugs.python.org/issue21364 > > > > So something *can* come out of a python-list rantfest, it seems. > > > > Regards > > > > Antoine. ====== http://www.unicode.org/ Avec mes meilleures salutations.
[toc] | [prev] | [next] | [standalone]
| From | alister <alister.nospam.ware@ntlworld.com> |
|---|---|
| Date | 2014-05-14 12:38 +0000 |
| Message-ID | <itJcv.77900$dT1.47871@fx12.am4> |
| In reply to | #71502 |
On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote: > On Tue, May 13, 2014 at 5:19 AM, alister > <alister.nospam.ware@ntlworld.com> wrote: >> I am only an amateur python coder which is why I asked if I am missing >> something >> >> I could not see any reason to be using the shutil module if all that >> the programm is doing is opening a file, reading it & then printing it. >> >> is it python that causes the issue, the shutil module or just the OS >> not liking the data it is being sent? >> >> an explanation of why this approach is taken would be much appreciated. > > No, that part is perfectly fine. This is exactly what the shutil module > is meant for: providing shell-like operations. Although in this case > the copyfileobj function is quite simple (have yourself a look at the > source -- it just reads from one file and writes to the other in a > loop), in general the Pythonic thing is to avoid reinventing the wheel. > > And since it's so simple, it shouldn't be hard to see that the use of > the shutil module has nothing to do with the Unicode woes here. The > crux of the issue is that a general-purpose command like cat typically > can't know the encoding of its input and can't assume anything about it. > In fact, there may not even be an encoding; cat can be used with binary > data. The only non-destructive approach then is to copy the binary data > straight from the source to the destination with no decoding steps at > all, and trust the user to ensure that the destination will be able to > accommodate the source encoding. Because Python 3 presents stdin and > stdout as text streams however, it makes them more difficult to use with > binary data, which is why Armin sets up all that extra code to make sure > his file objects are binary. I think I understand that in which case I owe Armin an apology, this certainly sounds like a failing in pythons handling of stdout -- Get it up, keep it up... LINUX: Viagra for the PC. -- Chris Abbey
[toc] | [prev] | [next] | [standalone]
| From | Robin Becker <robin@reportlab.com> |
|---|---|
| Date | 2014-05-14 16:30 +0100 |
| Message-ID | <mailman.10010.1400081443.18130.python-list@python.org> |
| In reply to | #71470 |
On 13/05/2014 17:08, Ian Kelly wrote: ......... > > And since it's so simple, it shouldn't be hard to see that the use of > the shutil module has nothing to do with the Unicode woes here. The > crux of the issue is that a general-purpose command like cat typically > can't know the encoding of its input and can't assume anything about > it. In fact, there may not even be an encoding; cat can be used with > binary data. The only non-destructive approach then is to copy the > binary data straight from the source to the destination with no > decoding steps at all, and trust the user to ensure that the > destination will be able to accommodate the source encoding. Because > Python 3 presents stdin and stdout as text streams however, it makes > them more difficult to use with binary data, which is why Armin sets > up all that extra code to make sure his file objects are binary. > Doesn't this issue also come up wherever bytes are being read ie in sockets, pipe file handles etc? Some sources may have well defined encodings and so allow use of unicode strings but surely not all. I imagine all of the problems associated with a broken encoding promise for stdin can also occur with sockets & other sources ie error messages failing to be printable etc etc. Since bytes in Python 3 are not equivalent to the old str (Python 3 bytes != Python 2 str) using bytes everywhere has its own problems. -- Robin Becker
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-05-14 09:56 -0600 |
| Message-ID | <mailman.10012.1400083063.18130.python-list@python.org> |
| In reply to | #71470 |
On Wed, May 14, 2014 at 9:30 AM, Robin Becker <robin@reportlab.com> wrote: > Doesn't this issue also come up wherever bytes are being read ie in sockets, > pipe file handles etc? Some sources may have well defined encodings and so > allow use of unicode strings but surely not all. I imagine all of the > problems associated with a broken encoding promise for stdin can also occur > with sockets & other sources ie error messages failing to be printable etc > etc. Since bytes in Python 3 are not equivalent to the old str (Python 3 > bytes != Python 2 str) using bytes everywhere has its own problems. Sockets send and receive bytes, and pipes created by the subprocess module are opened in binary mode. Pipes inherited as stdin are still assumed to be unicode, though.
[toc] | [prev] | [standalone]
Page 4 of 4 — ← Prev page 1 2 3 [4]
Back to top | Article view | comp.lang.python
csiph-web