Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #71389 > unrolled thread

Everything you did not want to know about Unicode in Python 3

Started byMark Lawrence <breamoreboy@yahoo.co.uk>
First post2014-05-12 16:19 +0100
Last post2014-05-14 09:56 -0600
Articles 12 on this page of 72 — 25 participants

Back to article view | Back to comp.lang.python


Contents

  Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-12 16:19 +0100
    Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-12 17:47 +0000
      Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-12 12:31 -0600
      Re: Everything you did not want to know about Unicode in Python 3 MRAB <python@mrabarnett.plus.com> - 2014-05-12 20:42 +0100
      Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-12 16:16 -0600
      Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 09:42 +1000
      Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 01:18 +0000
        Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 11:39 +1000
          Re: Everything you did not want to know about Unicode in Python 3 alex23 <wuwei23@gmail.com> - 2014-05-13 16:25 +1000
            Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 16:32 +1000
        Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-12 20:58 -0500
        Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-13 03:33 +0100
        Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-12 22:10 -0700
          Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 00:39 -0500
            Re: Everything you did not want to know about Unicode in Python 3 Gene Heskett <gheskett@wdtv.com> - 2014-05-13 01:45 -0400
            Re: Everything you did not want to know about Unicode in Python 3 Ben Finney <ben@benfinney.id.au> - 2014-05-13 16:03 +1000
            Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-12 23:09 -0700
            Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 16:18 +1000
              Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 01:32 -0500
              Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 01:32 -0500
              Re: Everything you did not want to know about Unicode in Python 3 Roy Smith <roy@panix.com> - 2014-05-13 07:20 -0400
                Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 13:39 +0000
                  Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:43 +1000
                    Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-13 07:30 -0700
                      Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 00:36 +1000
                  Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 13:51 +0000
                    Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-13 14:42 +0000
                      Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 15:21 +0000
                      Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 23:53 +0000
                        Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 10:08 +1000
                          Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 12:42 +0000
                            Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 22:52 +1000
                            Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-16 14:46 +0000
                              Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 01:07 +0000
                                Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-17 07:19 +0300
                                  Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-17 09:35 +0100
                                  Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 10:29 +0100
                                    Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 14:15 +0000
                                      Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 22:01 +0100
                                Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 09:57 +0100
                                  Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 12:07 +0000
                                    Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 22:07 +0100
                                Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-17 19:18 +1000
                                Re: Everything you did not want to know about Unicode in Python 3 Ben Finney <ben@benfinney.id.au> - 2014-05-17 21:05 +1000
                        [OT] Copyright statements and why they can be useful (was: Everything you did not want to know about Unicode in Python 3) Ben Finney <ben@benfinney.id.au> - 2014-05-14 11:01 +1000
                        Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-14 09:07 -0600
                  Re: Everything you did not want to know about Unicode in Python 3 Dave Angel <davea@davea.name> - 2014-05-13 21:56 -0400
              Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 13:49 +0000
        Re: Everything you did not want to know about Unicode in Python 3 gregor <gregor@ediwo.com> - 2014-05-13 09:27 +0200
        Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 10:08 +0200
          Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 11:25 +0300
            Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 18:38 +1000
              Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 12:06 +0300
                Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 19:29 +1000
                Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve@pearwood.info> - 2014-05-13 09:44 +0000
              Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 11:38 +0200
            Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 11:46 +0200
              Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 12:59 +0300
            Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-13 14:30 +0100
            Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:37 +1000
            Re: Everything you did not want to know about Unicode in Python 3 Skip Montanaro <skip@pobox.com> - 2014-05-13 09:02 -0500
          Re: Everything you did not want to know about Unicode in Python 3 wxjmfauth@gmail.com - 2014-05-14 00:00 -0700
        Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-13 11:19 +0000
          Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-13 10:08 -0600
            Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-14 00:10 +0000
              Re: Everything you did not want to know about Unicode in Python 3 Ethan Furman <ethan@stoneleaf.us> - 2014-05-13 17:53 -0700
              Re: Everything you did not want to know about Unicode in Python 3 Terry Reedy <tjreedy@udel.edu> - 2014-05-14 17:47 -0400
              Re: Everything you did not want to know about Unicode in Python 3 Antoine Pitrou <antoine@python.org> - 2014-05-16 11:50 +0000
                Re: Everything you did not want to know about Unicode in Python 3 wxjmfauth@gmail.com - 2014-05-16 06:20 -0700
            Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 12:38 +0000
          Re: Everything you did not want to know about Unicode in Python 3 Robin Becker <robin@reportlab.com> - 2014-05-14 16:30 +0100
          Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-14 09:56 -0600

Page 4 of 4 — ← Prev page 1 2 3 [4]


#71493

FromSkip Montanaro <skip@pobox.com>
Date2014-05-13 09:02 -0500
Message-ID<mailman.9969.1399989757.18130.python-list@python.org>
In reply to#71450

[Multipart message — attachments visible in raw view] — view raw

On Tue, May 13, 2014 at 3:38 AM, Chris Angelico <rosuav@gmail.com> wrote:
>> Python 2's ambiguity allows me not to answer the tough philosophical
>> questions. I'm not saying it's necessarily a good thing, but it has its
>> benefits.
>
> It's not a good thing. It means that you have the convenience of
> pretending there's no problem, which means you don't notice trouble
> until something happens... and then, in all probability, your app is
> in production and you have no idea why stuff went wrong.

BITD, when I still maintained and developed Musi-Cal (an early online
concert calendar, long since gone), I faced a challenge when I first
started encountering non-ASCII band names and cities. I resisted UTF-8.
After all, if I printed a string containing an "é", it came out looking like



What kind of mess was that???

I tried to ignore it, or assume Latin-1 would cover all the bases (my first
non-ASCII inputs tended to come from Western Europe). If nothing else, at
least "é" was legible.

Needless to say, those approaches didn't work well. After perhaps six
months or a year, I broke down and started converting everything coming in
​ or going out​
to UTF-8 at the boundaries of my system (making educated guesses at
​input
 encodings if necessary). My life got a whole lot easier after that. The
distinction between bytes and text didn't really matter much, certainly not
compared to the mess I had before where strings of unknown data leaked into
my system and its database.

Skip

​P.S. My apologies for the mess this message probably is. Amazing as it may
seem, Gmail in Chrome does a crappy job editing anything other than plain
text. Also, I'm surprised in this day and age that common tools like Gnome
Terminal have little or no encoding support. I wound up having to pop up
urxvt to get an encodings-flexible terminal emulator...​

[toc] | [prev] | [next] | [standalone]


#71536

Fromwxjmfauth@gmail.com
Date2014-05-14 00:00 -0700
Message-ID<09bacccb-5a82-4c3d-93ee-ec9dee1a2588@googlegroups.com>
In reply to#71449
Le mardi 13 mai 2014 10:08:45 UTC+2, Johannes Bauer a écrit :
> On 13.05.2014 03:18, Steven D'Aprano wrote:
> 
> 
> 
> > Armin Ronacher is an extremely experienced and knowledgeable Python 
> 
> > developer, and a Python core developer. He might be wrong, but he's not 
> 
> > *obviously* wrong.
> 
> 
> 
> He's correct about file name encodings. Which can be fixed really easily
> 
> wihtout messing everything up (sys.argv binary variant, open accepting
> 
> binary filenames). But that he suggests that Go would be superior:
> 
> 
> 
> > Which uses an even simpler model than Python 2: everything is a byte string. The assumed encoding is UTF-8. End of the story.
> 
> 
> 
> Is just a horrible idea. An obviously horrible idea, too.
> 
> 
> 
> Having dealt with the UTF-8 problems on Python2 I can safely say that I
> 
> never, never ever want to go back to that freaky hell. If I deal with
> 
> strings, I want to be able to sanely manipulate them and I want to be
> 
> sure that after manipulation they're still valid strings. Manipulating
> 
> the bytes representation of unicode data just doesn't work.
> 
> 
> 
> And I'm very very glad that some people felt the same way and
> 
> implemented a sane, consistent way of dealing with Unicode in Python3.
> 
> It's one of the reasons why I switched to Py3 very early and I love it.
> 
> 
> 
> Cheers,
> 
> Johannes
> 
> 
> 
> -- 
> 
> >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> 
> > Zumindest nicht öffentlich!
> 
> Ah, der neueste und bis heute genialste Streich unsere großen
> 
> Kosmologen: Die Geheim-Vorhersage.
> 
>  - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>

===========

A Rob 'Commander' Pike will never put utf16 and
ebcdic in the same basket, when discussing coding
of characters.

jmf

[toc] | [prev] | [next] | [standalone]


#71470

Fromalister <alister.nospam.ware@ntlworld.com>
Date2014-05-13 11:19 +0000
Message-ID<Kcncv.34412$GL7.204@fx10.am4>
In reply to#71416
On Tue, 13 May 2014 01:18:35 +0000, Steven D'Aprano wrote:

> On Mon, 12 May 2014 17:47:48 +0000, alister wrote:
> 
>> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:
>> 
>>> This was *NOT* written by our resident unicode expert
>>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
>>> 
>>> Posted as I thought it would make a rather pleasant change from
>>> interminable threads about names vs values vs variables vs objects.
>> 
>> Surely those example programs are not the pythonoic way to do things or
>> am i missing something?
> 
> Armin Ronacher is an extremely experienced and knowledgeable Python
> developer, and a Python core developer. He might be wrong, but he's not
> *obviously* wrong.
> 
I am only an amateur python coder which is why I asked if I am missing 
something

I could not see any reason to be using the shutil module if all that the 
programm is doing is opening a file, reading it & then printing it.

is it python that causes the issue, the shutil module or just the OS not 
liking the data it is being sent?

an explanation of why this approach is taken would be much appreciated.



-- 
Revenge is a form of nostalgia.

[toc] | [prev] | [next] | [standalone]


#71502

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-05-13 10:08 -0600
Message-ID<mailman.9974.1399999114.18130.python-list@python.org>
In reply to#71470
On Tue, May 13, 2014 at 5:19 AM, alister
<alister.nospam.ware@ntlworld.com> wrote:
> I am only an amateur python coder which is why I asked if I am missing
> something
>
> I could not see any reason to be using the shutil module if all that the
> programm is doing is opening a file, reading it & then printing it.
>
> is it python that causes the issue, the shutil module or just the OS not
> liking the data it is being sent?
>
> an explanation of why this approach is taken would be much appreciated.

No, that part is perfectly fine.  This is exactly what the shutil
module is meant for: providing shell-like operations.  Although in
this case the copyfileobj function is quite simple (have yourself a
look at the source -- it just reads from one file and writes to the
other in a loop), in general the Pythonic thing is to avoid
reinventing the wheel.

And since it's so simple, it shouldn't be hard to see that the use of
the shutil module has nothing to do with the Unicode woes here.  The
crux of the issue is that a general-purpose command like cat typically
can't know the encoding of its input and can't assume anything about
it. In fact, there may not even be an encoding; cat can be used with
binary data.  The only non-destructive approach then is to copy the
binary data straight from the source to the destination with no
decoding steps at all, and trust the user to ensure that the
destination will be able to accommodate the source encoding.  Because
Python 3 presents stdin and stdout as text streams however, it makes
them more difficult to use with binary data, which is why Armin sets
up all that extra code to make sure his file objects are binary.

[toc] | [prev] | [next] | [standalone]


#71519

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-05-14 00:10 +0000
Message-ID<5372b493$0$29977$c3e8da3$5496439d@news.astraweb.com>
In reply to#71502
On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote:

> Because Python 3 presents stdin and stdout as text streams however, it
> makes them more difficult to use with binary data, which is why Armin
> sets up all that extra code to make sure his file objects are binary.

What surprises me is how hard that is. Surely there's a simpler way to 
open stdin and stdout in binary mode? If not, there ought to be.




-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#71524

FromEthan Furman <ethan@stoneleaf.us>
Date2014-05-13 17:53 -0700
Message-ID<mailman.9986.1400031716.18130.python-list@python.org>
In reply to#71519
On 05/13/2014 05:10 PM, Steven D'Aprano wrote:
> On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote:
>
>> Because Python 3 presents stdin and stdout as text streams however, it
>> makes them more difficult to use with binary data, which is why Armin
>> sets up all that extra code to make sure his file objects are binary.
>
> What surprises me is how hard that is. Surely there's a simpler way to
> open stdin and stdout in binary mode? If not, there ought to be.

Somebody already posted this:

https://docs.python.org/3/library/sys.html#sys.stdin

which talks about .detach().

--
~Ethan~

[toc] | [prev] | [next] | [standalone]


#71579

FromTerry Reedy <tjreedy@udel.edu>
Date2014-05-14 17:47 -0400
Message-ID<mailman.10023.1400104047.18130.python-list@python.org>
In reply to#71519
On 5/13/2014 8:53 PM, Ethan Furman wrote:
> On 05/13/2014 05:10 PM, Steven D'Aprano wrote:
>> On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote:
>>
>>> Because Python 3 presents stdin and stdout as text streams however, it
>>> makes them more difficult to use with binary data, which is why Armin
>>> sets up all that extra code to make sure his file objects are binary.
>>
>> What surprises me is how hard that is. Surely there's a simpler way to
>> open stdin and stdout in binary mode? If not, there ought to be.
>
> Somebody already posted this:
>
> https://docs.python.org/3/library/sys.html#sys.stdin
>
> which talks about .detach().

I sent a message to Armin about this.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#71658

FromAntoine Pitrou <antoine@python.org>
Date2014-05-16 11:50 +0000
Message-ID<mailman.10068.1400241064.18130.python-list@python.org>
In reply to#71519
Terry Reedy <tjreedy <at> udel.edu> writes:
> 
> On 5/13/2014 8:53 PM, Ethan Furman wrote:
> > On 05/13/2014 05:10 PM, Steven D'Aprano wrote:
> >> On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote:
> >>
> >>> Because Python 3 presents stdin and stdout as text streams however, it
> >>> makes them more difficult to use with binary data, which is why Armin
> >>> sets up all that extra code to make sure his file objects are binary.
> >>
> >> What surprises me is how hard that is. Surely there's a simpler way to
> >> open stdin and stdout in binary mode? If not, there ought to be.
> >
> > Somebody already posted this:
> >
> > https://docs.python.org/3/library/sys.html#sys.stdin
> >
> > which talks about .detach().
> 
> I sent a message to Armin about this.

And the documentation has now been fixed:
http://bugs.python.org/issue21364

So something *can* come out of a python-list rantfest, it seems.

Regards

Antoine.

[toc] | [prev] | [next] | [standalone]


#71661

Fromwxjmfauth@gmail.com
Date2014-05-16 06:20 -0700
Message-ID<bbb2b11c-2a17-4eaa-8a84-f69d1750fa6e@googlegroups.com>
In reply to#71658
Le vendredi 16 mai 2014 13:50:47 UTC+2, Antoine Pitrou a écrit :
> Terry Reedy <tjreedy <at> udel.edu> writes:
> 
> > 
> 
> > On 5/13/2014 8:53 PM, Ethan Furman wrote:
> 
> > > On 05/13/2014 05:10 PM, Steven D'Aprano wrote:
> 
> > >> On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote:
> 
> > >>
> 
> > >>> Because Python 3 presents stdin and stdout as text streams however, it
> 
> > >>> makes them more difficult to use with binary data, which is why Armin
> 
> > >>> sets up all that extra code to make sure his file objects are binary.
> 
> > >>
> 
> > >> What surprises me is how hard that is. Surely there's a simpler way to
> 
> > >> open stdin and stdout in binary mode? If not, there ought to be.
> 
> > >
> 
> > > Somebody already posted this:
> 
> > >
> 
> > > https://docs.python.org/3/library/sys.html#sys.stdin
> 
> > >
> 
> > > which talks about .detach().
> 
> > 
> 
> > I sent a message to Armin about this.
> 
> 
> 
> And the documentation has now been fixed:
> 
> http://bugs.python.org/issue21364
> 
> 
> 
> So something *can* come out of a python-list rantfest, it seems.
> 
> 
> 
> Regards
> 
> 
> 
> Antoine.

======

http://www.unicode.org/

Avec mes meilleures salutations.

[toc] | [prev] | [next] | [standalone]


#71550

Fromalister <alister.nospam.ware@ntlworld.com>
Date2014-05-14 12:38 +0000
Message-ID<itJcv.77900$dT1.47871@fx12.am4>
In reply to#71502
On Tue, 13 May 2014 10:08:42 -0600, Ian Kelly wrote:

> On Tue, May 13, 2014 at 5:19 AM, alister
> <alister.nospam.ware@ntlworld.com> wrote:
>> I am only an amateur python coder which is why I asked if I am missing
>> something
>>
>> I could not see any reason to be using the shutil module if all that
>> the programm is doing is opening a file, reading it & then printing it.
>>
>> is it python that causes the issue, the shutil module or just the OS
>> not liking the data it is being sent?
>>
>> an explanation of why this approach is taken would be much appreciated.
> 
> No, that part is perfectly fine.  This is exactly what the shutil module
> is meant for: providing shell-like operations.  Although in this case
> the copyfileobj function is quite simple (have yourself a look at the
> source -- it just reads from one file and writes to the other in a
> loop), in general the Pythonic thing is to avoid reinventing the wheel.
> 
> And since it's so simple, it shouldn't be hard to see that the use of
> the shutil module has nothing to do with the Unicode woes here.  The
> crux of the issue is that a general-purpose command like cat typically
> can't know the encoding of its input and can't assume anything about it.
> In fact, there may not even be an encoding; cat can be used with binary
> data.  The only non-destructive approach then is to copy the binary data
> straight from the source to the destination with no decoding steps at
> all, and trust the user to ensure that the destination will be able to
> accommodate the source encoding.  Because Python 3 presents stdin and
> stdout as text streams however, it makes them more difficult to use with
> binary data, which is why Armin sets up all that extra code to make sure
> his file objects are binary.

I think I understand that 
in which case I owe Armin an apology, this certainly sounds like a 
failing in pythons handling of stdout



-- 
Get it up, keep it up... LINUX: Viagra for the PC.
   
   -- Chris Abbey

[toc] | [prev] | [next] | [standalone]


#71565

FromRobin Becker <robin@reportlab.com>
Date2014-05-14 16:30 +0100
Message-ID<mailman.10010.1400081443.18130.python-list@python.org>
In reply to#71470
On 13/05/2014 17:08, Ian Kelly wrote:
.........
>
> And since it's so simple, it shouldn't be hard to see that the use of
> the shutil module has nothing to do with the Unicode woes here.  The
> crux of the issue is that a general-purpose command like cat typically
> can't know the encoding of its input and can't assume anything about
> it. In fact, there may not even be an encoding; cat can be used with
> binary data.  The only non-destructive approach then is to copy the
> binary data straight from the source to the destination with no
> decoding steps at all, and trust the user to ensure that the
> destination will be able to accommodate the source encoding.  Because
> Python 3 presents stdin and stdout as text streams however, it makes
> them more difficult to use with binary data, which is why Armin sets
> up all that extra code to make sure his file objects are binary.
>
Doesn't this issue also come up wherever bytes are being read ie in sockets, 
pipe file handles etc? Some sources may have well defined encodings and so allow 
use of unicode strings but surely not all. I imagine all of the problems 
associated with a broken encoding promise for stdin can also occur with sockets 
& other sources ie error messages failing to be printable etc etc. Since bytes 
in Python 3 are not equivalent to the old str (Python 3 bytes != Python 2 str) 
using bytes everywhere has its own problems.
-- 
Robin Becker

[toc] | [prev] | [next] | [standalone]


#71567

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-05-14 09:56 -0600
Message-ID<mailman.10012.1400083063.18130.python-list@python.org>
In reply to#71470
On Wed, May 14, 2014 at 9:30 AM, Robin Becker <robin@reportlab.com> wrote:
> Doesn't this issue also come up wherever bytes are being read ie in sockets,
> pipe file handles etc? Some sources may have well defined encodings and so
> allow use of unicode strings but surely not all. I imagine all of the
> problems associated with a broken encoding promise for stdin can also occur
> with sockets & other sources ie error messages failing to be printable etc
> etc. Since bytes in Python 3 are not equivalent to the old str (Python 3
> bytes != Python 2 str) using bytes everywhere has its own problems.

Sockets send and receive bytes, and pipes created by the subprocess
module are opened in binary mode.  Pipes inherited as stdin are still
assumed to be unicode, though.

[toc] | [prev] | [standalone]


Page 4 of 4 — ← Prev page 1 2 3 [4]

Back to top | Article view | comp.lang.python


csiph-web