Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #7270

Re: the stupid encoding problem to stdout

References <4df02e04$0$1779$a729d347@news.telepac.pt>
Date 2011-06-08 20:00 -0700
Subject Re: the stupid encoding problem to stdout
From Benjamin Kaplan <benjamin.kaplan@case.edu>
Newsgroups comp.lang.python
Message-ID <mailman.40.1307588443.11593.python-list@python.org> (permalink)

Show all headers | View raw


2011/6/8 Sérgio Monteiro Basto <sergiomb@sapo.pt>:
> hi,
> cat test.py
> #!/usr/bin/env python
> #-*- coding: utf-8 -*-
> u = u'moçambique'
> print u.encode("utf-8")
> print u
>
> chmod +x test.py
> ./test.py
> moçambique
> moçambique
>
> ./test.py > output.txt
> Traceback (most recent call last):
>  File "./test.py", line 5, in <module>
>    print u
> UnicodeEncodeError: 'ascii' codec can't encode character
> u'\xe7' in position 2: ordinal not in range(128)
>
> in python 2.7
> how I explain to python to send the same thing to stdout and
> the file output.txt ?
>
> Don't seems logic, when send things to a file the beaviour
> change.
>
> Thanks,
> Sérgio M. B.

That's not a terminal vs file thing. It's a "file that declares it's
encoding" vs a "file that doesn't declare it's encoding" thing. Your
terminal declares that it is UTF-8. So when you print a Unicode string
to your terminal, Python knows that it's supposed to turn it into
UTF-8. When you pipe the output to a file, that file doesn't declare
an encoding. So rather than guess which encoding you want, Python
defaults to the lowest common denominator: ASCII. If you want
something to be a particular encoding, you have to encode it yourself.

You have a couple of choices on how to make it work:
1) Play dumb and always encode as UTF-8. This would look really weird
if someone tried running your program in a terminal with a CP-847
encoding (like cmd.exe on at least the US version of Windows), but it
would never crash.
2) Check sys.stdout.encoding. If it's ascii, then encode your unicode
string in the string-escape encoding, which substitutes the escape
sequence in for all non-ASCII characters.
3) Check to see if sys.stdout.isatty() and have different behavior for
terminals vs files. If you're on a terminal that doesn't declare its
encoding, encoding it as UTF-8 probably won't help. If you're writing
to a file, that might be what you want to do.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

the stupid encoding problem to stdout Sérgio Monteiro Basto <sergiomb@sapo.pt> - 2011-06-09 03:18 +0100
  Re: the stupid encoding problem to stdout Ben Finney <ben+python@benfinney.id.au> - 2011-06-09 12:39 +1000
    Re: the stupid encoding problem to stdout Sérgio Monteiro Basto <sergiomb@sapo.pt> - 2011-06-09 22:16 +0100
      Re: the stupid encoding problem to stdout Ben Finney <ben+python@benfinney.id.au> - 2011-06-10 09:19 +1000
  Re: the stupid encoding problem to stdout Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-06-08 20:00 -0700
    Re: the stupid encoding problem to stdout Sérgio Monteiro Basto <sergiomb@sapo.pt> - 2011-06-09 22:14 +0100
      Re: the stupid encoding problem to stdout Nobody <nobody@nowhere.com> - 2011-06-09 22:46 +0100
        Re: the stupid encoding problem to stdout Terry Reedy <tjreedy@udel.edu> - 2011-06-09 20:14 -0400
        Re: the stupid encoding problem to stdout Sérgio Monteiro Basto <sergiomb@sapo.pt> - 2011-06-10 02:11 +0100
          Re: the stupid encoding problem to stdout Ben Finney <ben+python@benfinney.id.au> - 2011-06-10 11:45 +1000
            Re: the stupid encoding problem to stdout Sérgio Monteiro Basto <sergiomb@sapo.pt> - 2011-06-10 02:59 +0100
            Re: the stupid encoding problem to stdout Sérgio Monteiro Basto <sergiomb@sapo.pt> - 2011-06-10 16:11 +0100
              Re: the stupid encoding problem to stdout Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-10 10:58 -0600
                Re: the stupid encoding problem to stdout Sérgio Monteiro Basto <sergiomb@sapo.pt> - 2011-06-13 15:15 +0100
                Re: the stupid encoding problem to stdout Chris Angelico <rosuav@gmail.com> - 2011-06-14 00:49 +1000
              Re: the stupid encoding problem to stdout Chris Angelico <rosuav@gmail.com> - 2011-06-11 08:07 +1000
      Re: the stupid encoding problem to stdout "Mark Tolonen" <metolone+gmane@gmail.com> - 2011-06-09 17:57 -0700
        Re: the stupid encoding problem to stdout Sérgio Monteiro Basto <sergiomb@sapo.pt> - 2011-06-10 02:17 +0100
  Re: the stupid encoding problem to stdout Laurent Claessens <moky.math@gmail.com> - 2011-06-10 07:47 +0200
  Re: the stupid encoding problem to stdout Laurent Claessens <moky.math@gmail.com> - 2011-06-10 07:47 +0200
  Re: the stupid encoding problem to stdout Laurent Claessens <moky.math@gmail.com> - 2011-06-10 07:47 +0200

csiph-web