Groups > comp.lang.python > #56657 > unrolled thread

Unicode Objects in Tuples

Started by	Stephen Tucker <stephen_tucker@sil.org>
First post	2013-10-11 09:16 +0100
Last post	2013-10-11 17:06 +0000
Articles	2 — 2 participants

Back to article view | Back to comp.lang.python

  Unicode Objects in Tuples Stephen Tucker <stephen_tucker@sil.org> - 2013-10-11 09:16 +0100
    Re: Unicode Objects in Tuples Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-11 17:06 +0000

#56657 — Unicode Objects in Tuples

From	Stephen Tucker <stephen_tucker@sil.org>
Date	2013-10-11 09:16 +0100
Subject	Unicode Objects in Tuples
Message-ID	<mailman.989.1381479460.18130.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

I am using IDLE, Python 2.7.2 on Windows 7, 64-bit.

I have four questions:

1. Why is it that
     print unicode_object
displays non-ASCII characters in the unicode object correctly, whereas
     print (unicode_object, another_unicode_object)
displays non-ASCII characters in the unicode objects as escape sequences
(as repr() does)?

2. Given that this is actually *deliberately *the case (which I, at the
moment, am finding difficult to accept), what is the neatest (that is, the
most Pythonic) way to get non-ASCII characters in unicode objects in tuples
displayed correctly?

3. A similar thing happens when I write such objects and tuples to a file
opened by
     codecs.open ( ..., "utf-8")
I have also found that, even though I use  write  to send the text to the
file, unicode objects not in tuples get their non-ASCII characters sent to
the file correctly, whereas, unicode objects in tuples get their characters
sent to the file as escape sequences. Why is this the case?

4. As for question 1 above, I ask here also: What is the neatest way to get
round this?

Stephen Tucker.

[toc] | [next] | [standalone]

#56696

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-10-11 17:06 +0000
Message-ID	<52582ffd$0$29984$c3e8da3$5496439d@news.astraweb.com>
In reply to	#56657

On Fri, 11 Oct 2013 09:16:36 +0100, Stephen Tucker wrote:

> I am using IDLE, Python 2.7.2 on Windows 7, 64-bit.
> 
> I have four questions:
> 
> 1. Why is it that
>      print unicode_object
> displays non-ASCII characters in the unicode object correctly, whereas
>      print (unicode_object, another_unicode_object)
> displays non-ASCII characters in the unicode objects as escape sequences
> (as repr() does)?

Because that is the design of Python. Printing compound objects like 
tuples, lists and dicts always uses the repr of the components. 
Otherwise, you couldn't tell the difference between (say) (23, 42) and 
("23", "42").

If you want something different, you have to do it yourself.

However, having said that, it is true that the repr() of Unicode strings 
in Python 2 is rather lame. Python 3 is much better:

[steve@ando ~]$ python2.7 -c "print repr(u'∫ßδЛ')"
u'\xe2\x88\xab\xc3\x9f\xce\xb4\xd0\x9b'

[steve@ando ~]$ python3.3 -c "print(repr('∫ßδЛ'))"
'∫ßδЛ'

So if you have the opportunity to upgrade to Python 3.3, I recommend it.

> 2. Given that this is actually *deliberately *the case (which I, at the
> moment, am finding difficult to accept), what is the neatest (that is,
> the most Pythonic) way to get non-ASCII characters in unicode objects in
> tuples displayed correctly?

I'd go with something like this helper function:

def print_unicode(obj):
    if isinstance(obj, (tuple, list, set, frozenset)):
        print u', '.join(unicode(item) for item in obj)
    else:
        print unicode(item)

Adjust to taste :-)

> 3. A similar thing happens when I write such objects and tuples to a
> file opened by
>      codecs.open ( ..., "utf-8")
> I have also found that, even though I use  write  to send the text to
> the file, unicode objects not in tuples get their non-ASCII characters
> sent to the file correctly, whereas, unicode objects in tuples get their
> characters sent to the file as escape sequences. Why is this the case?

Same reason. The default string converter for tuples uses the repr, which 
intentionally uses escape sequences. If you want something different, you 
can program it yourself.

-- 
Steven

[toc] | [prev] | [standalone]

csiph-web

Unicode Objects in Tuples

Contents

#56657 — Unicode Objects in Tuples

#56696