Groups > comp.lang.python > #62528 > unrolled thread

unicode to human readable format

Started by	tomasz.kaczorek@gmail.com
First post	2013-12-22 04:24 -0800
Last post	2013-12-27 06:59 -0500
Articles	8 — 7 participants

Back to article view | Back to comp.lang.python

  unicode to human readable format tomasz.kaczorek@gmail.com - 2013-12-22 04:24 -0800
    Re: unicode to human readable format Chris “Kwpolska” Warrick <kwpolska@gmail.com> - 2013-12-22 13:30 +0100
    Re: unicode to human readable format Peter Otten <__peter__@web.de> - 2013-12-22 13:33 +0100
    Re: unicode to human readable format tomasz.kaczorek@gmail.com - 2013-12-27 02:43 -0800
      Re: unicode to human readable format Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-27 22:37 +1100
        Re: unicode to human readable format wxjmfauth@gmail.com - 2013-12-27 23:48 -0800
      Re: unicode to human readable format Ned Batchelder <ned@nedbatchelder.com> - 2013-12-27 06:47 -0500
      Re: unicode to human readable format Dave Angel <davea@davea.name> - 2013-12-27 06:59 -0500

#62528 — unicode to human readable format

From	tomasz.kaczorek@gmail.com
Date	2013-12-22 04:24 -0800
Subject	unicode to human readable format
Message-ID	<f6f3d6e9-d0f5-4c06-8f15-42aabf149473@googlegroups.com>

Hi,
i'm looking for solution the unicode string translation to the more readable format. 
I've got string like s=s=[u'\u0105\u017c\u0119\u0142\u0144'] and have no idea how to change to the human readable format. please help!

regards,
tomasz

[toc] | [next] | [standalone]

#62529

From	Chris “Kwpolska” Warrick <kwpolska@gmail.com>
Date	2013-12-22 13:30 +0100
Message-ID	<mailman.4491.1387715422.18130.python-list@python.org>
In reply to	#62528

On Sun, Dec 22, 2013 at 1:24 PM,  <tomasz.kaczorek@gmail.com> wrote:
> Hi,
> i'm looking for solution the unicode string translation to the more readable format.
> I've got string like s=s=[u'\u0105\u017c\u0119\u0142\u0144'] and have no idea how to change to the human readable format. please help!
>
> regards,
> tomasz
> --
> https://mail.python.org/mailman/listinfo/python-list

While printing the string, instead of the list/seeing the list’s repr,
Python shows a nice human-friendly representation.

>>> s=[u'\u0105\u017c\u0119\u0142\u0144']
>>> s
[u'\u0105\u017c\u0119\u0142\u0144']
>>> s[0]
u'\u0105\u017c\u0119\u0142\u0144'
>>> print s
[u'\u0105\u017c\u0119\u0142\u0144']
>>> print s[0]
ążęłń

However, that is only the case with Python 2, as Python 3 has a
human-friendly representation in the repr, too:

>>> s=[u'\u0105\u017c\u0119\u0142\u0144']
>>> s
['ążęłń']

-- 
Chris “Kwpolska” Warrick <http://kwpolska.tk>
PGP: 5EAAEA16
stop html mail | always bottom-post | only UTF-8 makes sense

[toc] | [prev] | [next] | [standalone]

#62530

From	Peter Otten <__peter__@web.de>
Date	2013-12-22 13:33 +0100
Message-ID	<mailman.4492.1387715621.18130.python-list@python.org>
In reply to	#62528

tomasz.kaczorek@gmail.com wrote:

> Hi,
> i'm looking for solution the unicode string translation to the more
> readable format. I've got string like
> s=s=[u'\u0105\u017c\u0119\u0142\u0144'] and have no idea how to change to
> the human readable format. please help!

No, you have a list of strings:

>>> list_of_strings = [u'\u0105\u017c\u0119\u0142\u0144']
>>> print list_of_strings
[u'\u0105\u017c\u0119\u0142\u0144']

When a list is printed the individual items are converted to strings with 
repr() to avoid ambiguous output e. g. for strings with embeded commas.

If you want human readable strings print them individually instead of the 
whole list at once:

>>> for string in list_of_strings:
...     print string
... 
ążęłń

[toc] | [prev] | [next] | [standalone]

#62786

From	tomasz.kaczorek@gmail.com
Date	2013-12-27 02:43 -0800
Message-ID	<7f0f59ef-5a9e-4c6b-832a-dfaee3ee4dcf@googlegroups.com>
In reply to	#62528

hello,
can I ask you for help? when I try to print s[0] i vane the message: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128). 
how to solve my problem, please?


regards,
t.

[toc] | [prev] | [next] | [standalone]

#62790

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-12-27 22:37 +1100
Message-ID	<52bd666d$0$29999$c3e8da3$5496439d@news.astraweb.com>
In reply to	#62786

tomasz.kaczorek@gmail.com wrote:

> hello,
> can I ask you for help? when I try to print s[0] i vane the message:
> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1:
> ordinal not in range(128). how to solve my problem, please?

What version of Python?

What operating system?

What environment are you running in? IDLE? The shell or cmd.exe? Powershell?
xterm? Something else?

Please copy and paste the complete traceback, starting from the line

    Traceback (most recent call last):

to the end.

Please print repr(s[0]) and show us the output.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#62835

From	wxjmfauth@gmail.com
Date	2013-12-27 23:48 -0800
Message-ID	<6a48f9a5-e676-4699-8dd8-ff7c43250583@googlegroups.com>
In reply to	#62790

Le vendredi 27 décembre 2013 12:37:17 UTC+1, Steven D'Aprano a écrit :
> tomasz.kaczorek@gmail.com wrote:
> 
> 
> 
> > hello,
> 
> > can I ask you for help? when I try to print s[0] i vane the message:
> 
> > UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1:
> 
> > ordinal not in range(128). how to solve my problem, please?
> 
> 
> 
> What version of Python?
> 
> 
> 
> What operating system?
> 
> 
> 
> What environment are you running in? IDLE? The shell or cmd.exe? Powershell?
> 
> xterm? Something else?
> 
> 
> 
> Please copy and paste the complete traceback, starting from the line
> 
> 
> 
>     Traceback (most recent call last):
> 
> 
> 
> to the end.
> 
> 
> 
> Please print repr(s[0]) and show us the output.
> 
> 


What do you expect?
The representation is - and should be -

>>> print repr(s[0])
u'\u0105\u017c\u0119\u0142\u0144'

independently of the tool one uses to process such
a code.


Now, if one prints s[0], the result may be - and should be -
different from the tool.


win console, cp850

>>> print s[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\python27\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-4: cha
racter maps to <undefined>
>>>


win console, cp1252

>>> print s[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\python27\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-4: cha
racter maps to <undefined>
>>>

win console, cp1250

>>> s = [u'\u0105\u017c\u0119\u0142\u0144']
>>> print s[0]
ążęłń
>>>


SciTE editor, output pane "locale", cp1252 for me.

Traceback (most recent call last):
  File "utrick.py", line 18, in <module>
    print u'\u0105\u017c\u0119\u0142\u0144'
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
>Exit code: 1


SciTE editor, output pane 65001

Traceback (most recent call last):
  File "utrick.py", line 18, in <module>
    print u'\u0105\u017c\u0119\u0142\u0144'
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
>Exit code: 1


Now in IDLE, Western European version of Windows, 
one get this

>>> print s[0]
ążęłń

Note, by chance it is printing something. It may
come it does not print, understand, render chars
at all. *This is wrong*.



My interactive interpreter I wrote for Py2.*
(full of dirty tricks).

>>> print repr(s[0])
u'\u0105\u017c\u0119\u0142\u0144'
>>> print s[0]
?????

*This is correct*, it is an expected result and it
works for all chars.



A (the) correct way to print s[0] with a console (all
platforms).

>>> print s[0].encode(sys.stdout.encoding, 'replace')
?????
>>>


See the another thread about printing repr().


jmf

[toc] | [prev] | [next] | [standalone]

#62792

From	Ned Batchelder <ned@nedbatchelder.com>
Date	2013-12-27 06:47 -0500
Message-ID	<mailman.4663.1388144838.18130.python-list@python.org>
In reply to	#62786

On 12/27/13 5:43 AM, tomasz.kaczorek@gmail.com wrote:
> hello,
> can I ask you for help? when I try to print s[0] i vane the message: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128).
> how to solve my problem, please?
>
>
> regards,
> t.
>

For help with the fundamentals, you can read or watch this PyCon 
presentation:  Pragmatic Unicode, or, How Do I Stop the Pain? 
http://nedbatchelder.com/text/unipain.html

-- 
Ned Batchelder, http://nedbatchelder.com

[toc] | [prev] | [next] | [standalone]

#62793

From	Dave Angel <davea@davea.name>
Date	2013-12-27 06:59 -0500
Message-ID	<mailman.4664.1388145522.18130.python-list@python.org>
In reply to	#62786

On Fri, 27 Dec 2013 02:43:58 -0800 (PST), tomasz.kaczorek@gmail.com 
wrote:
> can I ask you for help? when I try to print s[0] i vane the 
message: UnicodeEncodeError: 'ascii' codec can't encode characters in 
position 0-1: ordinal not in range(128). 
> how to solve my problem, please?

First, what version of what os, and what version of python? 

Next,  what terminal are you running,  or what ide, and do you have 
stdout redirected? 

Finally what does your program look like, or at least tell us the 
type and represents of s [0].

Bottom line is that s [0] contains a code point that's larger than 7f 
and print is convinced that your terminal can handle only ASCII.

-- 
DaveA

[toc] | [prev] | [standalone]

csiph-web

unicode to human readable format

Contents

#62528 — unicode to human readable format

#62529

#62530

#62786

#62790

#62835

#62792

#62793