Groups > comp.lang.python > #33037 > unrolled thread

Printing characters outside of the ASCII range

Started by	danielk <danielkleinad@gmail.com>
First post	2012-11-09 09:17 -0800
Last post	2012-11-11 15:40 +0100
Articles	16 — 8 participants

Back to article view | Back to comp.lang.python

  Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-09 09:17 -0800
    Re: Printing characters outside of the ASCII range Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-09 10:34 -0700
    Re: Printing characters outside of the ASCII range Andrew Berg <bahamutzero8825@gmail.com> - 2012-11-09 11:39 -0600
    Re: Printing characters outside of the ASCII range Dave Angel <d@davea.name> - 2012-11-09 12:47 -0500
      Re: Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-09 13:17 -0800
        RE: Printing characters outside of the ASCII range "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-11-09 21:34 +0000
          Re: Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-09 13:46 -0800
            Re: Printing characters outside of the ASCII range Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-09 15:10 -0700
              Re: Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-11 05:42 -0800
                Re: Printing characters outside of the ASCII range Lele Gaifax <lele@metapensiero.it> - 2012-11-11 18:09 +0100
              Re: Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-11 05:42 -0800
          Re: Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-09 13:46 -0800
        Re: Printing characters outside of the ASCII range Andrew Berg <bahamutzero8825@gmail.com> - 2012-11-09 15:39 -0600
      Re: Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-09 13:17 -0800
    Re: Printing characters outside of the ASCII range wxjmfauth@gmail.com - 2012-11-10 02:09 -0800
    Re: Printing characters outside of the ASCII range Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2012-11-11 15:40 +0100

#33037 — Printing characters outside of the ASCII range

From	danielk <danielkleinad@gmail.com>
Date	2012-11-09 09:17 -0800
Subject	Printing characters outside of the ASCII range
Message-ID	<3d4644f8-ab88-41c5-9a52-2a5678dd64c0@googlegroups.com>

I'm converting an application to Python 3. The app works fine on Python 2.

Simply put, this simple one-liner:

print(chr(254))

errors out with:

Traceback (most recent call last):
  File "D:\home\python\tst.py", line 1, in <module>
    print(chr(254))
  File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>

I'm using this character as a delimiter in my application.

What do I have to do to convert this string so that it does not error out?

[toc] | [next] | [standalone]

#33040

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2012-11-09 10:34 -0700
Message-ID	<mailman.3505.1352482524.27098.python-list@python.org>
In reply to	#33037

On Fri, Nov 9, 2012 at 10:17 AM, danielk <danielkleinad@gmail.com> wrote:
> I'm converting an application to Python 3. The app works fine on Python 2.
>
> Simply put, this simple one-liner:
>
> print(chr(254))
>
> errors out with:
>
> Traceback (most recent call last):
>   File "D:\home\python\tst.py", line 1, in <module>
>     print(chr(254))
>   File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
>
> I'm using this character as a delimiter in my application.
>
> What do I have to do to convert this string so that it does not error out?

In Python 2, chr(254) means the byte 254.

In Python 3, chr(254) means the Unicode character with code point 254,
which is "þ".  This character does not exist in CP 437, so it fails to
encode it for output.

If what you really want is the byte, then use b'\xfe' or bytes([254]) instead.

[toc] | [prev] | [next] | [standalone]

#33042

From	Andrew Berg <bahamutzero8825@gmail.com>
Date	2012-11-09 11:39 -0600
Message-ID	<mailman.3507.1352482800.27098.python-list@python.org>
In reply to	#33037

On 2012.11.09 11:17, danielk wrote:
> I'm converting an application to Python 3. The app works fine on Python 2.
> 
> Simply put, this simple one-liner:
> 
> print(chr(254))
> 
> errors out with:
> 
> Traceback (most recent call last):
>   File "D:\home\python\tst.py", line 1, in <module>
>     print(chr(254))
>   File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
> 
> I'm using this character as a delimiter in my application.
> 
> What do I have to do to convert this string so that it does not error out?
> 
That character is outside of cp437 - the default terminal encoding on
many Windows systems. You will either need to change the code page to
something that supports the character (if you're going to change it, you
might as well change it to cp65001 since you are using 3.3), catch the
error and replace the character with something that is in the current
codepage (don't assume cp437; it is not the default everywhere), or use
a different character completely. If it works on Python 2, it's probably
changing the character automatically to a replacement character or you
were using IDLE, which is graphical and is not subject to the weird
encoding system of terminals.
-- 
CPython 3.3.0 | Windows NT 6.1.7601.17835

[toc] | [prev] | [next] | [standalone]

#33043

From	Dave Angel <d@davea.name>
Date	2012-11-09 12:47 -0500
Message-ID	<mailman.3508.1352483285.27098.python-list@python.org>
In reply to	#33037

On 11/09/2012 12:17 PM, danielk wrote:
> I'm converting an application to Python 3. The app works fine on Python 2.
>
> Simply put, this simple one-liner:
>
> print(chr(254))
>
> errors out with:
>
> Traceback (most recent call last):
>   File "D:\home\python\tst.py", line 1, in <module>
>     print(chr(254))
>   File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
>
> I'm using this character as a delimiter in my application.
>
> What do I have to do to convert this string so that it does not error out?

What character do you want?  What characters do your console handle
directly?  What does a "delimiter" mean for your particular console?

Or are you just printing it for the fun of it, and the real purpose is
for further processing, which will not go to the console?

What kind of things will it be separating?  (strings, bytes ?)  Clearly
you originally picked it as something unlikely to occur in those elements.

When those things are combined with a separator between, how are the
results going to be used?  Saved to a file?  Printed to console?  What?

-- 

DaveA

[toc] | [prev] | [next] | [standalone]

#33051

From	danielk <danielkleinad@gmail.com>
Date	2012-11-09 13:17 -0800
Message-ID	<99d5bd83-35ab-4801-b953-391c497c35bf@googlegroups.com>
In reply to	#33043

On Friday, November 9, 2012 12:48:05 PM UTC-5, Dave Angel wrote:
> On 11/09/2012 12:17 PM, danielk wrote:
> 
> > I'm converting an application to Python 3. The app works fine on Python 2.
> 
> >
> 
> > Simply put, this simple one-liner:
> 
> >
> 
> > print(chr(254))
> 
> >
> 
> > errors out with:
> 
> >
> 
> > Traceback (most recent call last):
> 
> >   File "D:\home\python\tst.py", line 1, in <module>
> 
> >     print(chr(254))
> 
> >   File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
> 
> >     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> 
> > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
> 
> >
> 
> > I'm using this character as a delimiter in my application.
> 
> >
> 
> > What do I have to do to convert this string so that it does not error out?
> 
> 
> 
> What character do you want?  What characters do your console handle
> 
> directly?  What does a "delimiter" mean for your particular console?
> 
> 
> 
> Or are you just printing it for the fun of it, and the real purpose is
> 
> for further processing, which will not go to the console?
> 
> 
> 
> What kind of things will it be separating?  (strings, bytes ?)  Clearly
> 
> you originally picked it as something unlikely to occur in those elements.
> 
> 
> 
> When those things are combined with a separator between, how are the
> 
> results going to be used?  Saved to a file?  Printed to console?  What?
> 
> 
> 
> -- 
> 
> 
> 
> DaveA

The database I'm using stores information as a 3-dimensional array. The delimiters between elements are chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for simplicity):

name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip

The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name' then the 'address' field would look like this:

addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...

I use Python to connect to the database using subprocess.Popen to run a server process. Python requests 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some actions require that the server send back information in the form of records that contain those delimiters.

I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could convert those characters on the server before sending them to Python and that is what I'm probably going to do, so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.

I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I know there will be characters outside of the ASCII range of 0-127?

[toc] | [prev] | [next] | [standalone]

#33054

From	"Prasad, Ramit" <ramit.prasad@jpmorgan.com>
Date	2012-11-09 21:34 +0000
Message-ID	<mailman.3517.1352496859.27098.python-list@python.org>
In reply to	#33051

danielk wrote:
> 
> The database I'm using stores information as a 3-dimensional array. The delimiters between elements are
> chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for
> simplicity):
> 
> name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip
> 
> The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name'
> then the 'address' field would look like this:
> 
> addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...
> 
> I use Python to connect to the database using subprocess.Popen to run a server process. Python requests
> 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some
> actions require that the server send back information in the form of records that contain those delimiters.
> 
> I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could
> convert those characters on the server before sending them to Python and that is what I'm probably going to do,
> so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.
> 
> I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I
> know there will be characters outside of the ASCII range of 0-127?

You just need to change the string to one that is not 
trying to use the ASCII codec when printing. 

print(chr(253).decode('latin1')) # change latin1 to your 
                                 # chosen encoding.
ý


~Ramit


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.

[toc] | [prev] | [next] | [standalone]

#33056

From	danielk <danielkleinad@gmail.com>
Date	2012-11-09 13:46 -0800
Message-ID	<ba72452f-fac6-4b18-9b22-d1854eecbffb@googlegroups.com>
In reply to	#33054

On Friday, November 9, 2012 4:34:19 PM UTC-5, Prasad, Ramit wrote:
> danielk wrote:
> 
> > 
> 
> > The database I'm using stores information as a 3-dimensional array. The delimiters between elements are
> 
> > chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for
> 
> > simplicity):
> 
> > 
> 
> > name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip
> 
> > 
> 
> > The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name'
> 
> > then the 'address' field would look like this:
> 
> > 
> 
> > addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...
> 
> > 
> 
> > I use Python to connect to the database using subprocess.Popen to run a server process. Python requests
> 
> > 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some
> 
> > actions require that the server send back information in the form of records that contain those delimiters.
> 
> > 
> 
> > I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could
> 
> > convert those characters on the server before sending them to Python and that is what I'm probably going to do,
> 
> > so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.
> 
> > 
> 
> > I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I
> 
> > know there will be characters outside of the ASCII range of 0-127?
> 
> 
> 
> You just need to change the string to one that is not 
> 
> trying to use the ASCII codec when printing. 
> 
> 
> 
> print(chr(253).decode('latin1')) # change latin1 to your 
> 
>                                  # chosen encoding.
> 
> ý
> 
> 
> 
> 
> 
> ~Ramit
> 
> 
> 
> 
> 
> This email is confidential and subject to important disclaimers and
> 
> conditions including on offers for the purchase or sale of
> 
> securities, accuracy and completeness of information, viruses,
> 
> confidentiality, legal privilege, and legal entity disclaimers,
> 
> available at http://www.jpmorgan.com/pages/disclosures/email.

D:\home\python>pytest.py
Traceback (most recent call last):
  File "D:\home\python\pytest.py", line 1, in <module>
    print(chr(253).decode('latin1'))
AttributeError: 'str' object has no attribute 'decode'

Do I need to import something?

[toc] | [prev] | [next] | [standalone]

#33059

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2012-11-09 15:10 -0700
Message-ID	<mailman.3521.1352499071.27098.python-list@python.org>
In reply to	#33056

On Fri, Nov 9, 2012 at 2:46 PM, danielk <danielkleinad@gmail.com> wrote:
> D:\home\python>pytest.py
> Traceback (most recent call last):
>   File "D:\home\python\pytest.py", line 1, in <module>
>     print(chr(253).decode('latin1'))
> AttributeError: 'str' object has no attribute 'decode'
>
> Do I need to import something?

Ramit should have written "encode", not "decode".  But the above still
would not work, because chr(253) gives you the character at *Unicode*
code point 253, not the character with CP437 ordinal 253 that your
terminal can actually print.  The Unicode equivalents of those
characters are:

>>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))
[8319, 178, 9632]

So these are what you would need to encode to CP437 for printing.

>>> print(chr(8319))
ⁿ
>>> print(chr(178))
²
>>> print(chr(9632))
■

That's probably not the way you want to go about printing them,
though, unless you mean to be inserting them manually.  Is the data
you get from your database a string, or a bytes object?  If the
former, just do:

print(data.encode('cp437'))

If the latter, then it should be printable as is, unless it is in some
other encoding than CP437.

[toc] | [prev] | [next] | [standalone]

#33126

From	danielk <danielkleinad@gmail.com>
Date	2012-11-11 05:42 -0800
Message-ID	<90c86fc7-a462-4a19-b883-17c64244c806@googlegroups.com>
In reply to	#33059

On Friday, November 9, 2012 5:11:12 PM UTC-5, Ian wrote:
> On Fri, Nov 9, 2012 at 2:46 PM, danielk <danielkleinad@gmail.com> wrote:
> 
> > D:\home\python>pytest.py
> 
> > Traceback (most recent call last):
> 
> >   File "D:\home\python\pytest.py", line 1, in <module>
> 
> >     print(chr(253).decode('latin1'))
> 
> > AttributeError: 'str' object has no attribute 'decode'
> 
> >
> 
> > Do I need to import something?
> 
> 
> 
> Ramit should have written "encode", not "decode".  But the above still
> 
> would not work, because chr(253) gives you the character at *Unicode*
> 
> code point 253, not the character with CP437 ordinal 253 that your
> 
> terminal can actually print.  The Unicode equivalents of those
> 
> characters are:
> 
> 
> 
> >>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))
> 
> [8319, 178, 9632]
> 
> 
> 
> So these are what you would need to encode to CP437 for printing.
> 
> 
> 
> >>> print(chr(8319))
> 
> ⁿ
> 
> >>> print(chr(178))
> 
> ²
> 
> >>> print(chr(9632))
> 
> ■
> 
> 
> 
> That's probably not the way you want to go about printing them,
> 
> though, unless you mean to be inserting them manually.  Is the data
> 
> you get from your database a string, or a bytes object?  If the
> 
> former, just do:
> 
> 
> 
> print(data.encode('cp437'))
> 
> 
> 
> If the latter, then it should be printable as is, unless it is in some
> 
> other encoding than CP437.

Ian's solution gives me what I need (thanks Ian!). But I notice a difference between '__str__' and '__repr__'.

class Pytest(str):
    def __init__(self, data = None):
        if data == None: data = ""
        self.data = data

    def __repr__(self):
        return (self.data).encode('cp437')

>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
abc²def
>>> print(p.data)
abc²def
>>> print(type(p.data))
<class 'str'>

If I change '__repr__' to '__str__' then I get:

>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type bytes)

Why is '__str__' behaving differently than '__repr__' ? I'd like to be able to use '__str__' because the result is not executable code, it's just a string of the record contents.

The documentation for the 'encode' method says: "Return an encoded version of the string as a bytes object." Yet when I displayed the type, it said it was <class 'str'>, which I'm taking to be 'type string', or can a 'string' also be 'a string of bytes' ?

I'm trying to get my head around all this codecs/unicode stuff. I haven't had to deal with it until now but I'm determined to not let it get the best of me :-)

My goals are:

a) display a 'raw' database record with the delimiters intact, and
b) allow the client to create a string that represents a database record. So, if they know the record format then they should be able to create a database object like it does above, but with the chr(25x) characters. I will handle the conversion of the chr(25x) characters internally.

[toc] | [prev] | [next] | [standalone]

#33131

From	Lele Gaifax <lele@metapensiero.it>
Date	2012-11-11 18:09 +0100
Message-ID	<mailman.3560.1352653786.27098.python-list@python.org>
In reply to	#33126

danielk <danielkleinad@gmail.com> writes:

> Ian's solution gives me what I need (thanks Ian!). But I notice a
> difference between '__str__' and '__repr__'.
>
> class Pytest(str):
>     def __init__(self, data = None):
>         if data == None: data = ""
>         self.data = data
>
>     def __repr__(self):
>         return (self.data).encode('cp437')
>

The correct way of comparing with None (and in general with
“singletons”) is with the “is” operator, not with “==”.

> If I change '__repr__' to '__str__' then I get:
>
>>>> import pytest
>>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>>> print(p)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: __str__ returned non-string (type bytes)

In Python 3.3 there is one kind of string, the one that under Python 2.x
was called “unicode”. When you encode such a string with a specific
encoding you obtain a plain “bytes array”. No surprise that the
__str__() method complains, it's called like that for a reason :)

> I'm trying to get my head around all this codecs/unicode stuff. I
> haven't had to deal with it until now but I'm determined to not let it
> get the best of me :-)

Two good readings on the subject:

- http://nedbatchelder.com/text/unipain.html
- http://www.joelonsoftware.com/articles/Unicode.html

ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@metapensiero.it  |                 -- Fortunato Depero, 1929.

[toc] | [prev] | [next] | [standalone]

#33127

From	danielk <danielkleinad@gmail.com>
Date	2012-11-11 05:42 -0800
Message-ID	<mailman.3558.1352642076.27098.python-list@python.org>
In reply to	#33059

On Friday, November 9, 2012 5:11:12 PM UTC-5, Ian wrote:
> On Fri, Nov 9, 2012 at 2:46 PM, danielk <danielkleinad@gmail.com> wrote:
> 
> > D:\home\python>pytest.py
> 
> > Traceback (most recent call last):
> 
> >   File "D:\home\python\pytest.py", line 1, in <module>
> 
> >     print(chr(253).decode('latin1'))
> 
> > AttributeError: 'str' object has no attribute 'decode'
> 
> >
> 
> > Do I need to import something?
> 
> 
> 
> Ramit should have written "encode", not "decode".  But the above still
> 
> would not work, because chr(253) gives you the character at *Unicode*
> 
> code point 253, not the character with CP437 ordinal 253 that your
> 
> terminal can actually print.  The Unicode equivalents of those
> 
> characters are:
> 
> 
> 
> >>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))
> 
> [8319, 178, 9632]
> 
> 
> 
> So these are what you would need to encode to CP437 for printing.
> 
> 
> 
> >>> print(chr(8319))
> 
> ⁿ
> 
> >>> print(chr(178))
> 
> ²
> 
> >>> print(chr(9632))
> 
> ■
> 
> 
> 
> That's probably not the way you want to go about printing them,
> 
> though, unless you mean to be inserting them manually.  Is the data
> 
> you get from your database a string, or a bytes object?  If the
> 
> former, just do:
> 
> 
> 
> print(data.encode('cp437'))
> 
> 
> 
> If the latter, then it should be printable as is, unless it is in some
> 
> other encoding than CP437.

Ian's solution gives me what I need (thanks Ian!). But I notice a difference between '__str__' and '__repr__'.

class Pytest(str):
    def __init__(self, data = None):
        if data == None: data = ""
        self.data = data

    def __repr__(self):
        return (self.data).encode('cp437')

>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
abc²def
>>> print(p.data)
abc²def
>>> print(type(p.data))
<class 'str'>

If I change '__repr__' to '__str__' then I get:

>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type bytes)

Why is '__str__' behaving differently than '__repr__' ? I'd like to be able to use '__str__' because the result is not executable code, it's just a string of the record contents.

The documentation for the 'encode' method says: "Return an encoded version of the string as a bytes object." Yet when I displayed the type, it said it was <class 'str'>, which I'm taking to be 'type string', or can a 'string' also be 'a string of bytes' ?

I'm trying to get my head around all this codecs/unicode stuff. I haven't had to deal with it until now but I'm determined to not let it get the best of me :-)

My goals are:

a) display a 'raw' database record with the delimiters intact, and
b) allow the client to create a string that represents a database record. So, if they know the record format then they should be able to create a database object like it does above, but with the chr(25x) characters. I will handle the conversion of the chr(25x) characters internally.

[toc] | [prev] | [next] | [standalone]

#33057

From	danielk <danielkleinad@gmail.com>
Date	2012-11-09 13:46 -0800
Message-ID	<mailman.3519.1352497611.27098.python-list@python.org>
In reply to	#33054

On Friday, November 9, 2012 4:34:19 PM UTC-5, Prasad, Ramit wrote:
> danielk wrote:
> 
> > 
> 
> > The database I'm using stores information as a 3-dimensional array. The delimiters between elements are
> 
> > chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for
> 
> > simplicity):
> 
> > 
> 
> > name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip
> 
> > 
> 
> > The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name'
> 
> > then the 'address' field would look like this:
> 
> > 
> 
> > addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...
> 
> > 
> 
> > I use Python to connect to the database using subprocess.Popen to run a server process. Python requests
> 
> > 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some
> 
> > actions require that the server send back information in the form of records that contain those delimiters.
> 
> > 
> 
> > I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could
> 
> > convert those characters on the server before sending them to Python and that is what I'm probably going to do,
> 
> > so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.
> 
> > 
> 
> > I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I
> 
> > know there will be characters outside of the ASCII range of 0-127?
> 
> 
> 
> You just need to change the string to one that is not 
> 
> trying to use the ASCII codec when printing. 
> 
> 
> 
> print(chr(253).decode('latin1')) # change latin1 to your 
> 
>                                  # chosen encoding.
> 
> ý
> 
> 
> 
> 
> 
> ~Ramit
> 
> 
> 
> 
> 
> This email is confidential and subject to important disclaimers and
> 
> conditions including on offers for the purchase or sale of
> 
> securities, accuracy and completeness of information, viruses,
> 
> confidentiality, legal privilege, and legal entity disclaimers,
> 
> available at http://www.jpmorgan.com/pages/disclosures/email.

D:\home\python>pytest.py
Traceback (most recent call last):
  File "D:\home\python\pytest.py", line 1, in <module>
    print(chr(253).decode('latin1'))
AttributeError: 'str' object has no attribute 'decode'

Do I need to import something?

[toc] | [prev] | [next] | [standalone]

#33055

From	Andrew Berg <bahamutzero8825@gmail.com>
Date	2012-11-09 15:39 -0600
Message-ID	<mailman.3518.1352497164.27098.python-list@python.org>
In reply to	#33051

On 2012.11.09 15:17, danielk wrote:
> I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I know there will be characters outside of the ASCII range of 0-127?
You don't. It's raising that exception because the terminal cannot
display that character, not because it's using the wrong encoding. As
Ian mentioned, chr() on Python 2 and chr() on Python 3 return two
different things. I'm not very familiar with the oddities of Python 2,
but I suspect sending bytes to the terminal could work since that is
what chr() on Python 2 returns.
-- 
CPython 3.3.0 | Windows NT 6.1.7601.17835

[toc] | [prev] | [next] | [standalone]

#33052

From	danielk <danielkleinad@gmail.com>
Date	2012-11-09 13:17 -0800
Message-ID	<mailman.3515.1352495868.27098.python-list@python.org>
In reply to	#33043

On Friday, November 9, 2012 12:48:05 PM UTC-5, Dave Angel wrote:
> On 11/09/2012 12:17 PM, danielk wrote:
> 
> > I'm converting an application to Python 3. The app works fine on Python 2.
> 
> >
> 
> > Simply put, this simple one-liner:
> 
> >
> 
> > print(chr(254))
> 
> >
> 
> > errors out with:
> 
> >
> 
> > Traceback (most recent call last):
> 
> >   File "D:\home\python\tst.py", line 1, in <module>
> 
> >     print(chr(254))
> 
> >   File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
> 
> >     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> 
> > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
> 
> >
> 
> > I'm using this character as a delimiter in my application.
> 
> >
> 
> > What do I have to do to convert this string so that it does not error out?
> 
> 
> 
> What character do you want?  What characters do your console handle
> 
> directly?  What does a "delimiter" mean for your particular console?
> 
> 
> 
> Or are you just printing it for the fun of it, and the real purpose is
> 
> for further processing, which will not go to the console?
> 
> 
> 
> What kind of things will it be separating?  (strings, bytes ?)  Clearly
> 
> you originally picked it as something unlikely to occur in those elements.
> 
> 
> 
> When those things are combined with a separator between, how are the
> 
> results going to be used?  Saved to a file?  Printed to console?  What?
> 
> 
> 
> -- 
> 
> 
> 
> DaveA

The database I'm using stores information as a 3-dimensional array. The delimiters between elements are chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for simplicity):

name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip

The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name' then the 'address' field would look like this:

addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...

I use Python to connect to the database using subprocess.Popen to run a server process. Python requests 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some actions require that the server send back information in the form of records that contain those delimiters.

I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could convert those characters on the server before sending them to Python and that is what I'm probably going to do, so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.

I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I know there will be characters outside of the ASCII range of 0-127?

[toc] | [prev] | [next] | [standalone]

#33083

From	wxjmfauth@gmail.com
Date	2012-11-10 02:09 -0800
Message-ID	<8a3c7c1a-5e87-4195-8fa1-c6b5a447ae7f@googlegroups.com>
In reply to	#33037

Le vendredi 9 novembre 2012 18:17:54 UTC+1, danielk a écrit :
> I'm converting an application to Python 3. The app works fine on Python 2.
> 
> 
> 
> Simply put, this simple one-liner:
> 
> 
> 
> print(chr(254))
> 
> 
> 
> errors out with:
> 
> 
> 
> Traceback (most recent call last):
> 
>   File "D:\home\python\tst.py", line 1, in <module>
> 
>     print(chr(254))
> 
>   File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
> 
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> 
> UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
> 
> 
> 
> I'm using this character as a delimiter in my application.
> 
> 
> 
> What do I have to do to convert this string so that it does not error out?

-----

There is nothing wrong in having the character with
the code point 0xfe in the cp437 coding scheme as
a delimiter.

If it is coming from a byte string, you should
decode it properly

>>> b'=\xfe=\xfe='.decode('cp437')
'=■=■='

or you can use directly the unicode equivalent

>>> '=\u25a0=\u25a0='
'=■=■='

That's for "input". For "output" see:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/c29f2f7f5a4962e8#


The choice of that character as a delimiter is not wrong.
It's a little bit unfortunate, because it falls high in
the "unicode table".

>>> import fourbiunicode as fu
>>> fu.UnicodeBlock('\u25a0')
'Geometric Shapes'
>>>
>>> fu.UnicodeBlock(b'\xfe'.decode('cp437'))
'Geometric Shapes'

(Another form of explanation)
jmf

[toc] | [prev] | [next] | [standalone]

#33129

From	Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de>
Date	2012-11-11 15:40 +0100
Message-ID	<k7odci$2e3$1@r03.glglgl.gl>
In reply to	#33037

Am 09.11.2012 18:17 schrieb danielk:

> I'm using this character as a delimiter in my application.

Then you probably use the *byte* 254 as opposed to the *character* 254.

So it might be better to either switch to byte strings, or output the 
representation of the string instead of itself.

So do print(repr(chr(254))) or, for byte strings, print(bytes([254])).


Thomas

[toc] | [prev] | [standalone]

csiph-web

Printing characters outside of the ASCII range

Contents

#33037 — Printing characters outside of the ASCII range

#33040

#33042

#33043

#33051

#33054

#33056

#33059

#33126

#33131

#33127

#33057

#33055

#33052

#33083

#33129