Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #33037 > unrolled thread
| Started by | danielk <danielkleinad@gmail.com> |
|---|---|
| First post | 2012-11-09 09:17 -0800 |
| Last post | 2012-11-11 15:40 +0100 |
| Articles | 16 — 8 participants |
Back to article view | Back to comp.lang.python
Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-09 09:17 -0800
Re: Printing characters outside of the ASCII range Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-09 10:34 -0700
Re: Printing characters outside of the ASCII range Andrew Berg <bahamutzero8825@gmail.com> - 2012-11-09 11:39 -0600
Re: Printing characters outside of the ASCII range Dave Angel <d@davea.name> - 2012-11-09 12:47 -0500
Re: Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-09 13:17 -0800
RE: Printing characters outside of the ASCII range "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-11-09 21:34 +0000
Re: Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-09 13:46 -0800
Re: Printing characters outside of the ASCII range Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-09 15:10 -0700
Re: Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-11 05:42 -0800
Re: Printing characters outside of the ASCII range Lele Gaifax <lele@metapensiero.it> - 2012-11-11 18:09 +0100
Re: Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-11 05:42 -0800
Re: Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-09 13:46 -0800
Re: Printing characters outside of the ASCII range Andrew Berg <bahamutzero8825@gmail.com> - 2012-11-09 15:39 -0600
Re: Printing characters outside of the ASCII range danielk <danielkleinad@gmail.com> - 2012-11-09 13:17 -0800
Re: Printing characters outside of the ASCII range wxjmfauth@gmail.com - 2012-11-10 02:09 -0800
Re: Printing characters outside of the ASCII range Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2012-11-11 15:40 +0100
| From | danielk <danielkleinad@gmail.com> |
|---|---|
| Date | 2012-11-09 09:17 -0800 |
| Subject | Printing characters outside of the ASCII range |
| Message-ID | <3d4644f8-ab88-41c5-9a52-2a5678dd64c0@googlegroups.com> |
I'm converting an application to Python 3. The app works fine on Python 2.
Simply put, this simple one-liner:
print(chr(254))
errors out with:
Traceback (most recent call last):
File "D:\home\python\tst.py", line 1, in <module>
print(chr(254))
File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
I'm using this character as a delimiter in my application.
What do I have to do to convert this string so that it does not error out?
[toc] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2012-11-09 10:34 -0700 |
| Message-ID | <mailman.3505.1352482524.27098.python-list@python.org> |
| In reply to | #33037 |
On Fri, Nov 9, 2012 at 10:17 AM, danielk <danielkleinad@gmail.com> wrote: > I'm converting an application to Python 3. The app works fine on Python 2. > > Simply put, this simple one-liner: > > print(chr(254)) > > errors out with: > > Traceback (most recent call last): > File "D:\home\python\tst.py", line 1, in <module> > print(chr(254)) > File "C:\Python33\lib\encodings\cp437.py", line 19, in encode > return codecs.charmap_encode(input,self.errors,encoding_map)[0] > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined> > > I'm using this character as a delimiter in my application. > > What do I have to do to convert this string so that it does not error out? In Python 2, chr(254) means the byte 254. In Python 3, chr(254) means the Unicode character with code point 254, which is "þ". This character does not exist in CP 437, so it fails to encode it for output. If what you really want is the byte, then use b'\xfe' or bytes([254]) instead.
[toc] | [prev] | [next] | [standalone]
| From | Andrew Berg <bahamutzero8825@gmail.com> |
|---|---|
| Date | 2012-11-09 11:39 -0600 |
| Message-ID | <mailman.3507.1352482800.27098.python-list@python.org> |
| In reply to | #33037 |
On 2012.11.09 11:17, danielk wrote: > I'm converting an application to Python 3. The app works fine on Python 2. > > Simply put, this simple one-liner: > > print(chr(254)) > > errors out with: > > Traceback (most recent call last): > File "D:\home\python\tst.py", line 1, in <module> > print(chr(254)) > File "C:\Python33\lib\encodings\cp437.py", line 19, in encode > return codecs.charmap_encode(input,self.errors,encoding_map)[0] > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined> > > I'm using this character as a delimiter in my application. > > What do I have to do to convert this string so that it does not error out? > That character is outside of cp437 - the default terminal encoding on many Windows systems. You will either need to change the code page to something that supports the character (if you're going to change it, you might as well change it to cp65001 since you are using 3.3), catch the error and replace the character with something that is in the current codepage (don't assume cp437; it is not the default everywhere), or use a different character completely. If it works on Python 2, it's probably changing the character automatically to a replacement character or you were using IDLE, which is graphical and is not subject to the weird encoding system of terminals. -- CPython 3.3.0 | Windows NT 6.1.7601.17835
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-11-09 12:47 -0500 |
| Message-ID | <mailman.3508.1352483285.27098.python-list@python.org> |
| In reply to | #33037 |
On 11/09/2012 12:17 PM, danielk wrote: > I'm converting an application to Python 3. The app works fine on Python 2. > > Simply put, this simple one-liner: > > print(chr(254)) > > errors out with: > > Traceback (most recent call last): > File "D:\home\python\tst.py", line 1, in <module> > print(chr(254)) > File "C:\Python33\lib\encodings\cp437.py", line 19, in encode > return codecs.charmap_encode(input,self.errors,encoding_map)[0] > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined> > > I'm using this character as a delimiter in my application. > > What do I have to do to convert this string so that it does not error out? What character do you want? What characters do your console handle directly? What does a "delimiter" mean for your particular console? Or are you just printing it for the fun of it, and the real purpose is for further processing, which will not go to the console? What kind of things will it be separating? (strings, bytes ?) Clearly you originally picked it as something unlikely to occur in those elements. When those things are combined with a separator between, how are the results going to be used? Saved to a file? Printed to console? What? -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | danielk <danielkleinad@gmail.com> |
|---|---|
| Date | 2012-11-09 13:17 -0800 |
| Message-ID | <99d5bd83-35ab-4801-b953-391c497c35bf@googlegroups.com> |
| In reply to | #33043 |
On Friday, November 9, 2012 12:48:05 PM UTC-5, Dave Angel wrote: > On 11/09/2012 12:17 PM, danielk wrote: > > > I'm converting an application to Python 3. The app works fine on Python 2. > > > > > > Simply put, this simple one-liner: > > > > > > print(chr(254)) > > > > > > errors out with: > > > > > > Traceback (most recent call last): > > > File "D:\home\python\tst.py", line 1, in <module> > > > print(chr(254)) > > > File "C:\Python33\lib\encodings\cp437.py", line 19, in encode > > > return codecs.charmap_encode(input,self.errors,encoding_map)[0] > > > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined> > > > > > > I'm using this character as a delimiter in my application. > > > > > > What do I have to do to convert this string so that it does not error out? > > > > What character do you want? What characters do your console handle > > directly? What does a "delimiter" mean for your particular console? > > > > Or are you just printing it for the fun of it, and the real purpose is > > for further processing, which will not go to the console? > > > > What kind of things will it be separating? (strings, bytes ?) Clearly > > you originally picked it as something unlikely to occur in those elements. > > > > When those things are combined with a separator between, how are the > > results going to be used? Saved to a file? Printed to console? What? > > > > -- > > > > DaveA The database I'm using stores information as a 3-dimensional array. The delimiters between elements are chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for simplicity): name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name' then the 'address' field would look like this: addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ... I use Python to connect to the database using subprocess.Popen to run a server process. Python requests 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some actions require that the server send back information in the form of records that contain those delimiters. I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could convert those characters on the server before sending them to Python and that is what I'm probably going to do, so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation. I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I know there will be characters outside of the ASCII range of 0-127?
[toc] | [prev] | [next] | [standalone]
| From | "Prasad, Ramit" <ramit.prasad@jpmorgan.com> |
|---|---|
| Date | 2012-11-09 21:34 +0000 |
| Message-ID | <mailman.3517.1352496859.27098.python-list@python.org> |
| In reply to | #33051 |
danielk wrote:
>
> The database I'm using stores information as a 3-dimensional array. The delimiters between elements are
> chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for
> simplicity):
>
> name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip
>
> The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name'
> then the 'address' field would look like this:
>
> addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...
>
> I use Python to connect to the database using subprocess.Popen to run a server process. Python requests
> 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some
> actions require that the server send back information in the form of records that contain those delimiters.
>
> I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could
> convert those characters on the server before sending them to Python and that is what I'm probably going to do,
> so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.
>
> I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I
> know there will be characters outside of the ASCII range of 0-127?
You just need to change the string to one that is not
trying to use the ASCII codec when printing.
print(chr(253).decode('latin1')) # change latin1 to your
# chosen encoding.
ý
~Ramit
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.
[toc] | [prev] | [next] | [standalone]
| From | danielk <danielkleinad@gmail.com> |
|---|---|
| Date | 2012-11-09 13:46 -0800 |
| Message-ID | <ba72452f-fac6-4b18-9b22-d1854eecbffb@googlegroups.com> |
| In reply to | #33054 |
On Friday, November 9, 2012 4:34:19 PM UTC-5, Prasad, Ramit wrote:
> danielk wrote:
>
> >
>
> > The database I'm using stores information as a 3-dimensional array. The delimiters between elements are
>
> > chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for
>
> > simplicity):
>
> >
>
> > name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip
>
> >
>
> > The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name'
>
> > then the 'address' field would look like this:
>
> >
>
> > addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...
>
> >
>
> > I use Python to connect to the database using subprocess.Popen to run a server process. Python requests
>
> > 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some
>
> > actions require that the server send back information in the form of records that contain those delimiters.
>
> >
>
> > I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could
>
> > convert those characters on the server before sending them to Python and that is what I'm probably going to do,
>
> > so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.
>
> >
>
> > I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I
>
> > know there will be characters outside of the ASCII range of 0-127?
>
>
>
> You just need to change the string to one that is not
>
> trying to use the ASCII codec when printing.
>
>
>
> print(chr(253).decode('latin1')) # change latin1 to your
>
> # chosen encoding.
>
> ý
>
>
>
>
>
> ~Ramit
>
>
>
>
>
> This email is confidential and subject to important disclaimers and
>
> conditions including on offers for the purchase or sale of
>
> securities, accuracy and completeness of information, viruses,
>
> confidentiality, legal privilege, and legal entity disclaimers,
>
> available at http://www.jpmorgan.com/pages/disclosures/email.
D:\home\python>pytest.py
Traceback (most recent call last):
File "D:\home\python\pytest.py", line 1, in <module>
print(chr(253).decode('latin1'))
AttributeError: 'str' object has no attribute 'decode'
Do I need to import something?
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2012-11-09 15:10 -0700 |
| Message-ID | <mailman.3521.1352499071.27098.python-list@python.org> |
| In reply to | #33056 |
On Fri, Nov 9, 2012 at 2:46 PM, danielk <danielkleinad@gmail.com> wrote:
> D:\home\python>pytest.py
> Traceback (most recent call last):
> File "D:\home\python\pytest.py", line 1, in <module>
> print(chr(253).decode('latin1'))
> AttributeError: 'str' object has no attribute 'decode'
>
> Do I need to import something?
Ramit should have written "encode", not "decode". But the above still
would not work, because chr(253) gives you the character at *Unicode*
code point 253, not the character with CP437 ordinal 253 that your
terminal can actually print. The Unicode equivalents of those
characters are:
>>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))
[8319, 178, 9632]
So these are what you would need to encode to CP437 for printing.
>>> print(chr(8319))
ⁿ
>>> print(chr(178))
²
>>> print(chr(9632))
■
That's probably not the way you want to go about printing them,
though, unless you mean to be inserting them manually. Is the data
you get from your database a string, or a bytes object? If the
former, just do:
print(data.encode('cp437'))
If the latter, then it should be printable as is, unless it is in some
other encoding than CP437.
[toc] | [prev] | [next] | [standalone]
| From | danielk <danielkleinad@gmail.com> |
|---|---|
| Date | 2012-11-11 05:42 -0800 |
| Message-ID | <90c86fc7-a462-4a19-b883-17c64244c806@googlegroups.com> |
| In reply to | #33059 |
On Friday, November 9, 2012 5:11:12 PM UTC-5, Ian wrote:
> On Fri, Nov 9, 2012 at 2:46 PM, danielk <danielkleinad@gmail.com> wrote:
>
> > D:\home\python>pytest.py
>
> > Traceback (most recent call last):
>
> > File "D:\home\python\pytest.py", line 1, in <module>
>
> > print(chr(253).decode('latin1'))
>
> > AttributeError: 'str' object has no attribute 'decode'
>
> >
>
> > Do I need to import something?
>
>
>
> Ramit should have written "encode", not "decode". But the above still
>
> would not work, because chr(253) gives you the character at *Unicode*
>
> code point 253, not the character with CP437 ordinal 253 that your
>
> terminal can actually print. The Unicode equivalents of those
>
> characters are:
>
>
>
> >>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))
>
> [8319, 178, 9632]
>
>
>
> So these are what you would need to encode to CP437 for printing.
>
>
>
> >>> print(chr(8319))
>
> ⁿ
>
> >>> print(chr(178))
>
> ²
>
> >>> print(chr(9632))
>
> ■
>
>
>
> That's probably not the way you want to go about printing them,
>
> though, unless you mean to be inserting them manually. Is the data
>
> you get from your database a string, or a bytes object? If the
>
> former, just do:
>
>
>
> print(data.encode('cp437'))
>
>
>
> If the latter, then it should be printable as is, unless it is in some
>
> other encoding than CP437.
Ian's solution gives me what I need (thanks Ian!). But I notice a difference between '__str__' and '__repr__'.
class Pytest(str):
def __init__(self, data = None):
if data == None: data = ""
self.data = data
def __repr__(self):
return (self.data).encode('cp437')
>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
abc²def
>>> print(p.data)
abc²def
>>> print(type(p.data))
<class 'str'>
If I change '__repr__' to '__str__' then I get:
>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type bytes)
Why is '__str__' behaving differently than '__repr__' ? I'd like to be able to use '__str__' because the result is not executable code, it's just a string of the record contents.
The documentation for the 'encode' method says: "Return an encoded version of the string as a bytes object." Yet when I displayed the type, it said it was <class 'str'>, which I'm taking to be 'type string', or can a 'string' also be 'a string of bytes' ?
I'm trying to get my head around all this codecs/unicode stuff. I haven't had to deal with it until now but I'm determined to not let it get the best of me :-)
My goals are:
a) display a 'raw' database record with the delimiters intact, and
b) allow the client to create a string that represents a database record. So, if they know the record format then they should be able to create a database object like it does above, but with the chr(25x) characters. I will handle the conversion of the chr(25x) characters internally.
[toc] | [prev] | [next] | [standalone]
| From | Lele Gaifax <lele@metapensiero.it> |
|---|---|
| Date | 2012-11-11 18:09 +0100 |
| Message-ID | <mailman.3560.1352653786.27098.python-list@python.org> |
| In reply to | #33126 |
danielk <danielkleinad@gmail.com> writes:
> Ian's solution gives me what I need (thanks Ian!). But I notice a
> difference between '__str__' and '__repr__'.
>
> class Pytest(str):
> def __init__(self, data = None):
> if data == None: data = ""
> self.data = data
>
> def __repr__(self):
> return (self.data).encode('cp437')
>
The correct way of comparing with None (and in general with
“singletons”) is with the “is” operator, not with “==”.
> If I change '__repr__' to '__str__' then I get:
>
>>>> import pytest
>>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>>> print(p)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: __str__ returned non-string (type bytes)
In Python 3.3 there is one kind of string, the one that under Python 2.x
was called “unicode”. When you encode such a string with a specific
encoding you obtain a plain “bytes array”. No surprise that the
__str__() method complains, it's called like that for a reason :)
> I'm trying to get my head around all this codecs/unicode stuff. I
> haven't had to deal with it until now but I'm determined to not let it
> get the best of me :-)
Two good readings on the subject:
- http://nedbatchelder.com/text/unipain.html
- http://www.joelonsoftware.com/articles/Unicode.html
ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@metapensiero.it | -- Fortunato Depero, 1929.
[toc] | [prev] | [next] | [standalone]
| From | danielk <danielkleinad@gmail.com> |
|---|---|
| Date | 2012-11-11 05:42 -0800 |
| Message-ID | <mailman.3558.1352642076.27098.python-list@python.org> |
| In reply to | #33059 |
On Friday, November 9, 2012 5:11:12 PM UTC-5, Ian wrote:
> On Fri, Nov 9, 2012 at 2:46 PM, danielk <danielkleinad@gmail.com> wrote:
>
> > D:\home\python>pytest.py
>
> > Traceback (most recent call last):
>
> > File "D:\home\python\pytest.py", line 1, in <module>
>
> > print(chr(253).decode('latin1'))
>
> > AttributeError: 'str' object has no attribute 'decode'
>
> >
>
> > Do I need to import something?
>
>
>
> Ramit should have written "encode", not "decode". But the above still
>
> would not work, because chr(253) gives you the character at *Unicode*
>
> code point 253, not the character with CP437 ordinal 253 that your
>
> terminal can actually print. The Unicode equivalents of those
>
> characters are:
>
>
>
> >>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))
>
> [8319, 178, 9632]
>
>
>
> So these are what you would need to encode to CP437 for printing.
>
>
>
> >>> print(chr(8319))
>
> ⁿ
>
> >>> print(chr(178))
>
> ²
>
> >>> print(chr(9632))
>
> ■
>
>
>
> That's probably not the way you want to go about printing them,
>
> though, unless you mean to be inserting them manually. Is the data
>
> you get from your database a string, or a bytes object? If the
>
> former, just do:
>
>
>
> print(data.encode('cp437'))
>
>
>
> If the latter, then it should be printable as is, unless it is in some
>
> other encoding than CP437.
Ian's solution gives me what I need (thanks Ian!). But I notice a difference between '__str__' and '__repr__'.
class Pytest(str):
def __init__(self, data = None):
if data == None: data = ""
self.data = data
def __repr__(self):
return (self.data).encode('cp437')
>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
abc²def
>>> print(p.data)
abc²def
>>> print(type(p.data))
<class 'str'>
If I change '__repr__' to '__str__' then I get:
>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type bytes)
Why is '__str__' behaving differently than '__repr__' ? I'd like to be able to use '__str__' because the result is not executable code, it's just a string of the record contents.
The documentation for the 'encode' method says: "Return an encoded version of the string as a bytes object." Yet when I displayed the type, it said it was <class 'str'>, which I'm taking to be 'type string', or can a 'string' also be 'a string of bytes' ?
I'm trying to get my head around all this codecs/unicode stuff. I haven't had to deal with it until now but I'm determined to not let it get the best of me :-)
My goals are:
a) display a 'raw' database record with the delimiters intact, and
b) allow the client to create a string that represents a database record. So, if they know the record format then they should be able to create a database object like it does above, but with the chr(25x) characters. I will handle the conversion of the chr(25x) characters internally.
[toc] | [prev] | [next] | [standalone]
| From | danielk <danielkleinad@gmail.com> |
|---|---|
| Date | 2012-11-09 13:46 -0800 |
| Message-ID | <mailman.3519.1352497611.27098.python-list@python.org> |
| In reply to | #33054 |
On Friday, November 9, 2012 4:34:19 PM UTC-5, Prasad, Ramit wrote:
> danielk wrote:
>
> >
>
> > The database I'm using stores information as a 3-dimensional array. The delimiters between elements are
>
> > chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for
>
> > simplicity):
>
> >
>
> > name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip
>
> >
>
> > The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name'
>
> > then the 'address' field would look like this:
>
> >
>
> > addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...
>
> >
>
> > I use Python to connect to the database using subprocess.Popen to run a server process. Python requests
>
> > 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some
>
> > actions require that the server send back information in the form of records that contain those delimiters.
>
> >
>
> > I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could
>
> > convert those characters on the server before sending them to Python and that is what I'm probably going to do,
>
> > so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.
>
> >
>
> > I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I
>
> > know there will be characters outside of the ASCII range of 0-127?
>
>
>
> You just need to change the string to one that is not
>
> trying to use the ASCII codec when printing.
>
>
>
> print(chr(253).decode('latin1')) # change latin1 to your
>
> # chosen encoding.
>
> ý
>
>
>
>
>
> ~Ramit
>
>
>
>
>
> This email is confidential and subject to important disclaimers and
>
> conditions including on offers for the purchase or sale of
>
> securities, accuracy and completeness of information, viruses,
>
> confidentiality, legal privilege, and legal entity disclaimers,
>
> available at http://www.jpmorgan.com/pages/disclosures/email.
D:\home\python>pytest.py
Traceback (most recent call last):
File "D:\home\python\pytest.py", line 1, in <module>
print(chr(253).decode('latin1'))
AttributeError: 'str' object has no attribute 'decode'
Do I need to import something?
[toc] | [prev] | [next] | [standalone]
| From | Andrew Berg <bahamutzero8825@gmail.com> |
|---|---|
| Date | 2012-11-09 15:39 -0600 |
| Message-ID | <mailman.3518.1352497164.27098.python-list@python.org> |
| In reply to | #33051 |
On 2012.11.09 15:17, danielk wrote: > I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I know there will be characters outside of the ASCII range of 0-127? You don't. It's raising that exception because the terminal cannot display that character, not because it's using the wrong encoding. As Ian mentioned, chr() on Python 2 and chr() on Python 3 return two different things. I'm not very familiar with the oddities of Python 2, but I suspect sending bytes to the terminal could work since that is what chr() on Python 2 returns. -- CPython 3.3.0 | Windows NT 6.1.7601.17835
[toc] | [prev] | [next] | [standalone]
| From | danielk <danielkleinad@gmail.com> |
|---|---|
| Date | 2012-11-09 13:17 -0800 |
| Message-ID | <mailman.3515.1352495868.27098.python-list@python.org> |
| In reply to | #33043 |
On Friday, November 9, 2012 12:48:05 PM UTC-5, Dave Angel wrote: > On 11/09/2012 12:17 PM, danielk wrote: > > > I'm converting an application to Python 3. The app works fine on Python 2. > > > > > > Simply put, this simple one-liner: > > > > > > print(chr(254)) > > > > > > errors out with: > > > > > > Traceback (most recent call last): > > > File "D:\home\python\tst.py", line 1, in <module> > > > print(chr(254)) > > > File "C:\Python33\lib\encodings\cp437.py", line 19, in encode > > > return codecs.charmap_encode(input,self.errors,encoding_map)[0] > > > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined> > > > > > > I'm using this character as a delimiter in my application. > > > > > > What do I have to do to convert this string so that it does not error out? > > > > What character do you want? What characters do your console handle > > directly? What does a "delimiter" mean for your particular console? > > > > Or are you just printing it for the fun of it, and the real purpose is > > for further processing, which will not go to the console? > > > > What kind of things will it be separating? (strings, bytes ?) Clearly > > you originally picked it as something unlikely to occur in those elements. > > > > When those things are combined with a separator between, how are the > > results going to be used? Saved to a file? Printed to console? What? > > > > -- > > > > DaveA The database I'm using stores information as a 3-dimensional array. The delimiters between elements are chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for simplicity): name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name' then the 'address' field would look like this: addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ... I use Python to connect to the database using subprocess.Popen to run a server process. Python requests 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some actions require that the server send back information in the form of records that contain those delimiters. I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could convert those characters on the server before sending them to Python and that is what I'm probably going to do, so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation. I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I know there will be characters outside of the ASCII range of 0-127?
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-11-10 02:09 -0800 |
| Message-ID | <8a3c7c1a-5e87-4195-8fa1-c6b5a447ae7f@googlegroups.com> |
| In reply to | #33037 |
Le vendredi 9 novembre 2012 18:17:54 UTC+1, danielk a écrit :
> I'm converting an application to Python 3. The app works fine on Python 2.
>
>
>
> Simply put, this simple one-liner:
>
>
>
> print(chr(254))
>
>
>
> errors out with:
>
>
>
> Traceback (most recent call last):
>
> File "D:\home\python\tst.py", line 1, in <module>
>
> print(chr(254))
>
> File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
>
> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
>
>
>
> I'm using this character as a delimiter in my application.
>
>
>
> What do I have to do to convert this string so that it does not error out?
-----
There is nothing wrong in having the character with
the code point 0xfe in the cp437 coding scheme as
a delimiter.
If it is coming from a byte string, you should
decode it properly
>>> b'=\xfe=\xfe='.decode('cp437')
'=■=■='
or you can use directly the unicode equivalent
>>> '=\u25a0=\u25a0='
'=■=■='
That's for "input". For "output" see:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/c29f2f7f5a4962e8#
The choice of that character as a delimiter is not wrong.
It's a little bit unfortunate, because it falls high in
the "unicode table".
>>> import fourbiunicode as fu
>>> fu.UnicodeBlock('\u25a0')
'Geometric Shapes'
>>>
>>> fu.UnicodeBlock(b'\xfe'.decode('cp437'))
'Geometric Shapes'
(Another form of explanation)
jmf
[toc] | [prev] | [next] | [standalone]
| From | Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> |
|---|---|
| Date | 2012-11-11 15:40 +0100 |
| Message-ID | <k7odci$2e3$1@r03.glglgl.gl> |
| In reply to | #33037 |
Am 09.11.2012 18:17 schrieb danielk: > I'm using this character as a delimiter in my application. Then you probably use the *byte* 254 as opposed to the *character* 254. So it might be better to either switch to byte strings, or output the representation of the string instead of itself. So do print(repr(chr(254))) or, for byte strings, print(bytes([254])). Thomas
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web