Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #32908 > unrolled thread
| Started by | Anders <aschneiderman@asha.org> |
|---|---|
| First post | 2012-11-07 14:17 -0800 |
| Last post | 2012-11-08 21:30 -0600 |
| Articles | 20 on this page of 23 — 9 participants |
Back to article view | Back to comp.lang.python
Right solution to unicode error? Anders <aschneiderman@asha.org> - 2012-11-07 14:17 -0800
RE: Right solution to unicode error? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-11-07 23:07 +0000
Re: Right solution to unicode error? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-11-07 23:27 +0000
Re: Right solution to unicode error? Andrew Berg <bahamutzero8825@gmail.com> - 2012-11-07 17:51 -0600
Re: Right solution to unicode error? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-11-07 23:53 +0000
Re: Right solution to unicode error? Hans Mulder <hansmu@xs4all.nl> - 2012-11-08 12:40 +0100
Re: Right solution to unicode error? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-11-08 00:44 +0000
Re: Right solution to unicode error? wxjmfauth@gmail.com - 2012-11-08 03:01 -0800
RE: Right solution to unicode error? Anders Schneiderman <ASchneiderman@asha.org> - 2012-11-08 09:00 -0500
Re: Right solution to unicode error? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-11-08 14:06 +0000
Re: Right solution to unicode error? wxjmfauth@gmail.com - 2012-11-08 07:05 -0800
Re: Right solution to unicode error? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-11-08 18:32 +0000
Re: Right solution to unicode error? wxjmfauth@gmail.com - 2012-11-08 11:30 -0800
Re: Right solution to unicode error? wxjmfauth@gmail.com - 2012-11-08 11:30 -0800
Re: Right solution to unicode error? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-08 11:48 -0700
Re: Right solution to unicode error? wxjmfauth@gmail.com - 2012-11-08 11:54 -0800
Re: Right solution to unicode error? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-08 13:41 -0700
Re: Right solution to unicode error? wxjmfauth@gmail.com - 2012-11-09 02:06 -0800
RE: Right solution to unicode error? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-11-08 20:54 +0000
Re: Right solution to unicode error? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-08 14:07 -0700
Re: Right solution to unicode error? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-11-08 21:37 +0000
Re: Right solution to unicode error? wxjmfauth@gmail.com - 2012-11-08 11:54 -0800
Re: Right solution to unicode error? Andrew Berg <bahamutzero8825@gmail.com> - 2012-11-08 21:30 -0600
Page 1 of 2 [1] 2 Next page →
| From | Anders <aschneiderman@asha.org> |
|---|---|
| Date | 2012-11-07 14:17 -0800 |
| Subject | Right solution to unicode error? |
| Message-ID | <09a3d20b-5871-47f4-9218-df119698e405@m4g2000yqf.googlegroups.com> |
I've run into a Unicode error, and despite doing some googling, I
can't figure out the right way to fix it. I have a Python 2.6 script
that reads my Outlook 2010 task list. I'm able to read the tasks from
Outlook and store them as a list of objects without a hitch. But when
I try to print the tasks' subjects, one of the tasks is generating an
error:
Traceback (most recent call last):
File "outlook_tasks.py", line 66, in <module>
my_tasks.dump_today_tasks()
File "C:\Users\Anders\code\Task List\tasks.py", line 29, in
dump_today_tasks
print task.subject
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in
position 42: ordinal not in range(128)
(where task.subject was previously assigned the value of
task.Subject, aka the Subject property of an Outlook 2010 TaskItem)
From what I understand from reading online, the error is telling me
that the subject line contains an en dash and that Python is trying
to convert to ascii and failing (as it should).
Here's where I'm getting stuck. In the code above I was just printing
the subject so I can see whether the script is working properly.
Ultimately what I want to do is parse the tasks I'm interested in and
then create an HTML file containing those tasks. Given that, what's
the best way to fix this problem?
BTW, if there's a clear description of the best solution for this
particular problem – i.e., where I want to ultimately display the
results as HTML – please feel free to refer me to the link. I tried
reading a number of docs on the web but still feel pretty lost.
Thanks,
Anders
[toc] | [next] | [standalone]
| From | "Prasad, Ramit" <ramit.prasad@jpmorgan.com> |
|---|---|
| Date | 2012-11-07 23:07 +0000 |
| Message-ID | <mailman.3400.1352329734.27098.python-list@python.org> |
| In reply to | #32908 |
Anders wrote: > > I've run into a Unicode error, and despite doing some googling, I > can't figure out the right way to fix it. I have a Python 2.6 script > that reads my Outlook 2010 task list. I'm able to read the tasks from > Outlook and store them as a list of objects without a hitch. But when > I try to print the tasks' subjects, one of the tasks is generating an > error: > > Traceback (most recent call last): > File "outlook_tasks.py", line 66, in <module> > my_tasks.dump_today_tasks() > File "C:\Users\Anders\code\Task List\tasks.py", line 29, in > dump_today_tasks > print task.subject > UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in > position 42: ordinal not in range(128) > > (where task.subject was previously assigned the value of > task.Subject, aka the Subject property of an Outlook 2010 TaskItem) > > From what I understand from reading online, the error is telling me > that the subject line contains an en dash and that Python is trying > to convert to ascii and failing (as it should). > > Here's where I'm getting stuck. In the code above I was just printing > the subject so I can see whether the script is working properly. > Ultimately what I want to do is parse the tasks I'm interested in and > then create an HTML file containing those tasks. Given that, what's > the best way to fix this problem? > > BTW, if there's a clear description of the best solution for this > particular problem - i.e., where I want to ultimately display the > results as HTML - please feel free to refer me to the link. I tried > reading a number of docs on the web but still feel pretty lost. > You can always encode in a non-ASCII codec. `print task.subject.encode(<encoding>)` where <encoding> is something that supports the characters you want e.g. latin1. The list of built in codecs can be found: http://docs.python.org/library/codecs.html#standard-encodings ~Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
[toc] | [prev] | [next] | [standalone]
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| Date | 2012-11-07 23:27 +0000 |
| Message-ID | <mailman.3406.1352330840.27098.python-list@python.org> |
| In reply to | #32908 |
On 7 November 2012 22:17, Anders <aschneiderman@asha.org> wrote: > > Traceback (most recent call last): > File "outlook_tasks.py", line 66, in <module> > my_tasks.dump_today_tasks() > File "C:\Users\Anders\code\Task List\tasks.py", line 29, in > dump_today_tasks > print task.subject > UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in > position 42: ordinal not in range(128) > > Here's where I'm getting stuck. In the code above I was just printing > the subject so I can see whether the script is working properly. > Ultimately what I want to do is parse the tasks I'm interested in and > then create an HTML file containing those tasks. Given that, what's > the best way to fix this problem? Are you using cmd.exe (standard Windows terminal)? If so, it does not support unicode and Python is telling you that it cannot encode the string in a way that can be understood by your terminal. You can try using chcp to set the code page to something that works with your script. If you are only printing it for debugging purposes you can just print the repr() of the string which will be ascii and will come out fine in your terminal. If you want to write it to a html file you should encode the string with whatever encoding (probably utf-8) you use in the html file. If you really just want your script to be able to print unicode characters then you need to use something other than cmd.exe (such as IDLE). Oscar
[toc] | [prev] | [next] | [standalone]
| From | Andrew Berg <bahamutzero8825@gmail.com> |
|---|---|
| Date | 2012-11-07 17:51 -0600 |
| Message-ID | <mailman.3408.1352332281.27098.python-list@python.org> |
| In reply to | #32908 |
On 2012.11.07 17:27, Oscar Benjamin wrote: > Are you using cmd.exe (standard Windows terminal)? If so, it does not > support unicode Actually, it does. Code page 65001 is UTF-8. I know that doesn't help the OP since Python versions below 3.3 don't support cp65001, but I think it's important to point out that the Windows command line system (it is not unique to cmd) does in fact support Unicode. -- CPython 3.3.0 | Windows NT 6.1.7601.17835
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-11-07 23:53 +0000 |
| Message-ID | <509af48d$0$29980$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #32908 |
On Wed, 07 Nov 2012 14:17:42 -0800, Anders wrote: > I've run into a Unicode error, and despite doing some googling, I can't > figure out the right way to fix it. I have a Python 2.6 script that > reads my Outlook 2010 task list. I'm able to read the tasks from Outlook > and store them as a list of objects without a hitch. But when I try to > print the tasks' subjects, one of the tasks is generating an error: > > Traceback (most recent call last): > File "outlook_tasks.py", line 66, in <module> > my_tasks.dump_today_tasks() > File "C:\Users\Anders\code\Task List\tasks.py", line 29, in > dump_today_tasks > print task.subject > UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in > position 42: ordinal not in range(128) This error confuses me. Is that an exact copy and paste of the error, or have you edited it or reconstructed it? Because it seems to me that if task.subject is a unicode string, as it appears to be, calling print on it should succeed: py> s = u'ABC\u2013DEF' py> print s ABC–DEF What does type(task.subject) return? -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Hans Mulder <hansmu@xs4all.nl> |
|---|---|
| Date | 2012-11-08 12:40 +0100 |
| Message-ID | <509b9a1b$0$6841$e4fe514c@news2.news.xs4all.nl> |
| In reply to | #32921 |
On 8/11/12 00:53:49, Steven D'Aprano wrote: > This error confuses me. Is that an exact copy and paste of the error, or > have you edited it or reconstructed it? Because it seems to me that if > task.subject is a unicode string, as it appears to be, calling print on > it should succeed: > > py> s = u'ABC\u2013DEF' > py> print s > ABC–DEF That would depend on whether python thinks sys.stdout can handle UTF8. For example, on my MacOS X box: $ python2.6 -c 'print u"abc\u2013def"' abc–def $ python2.6 -c 'print u"abc\u2013def"' | cat Traceback (most recent call last): File "<string>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 3: ordinal not in range(128) This is because python knows that my terminal is capable of handling UTF8, but it has no idea whether the program at the other end of a pipe had that ability, so it'll fall back to ASCII only if sys.stdout goes to a pipe. Apparently the OP has a terminal that doesn't handle UTF8, or one that Python doesn't know about. Hope this helps, -- HansM
[toc] | [prev] | [next] | [standalone]
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| Date | 2012-11-08 00:44 +0000 |
| Message-ID | <mailman.3415.1352335468.27098.python-list@python.org> |
| In reply to | #32908 |
On 7 November 2012 23:51, Andrew Berg <bahamutzero8825@gmail.com> wrote: > On 2012.11.07 17:27, Oscar Benjamin wrote: >> Are you using cmd.exe (standard Windows terminal)? If so, it does not >> support unicode > Actually, it does. Code page 65001 is UTF-8. I know that doesn't help > the OP since Python versions below 3.3 don't support cp65001, but I > think it's important to point out that the Windows command line system > (it is not unique to cmd) does in fact support Unicode. I have tried to use code page 65001 and it didn't work for me even if I did use a version of Python (possibly 3.3 alpha) that claimed to support it. It turned out that there were other Windows related problems with using the codepage so that I had to do something like chcp 65001 && python myscript.py && chcp 2521 (It was important for all those commands to be on the same line) I'm not on Windows right now and I can't remember all the details but I seem to remember that even with that awkwardness and changing the font it still didn't actually work. If you know how to make it work, I'd be interested to know. Oscar
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-11-08 03:01 -0800 |
| Message-ID | <b2e373bd-7a62-415d-ba18-9d834bb4821b@googlegroups.com> |
| In reply to | #32908 |
Le mercredi 7 novembre 2012 23:17:42 UTC+1, Anders a écrit :
> I've run into a Unicode error, and despite doing some googling, I
>
> can't figure out the right way to fix it. I have a Python 2.6 script
>
> that reads my Outlook 2010 task list. I'm able to read the tasks from
>
> Outlook and store them as a list of objects without a hitch. But when
>
> I try to print the tasks' subjects, one of the tasks is generating an
>
> error:
>
>
>
> Traceback (most recent call last):
>
> File "outlook_tasks.py", line 66, in <module>
>
> my_tasks.dump_today_tasks()
>
> File "C:\Users\Anders\code\Task List\tasks.py", line 29, in
>
> dump_today_tasks
>
> print task.subject
>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in
>
> position 42: ordinal not in range(128)
>
>
>
> (where task.subject was previously assigned the value of
>
> task.Subject, aka the Subject property of an Outlook 2010 TaskItem)
>
>
>
> From what I understand from reading online, the error is telling me
>
> that the subject line contains an en dash and that Python is trying
>
> to convert to ascii and failing (as it should).
>
>
>
> Here's where I'm getting stuck. In the code above I was just printing
>
> the subject so I can see whether the script is working properly.
>
> Ultimately what I want to do is parse the tasks I'm interested in and
>
> then create an HTML file containing those tasks. Given that, what's
>
> the best way to fix this problem?
>
>
>
> BTW, if there's a clear description of the best solution for this
>
> particular problem – i.e., where I want to ultimately display the
>
> results as HTML – please feel free to refer me to the link. I tried
>
> reading a number of docs on the web but still feel pretty lost.
>
>
>
> Thanks,
>
> Anders
----------
The problem is not on the Python side or specific
to Python. It is on the side of the "coding of
characters".
1) Unicode is an abstract entity, it has to be encoded
for the system/device that will host it.
Using Python:
<unicode>.encode(host_coding)
2) The host_coding scheme may not contain the
character (glyph/grapheme) corresponding to the
"unicode character". In that case, 2 possible
solutions, "ignore" it ou "replace" it with a
substitution character.
Using Python:
<unicode>.encode(host_coding, "ignore")
<unicode>.encode(host_coding, "replace")
3) Detecting the host_coding, the most difficult
task. Either you have to hard-code it or you
may expect Python find it via its sys.encoding.
4) Due to the nature of unicode, it the unique
way to do it correctly.
Expectedly failing and not failing examples.
Mainly Py3, but it doesn't matter. Note: Py3 encodes
and creates a byte string, which has to be
decoded to produce a native (unicode) string, here
with cp1252.
Py2
>>> u'éléphant\u2013abc'.encode('ascii')
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
u'éléphant\u2013abc'.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
>>> print(u'éléphant\u2013abc'.encode('cp1252'))
éléphant–abc
>>>
Py3
>>> 'éléphant\u2013abc'.encode('ascii')
Traceback (most recent call last):
File "<eta last command>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in
position 0: ordinal not in range(128)
>>> 'éléphant\u2013abc'.encode('ascii', 'ignore')
b'lphantabc'
>>> 'éléphant\u2013abc'.encode('ascii', 'replace')
b'?l?phant?abc'
>>> 'éléphant\u2013abc'.encode('ascii', 'ignore').decode('cp1252')
'lphantabc'
>>> 'éléphant\u2013abc'.encode('ascii', 'replace').decode('cp1252')
'?l?phant?abc'
>>>
>>> 'éléphant\u2013abc'.encode('cp1252').decode('cp1252')
'éléphant–abc'
>>> sys.stdout.encoding
'cp1252'
>>> 'éléphant\u2013abc'.encode(sys.stdout.encoding).decode('cp1252')
'éléphant–abc'
etc
jmf
[toc] | [prev] | [next] | [standalone]
| From | Anders Schneiderman <ASchneiderman@asha.org> |
|---|---|
| Date | 2012-11-08 09:00 -0500 |
| Message-ID | <mailman.3435.1352383315.27098.python-list@python.org> |
| In reply to | #32908 |
Thanks, Oscar and Ramit! This is exactly what I was looking for. Anders > -----Original Message----- > From: Oscar Benjamin [mailto:oscar.j.benjamin@gmail.com] > Sent: Wednesday, November 07, 2012 6:27 PM > To: Anders Schneiderman > Cc: python-list@python.org > Subject: Re: Right solution to unicode error? > > On 7 November 2012 22:17, Anders <aschneiderman@asha.org> wrote: > > > > Traceback (most recent call last): > > File "outlook_tasks.py", line 66, in <module> > > my_tasks.dump_today_tasks() > > File "C:\Users\Anders\code\Task List\tasks.py", line 29, in > > dump_today_tasks > > print task.subject > > UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in > > position 42: ordinal not in range(128) > > > > Here's where I'm getting stuck. In the code above I was just printing > > the subject so I can see whether the script is working properly. > > Ultimately what I want to do is parse the tasks I'm interested in and > > then create an HTML file containing those tasks. Given that, what's > > the best way to fix this problem? > > Are you using cmd.exe (standard Windows terminal)? If so, it does not > support unicode and Python is telling you that it cannot encode the string in a > way that can be understood by your terminal. You can try using chcp to set > the code page to something that works with your script. > > If you are only printing it for debugging purposes you can just print the repr() > of the string which will be ascii and will come out fine in your terminal. If you > want to write it to a html file you should encode the string with whatever > encoding (probably utf-8) you use in the html file. If you really just want your > script to be able to print unicode characters then you need to use something > other than cmd.exe (such as IDLE). > > > Oscar
[toc] | [prev] | [next] | [standalone]
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| Date | 2012-11-08 14:06 +0000 |
| Message-ID | <mailman.3436.1352383603.27098.python-list@python.org> |
| In reply to | #32908 |
On 8 November 2012 00:44, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
> On 7 November 2012 23:51, Andrew Berg <bahamutzero8825@gmail.com> wrote:
>> On 2012.11.07 17:27, Oscar Benjamin wrote:
>>> Are you using cmd.exe (standard Windows terminal)? If so, it does not
>>> support unicode
>> Actually, it does. Code page 65001 is UTF-8. I know that doesn't help
>> the OP since Python versions below 3.3 don't support cp65001, but I
>> think it's important to point out that the Windows command line system
>> (it is not unique to cmd) does in fact support Unicode.
>
> I have tried to use code page 65001 and it didn't work for me even if
> I did use a version of Python (possibly 3.3 alpha) that claimed to
> support it.
I stand corrected. I've just checked and codepage 65001 does work in
cmd.exe (on this machine):
O:\>Q:\tools\Python33\python -c print('abc\u2013def')
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "Q:\tools\Python33\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in
position 3: character maps to
<undefined>
O:\>chcp 65001
Active code page: 65001
O:\>Q:\tools\Python33\python -c print('abc\u2013def')
abc-def
O:\>Q:\tools\Python33\python -c print('\u03b1')
α
It would be a lot better though if it just worked straight away
without me needing to set the code page (like the terminal in every
other OS I use).
Oscar
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-11-08 07:05 -0800 |
| Message-ID | <65910cea-f145-409c-a579-9f0cda499546@googlegroups.com> |
| In reply to | #32951 |
Le jeudi 8 novembre 2012 15:07:23 UTC+1, Oscar Benjamin a écrit :
> On 8 November 2012 00:44, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
>
> > On 7 November 2012 23:51, Andrew Berg <bahamutzero8825@gmail.com> wrote:
>
> >> On 2012.11.07 17:27, Oscar Benjamin wrote:
>
> >>> Are you using cmd.exe (standard Windows terminal)? If so, it does not
>
> >>> support unicode
>
> >> Actually, it does. Code page 65001 is UTF-8. I know that doesn't help
>
> >> the OP since Python versions below 3.3 don't support cp65001, but I
>
> >> think it's important to point out that the Windows command line system
>
> >> (it is not unique to cmd) does in fact support Unicode.
>
> >
>
> > I have tried to use code page 65001 and it didn't work for me even if
>
> > I did use a version of Python (possibly 3.3 alpha) that claimed to
>
> > support it.
>
>
>
> I stand corrected. I've just checked and codepage 65001 does work in
>
> cmd.exe (on this machine):
>
>
>
> O:\>Q:\tools\Python33\python -c print('abc\u2013def')
>
> Traceback (most recent call last):
>
> File "<string>", line 1, in <module>
>
> File "Q:\tools\Python33\lib\encodings\cp850.py", line 19, in encode
>
> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in
>
> position 3: character maps to
>
> <undefined>
>
>
>
> O:\>chcp 65001
>
> Active code page: 65001
>
>
>
> O:\>Q:\tools\Python33\python -c print('abc\u2013def')
>
> abc-def
>
>
>
>
>
> O:\>Q:\tools\Python33\python -c print('\u03b1')
>
> α
>
>
>
> It would be a lot better though if it just worked straight away
>
> without me needing to set the code page (like the terminal in every
>
> other OS I use).
>
>
>
>
>
> Oscar
----------
It *WORKS* straight away. The problem is that
people do not wish to use unicode correctly
(eg. Mulder's example).
Read the point 1) and 4) in my previous post.
Unicode and in general the coding of the characters
have nothing to do with the os's or programming languages.
jmf
[toc] | [prev] | [next] | [standalone]
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| Date | 2012-11-08 18:32 +0000 |
| Message-ID | <mailman.3457.1352399533.27098.python-list@python.org> |
| In reply to | #32955 |
On 8 November 2012 15:05, <wxjmfauth@gmail.com> wrote:
> Le jeudi 8 novembre 2012 15:07:23 UTC+1, Oscar Benjamin a écrit :
>> On 8 November 2012 00:44, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
>> > On 7 November 2012 23:51, Andrew Berg <bahamutzero8825@gmail.com> wrote:
>> >> On 2012.11.07 17:27, Oscar Benjamin wrote:
>>
>> >>> Are you using cmd.exe (standard Windows terminal)? If so, it does not
>> >>> support unicode
>>
>> >> Actually, it does. Code page 65001 is UTF-8. I know that doesn't help
>> >> the OP since Python versions below 3.3 don't support cp65001, but I
>> >> think it's important to point out that the Windows command line system
>> >> (it is not unique to cmd) does in fact support Unicode.
>>
>> > I have tried to use code page 65001 and it didn't work for me even if
>> > I did use a version of Python (possibly 3.3 alpha) that claimed to
>> > support it.
>>
>> I stand corrected. I've just checked and codepage 65001 does work in
>> cmd.exe (on this machine):
>>
>> O:\>chcp 65001
>> Active code page: 65001
>>
>> O:\>Q:\tools\Python33\python -c print('abc\u2013def')
>> abc-def
>>
>> O:\>Q:\tools\Python33\python -c print('\u03b1')
>> α
>>
>> It would be a lot better though if it just worked straight away
>> without me needing to set the code page (like the terminal in every
>> other OS I use).
>
> It *WORKS* straight away. The problem is that
> people do not wish to use unicode correctly
> (eg. Mulder's example).
> Read the point 1) and 4) in my previous post.
>
> Unicode and in general the coding of the characters
> have nothing to do with the os's or programming languages.
I don't know what you mean that it works "straight away".
The default code page on my machine is cp850.
O:\>chcp
Active code page: 850
cp850 doesn't understand utf-8. It just prints garbage:
O:\>Q:\tools\Python33\python -c "import sys;
sys.stdout.buffer.write('\u03b1\n'.encode('utf-8'))"
╬▒
Using the correct encoding doesn't help:
O:\>Q:\tools\Python33\python -c "import sys;
sys.stdout.buffer.write('\u03b1\n'.encode('cp850'))"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "Q:\tools\Python33\lib\encodings\cp850.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\u03b1' in
position 0: character maps to
<undefined>
O:\>Q:\tools\Python33\python -c "import sys;
sys.stdout.buffer.write('\u03b1\n'.encode(sys.stdout.en
coding))"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "Q:\tools\Python33\lib\encodings\cp850.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\u03b1' in
position 0: character maps to
<undefined>
If I want the other characters to work I need to change the code page:
O:\>chcp 65001
Active code page: 65001
O:\>Q:\tools\Python33\python -c "import sys;
sys.stdout.buffer.write('\u03b1\n'.encode('utf-8'))"
α
O:\>Q:\tools\Python33\python -c "import sys;
sys.stdout.buffer.write('\u03b1\n'.encode(sys.stdout.en
coding))"
α
Oscar
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-11-08 11:30 -0800 |
| Message-ID | <08b2c7a7-a5df-45cb-a1b8-1aebe01d46e7@googlegroups.com> |
| In reply to | #32970 |
Le jeudi 8 novembre 2012 19:32:14 UTC+1, Oscar Benjamin a écrit :
> On 8 November 2012 15:05, <wxjmfauth@gmail.com> wrote:
>
> > Le jeudi 8 novembre 2012 15:07:23 UTC+1, Oscar Benjamin a écrit :
>
> >> On 8 November 2012 00:44, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
>
> >> > On 7 November 2012 23:51, Andrew Berg <bahamutzero8825@gmail.com> wrote:
>
> >> >> On 2012.11.07 17:27, Oscar Benjamin wrote:
>
> >>
>
> >> >>> Are you using cmd.exe (standard Windows terminal)? If so, it does not
>
> >> >>> support unicode
>
> >>
>
> >> >> Actually, it does. Code page 65001 is UTF-8. I know that doesn't help
>
> >> >> the OP since Python versions below 3.3 don't support cp65001, but I
>
> >> >> think it's important to point out that the Windows command line system
>
> >> >> (it is not unique to cmd) does in fact support Unicode.
>
> >>
>
> >> > I have tried to use code page 65001 and it didn't work for me even if
>
> >> > I did use a version of Python (possibly 3.3 alpha) that claimed to
>
> >> > support it.
>
> >>
>
> >> I stand corrected. I've just checked and codepage 65001 does work in
>
> >> cmd.exe (on this machine):
>
> >>
>
> >> O:\>chcp 65001
>
> >> Active code page: 65001
>
> >>
>
> >> O:\>Q:\tools\Python33\python -c print('abc\u2013def')
>
> >> abc-def
>
> >>
>
> >> O:\>Q:\tools\Python33\python -c print('\u03b1')
>
> >> α
>
> >>
>
> >> It would be a lot better though if it just worked straight away
>
> >> without me needing to set the code page (like the terminal in every
>
> >> other OS I use).
>
> >
>
> > It *WORKS* straight away. The problem is that
>
> > people do not wish to use unicode correctly
>
> > (eg. Mulder's example).
>
> > Read the point 1) and 4) in my previous post.
>
> >
>
> > Unicode and in general the coding of the characters
>
> > have nothing to do with the os's or programming languages.
>
>
>
> I don't know what you mean that it works "straight away".
>
>
>
> The default code page on my machine is cp850.
>
>
>
> O:\>chcp
>
> Active code page: 850
>
>
>
> cp850 doesn't understand utf-8. It just prints garbage:
>
>
>
> O:\>Q:\tools\Python33\python -c "import sys;
>
> sys.stdout.buffer.write('\u03b1\n'.encode('utf-8'))"
>
> ╬▒
>
>
>
> Using the correct encoding doesn't help:
>
>
>
> O:\>Q:\tools\Python33\python -c "import sys;
>
> sys.stdout.buffer.write('\u03b1\n'.encode('cp850'))"
>
> Traceback (most recent call last):
>
> File "<string>", line 1, in <module>
>
> File "Q:\tools\Python33\lib\encodings\cp850.py", line 12, in encode
>
> return codecs.charmap_encode(input,errors,encoding_map)
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u03b1' in
>
> position 0: character maps to
>
> <undefined>
>
>
>
> O:\>Q:\tools\Python33\python -c "import sys;
>
> sys.stdout.buffer.write('\u03b1\n'.encode(sys.stdout.en
>
> coding))"
>
> Traceback (most recent call last):
>
> File "<string>", line 1, in <module>
>
> File "Q:\tools\Python33\lib\encodings\cp850.py", line 12, in encode
>
> return codecs.charmap_encode(input,errors,encoding_map)
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u03b1' in
>
> position 0: character maps to
>
> <undefined>
>
>
>
> If I want the other characters to work I need to change the code page:
>
>
>
> O:\>chcp 65001
>
> Active code page: 65001
>
>
>
> O:\>Q:\tools\Python33\python -c "import sys;
>
> sys.stdout.buffer.write('\u03b1\n'.encode('utf-8'))"
>
> α
>
>
>
> O:\>Q:\tools\Python33\python -c "import sys;
>
> sys.stdout.buffer.write('\u03b1\n'.encode(sys.stdout.en
>
> coding))"
>
> α
>
>
>
>
>
> Oscar
You are confusing two things. The coding of the
characters and the set of the characters (glyphes/graphemes)
of a coding scheme.
It is always possible to encode safely an unicode, but
the target coding may not contain the character.
Take a look at the output of this "special" interactive
interpreter" where the host coding (sys.stdout.encoding)
can be change on the fly.
>>> s = 'éléphant\u2013abc需'
>>> sys.stdout.encoding
'<unicode>'
>>> s
'éléphant–abc需'
>>>
>>> sys.stdout.encoding = 'cp1252'
>>> s.encode('cp1252')
'éléphant–abc需'
>>> sys.stdout.encoding = 'cp850'
>>> s.encode('cp850')
Traceback (most recent call last):
File "<eta last command>", line 1, in <module>
File "C:\Python32\lib\encodings\cp850.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013'
in position 8: character maps to <undefined>
>>> # but
>>> s.encode('cp850', 'replace')
'éléphant?abcé??'
>>>
>>> sys.stdout.encoding = 'utf-8'
>>> s
'éléphant–abc需'
>>> s.encode('utf-8')
'éléphant–abc需'
>>>
>>> sys.stdout.encoding = 'utf-16-le' <<<<<<<<<
>>> s
' é l é p h a n t a b c é S ¬ '
>>> s.encode('utf-16-le')
'éléphant–abc需'
<<<<<<<<<<< some cheating here do to the mail system, it really looks like this.
jmf
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-11-08 11:30 -0800 |
| Message-ID | <mailman.3461.1352403047.27098.python-list@python.org> |
| In reply to | #32970 |
Le jeudi 8 novembre 2012 19:32:14 UTC+1, Oscar Benjamin a écrit :
> On 8 November 2012 15:05, <wxjmfauth@gmail.com> wrote:
>
> > Le jeudi 8 novembre 2012 15:07:23 UTC+1, Oscar Benjamin a écrit :
>
> >> On 8 November 2012 00:44, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
>
> >> > On 7 November 2012 23:51, Andrew Berg <bahamutzero8825@gmail.com> wrote:
>
> >> >> On 2012.11.07 17:27, Oscar Benjamin wrote:
>
> >>
>
> >> >>> Are you using cmd.exe (standard Windows terminal)? If so, it does not
>
> >> >>> support unicode
>
> >>
>
> >> >> Actually, it does. Code page 65001 is UTF-8. I know that doesn't help
>
> >> >> the OP since Python versions below 3.3 don't support cp65001, but I
>
> >> >> think it's important to point out that the Windows command line system
>
> >> >> (it is not unique to cmd) does in fact support Unicode.
>
> >>
>
> >> > I have tried to use code page 65001 and it didn't work for me even if
>
> >> > I did use a version of Python (possibly 3.3 alpha) that claimed to
>
> >> > support it.
>
> >>
>
> >> I stand corrected. I've just checked and codepage 65001 does work in
>
> >> cmd.exe (on this machine):
>
> >>
>
> >> O:\>chcp 65001
>
> >> Active code page: 65001
>
> >>
>
> >> O:\>Q:\tools\Python33\python -c print('abc\u2013def')
>
> >> abc-def
>
> >>
>
> >> O:\>Q:\tools\Python33\python -c print('\u03b1')
>
> >> α
>
> >>
>
> >> It would be a lot better though if it just worked straight away
>
> >> without me needing to set the code page (like the terminal in every
>
> >> other OS I use).
>
> >
>
> > It *WORKS* straight away. The problem is that
>
> > people do not wish to use unicode correctly
>
> > (eg. Mulder's example).
>
> > Read the point 1) and 4) in my previous post.
>
> >
>
> > Unicode and in general the coding of the characters
>
> > have nothing to do with the os's or programming languages.
>
>
>
> I don't know what you mean that it works "straight away".
>
>
>
> The default code page on my machine is cp850.
>
>
>
> O:\>chcp
>
> Active code page: 850
>
>
>
> cp850 doesn't understand utf-8. It just prints garbage:
>
>
>
> O:\>Q:\tools\Python33\python -c "import sys;
>
> sys.stdout.buffer.write('\u03b1\n'.encode('utf-8'))"
>
> ╬▒
>
>
>
> Using the correct encoding doesn't help:
>
>
>
> O:\>Q:\tools\Python33\python -c "import sys;
>
> sys.stdout.buffer.write('\u03b1\n'.encode('cp850'))"
>
> Traceback (most recent call last):
>
> File "<string>", line 1, in <module>
>
> File "Q:\tools\Python33\lib\encodings\cp850.py", line 12, in encode
>
> return codecs.charmap_encode(input,errors,encoding_map)
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u03b1' in
>
> position 0: character maps to
>
> <undefined>
>
>
>
> O:\>Q:\tools\Python33\python -c "import sys;
>
> sys.stdout.buffer.write('\u03b1\n'.encode(sys.stdout.en
>
> coding))"
>
> Traceback (most recent call last):
>
> File "<string>", line 1, in <module>
>
> File "Q:\tools\Python33\lib\encodings\cp850.py", line 12, in encode
>
> return codecs.charmap_encode(input,errors,encoding_map)
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u03b1' in
>
> position 0: character maps to
>
> <undefined>
>
>
>
> If I want the other characters to work I need to change the code page:
>
>
>
> O:\>chcp 65001
>
> Active code page: 65001
>
>
>
> O:\>Q:\tools\Python33\python -c "import sys;
>
> sys.stdout.buffer.write('\u03b1\n'.encode('utf-8'))"
>
> α
>
>
>
> O:\>Q:\tools\Python33\python -c "import sys;
>
> sys.stdout.buffer.write('\u03b1\n'.encode(sys.stdout.en
>
> coding))"
>
> α
>
>
>
>
>
> Oscar
You are confusing two things. The coding of the
characters and the set of the characters (glyphes/graphemes)
of a coding scheme.
It is always possible to encode safely an unicode, but
the target coding may not contain the character.
Take a look at the output of this "special" interactive
interpreter" where the host coding (sys.stdout.encoding)
can be change on the fly.
>>> s = 'éléphant\u2013abc需'
>>> sys.stdout.encoding
'<unicode>'
>>> s
'éléphant–abc需'
>>>
>>> sys.stdout.encoding = 'cp1252'
>>> s.encode('cp1252')
'éléphant–abc需'
>>> sys.stdout.encoding = 'cp850'
>>> s.encode('cp850')
Traceback (most recent call last):
File "<eta last command>", line 1, in <module>
File "C:\Python32\lib\encodings\cp850.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\u2013'
in position 8: character maps to <undefined>
>>> # but
>>> s.encode('cp850', 'replace')
'éléphant?abcé??'
>>>
>>> sys.stdout.encoding = 'utf-8'
>>> s
'éléphant–abc需'
>>> s.encode('utf-8')
'éléphant–abc需'
>>>
>>> sys.stdout.encoding = 'utf-16-le' <<<<<<<<<
>>> s
' é l é p h a n t a b c é S ¬ '
>>> s.encode('utf-16-le')
'éléphant–abc需'
<<<<<<<<<<< some cheating here do to the mail system, it really looks like this.
jmf
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2012-11-08 11:48 -0700 |
| Message-ID | <mailman.3459.1352400535.27098.python-list@python.org> |
| In reply to | #32955 |
On Thu, Nov 8, 2012 at 11:32 AM, Oscar Benjamin
<oscar.j.benjamin@gmail.com> wrote:
> If I want the other characters to work I need to change the code page:
>
> O:\>chcp 65001
> Active code page: 65001
>
> O:\>Q:\tools\Python33\python -c "import sys;
> sys.stdout.buffer.write('\u03b1\n'.encode('utf-8'))"
> α
>
> O:\>Q:\tools\Python33\python -c "import sys;
> sys.stdout.buffer.write('\u03b1\n'.encode(sys.stdout.en
> coding))"
> α
I find that I also need to change the font. With the default font,
printing '\u2013' gives me:
–
The only alternative font option I have in Windows XP is Lucida
Console, which at least works correctly, although it seems to be
lacking a lot of glyphs.
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-11-08 11:54 -0800 |
| Message-ID | <a0073458-3b60-4c19-909d-c3d6dda7dccc@googlegroups.com> |
| In reply to | #32972 |
Le jeudi 8 novembre 2012 19:49:24 UTC+1, Ian a écrit :
> On Thu, Nov 8, 2012 at 11:32 AM, Oscar Benjamin
>
> <oscar.j.benjamin@gmail.com> wrote:
>
> > If I want the other characters to work I need to change the code page:
>
> >
>
> > O:\>chcp 65001
>
> > Active code page: 65001
>
> >
>
> > O:\>Q:\tools\Python33\python -c "import sys;
>
> > sys.stdout.buffer.write('\u03b1\n'.encode('utf-8'))"
>
> > α
>
> >
>
> > O:\>Q:\tools\Python33\python -c "import sys;
>
> > sys.stdout.buffer.write('\u03b1\n'.encode(sys.stdout.en
>
> > coding))"
>
> > α
>
>
>
> I find that I also need to change the font. With the default font,
>
> printing '\u2013' gives me:
>
>
>
> –
>
>
>
> The only alternative font option I have in Windows XP is Lucida
>
> Console, which at least works correctly, although it seems to be
>
> lacking a lot of glyphs.
--------
Font has nothing to do here.
You are "simply" wrongly encoding your "unicode".
>>> '\u2013'
'–'
>>> '\u2013'.encode('utf-8')
b'\xe2\x80\x93'
>>> '\u2013'.encode('utf-8').decode('cp1252')
'–'
jmf
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2012-11-08 13:41 -0700 |
| Message-ID | <mailman.3465.1352407330.27098.python-list@python.org> |
| In reply to | #32976 |
On Thu, Nov 8, 2012 at 12:54 PM, <wxjmfauth@gmail.com> wrote:
> Font has nothing to do here.
> You are "simply" wrongly encoding your "unicode".
>
>>>> '\u2013'
> '–'
>>>> '\u2013'.encode('utf-8')
> b'\xe2\x80\x93'
>>>> '\u2013'.encode('utf-8').decode('cp1252')
> '–'
No, it seriously is the font. This is what I get using the default
("Raster") font:
C:\>chcp 65001
Active code page: 65001
C:\>c:\python33\python
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600
32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> '\u2013'
'–'
>>> import sys
>>> sys.stdout.buffer.write('\u2013\n'.encode('utf-8'))
–
4
I should note here that the characters copied and pasted do not
correspond to the glyphs actually displayed in my terminal window. In
the terminal window I actually see:
ΓÇô
If I change the font to Lucida Console and run the *exact same code*,
I get this:
C:\>chcp 65001
Active code page: 65001
C:\>c:\python33\python
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600
32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> '\u2013'
'–'
>>> import sys
>>> sys.stdout.buffer.write('\u2013\n'.encode('utf-8'))
–
4
Why is the font important? I have no idea. Blame Microsoft.
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-11-09 02:06 -0800 |
| Message-ID | <65d2286f-78dc-4eb8-945c-d15fb41a8232@googlegroups.com> |
| In reply to | #32980 |
Le jeudi 8 novembre 2012 21:42:58 UTC+1, Ian a écrit :
> On Thu, Nov 8, 2012 at 12:54 PM, <wxjmfauth@gmail.com> wrote:
>
> > Font has nothing to do here.
>
> > You are "simply" wrongly encoding your "unicode".
>
> >
>
> >>>> '\u2013'
>
> > '–'
>
> >>>> '\u2013'.encode('utf-8')
>
> > b'\xe2\x80\x93'
>
> >>>> '\u2013'.encode('utf-8').decode('cp1252')
>
> > '–'
>
>
>
> No, it seriously is the font. This is what I get using the default
>
> ("Raster") font:
>
>
>
> C:\>chcp 65001
>
> Active code page: 65001
>
>
>
> C:\>c:\python33\python
>
> Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600
>
> 32 bit (Intel)] on win32
>
> Type "help", "copyright", "credits" or "license" for more information.
>
> >>> '\u2013'
>
> '–'
>
> >>> import sys
>
> >>> sys.stdout.buffer.write('\u2013\n'.encode('utf-8'))
>
> –
>
> 4
>
>
>
> I should note here that the characters copied and pasted do not
>
> correspond to the glyphs actually displayed in my terminal window. In
>
> the terminal window I actually see:
>
>
>
> ΓÇô
>
>
>
> If I change the font to Lucida Console and run the *exact same code*,
>
> I get this:
>
>
>
> C:\>chcp 65001
>
> Active code page: 65001
>
>
>
> C:\>c:\python33\python
>
> Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600
>
> 32 bit (Intel)] on win32
>
> Type "help", "copyright", "credits" or "license" for more information.
>
> >>> '\u2013'
>
> '–'
>
>
>
> >>> import sys
>
> >>> sys.stdout.buffer.write('\u2013\n'.encode('utf-8'))
>
> –
>
> 4
>
>
>
> Why is the font important? I have no idea. Blame Microsoft.
---------
If you have something like this 'ΓÇô'; in
Unicode nomenclature:
>>> import unicodedata as ud
>>> for c in 'ΓÇô':
... ud.name(c)
...
'GREEK CAPITAL LETTER GAMMA'
'LATIN CAPITAL LETTER C WITH CEDILLA'
'LATIN SMALL LETTER O WITH CIRCUMFLEX'
it is a sign of a "cp437" somewhere.
>>> '\u2013'.encode('utf-8').decode('cp437')
'ΓÇô'
On Windows 7. I do not remember having once a "coding
of the caracters" issue on XP.
jmf
[toc] | [prev] | [next] | [standalone]
| From | "Prasad, Ramit" <ramit.prasad@jpmorgan.com> |
|---|---|
| Date | 2012-11-08 20:54 +0000 |
| Message-ID | <mailman.3466.1352408089.27098.python-list@python.org> |
| In reply to | #32976 |
wxjmfauth@gmail.com wrote:
>
> Le jeudi 8 novembre 2012 19:49:24 UTC+1, Ian a écrit :
> > On Thu, Nov 8, 2012 at 11:32 AM, Oscar Benjamin
> >
> > <oscar.j.benjamin@gmail.com> wrote:
> >
> > > If I want the other characters to work I need to change the code page:
> > >
> > > O:\>chcp 65001
> > > Active code page: 65001
> > >
> > > O:\>Q:\tools\Python33\python -c "import sys;
> > > sys.stdout.buffer.write('\u03b1\n'.encode('utf-8'))"
> > > α
> > >
> > > O:\>Q:\tools\Python33\python -c "import sys;
> > > sys.stdout.buffer.write('\u03b1\n'.encode(sys.stdout.en
> > > coding))"
> > > α
> >
> > I find that I also need to change the font. With the default font,
> >
> > printing '\u2013' gives me:
> > –
> >
> > The only alternative font option I have in Windows XP is Lucida
> > Console, which at least works correctly, although it seems to be
> > lacking a lot of glyphs.
>
> --------
>
> Font has nothing to do here.
> You are "simply" wrongly encoding your "unicode".
>
Why would font not matter? Unicode is the abstract definition
of all characters right? From that we map the abstract
character to a code page/set, which gives real values for an
abstract character. From that code page we then visually display
the "real value" based on the font. If that font does
not have a glyph for a specific character page (or a different
glyph) then that is a problem and not related encoding.
Unicode->code page->font
> >>> '\u2013'
> '–'
> >>> '\u2013'.encode('utf-8')
> b'\xe2\x80\x93'
> >>> '\u2013'.encode('utf-8').decode('cp1252')
> '–'
>
This is a mismatched translation between code pages; not
font related but is instead one abstraction "level" up.
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2012-11-08 14:07 -0700 |
| Message-ID | <mailman.3467.1352408866.27098.python-list@python.org> |
| In reply to | #32976 |
On Thu, Nov 8, 2012 at 1:54 PM, Prasad, Ramit <ramit.prasad@jpmorgan.com> wrote: > Why would font not matter? Unicode is the abstract definition > of all characters right? From that we map the abstract > character to a code page/set, which gives real values for an > abstract character. From that code page we then visually display > the "real value" based on the font. If that font does > not have a glyph for a specific character page (or a different > glyph) then that is a problem and not related encoding. Usually though when the font is missing a glyph for a Unicode character, you just get a missing glyph symbol, such as an empty rectangle. For some reason when using the default font, cmd seemingly ignores the active code page, skips decoding the characters, and tries to print the individual bytes as if using code page 437.
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web