Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #33456
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2012-11-16 15:27 -0800 |
| References | <f063ebaf-89ee-4558-a762-0241efa39dcc@googlegroups.com> <mailman.3762.1353105272.27098.python-list@python.org> |
| Subject | Re: latin1 and cp1252 inconsistent? |
| From | buck@yelp.com |
| Message-ID | <mailman.3764.1353108483.27098.python-list@python.org> (permalink) |
On Friday, November 16, 2012 2:34:32 PM UTC-8, Ian wrote:
> On Fri, Nov 16, 2012 at 2:44 PM, <buck> wrote:
>
> > Latin1 has a block of 32 undefined characters.
>
>
> These characters are not undefined. 0x80-0x9f are the C1 control
> codes in Latin-1, much as 0x00-0x1f are the C0 control codes, and
> their Unicode mappings are well defined.
They are indeed undefined: ftp://std.dkuug.dk/JTC1/sc2/wg3/docs/n411.pdf
""" The shaded positions in the code table correspond
to bit combinations that do not represent graphic
characters. Their use is outside the scope of
ISO/IEC 8859; it is specified in other International
Standards, for example ISO/IEC 6429.
However it's reasonable for 0x81 to decode to U+81 because the unicode standard says: http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf
""" The semantics of the control codes are generally determined by the application with which they are used. However, in the absence of specific application uses, they may be interpreted according to the control function semantics specified in ISO/IEC 6429:1992.
> You can use a non-strict error handling scheme to prevent the error.
> >>> b'hello \x81 world'.decode('cp1252', 'replace')
> 'hello \ufffd world'
This creates a non-reversible encoding, and loss of data, which isn't acceptable for my application.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 13:44 -0800
Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 15:33 -0700
Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 15:27 -0800
Re: latin1 and cp1252 inconsistent? Dave Angel <d@davea.name> - 2012-11-16 19:05 -0500
Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 17:20 -0700
Re: latin1 and cp1252 inconsistent? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-11-18 01:48 -0500
Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 15:27 -0800
Re: latin1 and cp1252 inconsistent? Nobody <nobody@nowhere.com> - 2012-11-17 00:33 +0000
Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 18:08 -0700
Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-17 08:56 -0800
Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-17 11:08 -0700
Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-17 11:13 -0700
Re: latin1 and cp1252 inconsistent? Nobody <nobody@nowhere.com> - 2012-11-17 19:15 +0000
csiph-web