Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #33456

Re: latin1 and cp1252 inconsistent?

Path csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <buck@yelp.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; '16,': 0.03; '"""': 0.05; 'correspond': 0.07; 'interpreted': 0.07; 'undefined': 0.07; 'used.': 0.07; 'defined.': 0.09; 'friday,': 0.09; 'semantics': 0.09; 'to:addr:comp.lang.python': 0.09; 'undefined.': 0.09; 'url:unicode': 0.09; 'cc:addr:python-list': 0.10; "'hello": 0.16; "'replace')": 0.16; 'combinations': 0.16; 'decode': 0.16; 'iso/iec': 0.16; 'uses,': 0.16; 'wrote:': 0.17; 'unicode': 0.17; 'creates': 0.18; '>>>': 0.18; 'bit': 0.21; 'error.': 0.21; 'cc:2**0': 0.23; 'example': 0.23; 'specified': 0.23; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header :User-Agent:1': 0.26; 'handling': 0.27; 'prevent': 0.27; 'represent': 0.28; 'fri,': 0.30; 'function': 0.30; 'error': 0.30; 'code': 0.31; 'generally': 0.32; 'received:google.com': 0.34; 'loss': 0.34; 'acceptable': 0.35; 'data,': 0.35; 'nov': 0.35; 'pm,': 0.35; 'table': 0.35; 'subject:?': 0.35; 'received:209.85': 0.35; 'url:org': 0.36; 'characters': 0.36; 'received:209': 0.37; 'subject:: ': 0.38; 'url:docs': 0.38; 'application': 0.40; 'from:no real name:2**0': 0.60; 'skip:n 10': 0.63; 'url:0': 0.67; 'positions': 0.68; 'standards,': 0.84; 'url:dk': 0.84
X-Google-DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=path:newsgroups:date:in-reply-to:complaints-to:injection-info :nntp-posting-host:references:user-agent:x-google-web-client :x-google-ip:mime-version:message-id:subject:from:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=tB/kdzTQhfu8/mrziJtYHOOWoGJp2pd706YBJz/rAaI=; b=mEe8PTpLtr7p4dpFWVUsrsfbkg+rJwfizYQDZldM3Ura5/eJ+0GpMb/Ez1WCdcQyhB gRmAkpOVuvl5kjdaKVeB0JCc9I8XzII/THY3o/J343yNyygx76vj021bSkpuvpnkasU+ MyPNjBIDFs2Qs5AgwUKarZoqLUb8bvT/G7hP1jnlBqy+zl86tNS84Sz/2Z1x+H8pylXM iulTM8T5OyonPxEC0yf+Vn2qs4lvDDdH3vNRGSatxvtTGV3QiU+Bgugq69Hs4Ut8livA 9LzxvRQGfyJcJiFgqNNh9Lg+BO8ibVwruXb0uylsZbJvTuWcKSHcBkFBAR/ZMcRaJ+xV ahkQ==
Newsgroups comp.lang.python
Date Fri, 16 Nov 2012 15:27:54 -0800 (PST)
In-Reply-To <mailman.3762.1353105272.27098.python-list@python.org>
Complaints-To groups-abuse@google.com
Injection-Info glegroupsg2000goo.googlegroups.com; posting-host=98.248.112.191; posting-account=64lhtQoAAAC4jcng0haBX247t-tzqGPA
References <f063ebaf-89ee-4558-a762-0241efa39dcc@googlegroups.com> <mailman.3762.1353105272.27098.python-list@python.org>
User-Agent G2/1.0
X-Google-Web-Client true
X-Google-IP 98.248.112.191
MIME-Version 1.0
Subject Re: latin1 and cp1252 inconsistent?
From buck@yelp.com
To comp.lang.python@googlegroups.com
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding quoted-printable
X-Gm-Message-State ALoCoQkSm+STyrsR7CIVuZEWFNnKRekozU6dnwBD+HSHev8S8I/4e5NoqgWU+tztYWOzf4lBwadt
Cc Python <python-list@python.org>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Message-ID <mailman.3764.1353108483.27098.python-list@python.org> (permalink)
Lines 34
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1353108483 news.xs4all.nl 6878 [2001:888:2000:d::a6]:57520
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:33456

Show key headers only | View raw


On Friday, November 16, 2012 2:34:32 PM UTC-8, Ian wrote:
> On Fri, Nov 16, 2012 at 2:44 PM,  <buck> wrote:
> 
> > Latin1 has a block of 32 undefined characters.
> 
> 
> These characters are not undefined.  0x80-0x9f are the C1 control
> codes in Latin-1, much as 0x00-0x1f are the C0 control codes, and
> their Unicode mappings are well defined.

They are indeed undefined: ftp://std.dkuug.dk/JTC1/sc2/wg3/docs/n411.pdf

""" The shaded positions in the code table correspond
    to bit combinations that do not represent graphic
    characters. Their use is outside the scope of
    ISO/IEC 8859; it is specified in other International
    Standards, for example ISO/IEC 6429.


However it's reasonable for 0x81 to decode to U+81 because the unicode standard says: http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf

""" The semantics of the control codes are generally determined by the application with which they are used. However, in the absence of specific application uses, they may be interpreted according to the control function semantics specified in ISO/IEC 6429:1992.


> You can use a non-strict error handling scheme to prevent the error.
> >>> b'hello \x81 world'.decode('cp1252', 'replace')
> 'hello \ufffd world'

This creates a non-reversible encoding, and loss of data, which isn't acceptable for my application.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 13:44 -0800
  Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 15:33 -0700
    Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 15:27 -0800
      Re: latin1 and cp1252 inconsistent? Dave Angel <d@davea.name> - 2012-11-16 19:05 -0500
      Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 17:20 -0700
      Re: latin1 and cp1252 inconsistent? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-11-18 01:48 -0500
    Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 15:27 -0800
  Re: latin1 and cp1252 inconsistent? Nobody <nobody@nowhere.com> - 2012-11-17 00:33 +0000
    Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 18:08 -0700
    Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-17 08:56 -0800
      Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-17 11:08 -0700
      Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-17 11:13 -0700
      Re: latin1 and cp1252 inconsistent? Nobody <nobody@nowhere.com> - 2012-11-17 19:15 +0000

csiph-web