Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #33481

Re: latin1 and cp1252 inconsistent?

Path csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!feeds.news.ox.ac.uk!news.ox.ac.uk!zen.net.uk!hamilton.zen.co.uk!reader03.nrc01.news.zen.net.uk.POSTED!not-for-mail
From Nobody <nobody@nowhere.com>
Subject Re: latin1 and cp1252 inconsistent?
Date Sat, 17 Nov 2012 19:15:15 +0000
User-Agent Pan/0.14.2 (This is not a psychotic episode. It's a cleansing moment of clarity.)
Message-Id <pan.2012.11.17.19.15.14.694000@nowhere.com>
Newsgroups comp.lang.python
References <f063ebaf-89ee-4558-a762-0241efa39dcc@googlegroups.com> <pan.2012.11.17.00.33.13.539000@nowhere.com> <32ab6e2e-e1b1-41ea-8ef4-6e4f763065bf@googlegroups.com>
MIME-Version 1.0
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding 8bit
Lines 16
Organization Zen Internet
NNTP-Posting-Host e12a8c40.news.zen.co.uk
X-Trace DXC=_<RiY;QII7EI>27fALHYIJf2FgniPJjgB=dR0\ckLKG@WeZ<[7LZNRFiR<?M\^NeJGM2Z^cWRFGAKoDcS;fUa<2M
X-Complaints-To abuse@zen.co.uk
Xref csiph.com comp.lang.python:33481

Show key headers only | View raw


On Sat, 17 Nov 2012 08:56:46 -0800, buck wrote:

>> Given that the only differences between the two are for code points
>> which are in the C1 range (0x80-0x9F), which should never occur in HTML,
>> parsing ISO-8859-1 as Windows-1252 should be harmless.
> 
> "should" is a wish. The reality is that documents (and especially URLs)
> exist that can be decoded with latin1, but will backtrace with cp1252.

In which case, they're probably neither ISO-8859-1 nor Windows-1252, but
some other (unknown) encoding which has acquired the ISO-8859-1 label
"by default".

In that situation, if you still need to know the encoding, you need to
resort to heuristics such as those employed by the chardet library.

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 13:44 -0800
  Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 15:33 -0700
    Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 15:27 -0800
      Re: latin1 and cp1252 inconsistent? Dave Angel <d@davea.name> - 2012-11-16 19:05 -0500
      Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 17:20 -0700
      Re: latin1 and cp1252 inconsistent? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-11-18 01:48 -0500
    Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 15:27 -0800
  Re: latin1 and cp1252 inconsistent? Nobody <nobody@nowhere.com> - 2012-11-17 00:33 +0000
    Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 18:08 -0700
    Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-17 08:56 -0800
      Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-17 11:08 -0700
      Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-17 11:13 -0700
      Re: latin1 and cp1252 inconsistent? Nobody <nobody@nowhere.com> - 2012-11-17 19:15 +0000

csiph-web