Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #33481

Re: latin1 and cp1252 inconsistent?

From Nobody <nobody@nowhere.com>
Subject Re: latin1 and cp1252 inconsistent?
Date 2012-11-17 19:15 +0000
Message-Id <pan.2012.11.17.19.15.14.694000@nowhere.com>
Newsgroups comp.lang.python
References <f063ebaf-89ee-4558-a762-0241efa39dcc@googlegroups.com> <pan.2012.11.17.00.33.13.539000@nowhere.com> <32ab6e2e-e1b1-41ea-8ef4-6e4f763065bf@googlegroups.com>
Organization Zen Internet

Show all headers | View raw


On Sat, 17 Nov 2012 08:56:46 -0800, buck wrote:

>> Given that the only differences between the two are for code points
>> which are in the C1 range (0x80-0x9F), which should never occur in HTML,
>> parsing ISO-8859-1 as Windows-1252 should be harmless.
> 
> "should" is a wish. The reality is that documents (and especially URLs)
> exist that can be decoded with latin1, but will backtrace with cp1252.

In which case, they're probably neither ISO-8859-1 nor Windows-1252, but
some other (unknown) encoding which has acquired the ISO-8859-1 label
"by default".

In that situation, if you still need to know the encoding, you need to
resort to heuristics such as those employed by the chardet library.

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 13:44 -0800
  Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 15:33 -0700
    Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 15:27 -0800
      Re: latin1 and cp1252 inconsistent? Dave Angel <d@davea.name> - 2012-11-16 19:05 -0500
      Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 17:20 -0700
      Re: latin1 and cp1252 inconsistent? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-11-18 01:48 -0500
    Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 15:27 -0800
  Re: latin1 and cp1252 inconsistent? Nobody <nobody@nowhere.com> - 2012-11-17 00:33 +0000
    Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 18:08 -0700
    Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-17 08:56 -0800
      Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-17 11:08 -0700
      Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-17 11:13 -0700
      Re: latin1 and cp1252 inconsistent? Nobody <nobody@nowhere.com> - 2012-11-17 19:15 +0000

csiph-web