Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #33481
| From | Nobody <nobody@nowhere.com> |
|---|---|
| Subject | Re: latin1 and cp1252 inconsistent? |
| Date | 2012-11-17 19:15 +0000 |
| Message-Id | <pan.2012.11.17.19.15.14.694000@nowhere.com> |
| Newsgroups | comp.lang.python |
| References | <f063ebaf-89ee-4558-a762-0241efa39dcc@googlegroups.com> <pan.2012.11.17.00.33.13.539000@nowhere.com> <32ab6e2e-e1b1-41ea-8ef4-6e4f763065bf@googlegroups.com> |
| Organization | Zen Internet |
On Sat, 17 Nov 2012 08:56:46 -0800, buck wrote: >> Given that the only differences between the two are for code points >> which are in the C1 range (0x80-0x9F), which should never occur in HTML, >> parsing ISO-8859-1 as Windows-1252 should be harmless. > > "should" is a wish. The reality is that documents (and especially URLs) > exist that can be decoded with latin1, but will backtrace with cp1252. In which case, they're probably neither ISO-8859-1 nor Windows-1252, but some other (unknown) encoding which has acquired the ISO-8859-1 label "by default". In that situation, if you still need to know the encoding, you need to resort to heuristics such as those employed by the chardet library.
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 13:44 -0800
Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 15:33 -0700
Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 15:27 -0800
Re: latin1 and cp1252 inconsistent? Dave Angel <d@davea.name> - 2012-11-16 19:05 -0500
Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 17:20 -0700
Re: latin1 and cp1252 inconsistent? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-11-18 01:48 -0500
Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 15:27 -0800
Re: latin1 and cp1252 inconsistent? Nobody <nobody@nowhere.com> - 2012-11-17 00:33 +0000
Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 18:08 -0700
Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-17 08:56 -0800
Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-17 11:08 -0700
Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-17 11:13 -0700
Re: latin1 and cp1252 inconsistent? Nobody <nobody@nowhere.com> - 2012-11-17 19:15 +0000
csiph-web