Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #33461

Re: latin1 and cp1252 inconsistent?

From Nobody <nobody@nowhere.com>
Subject Re: latin1 and cp1252 inconsistent?
Date 2012-11-17 00:33 +0000
Message-Id <pan.2012.11.17.00.33.13.539000@nowhere.com>
Newsgroups comp.lang.python
References <f063ebaf-89ee-4558-a762-0241efa39dcc@googlegroups.com>
Organization Zen Internet

Show all headers | View raw


On Fri, 16 Nov 2012 13:44:03 -0800, buck wrote:

> When a user agent [browser] would otherwise use a character encoding given
> in the first column [ISO-8859-1, aka latin1] of the following table to
> either convert content to Unicode characters or convert Unicode characters
> to bytes, it must instead use the encoding given in the cell in the second
> column of the same row [windows-1252, aka cp1252].

It goes on to say:

	The requirement to treat certain encodings as other encodings according
	to the table above is a willful violation of the W3C Character Model
	specification, motivated by a desire for compatibility with legacy
	content. [CHARMOD]

IOW: Microsoft's "embrace, extend, extinguish" strategy has been too
successful and now we have to deal with it. If HTML content is tagged as
using ISO-8859-1, it's more likely that it's actually Windows-1252 content
generated by someone who doesn't know the difference.

Given that the only differences between the two are for code points which
are in the C1 range (0x80-0x9F), which should never occur in HTML, parsing
ISO-8859-1 as Windows-1252 should be harmless.

If you need to support either, you can parse it as ISO-8859-1 then
explicitly convert C1 codes to their Windows-1252 equivalents as a
post-processing step, e.g. using the .translate() method.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 13:44 -0800
  Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 15:33 -0700
    Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 15:27 -0800
      Re: latin1 and cp1252 inconsistent? Dave Angel <d@davea.name> - 2012-11-16 19:05 -0500
      Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 17:20 -0700
      Re: latin1 and cp1252 inconsistent? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-11-18 01:48 -0500
    Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-16 15:27 -0800
  Re: latin1 and cp1252 inconsistent? Nobody <nobody@nowhere.com> - 2012-11-17 00:33 +0000
    Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-16 18:08 -0700
    Re: latin1 and cp1252 inconsistent? buck@yelp.com - 2012-11-17 08:56 -0800
      Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-17 11:08 -0700
      Re: latin1 and cp1252 inconsistent? Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-17 11:13 -0700
      Re: latin1 and cp1252 inconsistent? Nobody <nobody@nowhere.com> - 2012-11-17 19:15 +0000

csiph-web