Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!fu-berlin.de!uni-berlin.de!news.dfncis.de!not-for-mail From: =?ISO-8859-1?Q?Sven_K=F6hler?= Newsgroups: comp.lang.java.programmer Subject: Re: A proposal to handle file encodings Date: Sun, 25 Nov 2012 15:09:40 +0100 Lines: 18 Message-ID: References: <9kava8lk1ignppq7rso7gmcb541gnerf8q@4ax.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Trace: news.dfncis.de yxD+8QSW51qx1uXQ8A9a1AKkHH/4wsc2Tvzf6RMovt+fOFLRNLT3u+BUdRVj9RUgWqDIF7+Drx Cancel-Lock: sha1:cqWAMJGp7qWGk+yLRDedjxs7p4o= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121028 Thunderbird/16.0.1 In-Reply-To: Xref: csiph.com comp.lang.java.programmer:19940 Am 24.11.2012 00:11, schrieb Peter J. Holzer: > On 2012-11-23 18:21, Jan Burse wrote: >> Roedy Green schrieb: >>> The HTML encoding is incompetent. You can't read it without knowing >>> the encoding. > > Not true in practice. Almost all encodings used in the real world are > some superset of ASCII, and you only need to recognize ASCII characters > to find the relevant meta tag. With the exception of UTF-16LE/BE for example. Or is a BOM mandatory for UTF-16? The downside of BOMs is that they break feature like includes. Many include mechanism just copy the bytestream, this BOMs appear in the middle of the page. Regards, Sven