Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!news.musoftware.de!wum.musoftware.de!fu-berlin.de!uni-berlin.de!news.dfncis.de!not-for-mail From: =?ISO-8859-1?Q?Sven_K=F6hler?= Newsgroups: comp.lang.java.programmer Subject: Re: A proposal to handle file encodings Date: Sun, 25 Nov 2012 15:06:36 +0100 Lines: 45 Message-ID: References: <9kava8lk1ignppq7rso7gmcb541gnerf8q@4ax.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Trace: news.dfncis.de leFQ7YhUruY0EpOQSgMahQZcY618YBec9tu0IjZ2EfSotOX9/OBOhCfZ7vA6O5xoZYmB3j7uAp Cancel-Lock: sha1:bx10oIVzKJDXFkmJaP1/qOHJpuA= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121028 Thunderbird/16.0.1 In-Reply-To: Xref: csiph.com comp.lang.java.programmer:19939 Am 23.11.2012 19:21, schrieb Jan Burse: > Roedy Green schrieb: >> The HTML encoding is incompetent. You can't read it without knowing >> the encoding. It is just a confirmation. Thankfully the encoding comes >> in the HTTP header -- a case where meta information is available. > > For example when you edit a HTML file locally, you don't > have this HTTP header information. Also where does the HTTP > header get the charset information in the first place? > > Scenario 1: > - HTTP returns only mimetype=text/html without > the chartset option. > - The browser then reads the HTML doc meta tag, and > adjust the charset. > > Scenario 2: > - HTTP returns mimetype=text/html; charset= > fetched from the HTML file meta tag. > - The browser does not read the HTML doc meta tag, and > follows the charset found in the mimetype. > > In both scenarios 1 + 2, the meta tag is used. Don't > know whether there is a scenario 3, and where should > this scenario take the encoding from? Scenario 3: Apache configuration sets a default charset and sends Content-Type: text/html; charset=iso-8859-1 even though the meta tag in the file specifies utf8. Luckily, this feature could be turned off. I'm not sure, what the default config is at the moment. Also, I don't know of any webserver that actually implements scenario 2. Mostly, specifying the charset in the HTTP header is used by dynamic webpages (JSP, PHP, ASP), as they allow setting the headers. Also, why is this discussion in the Java newsgroup? Just because Java asks programmer to specify the charset sometimes? Regards, Sven