Path: csiph.com!usenet.pasdenom.info!gegeweb.org!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail From: Steven Simpson Newsgroups: comp.lang.java.programmer Subject: Re: Detect XML document encodings with SAX Date: Thu, 22 Nov 2012 07:53:45 +0000 Organization: A noiseless patient Spider Lines: 26 Message-ID: <9921o9-usm.ln1@s.simpson148.btinternet.com> References: <0b3b04bf-24dd-4d59-a16d-14c745b66c76@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: mx04.eternal-september.org; posting-host="0499196ec65187ea2951eab86a44f884"; logging-data="7960"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/RO9OVlIyA+2uwIJJ1iLf6TzNhpiMT0GM=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 In-Reply-To: Cancel-Lock: sha1:8Q84DEnfYTdiXspWUbnhXTnF0so= Xref: csiph.com comp.lang.java.programmer:19846 On 22/11/12 07:18, markspace wrote: > On 11/21/2012 10:41 PM, Sebastian wrote: >> >> The answer cannot be that windows-1250 is non-standard. In fact, the >> declared encoding of the XML file does not seem to matter. The code will >> always output "UTF-8". >> > > Maybe this quote from the article will help you out: > > "This approach works 90 percent of the time, maybe a little more. But > SAX parsers aren't required to support the Locator interface, much > less Locator2, and a few don't. A second option, if you know you're > using Xerces, is to work with XNI" > > > Since the output of the program is "unknown", I'd guess that this > particular SAX parser doesn't support Locator2, like it says. Like the OP, I'm getting "UTF-8", and tracing in the code shows that it is getting a Locator2. -- ss at comp dot lancs dot ac dot uk