Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #20378
| From | Stanimir Stamenkov <s7an10@netscape.net> |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: Detect XML document encodings with SAX |
| Date | 2012-12-16 17:43 +0200 |
| Organization | A noiseless patient Spider |
| Message-ID | <kakq61$oiu$1@dont-email.me> (permalink) |
| References | <k8ioi7$2e2$1@news.albasani.net> |
Wed, 21 Nov 2012 15:32:19 +0100, /Sebastian/: > I discovered this post: > http://www.ibm.com/developerworks/library/x-tipsaxxni/ > > and implemented both approaches (SAX and Xerces XNI). > > Unfortunately, for the attached XML file, both methods > output an encoding of UTF-8, while looking at the file > makes it clear that it is not UTF-8 encoded (all characters, > including the umlaut and the Euro-sign, take one byte, and the > declared encoding also is not UTF-8). > > Does anyone have an idea why that is so? And how I could > go about making some XML parser determine the correct encoding? Sorry if this has been answered already elsewhere in the thread. The XML specification has a guideline for detecting the source encoding: http://www.w3.org/TR/xml/#sec-guessing and this is basically what parsers do. One-byte encodings are basically indistinguishable from each other and they could be only reliably detected in presence of an explicit encoding information/declaration. -- Stanimir
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Find similar | Unroll thread
Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-21 15:32 +0100
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-21 11:31 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-22 00:39 +0100
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-21 16:37 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-22 07:41 +0100
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-21 23:18 -0800
Re: Detect XML document encodings with SAX Steven Simpson <ss@domain.invalid> - 2012-11-22 07:53 +0000
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-22 08:31 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:21 -0500
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:11 -0500
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:20 -0500
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-24 02:14 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-24 22:18 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:07 -0500
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:50 +0100
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 17:12 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 20:17 -0500
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 18:02 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 21:10 -0500
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 18:25 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 21:37 -0500
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 21:01 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:30 -0500
Re: Detect XML document encodings with SAX Gene Wirchenko <genew@telus.net> - 2012-12-12 18:03 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-12-12 21:09 -0500
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-12-12 18:58 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-12-12 22:17 -0500
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-12-12 22:51 -0800
Re: Detect XML document encodings with SAX Gene Wirchenko <genew@telus.net> - 2012-12-12 21:52 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:45 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:23 -0500
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-25 13:24 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:58 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:13 -0500
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:19 -0500
Re: Detect XML document encodings with SAX Roedy Green <see_website@mindprod.com.invalid> - 2012-11-22 03:24 -0800
Re: Detect XML document encodings with SAX "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-24 00:13 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:22 -0500
Re: Detect XML document encodings with SAX Steven Simpson <ss@domain.invalid> - 2012-11-25 11:00 +0000
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 12:32 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:41 -0500
Re: Detect XML document encodings with SAX Roedy Green <see_website@mindprod.com.invalid> - 2012-12-12 20:32 -0800
Re: Detect XML document encodings with SAX Stanimir Stamenkov <s7an10@netscape.net> - 2012-12-16 17:43 +0200
csiph-web