Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #19834 > unrolled thread
| Started by | Sebastian <sebastian@undisclosed.invalid> |
|---|---|
| First post | 2012-11-21 15:32 +0100 |
| Last post | 2012-12-16 17:43 +0200 |
| Articles | 3 on this page of 43 — 9 participants |
Back to article view | Back to comp.lang.java.programmer
Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-21 15:32 +0100
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-21 11:31 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-22 00:39 +0100
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-21 16:37 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-22 07:41 +0100
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-21 23:18 -0800
Re: Detect XML document encodings with SAX Steven Simpson <ss@domain.invalid> - 2012-11-22 07:53 +0000
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-22 08:31 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:21 -0500
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:11 -0500
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:20 -0500
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-24 02:14 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-24 22:18 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:07 -0500
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:50 +0100
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 17:12 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 20:17 -0500
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 18:02 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 21:10 -0500
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 18:25 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 21:37 -0500
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 21:01 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:30 -0500
Re: Detect XML document encodings with SAX Gene Wirchenko <genew@telus.net> - 2012-12-12 18:03 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-12-12 21:09 -0500
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-12-12 18:58 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-12-12 22:17 -0500
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-12-12 22:51 -0800
Re: Detect XML document encodings with SAX Gene Wirchenko <genew@telus.net> - 2012-12-12 21:52 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:45 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:23 -0500
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-25 13:24 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:58 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:13 -0500
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:19 -0500
Re: Detect XML document encodings with SAX Roedy Green <see_website@mindprod.com.invalid> - 2012-11-22 03:24 -0800
Re: Detect XML document encodings with SAX "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-24 00:13 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:22 -0500
Re: Detect XML document encodings with SAX Steven Simpson <ss@domain.invalid> - 2012-11-25 11:00 +0000
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 12:32 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:41 -0500
Re: Detect XML document encodings with SAX Roedy Green <see_website@mindprod.com.invalid> - 2012-12-12 20:32 -0800
Re: Detect XML document encodings with SAX Stanimir Stamenkov <s7an10@netscape.net> - 2012-12-16 17:43 +0200
Page 3 of 3 — ← Prev page 1 2 [3]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2012-11-25 14:41 -0500 |
| Message-ID | <50b27476$0$281$14726298@news.sunsite.dk> |
| In reply to | #19928 |
On 11/25/2012 6:00 AM, Steven Simpson wrote: > On 21/11/12 14:32, Sebastian wrote: >> Does anyone have an idea why that is so? And how I could >> go about making some XML parser determine the correct encoding? > > Sussed it! (Come to think of it, I feel I've sussed this before...) > > The charset returned by the locator changes during parsing. At > startDocument(), it is the assumed charset, possibly based on the first > four-or-so bytes. At endDocument(), it is reset to null. On the first > call to startElement, it has the correct value. Cool. Arne
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2012-12-12 20:32 -0800 |
| Message-ID | <38mic8hnlk2uuc2irrg0rco49sf3odsgr1@4ax.com> |
| In reply to | #19834 |
On Wed, 21 Nov 2012 15:32:19 +0100, Sebastian <sebastian@undisclosed.invalid> wrote, quoted or indirectly quoted someone who said : >Does anyone have an idea why that is so? And how I could >go about making some XML parser determine the correct encoding? There are not many encodings easy to recognise. See http://mindprod.com/products.html#ENCODINGRECOGNISER I think you are better off to figure out what it is and convert it to UTF-8 with native2ascii. see http://mindprod.com/jgloss/encoding.html#NATIVE2ASCII XML and UTF-8 are the expected pair. You are just asking for trouble using some other encoding. -- Roedy Green Canadian Mind Products http://mindprod.com Students who hire or con others to do their homework are as foolish as couch potatoes who hire others to go to the gym for them.
[toc] | [prev] | [next] | [standalone]
| From | Stanimir Stamenkov <s7an10@netscape.net> |
|---|---|
| Date | 2012-12-16 17:43 +0200 |
| Message-ID | <kakq61$oiu$1@dont-email.me> |
| In reply to | #19834 |
Wed, 21 Nov 2012 15:32:19 +0100, /Sebastian/: > I discovered this post: > http://www.ibm.com/developerworks/library/x-tipsaxxni/ > > and implemented both approaches (SAX and Xerces XNI). > > Unfortunately, for the attached XML file, both methods > output an encoding of UTF-8, while looking at the file > makes it clear that it is not UTF-8 encoded (all characters, > including the umlaut and the Euro-sign, take one byte, and the > declared encoding also is not UTF-8). > > Does anyone have an idea why that is so? And how I could > go about making some XML parser determine the correct encoding? Sorry if this has been answered already elsewhere in the thread. The XML specification has a guideline for detecting the source encoding: http://www.w3.org/TR/xml/#sec-guessing and this is basically what parsers do. One-byte encodings are basically indistinguishable from each other and they could be only reliably detected in presence of an explicit encoding information/declaration. -- Stanimir
[toc] | [prev] | [standalone]
Page 3 of 3 — ← Prev page 1 2 [3]
Back to top | Article view | comp.lang.java.programmer
csiph-web