Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #19888
| Newsgroups | comp.lang.java.programmer |
|---|---|
| Date | 2012-11-24 02:14 -0800 |
| References | <k8ioi7$2e2$1@news.albasani.net> <0b3b04bf-24dd-4d59-a16d-14c745b66c76@googlegroups.com> <50b02ee6$0$283$14726298@news.sunsite.dk> |
| Message-ID | <d64baf3c-d582-4308-b6b4-714ef3049ef5@googlegroups.com> (permalink) |
| Subject | Re: Detect XML document encodings with SAX |
| From | Lew <lewbloch@gmail.com> |
Arne Vajhøj wrote: > Lew wrote: >> Sebastian wrote: [snip] >>> output an encoding of UTF-8, while looking at the file >> as they should. > > No. > > If the XML prolog specifies another encoding than UTF-8, > then it should not return UTF-8. True, but I'm saying they should specify UTF-8 in the prolog. >> XML should be encoded in UTF-8 nearly always. See? > XML allows for other encodings. So? You should use UTF-8 nearly always, i.e., unless there's a compelling reason not to. > And Java XML parsers support it. For those rare times when you deviate from the usual UTF-8. > So it should always work. >> But SAX is a parser, so it doesn't output, it inputs. What are you telling us? > > Output usually mean System.out.println - that works fine with a parser. His phrasing wasn't clear to me. That's why I asked for clarification. I could have guessed, too. >> If your problem is with reading the file, then the encoding in the XML declaration See? You're preaching to the choir. >> should suffice to guide the parser. But then why do you talk about methods that >> "output an encoding"? > > Because he wants to know what it is. > >> However, according to >> http://xmlwriter.net/xml_guide/xml_declaration.shtml#Encoding >> supported encodings only include UTF-8, UTF-16, ISO-10646-UCS-2, >> ISO-10646-UCS-4, ISO-8859-1 to ISO-8859-9, ISO-2022-JP, Shift_JIS, >> and EUC-JP, >> So it looks like you must not accept XML documents with such a >> non-standard encoding. > > Those that has researched would know that the XML spec do not > limit the encodings at all. The XML processor must support UTF-8 > and UTF-16, but are free to support others. Perhaps the OP's parser doesn't exercise that freedom, judging by the symptoms. 'sall I'm sayin'. Obviously I don't know the answer, but he's asking for suggestions to investigate, AIUI. He's having encoding problems. His XML is apparently encoded in Windows-1252, a notoriously funky encoding especially for the variety of characters with which one might wish to deal. So why not investigate obtaining material that isn't in such a notoriously funky encoding, like, oh, say, the old reliable standard UTF-8? Perhaps that isn't feasible, for reasons as yet unstated, but that's the nature of brainstorming. -- Lew
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-21 15:32 +0100
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-21 11:31 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-22 00:39 +0100
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-21 16:37 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-22 07:41 +0100
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-21 23:18 -0800
Re: Detect XML document encodings with SAX Steven Simpson <ss@domain.invalid> - 2012-11-22 07:53 +0000
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-22 08:31 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:21 -0500
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:11 -0500
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:20 -0500
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-24 02:14 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-24 22:18 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:07 -0500
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:50 +0100
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 17:12 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 20:17 -0500
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 18:02 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 21:10 -0500
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 18:25 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 21:37 -0500
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 21:01 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:30 -0500
Re: Detect XML document encodings with SAX Gene Wirchenko <genew@telus.net> - 2012-12-12 18:03 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-12-12 21:09 -0500
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-12-12 18:58 -0800
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-12-12 22:17 -0500
Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-12-12 22:51 -0800
Re: Detect XML document encodings with SAX Gene Wirchenko <genew@telus.net> - 2012-12-12 21:52 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:45 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:23 -0500
Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-25 13:24 -0800
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:58 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:13 -0500
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:19 -0500
Re: Detect XML document encodings with SAX Roedy Green <see_website@mindprod.com.invalid> - 2012-11-22 03:24 -0800
Re: Detect XML document encodings with SAX "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-24 00:13 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:22 -0500
Re: Detect XML document encodings with SAX Steven Simpson <ss@domain.invalid> - 2012-11-25 11:00 +0000
Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 12:32 +0100
Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:41 -0500
Re: Detect XML document encodings with SAX Roedy Green <see_website@mindprod.com.invalid> - 2012-12-12 20:32 -0800
Re: Detect XML document encodings with SAX Stanimir Stamenkov <s7an10@netscape.net> - 2012-12-16 17:43 +0200
csiph-web