Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #19837

Re: Detect XML document encodings with SAX

Newsgroups comp.lang.java.programmer
Date 2012-11-21 11:31 -0800
References <k8ioi7$2e2$1@news.albasani.net>
Message-ID <0b3b04bf-24dd-4d59-a16d-14c745b66c76@googlegroups.com> (permalink)
Subject Re: Detect XML document encodings with SAX
From Lew <lewbloch@gmail.com>

Show all headers | View raw


Sebastian wrote:
> I discovered this post:
> http://www.ibm.com/developerworks/library/x-tipsaxxni/
> 
> and implemented both approaches (SAX and Xerces XNI).
> 
> Unfortunately, for the attached XML file, both methods

Don't do attachments on Usenet.

> output an encoding of UTF-8, while looking at the file

as they should. XML should be encoded in UTF-8 nearly always.

But SAX is a parser, so it doesn't output, it inputs. What are you telling us?

> makes it clear that it is not UTF-8 encoded (all characters,
> including the umlaut and the Euro-sign, take one byte, and the
> declared encoding also is not UTF-8).

http://sscce.org/

> Does anyone have an idea why that is so? And how I could

You used the default encoding in your Writer.

> go about making some XML parser determine the correct encoding?

Your problem is writing the file, no? That has nothing to do with parsing.

If your problem is with reading the file, then the encoding in the XML declaration 
should suffice to guide the parser. But then why do you talk about methods that 
"output an encoding"?

However, according to 
http://xmlwriter.net/xml_guide/xml_declaration.shtml#Encoding
supported encodings only include UTF-8, UTF-16, ISO-10646-UCS-2, 
ISO-10646-UCS-4, ISO-8859-1 to ISO-8859-9, ISO-2022-JP, Shift_JIS, and EUC-JP, 
as you would have learned had you researched your question.

So it looks like you must not accept XML documents with such a non-standard 
encoding.

Show us the code, or at least an SSCCE of it.

-- 
Lew

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-21 15:32 +0100
  Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-21 11:31 -0800
    Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-22 00:39 +0100
      Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-21 16:37 -0800
        Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-22 07:41 +0100
          Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-21 23:18 -0800
            Re: Detect XML document encodings with SAX Steven Simpson <ss@domain.invalid> - 2012-11-22 07:53 +0000
              Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-22 08:31 -0800
            Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:21 -0500
    Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:11 -0500
    Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:20 -0500
      Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-24 02:14 -0800
        Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-24 22:18 +0100
          Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:07 -0500
            Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:50 +0100
          Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 17:12 -0800
            Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 20:17 -0500
              Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 18:02 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 21:10 -0500
                Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 18:25 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 21:37 -0500
                Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 21:01 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:30 -0500
                Re: Detect XML document encodings with SAX Gene Wirchenko <genew@telus.net> - 2012-12-12 18:03 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-12-12 21:09 -0500
                Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-12-12 18:58 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-12-12 22:17 -0500
                Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-12-12 22:51 -0800
                Re: Detect XML document encodings with SAX Gene Wirchenko <genew@telus.net> - 2012-12-12 21:52 -0800
                Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:45 +0100
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:23 -0500
                Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-25 13:24 -0800
                Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:58 +0100
        Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:13 -0500
        Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:19 -0500
  Re: Detect XML document encodings with SAX Roedy Green <see_website@mindprod.com.invalid> - 2012-11-22 03:24 -0800
    Re: Detect XML document encodings with SAX "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-24 00:13 +0100
      Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:22 -0500
  Re: Detect XML document encodings with SAX Steven Simpson <ss@domain.invalid> - 2012-11-25 11:00 +0000
    Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 12:32 +0100
    Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:41 -0500
  Re: Detect XML document encodings with SAX Roedy Green <see_website@mindprod.com.invalid> - 2012-12-12 20:32 -0800
  Re: Detect XML document encodings with SAX Stanimir Stamenkov <s7an10@netscape.net> - 2012-12-16 17:43 +0200

csiph-web