Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #19888

Re: Detect XML document encodings with SAX

Newsgroups comp.lang.java.programmer
Date 2012-11-24 02:14 -0800
References <k8ioi7$2e2$1@news.albasani.net> <0b3b04bf-24dd-4d59-a16d-14c745b66c76@googlegroups.com> <50b02ee6$0$283$14726298@news.sunsite.dk>
Message-ID <d64baf3c-d582-4308-b6b4-714ef3049ef5@googlegroups.com> (permalink)
Subject Re: Detect XML document encodings with SAX
From Lew <lewbloch@gmail.com>

Show all headers | View raw


Arne Vajhøj wrote:
> Lew wrote:
>> Sebastian wrote:
[snip]
>>> output an encoding of UTF-8, while looking at the file
>> as they should.
> 
> No.
> 
> If the XML prolog specifies another encoding than UTF-8,
> then it should not return UTF-8.

True, but I'm saying they should specify UTF-8 in the prolog.

>>                 XML should be encoded in UTF-8 nearly always.

See?
 
> XML allows for other encodings.

So? You should use UTF-8 nearly always, i.e., unless there's a compelling 
reason not to.

> And Java XML parsers support it.

For those rare times when you deviate from the usual UTF-8.

> So it should always work.

>> But SAX is a parser, so it doesn't output, it inputs. What are you telling us?
> 
> Output usually mean System.out.println - that works fine with a parser.

His phrasing wasn't clear to me. That's why I asked for clarification.

I could have guessed, too.

>> If your problem is with reading the file, then the encoding in the XML declaration

See? You're preaching to the choir.

>> should suffice to guide the parser. But then why do you talk about methods that

>> "output an encoding"?
> 
> Because he wants to know what it is.
> 
>> However, according to
>> http://xmlwriter.net/xml_guide/xml_declaration.shtml#Encoding
>> supported encodings only include UTF-8, UTF-16, ISO-10646-UCS-2,
>> ISO-10646-UCS-4, ISO-8859-1 to ISO-8859-9, ISO-2022-JP, Shift_JIS, 
>> and EUC-JP,
>> So it looks like you must not accept XML documents with such a 
>> non-standard encoding.
>
> Those that has researched would know that the XML spec do not
> limit the encodings at all. The XML processor must support UTF-8
> and UTF-16, but are free to support others.

Perhaps the OP's parser doesn't exercise that freedom, judging by the 
symptoms.

'sall I'm sayin'.

Obviously I don't know the answer, but he's asking for suggestions 
to investigate, AIUI. He's having encoding problems. His XML is apparently 
encoded in Windows-1252, a notoriously funky encoding especially for 
the variety of characters with which one might wish to deal. So why not
investigate obtaining material that isn't in such a notoriously funky 
encoding, like, oh, say, the old reliable standard UTF-8?

Perhaps that isn't feasible, for reasons as yet unstated, but that's 
the nature of brainstorming.

-- 
Lew

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-21 15:32 +0100
  Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-21 11:31 -0800
    Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-22 00:39 +0100
      Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-21 16:37 -0800
        Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-22 07:41 +0100
          Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-21 23:18 -0800
            Re: Detect XML document encodings with SAX Steven Simpson <ss@domain.invalid> - 2012-11-22 07:53 +0000
              Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-22 08:31 -0800
            Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:21 -0500
    Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:11 -0500
    Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:20 -0500
      Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-24 02:14 -0800
        Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-24 22:18 +0100
          Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:07 -0500
            Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:50 +0100
          Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 17:12 -0800
            Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 20:17 -0500
              Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 18:02 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 21:10 -0500
                Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 18:25 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 21:37 -0500
                Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 21:01 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:30 -0500
                Re: Detect XML document encodings with SAX Gene Wirchenko <genew@telus.net> - 2012-12-12 18:03 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-12-12 21:09 -0500
                Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-12-12 18:58 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-12-12 22:17 -0500
                Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-12-12 22:51 -0800
                Re: Detect XML document encodings with SAX Gene Wirchenko <genew@telus.net> - 2012-12-12 21:52 -0800
                Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:45 +0100
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:23 -0500
                Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-25 13:24 -0800
                Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:58 +0100
        Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:13 -0500
        Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:19 -0500
  Re: Detect XML document encodings with SAX Roedy Green <see_website@mindprod.com.invalid> - 2012-11-22 03:24 -0800
    Re: Detect XML document encodings with SAX "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-24 00:13 +0100
      Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:22 -0500
  Re: Detect XML document encodings with SAX Steven Simpson <ss@domain.invalid> - 2012-11-25 11:00 +0000
    Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 12:32 +0100
    Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:41 -0500
  Re: Detect XML document encodings with SAX Roedy Green <see_website@mindprod.com.invalid> - 2012-12-12 20:32 -0800
  Re: Detect XML document encodings with SAX Stanimir Stamenkov <s7an10@netscape.net> - 2012-12-16 17:43 +0200

csiph-web