Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #19926

Re: Detect XML document encodings with SAX

Path csiph.com!usenet.pasdenom.info!news.albasani.net!.POSTED!not-for-mail
From Sebastian <sebastian@undisclosed.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: Detect XML document encodings with SAX
Date Sun, 25 Nov 2012 10:50:25 +0100
Organization albasani.net
Lines 42
Message-ID <k8sphg$hn4$1@news.albasani.net> (permalink)
References <k8ioi7$2e2$1@news.albasani.net> <0b3b04bf-24dd-4d59-a16d-14c745b66c76@googlegroups.com> <50b02ee6$0$283$14726298@news.sunsite.dk> <d64baf3c-d582-4308-b6b4-714ef3049ef5@googlegroups.com> <k8rdfq$gbg$1@news.albasani.net> <50b14516$0$282$14726298@news.sunsite.dk>
Mime-Version 1.0
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 8bit
X-Trace news.albasani.net eh18TytOivrUkadOUHQVNWMnfzNwFBly48p65D14XDFsv72rwgxBr7OC95q13oU/k5A7j02luP1k4epagnYDqA==
NNTP-Posting-Date Sun, 25 Nov 2012 09:48:32 +0000 (UTC)
Injection-Info news.albasani.net; logging-data="kvmeG6l6V8VehsXM22M6Z0elBwbTRDjVwn9/2fnE2N985pTRPeEPJJ/+s5iV2UChZtGDe9FjBAahdxOI2VsZbcgPG4oljuWZaTMnUGoKFJVX5OmHBww6WwleuV9zTpqS"; mail-complaints-to="abuse@albasani.net"
User-Agent Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9
In-Reply-To <50b14516$0$282$14726298@news.sunsite.dk>
Cancel-Lock sha1:meY1LLsSUI+sX5IFTqamtA/ppEs=
Xref csiph.com comp.lang.java.programmer:19926

Show key headers only | View raw


Am 24.11.2012 23:07, schrieb Arne Vajhøj:
[snip]
> I would consider it tempting to rewrite that app to use a standard
> XML parser.
>
> It would solve this problem and possibly also some future problems.

Yes, I wish I could do that (or rather, have that done...) It seems that
app also handles other types of files (like csv) and regardless of
file type they always do the same, namely open an InputStreamReader
given a charset name.

[snip]

> What about just reading the first few lines until you have the
> XML declaration.
>
> Parsing the encoding out of that should be simple.
>
> private static final Pattern encpat =
> Pattern.compile("encoding\\s*=\\s*['\"]([^'\"]+)['\"]");
> private static String detectSimple(String fnm) throws IOException {
> BufferedReader br = new BufferedReader(new FileReader(fnm));
> String firstpart = "";
> while(!firstpart.contains(">")) firstpart += br.readLine();
> br.close();
> Matcher m = encpat.matcher(firstpart);
> if(m.find()) {
> return m.group(1);
> } else {
> return "Unknown";
> }
> }
>
> I do not like the solution, but given the restrictions in the
> context, then maybe it is what you need.

Thanks for the suggestion. I'll use that idea until a better solution 
becomes feasible.

-- Sebastian

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-21 15:32 +0100
  Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-21 11:31 -0800
    Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-22 00:39 +0100
      Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-21 16:37 -0800
        Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-22 07:41 +0100
          Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-21 23:18 -0800
            Re: Detect XML document encodings with SAX Steven Simpson <ss@domain.invalid> - 2012-11-22 07:53 +0000
              Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-22 08:31 -0800
            Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:21 -0500
    Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:11 -0500
    Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:20 -0500
      Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-11-24 02:14 -0800
        Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-24 22:18 +0100
          Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:07 -0500
            Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:50 +0100
          Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 17:12 -0800
            Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 20:17 -0500
              Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 18:02 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 21:10 -0500
                Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 18:25 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 21:37 -0500
                Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-24 21:01 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:30 -0500
                Re: Detect XML document encodings with SAX Gene Wirchenko <genew@telus.net> - 2012-12-12 18:03 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-12-12 21:09 -0500
                Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-12-12 18:58 -0800
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-12-12 22:17 -0500
                Re: Detect XML document encodings with SAX Lew <lewbloch@gmail.com> - 2012-12-12 22:51 -0800
                Re: Detect XML document encodings with SAX Gene Wirchenko <genew@telus.net> - 2012-12-12 21:52 -0800
                Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:45 +0100
                Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:23 -0500
                Re: Detect XML document encodings with SAX markspace <-@.> - 2012-11-25 13:24 -0800
                Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 10:58 +0100
        Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:13 -0500
        Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 17:19 -0500
  Re: Detect XML document encodings with SAX Roedy Green <see_website@mindprod.com.invalid> - 2012-11-22 03:24 -0800
    Re: Detect XML document encodings with SAX "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-24 00:13 +0100
      Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 21:22 -0500
  Re: Detect XML document encodings with SAX Steven Simpson <ss@domain.invalid> - 2012-11-25 11:00 +0000
    Re: Detect XML document encodings with SAX Sebastian <sebastian@undisclosed.invalid> - 2012-11-25 12:32 +0100
    Re: Detect XML document encodings with SAX Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:41 -0500
  Re: Detect XML document encodings with SAX Roedy Green <see_website@mindprod.com.invalid> - 2012-12-12 20:32 -0800
  Re: Detect XML document encodings with SAX Stanimir Stamenkov <s7an10@netscape.net> - 2012-12-16 17:43 +0200

csiph-web