Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.help > #2476

XMLEventWriter and numeric character references

From Jeff Higgins <jeff@invalid.invalid>
Newsgroups comp.lang.java.help
Subject XMLEventWriter and numeric character references
Date 2013-02-05 03:52 -0500
Organization A noiseless patient Spider
Message-ID <keqget$g1q$1@dont-email.me> (permalink)

Show all headers | View raw


Hi
I have some text, well-formed XML, US-ASCII encoded, which contains
some numeric character references outside the BMP. I process this text
with the StAX event iterator API shipped with the latest (as of this
writing) Oracle JDK. All is well with the process except a couple of
hiccups; a prepended XML declaration, and the character references are
printed as a surrogate pair of character references rather than the
single numeric character reference I desire.

I cannot find a way to change this behavior with the StAX event
iterator API and am hoping that there is a way, and that you can help
me find it. I would not like to have to change APIs just to eliminate
the minor irritation of the prepended declaration and the annoying
hindrance of the pair of character references.

A code sample follows, my motivation follows that for those that might
be interested.

import java.io.StringReader;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;

public class HiccupingProcessor {

   public static void main(String[] args)
     throws XMLStreamException {

     XMLEventReader eventReader =
       XMLInputFactory
         .newInstance()
         .createXMLEventReader(
           new StringReader("<example>&#x1D43A;</example>"));

     XMLEventWriter eventWriter =
       XMLOutputFactory
         .newInstance()
         .createXMLEventWriter(
             System.out, "US-ASCII");

     while (eventReader.hasNext()) {
       // some irrelevant processing goes here
       eventWriter.add(eventReader.nextEvent());
     }
     eventWriter.close();

     // prints:
     // <?xml version="1.0" ?><example>&#xd835;&#xdc3a;</example>

     // rather than the desired:
     // <example>&#x1D43A;</example>
   }
}



I edit this text in my text editor which supports XML syntax
highlighting and folding. I then process it and may edit again,
possibly cycling several iterations. I want to be able to easily see
characters outside US-ASCII and the syntax highlighting makes that
possible. If I don't remember what a particular character is I can
easily look it up but not so with the surrogate pair - then I must
transcode before looking it up. The prepended XML declaration is not
needed at this stage and has become a simple irritation.

Back to comp.lang.java.help | Previous | NextNext in thread | Find similar | Unroll thread


Thread

XMLEventWriter and numeric character references Jeff Higgins <jeff@invalid.invalid> - 2013-02-05 03:52 -0500
  [Update] XMLEventWriter and numeric character references Jeff Higgins <jeff@invalid.invalid> - 2013-02-05 20:52 -0500

csiph-web