Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.help > #2476 > unrolled thread

XMLEventWriter and numeric character references

Started byJeff Higgins <jeff@invalid.invalid>
First post2013-02-05 03:52 -0500
Last post2013-02-05 20:52 -0500
Articles 2 — 1 participant

Back to article view | Back to comp.lang.java.help


Contents

  XMLEventWriter and numeric character references Jeff Higgins <jeff@invalid.invalid> - 2013-02-05 03:52 -0500
    [Update] XMLEventWriter and numeric character references Jeff Higgins <jeff@invalid.invalid> - 2013-02-05 20:52 -0500

#2476 — XMLEventWriter and numeric character references

FromJeff Higgins <jeff@invalid.invalid>
Date2013-02-05 03:52 -0500
SubjectXMLEventWriter and numeric character references
Message-ID<keqget$g1q$1@dont-email.me>
Hi
I have some text, well-formed XML, US-ASCII encoded, which contains
some numeric character references outside the BMP. I process this text
with the StAX event iterator API shipped with the latest (as of this
writing) Oracle JDK. All is well with the process except a couple of
hiccups; a prepended XML declaration, and the character references are
printed as a surrogate pair of character references rather than the
single numeric character reference I desire.

I cannot find a way to change this behavior with the StAX event
iterator API and am hoping that there is a way, and that you can help
me find it. I would not like to have to change APIs just to eliminate
the minor irritation of the prepended declaration and the annoying
hindrance of the pair of character references.

A code sample follows, my motivation follows that for those that might
be interested.

import java.io.StringReader;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;

public class HiccupingProcessor {

   public static void main(String[] args)
     throws XMLStreamException {

     XMLEventReader eventReader =
       XMLInputFactory
         .newInstance()
         .createXMLEventReader(
           new StringReader("<example>&#x1D43A;</example>"));

     XMLEventWriter eventWriter =
       XMLOutputFactory
         .newInstance()
         .createXMLEventWriter(
             System.out, "US-ASCII");

     while (eventReader.hasNext()) {
       // some irrelevant processing goes here
       eventWriter.add(eventReader.nextEvent());
     }
     eventWriter.close();

     // prints:
     // <?xml version="1.0" ?><example>&#xd835;&#xdc3a;</example>

     // rather than the desired:
     // <example>&#x1D43A;</example>
   }
}



I edit this text in my text editor which supports XML syntax
highlighting and folding. I then process it and may edit again,
possibly cycling several iterations. I want to be able to easily see
characters outside US-ASCII and the syntax highlighting makes that
possible. If I don't remember what a particular character is I can
easily look it up but not so with the surrogate pair - then I must
transcode before looking it up. The prepended XML declaration is not
needed at this stage and has become a simple irritation.

[toc] | [next] | [standalone]


#2477 — [Update] XMLEventWriter and numeric character references

FromJeff Higgins <jeff@invalid.invalid>
Date2013-02-05 20:52 -0500
Subject[Update] XMLEventWriter and numeric character references
Message-ID<kesc6c$7qv$1@dont-email.me>
In reply to#2476
 From XMLStreamWriterImpl:

   for (int index = 0; index < end; index++) {
     char ch = content.charAt(index);

     if (fEncoder != null && !fEncoder.canEncode(ch)){
       fWriter.write(content, startWritePos, index - startWritePos );

       // Escape this char as underlying encoder cannot handle it
       fWriter.write( "&#x" );
       fWriter.write(Integer.toHexString(ch));
       fWriter.write( ';' );
       startWritePos = index + 1;
       continue;
     }

So yeah, short of a custom XMLStreamWriter
I can add a clause to my processor:

while (eventReader.hasNext()) {
   XMLEvent e = eventReader.nextEvent();
   if (e.isCharacters() && e.asCharacters().getData().length() == 2) {
     if (Character.isHighSurrogate(e.asCharacters().getData().charAt(0))
         && 
Character.isLowSurrogate(e.asCharacters().getData().charAt(1))) {
       int cp = Character.toCodePoint(e.asCharacters().getData().charAt(0),
           e.asCharacters().getData().charAt(1));
       eventWriter.add(eventFactory.createEntityReference("#x"
           + Integer.toHexString(cp).toUpperCase(), null));
     } else
       eventWriter.add(e);
   } else
     eventWriter.add(e);
}

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.help


csiph-web