Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.help > #2476 > unrolled thread
| Started by | Jeff Higgins <jeff@invalid.invalid> |
|---|---|
| First post | 2013-02-05 03:52 -0500 |
| Last post | 2013-02-05 20:52 -0500 |
| Articles | 2 — 1 participant |
Back to article view | Back to comp.lang.java.help
XMLEventWriter and numeric character references Jeff Higgins <jeff@invalid.invalid> - 2013-02-05 03:52 -0500
[Update] XMLEventWriter and numeric character references Jeff Higgins <jeff@invalid.invalid> - 2013-02-05 20:52 -0500
| From | Jeff Higgins <jeff@invalid.invalid> |
|---|---|
| Date | 2013-02-05 03:52 -0500 |
| Subject | XMLEventWriter and numeric character references |
| Message-ID | <keqget$g1q$1@dont-email.me> |
Hi
I have some text, well-formed XML, US-ASCII encoded, which contains
some numeric character references outside the BMP. I process this text
with the StAX event iterator API shipped with the latest (as of this
writing) Oracle JDK. All is well with the process except a couple of
hiccups; a prepended XML declaration, and the character references are
printed as a surrogate pair of character references rather than the
single numeric character reference I desire.
I cannot find a way to change this behavior with the StAX event
iterator API and am hoping that there is a way, and that you can help
me find it. I would not like to have to change APIs just to eliminate
the minor irritation of the prepended declaration and the annoying
hindrance of the pair of character references.
A code sample follows, my motivation follows that for those that might
be interested.
import java.io.StringReader;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
public class HiccupingProcessor {
public static void main(String[] args)
throws XMLStreamException {
XMLEventReader eventReader =
XMLInputFactory
.newInstance()
.createXMLEventReader(
new StringReader("<example>𝐺</example>"));
XMLEventWriter eventWriter =
XMLOutputFactory
.newInstance()
.createXMLEventWriter(
System.out, "US-ASCII");
while (eventReader.hasNext()) {
// some irrelevant processing goes here
eventWriter.add(eventReader.nextEvent());
}
eventWriter.close();
// prints:
// <?xml version="1.0" ?><example>��</example>
// rather than the desired:
// <example>𝐺</example>
}
}
I edit this text in my text editor which supports XML syntax
highlighting and folding. I then process it and may edit again,
possibly cycling several iterations. I want to be able to easily see
characters outside US-ASCII and the syntax highlighting makes that
possible. If I don't remember what a particular character is I can
easily look it up but not so with the surrogate pair - then I must
transcode before looking it up. The prepended XML declaration is not
needed at this stage and has become a simple irritation.
[toc] | [next] | [standalone]
| From | Jeff Higgins <jeff@invalid.invalid> |
|---|---|
| Date | 2013-02-05 20:52 -0500 |
| Subject | [Update] XMLEventWriter and numeric character references |
| Message-ID | <kesc6c$7qv$1@dont-email.me> |
| In reply to | #2476 |
From XMLStreamWriterImpl:
for (int index = 0; index < end; index++) {
char ch = content.charAt(index);
if (fEncoder != null && !fEncoder.canEncode(ch)){
fWriter.write(content, startWritePos, index - startWritePos );
// Escape this char as underlying encoder cannot handle it
fWriter.write( "&#x" );
fWriter.write(Integer.toHexString(ch));
fWriter.write( ';' );
startWritePos = index + 1;
continue;
}
So yeah, short of a custom XMLStreamWriter
I can add a clause to my processor:
while (eventReader.hasNext()) {
XMLEvent e = eventReader.nextEvent();
if (e.isCharacters() && e.asCharacters().getData().length() == 2) {
if (Character.isHighSurrogate(e.asCharacters().getData().charAt(0))
&&
Character.isLowSurrogate(e.asCharacters().getData().charAt(1))) {
int cp = Character.toCodePoint(e.asCharacters().getData().charAt(0),
e.asCharacters().getData().charAt(1));
eventWriter.add(eventFactory.createEntityReference("#x"
+ Integer.toHexString(cp).toUpperCase(), null));
} else
eventWriter.add(e);
} else
eventWriter.add(e);
}
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.java.help
csiph-web