Groups > comp.lang.java.programmer > #10855 > unrolled thread

xml:id

Started by	Michael Jung <miju@golem.phantasia.org>
First post	2011-12-18 21:33 +0100
Last post	2011-12-20 22:54 +0100
Articles	7 — 3 participants

Back to article view | Back to comp.lang.java.programmer

  xml:id Michael Jung <miju@golem.phantasia.org> - 2011-12-18 21:33 +0100
    Re: xml:id Lee Fesperman <firstsql@gmail.com> - 2011-12-19 00:24 -0800
      Re: xml:id Michael Jung <miju@golem.phantasia.org> - 2011-12-19 10:27 +0100
        Re: xml:id Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2011-12-19 11:25 -0400
          Re: xml:id Michael Jung <miju@golem.phantasia.org> - 2011-12-19 19:06 +0100
            Re: xml:id Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2011-12-19 17:26 -0400
              Re: xml:id Michael Jung <miju@golem.phantasia.org> - 2011-12-20 22:54 +0100

#10855 — xml:id

From	Michael Jung <miju@golem.phantasia.org>
Date	2011-12-18 21:33 +0100
Subject	xml:id
Message-ID	<87wr9t1uxp.fsf@golem.phantasia.org>

I have a problem with transfering xml:ids from one document to another,
sample code is attached. Somehow the id attribute gets lost. Am I doing
something wrong, missing something, or is this a bug in the XML libs (I
use the ones supplied with the standard JDK)? Maybe this is a "feature"?
It is rather annoying if this doesn't work, since it forces me to to
travers the tree and do id handling myself.

The code produces the same output under OpenJDK, Sun's JDK 1.6 and 1.5:
: [elem: null]
: [elem: null]
: null

=== SimleTest.java ===
import java.io.File;
import javax.xml.parsers.*;
import org.w3c.dom.*;

public class SimpleTest {
    public static void main(String[] a) throws Exception {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        dbf.setValidating(true);
        dbf.setIgnoringElementContentWhitespace(true);
        dbf.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema");
        DocumentBuilder docBuilder = dbf.newDocumentBuilder();
        Document parsed = docBuilder.parse(new File("test.xml"));
        System.out.println(parsed.getElementById("x"));
        Document parsed2 = docBuilder.parse(new File("test.xml"));
        Element el = parsed.getElementById("x");
        el.setAttribute("id", "x2");
        System.out.println(parsed.getElementById("x2"));
        // I have tried importNode as well, that even loses the "isId"
        // property of the "id" tag.
        parsed2.adoptNode(el);
        // I definitely want to avoid the next call, since I'd need to
        // traverse in production code. But it is useless anyway.
        el.setIdAttribute("id", true);
        System.out.println(parsed2.getElementById("x2"));
    }
}
=== test.xml ===
<?xml version="1.0" encoding="UTF-8"?>
<test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="./test.xsd">
   <elem id="x"/>
</test>
=== test.xsd ===
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
   <xs:element name="test">
      <xs:complexType>
         <xs:choice>
            <xs:element name="elem">
               <xs:complexType>
                  <xs:attribute name="id" type="xs:ID" />
               </xs:complexType>
            </xs:element>
         </xs:choice>
      </xs:complexType>
   </xs:element>
</xs:schema>

[toc] | [next] | [standalone]

#10862

From	Lee Fesperman <firstsql@gmail.com>
Date	2011-12-19 00:24 -0800
Message-ID	<33c64ae6-d7f1-466b-8534-35f777eb3c7c@h4g2000yqk.googlegroups.com>
In reply to	#10855

On Dec 18, 12:33 pm, Michael Jung <m...@golem.phantasia.org> wrote:
> I have a problem with transfering xml:ids from one document to another,
> sample code is attached. Somehow the id attribute gets lost. Am I doing
> something wrong, missing something, or is this a bug in the XML libs (I
> use the ones supplied with the standard JDK)? Maybe this is a "feature"?
> It is rather annoying if this doesn't work, since it forces me to to
> travers the tree and do id handling myself.
>
> The code produces the same output under OpenJDK, Sun's JDK 1.6 and 1.5:
> : [elem: null]
> : [elem: null]
> : null
>
> === SimleTest.java ===
> import java.io.File;
> import javax.xml.parsers.*;
> import org.w3c.dom.*;
>
> public class SimpleTest {
>     public static void main(String[] a) throws Exception {
>         DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
>         dbf.setNamespaceAware(true);
>         dbf.setValidating(true);
>         dbf.setIgnoringElementContentWhitespace(true);
>         dbf.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema");
>         DocumentBuilder docBuilder = dbf.newDocumentBuilder();
>         Document parsed = docBuilder.parse(new File("test.xml"));
>         System.out.println(parsed.getElementById("x"));
>         Document parsed2 = docBuilder.parse(new File("test.xml"));
>         Element el = parsed.getElementById("x");
>         el.setAttribute("id", "x2");
>         System.out.println(parsed.getElementById("x2"));
>         // I have tried importNode as well, that even loses the "isId"
>         // property of the "id" tag.
>         parsed2.adoptNode(el);
>         // I definitely want to avoid the next call, since I'd need to
>         // traverse in production code. But it is useless anyway.
>         el.setIdAttribute("id", true);
>         System.out.println(parsed2.getElementById("x2"));
>     }}
>
> === test.xml ===
> <?xml version="1.0" encoding="UTF-8"?>
> <test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="./test.xsd">
>    <elem id="x"/>
> </test>
> === test.xsd ===
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
>    <xs:element name="test">
>       <xs:complexType>
>          <xs:choice>
>             <xs:element name="elem">
>                <xs:complexType>
>                   <xs:attribute name="id" type="xs:ID" />
>                </xs:complexType>
>             </xs:element>
>          </xs:choice>
>       </xs:complexType>
>    </xs:element>
> </xs:schema>

Try removing 'elementFormDefault="qualified"' from your schema (xsd).
Some schema processors may require qualification in your xml (for
'elem'), even though qualification is not possible in your case ...
because you have no (target)Namespace.

--
Lee Fesperman, FirstSQL Software (http://www.firstsql.com)
=============================================================
* Pure Java implementation, runs on cellphones to mainframes
* FirstSQL/J Object/Relational DBMS (http://www.firstsql.com)

[toc] | [prev] | [next] | [standalone]

#10864

From	Michael Jung <miju@golem.phantasia.org>
Date	2011-12-19 10:27 +0100
Message-ID	<87ehw0ap2i.fsf@golem.phantasia.org>
In reply to	#10862

Lee Fesperman <firstsql@gmail.com> writes:
> On Dec 18, 12:33 pm, Michael Jung <m...@golem.phantasia.org> wrote:
>> I have a problem with transfering xml:ids from one document to another,
>> sample code is attached. Somehow the id attribute gets lost. Am I doing
>> something wrong, missing something, or is this a bug in the XML libs (I
>> use the ones supplied with the standard JDK)? Maybe this is a "feature"?
>> It is rather annoying if this doesn't work, since it forces me to to
>> travers the tree and do id handling myself.
>>
>> The code produces the same output under OpenJDK, Sun's JDK 1.6 and 1.5:
>> : [elem: null]
>> : [elem: null]
>> : null
>>
>> === SimleTest.java ===
[...]
>> === test.xml ===
[..]
>> === test.xsd ===
>> <?xml version="1.0" encoding="UTF-8"?>
>> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
[...]

> Try removing 'elementFormDefault="qualified"' from your schema (xsd).
> Some schema processors may require qualification in your xml (for
> 'elem'), even though qualification is not possible in your case ...
> because you have no (target)Namespace.

I have (a) removed the elementForDefault, (b) set it to unqualified, and (c)
even added namespaces:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://z" elementFormDefault="qualified">

and

<test xmlns="http://z" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://z ./test.xsd">

No difference in output.  I also don't think that this has to do with
the validation process, since that has passed successfully. The ID
property is known for both parsed files. It is simply forgotten during
adopt(ion) of a node. (Though both documents even adhere to the same schema!)

Michael

[toc] | [prev] | [next] | [standalone]

#10874

From	Arved Sandstrom <asandstrom3minus1@eastlink.ca>
Date	2011-12-19 11:25 -0400
Message-ID	<zHIHq.27163$8O1.9231@newsfe07.iad>
In reply to	#10864

On 11-12-19 05:27 AM, Michael Jung wrote:
[ SNIP ]

> No difference in output.  I also don't think that this has to do with
> the validation process, since that has passed successfully. The ID
> property is known for both parsed files. It is simply forgotten during
> adopt(ion) of a node. (Though both documents even adhere to the same schema!)
> 
> Michael

Not "forgotten" exactly. I tested with JDK 1.6 and 1.7, and I find that
if the adopted node is placed _somewhere_ (appendChild or what have
you), that if you obtain the list of "elem" elements in the new
document, that you'll have 2 of them, and that if you inspect the isId()
value of the 'id" attributes, that both of them are TRUE.

Furthermore, if you retrieve by getElementById(), using "x" actually
returns an element, but using "x2" does not. So that tells me that
things are dubious overall if you've performed your scenario.

Seems to me that this is creating undefined behaviour (getElementById,
for example, calls this out). Neither importNode nor adoptNode say that
they are _replacing_ anything, for starters.

Further experimentation indicates (to me) that
Document.normalizeDocument() is either neutral or unhelpful at any point
that I've tried.

What *does* work is to (1) remove the first element, the one with value
"x", and (2) to then setIdAttribute() on the adopted/placed node with id
attribute of "x2".

Maybe I'm missing something, but if you're willing to call adoptNode()
in the course of doing what you're doing, what's the problem with
removing the element that is going to conflict, and also calling
setIdAttribute()?

AHS

[toc] | [prev] | [next] | [standalone]

#10881

From	Michael Jung <miju@golem.phantasia.org>
Date	2011-12-19 19:06 +0100
Message-ID	<87vcpctozy.fsf@golem.phantasia.org>
In reply to	#10874

Arved Sandstrom <asandstrom3minus1@eastlink.ca> writes:
> On 11-12-19 05:27 AM, Michael Jung wrote:
[...]
> Not "forgotten" exactly. I tested with JDK 1.6 and 1.7, and I find that
> if the adopted node is placed _somewhere_ (appendChild or what have
> you), that if you obtain the list of "elem" elements in the new
> document, that you'll have 2 of them, and that if you inspect the isId()
> value of the 'id" attributes, that both of them are TRUE.
> Furthermore, if you retrieve by getElementById(), using "x" actually
> returns an element, but using "x2" does not. So that tells me that
> things are dubious overall if you've performed your scenario.
[...]

Some of this code was apparently garbled in my attempt to tone the
working example down. Here is some changed code that still yields the
output above.

[...]
   Document parsed = docBuilder.parse(new File("src/test.xml"));
   Element el = parsed.getElementById("x");
   System.out.println(el);

   Document parsed2 = docBuilder.parse(new File("src/test2.xml"));
   Element el2 = parsed2.getElementById("x");
   System.out.println(el2);

   parsed2.adoptNode(el);
   //parsed2.getDocumentElement().removeChild(el2);
   //parsed2.getDocumentElement().appendChild(el);
   parsed2.getDocumentElement().replaceChild(el, el2);
   System.out.println(parsed2.getElementById("x"));
[...]

(Create a copy of test.xml called test2.xml.) What I am attempting to
achieve should be obvious by now: replace one node with the other from
a different file.  The commented out lines replacing the one following
them also doesn't work.

> What *does* work is to (1) remove the first element, the one with value
> "x", and (2) to then setIdAttribute() on the adopted/placed node with id
> attribute of "x2".

That is true: ie. putting "el.setIdAttribute("id", true);" somewhere
in the code above.  This way I need to know the id attribute's name,
but I can live with that.

> Maybe I'm missing something, but if you're willing to call adoptNode()
> in the course of doing what you're doing, what's the problem with
> removing the element that is going to conflict, and also calling
> setIdAttribute()?

setIdAttribute seems out of place. It's not likely that the schema
changes in that respect but it still is odd.  And parsing the schema
separately just for that is overkill. At least this issue could be
documented somewhere in the javadoc (I couldn't find anything). Now,
if there were a getIdAttribute that would be better.

Thanks.

Michael

[toc] | [prev] | [next] | [standalone]

#10883

From	Arved Sandstrom <asandstrom3minus1@eastlink.ca>
Date	2011-12-19 17:26 -0400
Message-ID	<y_NHq.26052$Q83.9160@newsfe17.iad>
In reply to	#10881

On 11-12-19 02:06 PM, Michael Jung wrote:
> Arved Sandstrom <asandstrom3minus1@eastlink.ca> writes:
>> On 11-12-19 05:27 AM, Michael Jung wrote:
> [...]
>> Not "forgotten" exactly. I tested with JDK 1.6 and 1.7, and I find that
>> if the adopted node is placed _somewhere_ (appendChild or what have
>> you), that if you obtain the list of "elem" elements in the new
>> document, that you'll have 2 of them, and that if you inspect the isId()
>> value of the 'id" attributes, that both of them are TRUE.
>> Furthermore, if you retrieve by getElementById(), using "x" actually
>> returns an element, but using "x2" does not. So that tells me that
>> things are dubious overall if you've performed your scenario.
> [...]
> 
> Some of this code was apparently garbled in my attempt to tone the
> working example down. Here is some changed code that still yields the
> output above.
> 
> [...]
>    Document parsed = docBuilder.parse(new File("src/test.xml"));
>    Element el = parsed.getElementById("x");
>    System.out.println(el);
> 
>    Document parsed2 = docBuilder.parse(new File("src/test2.xml"));
>    Element el2 = parsed2.getElementById("x");
>    System.out.println(el2);
> 
>    parsed2.adoptNode(el);
>    //parsed2.getDocumentElement().removeChild(el2);
>    //parsed2.getDocumentElement().appendChild(el);
>    parsed2.getDocumentElement().replaceChild(el, el2);
>    System.out.println(parsed2.getElementById("x"));
> [...]
> 
> (Create a copy of test.xml called test2.xml.) What I am attempting to
> achieve should be obvious by now: replace one node with the other from
> a different file.  The commented out lines replacing the one following
> them also doesn't work.
> 
>> What *does* work is to (1) remove the first element, the one with value
>> "x", and (2) to then setIdAttribute() on the adopted/placed node with id
>> attribute of "x2".
> 
> That is true: ie. putting "el.setIdAttribute("id", true);" somewhere
> in the code above.  This way I need to know the id attribute's name,
> but I can live with that.
> 
>> Maybe I'm missing something, but if you're willing to call adoptNode()
>> in the course of doing what you're doing, what's the problem with
>> removing the element that is going to conflict, and also calling
>> setIdAttribute()?
> 
> setIdAttribute seems out of place. It's not likely that the schema
> changes in that respect but it still is odd.  And parsing the schema
> separately just for that is overkill. At least this issue could be
> documented somewhere in the javadoc (I couldn't find anything). Now,
> if there were a getIdAttribute that would be better.
> 
> Thanks.
> 
> Michael

I can sort of see why we end up with these problems. Neither adoptNode
nor importNode provide a parent for the adopted/imported node in the
target document, which is why both methods return the actual
adopted/imported node (one that has the correct owner document).

Until we know where the adopted/imported node is placed in the target
document, it's not possible to determine whether attribute "id" of
element "elem" is actually of type xs:ID. As you know we could easily
have a schema that declares two elements at different places in the
hierarchy, each with tag name "elem", each with an attribute "id", where
those 2 attributes could either none of them, one of them or both of
them be declared as type xs:ID.

So we surmise that we have to take the adopted/imported "elem" element,
and parent it somewhere, where if "Something Else" (TM) happened that
the "id" attribute would be identified as being of xs:ID type.

I might add at this juncture, I'm not convinced that your

parsed2.getDocumentElement().replaceChild(el, el2);

will work. Node el2 is in document 2, but el is still in document 1.
That's why the returned value from adoptNode() is handy. replaceChild()
does work just fine if you use that value.

I would not expect any of these methods to re-validate and therefore I
am not surprised that if this is all we do, that the newly parented
adopted/imported node is not found with getElementById().

We also know that at this point that setIdAttribute() sets things right.
I experimented with Document.normalizeDocument() and tweaking some
DOMConfiguration parameters in the hopes of finding a way of at least
not having to specify the xs:ID attribute. But this seems not to work.
So I think we're stuck with setIdAttribute().

I'm no expert at DOM (I hate it actually :-)) but I hope the above
analysis shows why I at least am not particularly surprised that all of
this has to be done.

AHS

[toc] | [prev] | [next] | [standalone]

#10915

From	Michael Jung <miju@golem.phantasia.org>
Date	2011-12-20 22:54 +0100
Message-ID	<871uryewoj.fsf@golem.phantasia.org>
In reply to	#10883

Arved Sandstrom <asandstrom3minus1@eastlink.ca> writes:

> On 11-12-19 02:06 PM, Michael Jung wrote:
>> Arved Sandstrom <asandstrom3minus1@eastlink.ca> writes:
>>> On 11-12-19 05:27 AM, Michael Jung wrote:
>> [...]
>>> Not "forgotten" exactly. I tested with JDK 1.6 and 1.7, and I find that
>>> if the adopted node is placed _somewhere_ (appendChild or what have
>>> you), that if you obtain the list of "elem" elements in the new
>>> document, that you'll have 2 of them, and that if you inspect the isId()
>>> value of the 'id" attributes, that both of them are TRUE.
>>> Furthermore, if you retrieve by getElementById(), using "x" actually
>>> returns an element, but using "x2" does not. So that tells me that
>>> things are dubious overall if you've performed your scenario.
[...]
>>    Document parsed = docBuilder.parse(new File("src/test.xml"));
>>    Element el = parsed.getElementById("x");
>>    System.out.println(el);
>> 
>>    Document parsed2 = docBuilder.parse(new File("src/test2.xml"));
>>    Element el2 = parsed2.getElementById("x");
>>    System.out.println(el2);
>> 
>>    parsed2.adoptNode(el);
>>    //parsed2.getDocumentElement().removeChild(el2);
>>    //parsed2.getDocumentElement().appendChild(el);
>>    parsed2.getDocumentElement().replaceChild(el, el2);
>>    System.out.println(parsed2.getElementById("x"));
[...]
>> setIdAttribute seems out of place. It's not likely that the schema
>> changes in that respect but it still is odd.  And parsing the schema
>> separately just for that is overkill. At least this issue could be
>> documented somewhere in the javadoc (I couldn't find anything). Now,
>> if there were a getIdAttribute that would be better.
[...]
> So we surmise that we have to take the adopted/imported "elem" element,
> and parent it somewhere, where if "Something Else" (TM) happened that
> the "id" attribute would be identified as being of xs:ID type.

Clear enough. The println's up in the code come for free.

> I might add at this juncture, I'm not convinced that your
>
> parsed2.getDocumentElement().replaceChild(el, el2);
>
> will work. Node el2 is in document 2, but el is still in document 1.

It was adopted. That should move it out of doc-1 and somewhere in the
"orphanage" of doc-2. (At least that is what the javadoc says.)  Also
parsed2.adoptNode(el).equals(el) evaluates to true.

> That's why the returned value from adoptNode() is handy. replaceChild()
> does work just fine if you use that value.

Tried that and removed the setIdAttribute; fails.

> I would not expect any of these methods to re-validate and therefore I
> am not surprised that if this is all we do, that the newly parented
> adopted/imported node is not found with getElementById().

That would invalidate a document any time a node is added (and removed).
I understand that this is due to performance, but shouldn't there be
helper methods that verify that the document is valid after some
transformation? And some hints in the javadoc? The only way I see out of
this (if I had deep ids) is by transforming and reparsing the full
document.

> We also know that at this point that setIdAttribute() sets things
> right. [...]  So I think we're stuck with setIdAttribute().

See above. Luckily I don't have deep ids.  (Some XPath would help that,
of course. But it all seems overkill.)

> I'm no expert at DOM (I hate it actually :-)) but I hope the above
> analysis shows why I at least am not particularly surprised that all of
> this has to be done.

I get the point :-)

Michael

[toc] | [prev] | [standalone]

csiph-web

xml:id

Contents

#10855 — xml:id

#10862

#10864

#10874

#10881

#10883

#10915