Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #11401 > unrolled thread

what is the bettter/performant way to compare org.w3c.dom.DocumentFragment

Started byMausam <mausamhere@gmail.com>
First post2012-01-17 07:03 -0800
Last post2012-01-17 21:29 -0800
Articles 8 — 3 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  what is the bettter/performant way to compare org.w3c.dom.DocumentFragment Mausam <mausamhere@gmail.com> - 2012-01-17 07:03 -0800
    Re: what is the bettter/performant way to compare org.w3c.dom.DocumentFragment Jeff Higgins <jeff@invalid.invalid> - 2012-01-17 12:32 -0500
      Re: what is the bettter/performant way to compare org.w3c.dom.DocumentFragment Mausam <mausamhere@gmail.com> - 2012-01-17 17:56 -0800
        Re: what is the bettter/performant way to compare org.w3c.dom.DocumentFragment Jeff Higgins <jeff@invalid.invalid> - 2012-01-17 21:33 -0500
          Re: what is the bettter/performant way to compare org.w3c.dom.DocumentFragment Arne Vajhøj <arne@vajhoej.dk> - 2012-01-17 21:56 -0500
    Re: what is the bettter/performant way to compare org.w3c.dom.DocumentFragment Arne Vajhøj <arne@vajhoej.dk> - 2012-01-17 18:38 -0500
      Re: what is the bettter/performant way to compare org.w3c.dom.DocumentFragment Arne Vajhøj <arne@vajhoej.dk> - 2012-01-17 21:55 -0500
        Re: what is the bettter/performant way to compare org.w3c.dom.DocumentFragment Mausam <mausamhere@gmail.com> - 2012-01-17 21:29 -0800

#11401 — what is the bettter/performant way to compare org.w3c.dom.DocumentFragment

FromMausam <mausamhere@gmail.com>
Date2012-01-17 07:03 -0800
Subjectwhat is the bettter/performant way to compare org.w3c.dom.DocumentFragment
Message-ID<16090858.1983.1326812606093.JavaMail.geo-discussion-forums@prig11>
I have a java class, whose contains a DocumentFragment.

In the equals method of my class, I am converting the DocumentFragment to a String and comparing an equals on the String.

I know this is not the best way, because "attributes" e.g can change order in Element of DocumentFragment, or e.g documents differ only in the sequence of unordered elements.

So in such cases this equality will fail.

Please suggest a better approach.

[toc] | [next] | [standalone]


#11411

FromJeff Higgins <jeff@invalid.invalid>
Date2012-01-17 12:32 -0500
Message-ID<jf4asm$g9h$1@dont-email.me>
In reply to#11401
On 01/17/2012 10:03 AM, Mausam wrote:
> I have a java class, whose contains a DocumentFragment.
>
> In the equals method of my class, I am converting the DocumentFragment to a String and comparing an equals on the String.
>
> I know this is not the best way, because "attributes" e.g can change order in Element of DocumentFragment, or e.g documents differ only in the sequence of unordered elements.
>
> So in such cases this equality will fail.
>
> Please suggest a better approach.
A my class is equal to another my class if and only if ...

[toc] | [prev] | [next] | [standalone]


#11445

FromMausam <mausamhere@gmail.com>
Date2012-01-17 17:56 -0800
Message-ID<11466894.1272.1326851809373.JavaMail.geo-discussion-forums@prdh15>
In reply to#11411
On Tuesday, 17 January 2012 23:02:29 UTC+5:30, Jeff Higgins  wrote:
> On 01/17/2012 10:03 AM, Mausam wrote:
> > I have a java class, whose contains a DocumentFragment.
> >
> > In the equals method of my class, I am converting the DocumentFragment to a String and comparing an equals on the String.
> >
> > I know this is not the best way, because "attributes" e.g can change order in Element of DocumentFragment, or e.g documents differ only in the sequence of unordered elements.
> >
> > So in such cases this equality will fail.
> >
> > Please suggest a better approach.
> A my class is equal to another my class if and only if ...

Thanks Jeff, I understand what you mean. 

BTW, I was checking the API http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Node.html#isEqualNode%28org.w3c.dom.Node%29

The attributes NamedNodeMaps are equal. This is: they are both null, or they have the same length and for each node that exists in one map there is a node that exists in the other map and is equal, although not necessarily at the same index.


The childNodes NodeLists are equal. This is: they are both null, or they have the same length and contain equal nodes at the same index. Note that normalization can affect equality; to avoid this, nodes should be normalized before being compared. 

Here for attributes, they take care of "NOT necessarily at the same index" but in case of childNodes its not being taken care of. So if there is a sequence of unordered elements (<emp/><dept/> and <dept/><emp/> ) they will be treated as NOT equal.

So either I iterate through each node and attribute and do a comparison. That's the fall back. But before that, I wanted to check the experts if there are better options.

[toc] | [prev] | [next] | [standalone]


#11447

FromJeff Higgins <jeff@invalid.invalid>
Date2012-01-17 21:33 -0500
Message-ID<jf5ajc$arn$1@dont-email.me>
In reply to#11445
On 01/17/2012 08:56 PM, Mausam wrote:
> On Tuesday, 17 January 2012 23:02:29 UTC+5:30, Jeff Higgins  wrote:
>> On 01/17/2012 10:03 AM, Mausam wrote:
>>> I have a java class, whose contains a DocumentFragment.
>>>
>>> In the equals method of my class, I am converting the DocumentFragment to a String and comparing an equals on the String.
>>>
>>> I know this is not the best way, because "attributes" e.g can change order in Element of DocumentFragment, or e.g documents differ only in the sequence of unordered elements.
>>>
>>> So in such cases this equality will fail.
>>>
>>> Please suggest a better approach.
>> A my class is equal to another my class if and only if ...
>
> Thanks Jeff, I understand what you mean.
>
> BTW, I was checking the API http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Node.html#isEqualNode%28org.w3c.dom.Node%29
>
> The attributes NamedNodeMaps are equal. This is: they are both null, or they have the same length and for each node that exists in one map there is a node that exists in the other map and is equal, although not necessarily at the same index.
>
>
> The childNodes NodeLists are equal. This is: they are both null, or they have the same length and contain equal nodes at the same index. Note that normalization can affect equality; to avoid this, nodes should be normalized before being compared.
>
> Here for attributes, they take care of "NOT necessarily at the same index" but in case of childNodes its not being taken care of. So if there is a sequence of unordered elements (<emp/><dept/>  and<dept/><emp/>  ) they will be treated as NOT equal.
>
> So either I iterate through each node and attribute and do a comparison. That's the fall back. But before that, I wanted to check the experts if there are better options.

Yep. I based my hair trigger response upon the .equals(Object) of the 
"known implementing classes" of Node. Sorry. I'll be interested in 
finding out the "cost" associated with Arne Vajhøj's response.

[toc] | [prev] | [next] | [standalone]


#11449

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-01-17 21:56 -0500
Message-ID<4f1634eb$0$287$14726298@news.sunsite.dk>
In reply to#11447
On 1/17/2012 9:33 PM, Jeff Higgins wrote:
> On 01/17/2012 08:56 PM, Mausam wrote:
>> On Tuesday, 17 January 2012 23:02:29 UTC+5:30, Jeff Higgins wrote:
>>> On 01/17/2012 10:03 AM, Mausam wrote:
>>>> I have a java class, whose contains a DocumentFragment.
>>>>
>>>> In the equals method of my class, I am converting the
>>>> DocumentFragment to a String and comparing an equals on the String.
>>>>
>>>> I know this is not the best way, because "attributes" e.g can change
>>>> order in Element of DocumentFragment, or e.g documents differ only
>>>> in the sequence of unordered elements.
>>>>
>>>> So in such cases this equality will fail.
>>>>
>>>> Please suggest a better approach.
>>> A my class is equal to another my class if and only if ...
>>
>> Thanks Jeff, I understand what you mean.
>>
>> BTW, I was checking the API
>> http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Node.html#isEqualNode%28org.w3c.dom.Node%29
>>
>>
>> The attributes NamedNodeMaps are equal. This is: they are both null,
>> or they have the same length and for each node that exists in one map
>> there is a node that exists in the other map and is equal, although
>> not necessarily at the same index.
>>
>>
>> The childNodes NodeLists are equal. This is: they are both null, or
>> they have the same length and contain equal nodes at the same index.
>> Note that normalization can affect equality; to avoid this, nodes
>> should be normalized before being compared.
>>
>> Here for attributes, they take care of "NOT necessarily at the same
>> index" but in case of childNodes its not being taken care of. So if
>> there is a sequence of unordered elements (<emp/><dept/>
>> and<dept/><emp/> ) they will be treated as NOT equal.
>>
>> So either I iterate through each node and attribute and do a
>> comparison. That's the fall back. But before that, I wanted to check
>> the experts if there are better options.
>
> Yep. I based my hair trigger response upon the .equals(Object) of the
> "known implementing classes" of Node. Sorry. I'll be interested in
> finding out the "cost" associated with Arne Vajhøj's response.

The cost is CPU time. It cost a bit of CPU time to parse and
reorganize and serialize again.

Arne

[toc] | [prev] | [next] | [standalone]


#11435

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-01-17 18:38 -0500
Message-ID<4f160666$0$289$14726298@news.sunsite.dk>
In reply to#11401
On 1/17/2012 10:03 AM, Mausam wrote:
> I have a java class, whose contains a DocumentFragment.
>
> In the equals method of my class, I am converting the DocumentFragment to a String and comparing an equals on the String.
>
> I know this is not the best way, because "attributes" e.g can change order in Element of DocumentFragment, or e.g documents differ only in the sequence of unordered elements.
>
> So in such cases this equality will fail.

I think XML Canonicalization will solve the problem.

It comes as a cost though.

Arne

[toc] | [prev] | [next] | [standalone]


#11448

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-01-17 21:55 -0500
Message-ID<4f1634ad$0$287$14726298@news.sunsite.dk>
In reply to#11435
On 1/17/2012 6:38 PM, Arne Vajhøj wrote:
> On 1/17/2012 10:03 AM, Mausam wrote:
>> I have a java class, whose contains a DocumentFragment.
>>
>> In the equals method of my class, I am converting the DocumentFragment
>> to a String and comparing an equals on the String.
>>
>> I know this is not the best way, because "attributes" e.g can change
>> order in Element of DocumentFragment, or e.g documents differ only in
>> the sequence of unordered elements.
>>
>> So in such cases this equality will fail.
>
> I think XML Canonicalization will solve the problem.
>
> It comes as a cost though.

Example:

import java.io.IOException;
import java.io.UnsupportedEncodingException;

import javax.xml.parsers.ParserConfigurationException;

import org.apache.xml.security.Init;
import org.apache.xml.security.c14n.CanonicalizationException;
import org.apache.xml.security.c14n.Canonicalizer;
import org.apache.xml.security.c14n.InvalidCanonicalizerException;
import org.xml.sax.SAXException;

public class XmlComp {
	static {
		Init.init();
	}
	private static String canonicalize(String s) throws 
InvalidCanonicalizerException, UnsupportedEncodingException, 
CanonicalizationException, ParserConfigurationException, IOException, 
SAXException {
         Canonicalizer c14n = 
Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
         String res = new 
String(c14n.canonicalize(s.getBytes(Canonicalizer.ENCODING)), 
Canonicalizer.ENCODING);
         return res;
	}
	public static void main(String[] args) throws Exception {
		String s1 = "<a><b c='1' d='2'/></a>";
		String s2 = "<a><b d='2' c='1'/></a>";
		System.out.println(s1);
		System.out.println(s2);
		System.out.println(canonicalize(s1));
		System.out.println(canonicalize(s2));
	}
}

outputs:

<a><b c='1' d='2'/></a>
<a><b d='2' c='1'/></a>
<a><b c="1" d="2"></b></a>
<a><b c="1" d="2"></b></a>

Arne

[toc] | [prev] | [next] | [standalone]


#11452

FromMausam <mausamhere@gmail.com>
Date2012-01-17 21:29 -0800
Message-ID<17325668.425.1326864585936.JavaMail.geo-discussion-forums@yqih6>
In reply to#11448
Thanks Arne,

I can achieve that using Node.isEqualTo(Node) API post JDK1.5.

I am worried of following usecases (wondering if its even valid usecase or not)

1)
Are these two Nodes equal?  (check that one has empty street element and other has no street element. That implies that value for street is empty in both cases. So as per employee object is considered in Java, both will be equal.
<Employee company="example" xmlns="http://example.com" debug="true">
        <Employeename>mausam</Employeename>
        <email>a @example.com</email>
        <street/>
    </Employee>

<Employee  debug="true" company="example" xmlns="http://example.com">
        <Employeename>mausam</Employeename>
        <email>a @example.com</email>
    </Employee>

2)
Check the sequence of street element. In Node 1 it is after email and in node2 it is before.
<Employee company="example" xmlns="http://example.com" debug="true">
        <Employeename>mausam</Employeename>
        <email>a @example.com</email>
        <street>Marienplatz</street>
    </Employee>

<Employee  debug="true" company="example" xmlns="http://example.com">
        <Employeename>mausam</Employeename>
        <street>Marienplatz</street>
        <email>a @example.com</email>
    </Employee>

--

Please note that I can not create java objects from XMLs as those are free xml fragments and does not comply to schema. But thanks a lot for your effort and code example.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.programmer


csiph-web