Groups > comp.lang.java.programmer > #15652 > unrolled thread

SAX parser splits URL ...

Started by	lbrt chx _ gemale
First post	2012-06-27 03:50 +0000
Last post	2012-06-27 11:32 +0200
Articles	4 — 3 participants

Back to article view | Back to comp.lang.java.programmer

  SAX parser splits URL ... lbrt chx _ gemale - 2012-06-27 03:50 +0000
    Re: SAX parser splits URL ... Robert Klemme <shortcutter@googlemail.com> - 2012-06-27 07:34 +0200
      Re: SAX parser splits URL ... Robert Klemme <shortcutter@googlemail.com> - 2012-06-26 23:21 -0700
    Re: SAX parser splits URL ... "mayeul.marguet" <mayeul.marguet@free.fr> - 2012-06-27 11:32 +0200

#15652 — SAX parser splits URL ...

From	lbrt chx _ gemale
Date	2012-06-27 03:50 +0000
Subject	SAX parser splits URL ...
Message-ID	<1340769034.526896@nntp.aceinnovative.com>

 I have an URL in an XML file that looks like this:
~ 
...
  <Location>http://pagesinxt.com/?dn=www.outfo.org&flrdr=yes&nxte=zip</Location>
...
~ 
 http://xsdvalidation.utilities-online.info/
~ 
 is telling me the document itself is valid, but the SAX parser is splitting the value at every "&"
~ 
// __ start element iIxLvl: |3|Location
// __ start characters iIxLvl: |3|http://pagesinxt.com/?dn=www.outfo.org|
// __ start characters iIxLvl: |3|&|
// __ start characters iIxLvl: |3|flrdr=yes|
// __ start characters iIxLvl: |3|&|
// __ start characters iIxLvl: |3|nxte=zip|
// __ end element   iIxLvl: |2|Location|
~
 I found some sort of an explanation here:
~
 http://stackoverflow.com/questions/1328538/how-do-i-escape-ampersands-in-xml
~
 I couldn't make much sense of (I tried a few things)
~
 Is this related to a setting in the parser? Is there a way to fix that problem?
~
 thanks
 lbrtchx
 comp.lang.java.programmer: SAX parser splits URL ...

[toc] | [next] | [standalone]

#15654

From	Robert Klemme <shortcutter@googlemail.com>
Date	2012-06-27 07:34 +0200
Message-ID	<a4vkb1F60fU1@mid.individual.net>
In reply to	#15652

On 27.06.2012 05:50, lbrt chx _ gemale wrote:
>   I have an URL in an XML file that looks like this:
> ~
> ...
>    <Location>http://pagesinxt.com/?dn=www.outfo.org&flrdr=yes&nxte=zip</Location>
> ...
> ~
>   http://xsdvalidation.utilities-online.info/
> ~
> is telling me the document itself is valid, but the SAX parser is
> splitting the value at every "&"
> ~
> // __ start element iIxLvl: |3|Location
> // __ start characters iIxLvl: |3|http://pagesinxt.com/?dn=www.outfo.org|
> // __ start characters iIxLvl: |3|&|
> // __ start characters iIxLvl: |3|flrdr=yes|
> // __ start characters iIxLvl: |3|&|
> // __ start characters iIxLvl: |3|nxte=zip|
> // __ end element   iIxLvl: |2|Location|
> ~
>   I found some sort of an explanation here:
> ~
>   http://stackoverflow.com/questions/1328538/how-do-i-escape-ampersands-in-xml
> ~
>   I couldn't make much sense of (I tried a few things)
> ~
>   Is this related to a setting in the parser? Is there a way to fix that problem?

That's not related to the parser - at least not to a particular one.  It 
is a feature of XML which allows you to include characters in the 
document which are not supported by the native encoding you use when 
writing the document.

The concept is known as "XML entity".  Please see
http://www.tizag.com/xmlTutorial/xmlentity.php
http://www.javacommerce.com/displaypage.jsp?name=entities.sql&id=18238

The standard
http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-references

Bottom line, you can do

<Location>http://pagesinxt.com/?dn=www.outfo.org&amp;flrdr=yes&amp;nxte=zip</Location>

But please read up on XML more thoroughly - it pays off.

Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]

#15655

From	Robert Klemme <shortcutter@googlemail.com>
Date	2012-06-26 23:21 -0700
Message-ID	<bbe6a483-b3b6-4dee-bdd8-66857e8ffb9f@googlegroups.com>
In reply to	#15654

On Wednesday, June 27, 2012 7:34:18 AM UTC+2, Robert Klemme wrote:
> On 27.06.2012 05:50, lbrt chx _ gemale wrote:
> >   I have an URL in an XML file that looks like this:
> > ~
> > ...
> >    <Location>http://pagesinxt.com/?dn=www.outfo.org&flrdr=yes&nxte=zip</Location>
> > ...
> > ~
> >   http://xsdvalidation.utilities-online.info/
> > ~
> > is telling me the document itself is valid, but the SAX parser is
> > splitting the value at every "&"
> > ~
> > // __ start element iIxLvl: |3|Location
> > // __ start characters iIxLvl: |3|http://pagesinxt.com/?dn=www.outfo.org|
> > // __ start characters iIxLvl: |3|&|
> > // __ start characters iIxLvl: |3|flrdr=yes|
> > // __ start characters iIxLvl: |3|&|
> > // __ start characters iIxLvl: |3|nxte=zip|
> > // __ end element   iIxLvl: |2|Location|

I forgot to mention one thing: the SAX parser is quite free to hand over character sequences in any number of chunks as long as it maintains original order from the document and ensures all characters come from the same external entity.  See:

http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html#characters%28char[],%20int,%20int%29

Kind regards

robert

[toc] | [prev] | [next] | [standalone]

#15656

From	"mayeul.marguet" <mayeul.marguet@free.fr>
Date	2012-06-27 11:32 +0200
Message-ID	<4fead08e$0$6123$426a74cc@news.free.fr>
In reply to	#15652

On 27/06/2012 05:50, lbrt chx _ gemale wrote:
>   I have an URL in an XML file that looks like this:
> ~
> ...
>    <Location>http://pagesinxt.com/?dn=www.outfo.org&flrdr=yes&nxte=zip</Location>
> ...
> ~
>   http://xsdvalidation.utilities-online.info/
> ~
>   is telling me the document itself is valid, but the SAX parser is splitting the value at every "&"

Really? It is telling /me/:

 > The reference to entity "flrdr" must end with the ';' delimiter.

and rightly so.
It's implying, as can be checked too, that the document is not even 
well-formed. Says nothing about validity, which cannot be checked unless 
the document is well-formed.

--
Mayeul

[toc] | [prev] | [standalone]

csiph-web

SAX parser splits URL ...

Contents

#15652 — SAX parser splits URL ...

#15654

#15655

#15656