Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #15652 > unrolled thread
| Started by | lbrt chx _ gemale |
|---|---|
| First post | 2012-06-27 03:50 +0000 |
| Last post | 2012-06-27 11:32 +0200 |
| Articles | 4 — 3 participants |
Back to article view | Back to comp.lang.java.programmer
SAX parser splits URL ... lbrt chx _ gemale - 2012-06-27 03:50 +0000
Re: SAX parser splits URL ... Robert Klemme <shortcutter@googlemail.com> - 2012-06-27 07:34 +0200
Re: SAX parser splits URL ... Robert Klemme <shortcutter@googlemail.com> - 2012-06-26 23:21 -0700
Re: SAX parser splits URL ... "mayeul.marguet" <mayeul.marguet@free.fr> - 2012-06-27 11:32 +0200
| From | lbrt chx _ gemale |
|---|---|
| Date | 2012-06-27 03:50 +0000 |
| Subject | SAX parser splits URL ... |
| Message-ID | <1340769034.526896@nntp.aceinnovative.com> |
I have an URL in an XML file that looks like this: ~ ... <Location>http://pagesinxt.com/?dn=www.outfo.org&flrdr=yes&nxte=zip</Location> ... ~ http://xsdvalidation.utilities-online.info/ ~ is telling me the document itself is valid, but the SAX parser is splitting the value at every "&" ~ // __ start element iIxLvl: |3|Location // __ start characters iIxLvl: |3|http://pagesinxt.com/?dn=www.outfo.org| // __ start characters iIxLvl: |3|&| // __ start characters iIxLvl: |3|flrdr=yes| // __ start characters iIxLvl: |3|&| // __ start characters iIxLvl: |3|nxte=zip| // __ end element iIxLvl: |2|Location| ~ I found some sort of an explanation here: ~ http://stackoverflow.com/questions/1328538/how-do-i-escape-ampersands-in-xml ~ I couldn't make much sense of (I tried a few things) ~ Is this related to a setting in the parser? Is there a way to fix that problem? ~ thanks lbrtchx comp.lang.java.programmer: SAX parser splits URL ...
[toc] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2012-06-27 07:34 +0200 |
| Message-ID | <a4vkb1F60fU1@mid.individual.net> |
| In reply to | #15652 |
On 27.06.2012 05:50, lbrt chx _ gemale wrote: > I have an URL in an XML file that looks like this: > ~ > ... > <Location>http://pagesinxt.com/?dn=www.outfo.org&flrdr=yes&nxte=zip</Location> > ... > ~ > http://xsdvalidation.utilities-online.info/ > ~ > is telling me the document itself is valid, but the SAX parser is > splitting the value at every "&" > ~ > // __ start element iIxLvl: |3|Location > // __ start characters iIxLvl: |3|http://pagesinxt.com/?dn=www.outfo.org| > // __ start characters iIxLvl: |3|&| > // __ start characters iIxLvl: |3|flrdr=yes| > // __ start characters iIxLvl: |3|&| > // __ start characters iIxLvl: |3|nxte=zip| > // __ end element iIxLvl: |2|Location| > ~ > I found some sort of an explanation here: > ~ > http://stackoverflow.com/questions/1328538/how-do-i-escape-ampersands-in-xml > ~ > I couldn't make much sense of (I tried a few things) > ~ > Is this related to a setting in the parser? Is there a way to fix that problem? That's not related to the parser - at least not to a particular one. It is a feature of XML which allows you to include characters in the document which are not supported by the native encoding you use when writing the document. The concept is known as "XML entity". Please see http://www.tizag.com/xmlTutorial/xmlentity.php http://www.javacommerce.com/displaypage.jsp?name=entities.sql&id=18238 The standard http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-references Bottom line, you can do <Location>http://pagesinxt.com/?dn=www.outfo.org&flrdr=yes&nxte=zip</Location> But please read up on XML more thoroughly - it pays off. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2012-06-26 23:21 -0700 |
| Message-ID | <bbe6a483-b3b6-4dee-bdd8-66857e8ffb9f@googlegroups.com> |
| In reply to | #15654 |
On Wednesday, June 27, 2012 7:34:18 AM UTC+2, Robert Klemme wrote: > On 27.06.2012 05:50, lbrt chx _ gemale wrote: > > I have an URL in an XML file that looks like this: > > ~ > > ... > > <Location>http://pagesinxt.com/?dn=www.outfo.org&flrdr=yes&nxte=zip</Location> > > ... > > ~ > > http://xsdvalidation.utilities-online.info/ > > ~ > > is telling me the document itself is valid, but the SAX parser is > > splitting the value at every "&" > > ~ > > // __ start element iIxLvl: |3|Location > > // __ start characters iIxLvl: |3|http://pagesinxt.com/?dn=www.outfo.org| > > // __ start characters iIxLvl: |3|&| > > // __ start characters iIxLvl: |3|flrdr=yes| > > // __ start characters iIxLvl: |3|&| > > // __ start characters iIxLvl: |3|nxte=zip| > > // __ end element iIxLvl: |2|Location| I forgot to mention one thing: the SAX parser is quite free to hand over character sequences in any number of chunks as long as it maintains original order from the document and ensures all characters come from the same external entity. See: http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html#characters%28char[],%20int,%20int%29 Kind regards robert
[toc] | [prev] | [next] | [standalone]
| From | "mayeul.marguet" <mayeul.marguet@free.fr> |
|---|---|
| Date | 2012-06-27 11:32 +0200 |
| Message-ID | <4fead08e$0$6123$426a74cc@news.free.fr> |
| In reply to | #15652 |
On 27/06/2012 05:50, lbrt chx _ gemale wrote: > I have an URL in an XML file that looks like this: > ~ > ... > <Location>http://pagesinxt.com/?dn=www.outfo.org&flrdr=yes&nxte=zip</Location> > ... > ~ > http://xsdvalidation.utilities-online.info/ > ~ > is telling me the document itself is valid, but the SAX parser is splitting the value at every "&" Really? It is telling /me/: > The reference to entity "flrdr" must end with the ';' delimiter. and rightly so. It's implying, as can be checked too, that the document is not even well-formed. Says nothing about validity, which cannot be checked unless the document is well-formed. -- Mayeul
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.java.programmer
csiph-web