Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail From: Robert Klemme Newsgroups: comp.lang.java.programmer Subject: Re: SAX parser splits URL ... Date: Tue, 26 Jun 2012 23:21:46 -0700 (PDT) Organization: http://groups.google.com Lines: 28 Message-ID: References: <1340769034.526896@nntp.aceinnovative.com> NNTP-Posting-Host: 193.0.246.4 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-Trace: posting.google.com 1340778106 1531 127.0.0.1 (27 Jun 2012 06:21:46 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Wed, 27 Jun 2012 06:21:46 +0000 (UTC) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=193.0.246.4; posting-account=MGO7qgoAAABvyo26eHVDO00044spH-ws User-Agent: G2/1.0 Xref: csiph.com comp.lang.java.programmer:15655 On Wednesday, June 27, 2012 7:34:18 AM UTC+2, Robert Klemme wrote: > On 27.06.2012 05:50, lbrt chx _ gemale wrote: > > I have an URL in an XML file that looks like this: > > ~ > > ... > > http://pagesinxt.com/?dn=www.outfo.org&flrdr=yes&nxte=zip > > ... > > ~ > > http://xsdvalidation.utilities-online.info/ > > ~ > > is telling me the document itself is valid, but the SAX parser is > > splitting the value at every "&" > > ~ > > // __ start element iIxLvl: |3|Location > > // __ start characters iIxLvl: |3|http://pagesinxt.com/?dn=www.outfo.org| > > // __ start characters iIxLvl: |3|&| > > // __ start characters iIxLvl: |3|flrdr=yes| > > // __ start characters iIxLvl: |3|&| > > // __ start characters iIxLvl: |3|nxte=zip| > > // __ end element iIxLvl: |2|Location| I forgot to mention one thing: the SAX parser is quite free to hand over character sequences in any number of chunks as long as it maintains original order from the document and ensures all characters come from the same external entity. See: http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html#characters%28char[],%20int,%20int%29 Kind regards robert