Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #20823 > unrolled thread

Problem w/ DocumentBuilder parse method

Started by"John L." <johnlarew@sbcglobal.net>
First post2012-12-30 11:30 -0800
Last post2013-01-01 02:17 +0200
Articles 4 — 4 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  Problem w/ DocumentBuilder parse method "John L." <johnlarew@sbcglobal.net> - 2012-12-30 11:30 -0800
    Re: Problem w/ DocumentBuilder parse method Arne Vajhøj <arne@vajhoej.dk> - 2012-12-30 14:46 -0500
    Re: Problem w/ DocumentBuilder parse method Roedy Green <see_website@mindprod.com.invalid> - 2012-12-31 14:09 -0800
    Re: Problem w/ DocumentBuilder parse method Stanimir Stamenkov <s7an10@netscape.net> - 2013-01-01 02:17 +0200

#20823 — Problem w/ DocumentBuilder parse method

From"John L." <johnlarew@sbcglobal.net>
Date2012-12-30 11:30 -0800
SubjectProblem w/ DocumentBuilder parse method
Message-ID<0a82c44b-eec4-4ea8-92e2-af61192eee1a@googlegroups.com>
I'm pre-processing a file in an attempt to use the subject method, and receive the following error:

[Fatal Error] EXTRACT.TMP:51:23: The entity "nbsp" was referenced, but not declared.
Exception in thread "main" org.xml.sax.SAXParseException: The entity "nbsp" was referenced, but not declared.
	at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
	at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
	at Extract.CmdLine(Extract.java:144)
	at Extract.main(Extract.java:79)

The pertinent portion of the file being parsed follows:

[45]<div>
[46]<input type="hidden" name="cx" value="partner-pub-5436175752152469:m8vqbgi2n
21" />
[47]<input type="hidden" name="cof" value="FORID:10" />
[48]<input type="hidden" name="ie" value="ISO-8859-1" />
[49]<input type="text" name="q" size="55" />
[50]<input type="submit" name="sa" value="PCM Search" />
[51]                &nbsp; &nbsp; &nbsp; &nbsp; </div >

What is the required declaration syntax for &nbsp; to allow the file to be parsed?

Thanks in advance for your time and consideration.

[toc] | [next] | [standalone]


#20824

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-12-30 14:46 -0500
Message-ID<50e09a22$0$292$14726298@news.sunsite.dk>
In reply to#20823
On 12/30/2012 2:30 PM, John L. wrote:
> I'm pre-processing a file in an attempt to use the subject method, and receive the following error:
>
> [Fatal Error] EXTRACT.TMP:51:23: The entity "nbsp" was referenced, but not declared.
> Exception in thread "main" org.xml.sax.SAXParseException: The entity "nbsp" was referenced, but not declared.
> 	at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
> 	at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> 	at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
> 	at Extract.CmdLine(Extract.java:144)
> 	at Extract.main(Extract.java:79)
>
> The pertinent portion of the file being parsed follows:
>
> [45]<div>
> [46]<input type="hidden" name="cx" value="partner-pub-5436175752152469:m8vqbgi2n
> 21" />
> [47]<input type="hidden" name="cof" value="FORID:10" />
> [48]<input type="hidden" name="ie" value="ISO-8859-1" />
> [49]<input type="text" name="q" size="55" />
> [50]<input type="submit" name="sa" value="PCM Search" />
> [51]                &nbsp; &nbsp; &nbsp; &nbsp; </div >
>
> What is the required declaration syntax for &nbsp; to allow the file to be parsed?

Entities should be defined in the DTD.

The above looks like XHTML, so maybe it will work if you add a proper
DOCTYPE at the top (I think XHTML DTD defines nbsp)..

Arne

[toc] | [prev] | [next] | [standalone]


#20835

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-12-31 14:09 -0800
Message-ID<re04e8t6l4jql0blru3m0skbk3ct07kchf@4ax.com>
In reply to#20823
On Sun, 30 Dec 2012 11:30:24 -0800 (PST), "John L."
<johnlarew@sbcglobal.net> wrote, quoted or indirectly quoted someone
who said :

>The entity "nbsp" was referenced, but not declared.

XML supports just a tiny handful of entities and &nbsp; is not one of
them.  You are expected to use UTF-8 encodings or formally declare the
meaning of your entities in a DTD.

see http://mindprod.com/jgloss/xml.html#AWKWARD
-- 
Roedy Green Canadian Mind Products http://mindprod.com
Students who hire or con others to do their homework are as foolish 
as couch potatoes who hire others to go to the gym for them. 

[toc] | [prev] | [next] | [standalone]


#20839

FromStanimir Stamenkov <s7an10@netscape.net>
Date2013-01-01 02:17 +0200
Message-ID<kbt9un$gll$1@dont-email.me>
In reply to#20823
Sun, 30 Dec 2012 11:30:24 -0800 (PST), /John L./:

> I'm pre-processing a file in an attempt to use the subject method, and receive the following error:
>
> [Fatal Error] EXTRACT.TMP:51:23: The entity "nbsp" was referenced, but not declared.
> [...]
> What is the required declaration syntax for &nbsp; to allow the file to be parsed?

As Arne Vajhøj points in another reply, there should be an XHTML 
DOCTYPE declaration at the beginning of the document.  Browsers 
usually don't have problem processing XHTML containing entity 
references from the XHTML DTD, even without DOCTYPE declaration, 
because either:

1. The document is served as text/html, which is not processed as 
XML at all, or;

2. Browsers have and refer to the XHTML DTD locally and are 
automatically associating it automatically based on content-type: 
application/xhtml+xml, or xmlns="http://www.w3.org/1999/xhtml" on 
the root html element.

If the document you're trying to parse is at your control, you could:

1. Add the XHTML DOCTYPE declaration manually:

<!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
            "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

    or even:

<!DOCTYPE html
     SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

    You may still want to supply EntityResolver [1] to serve this 
DTD from a local resource;

2. Add a DOCTYPE with a local subset containing just the necessary 
entity declarations, like:

<!DOCTYPE html [
   <!ENTITY nbsp "&#160;">
]>

If you're parsing documents which don't have DOCTYPE declaration and 
are not in your control, you may supply EntityResolver2 
implementation which defines additional interface for just that purpose:

http://docs.oracle.com/javase/6/docs/api/org/xml/sax/ext/EntityResolver2.html#getExternalSubset%28java.lang.String,%20java.lang.String%29

[1] 
http://docs.oracle.com/javase/6/docs/api/javax/xml/parsers/DocumentBuilder.html#setEntityResolver%28org.xml.sax.EntityResolver%29

-- 
Stanimir

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.programmer


csiph-web