Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.help > #3593
| Newsgroups | comp.lang.java.help |
|---|---|
| Date | 2015-09-30 01:16 -0700 |
| References | <1153739499.868634.182090@i3g2000cwc.googlegroups.com> <1153744628.289088.11060@s13g2000cwa.googlegroups.com> |
| Message-ID | <cdab5f35-f93d-4bb7-8557-76b6cfaec895@googlegroups.com> (permalink) |
| Subject | Re: hi i need a bit help |
| From | ofernicus@gmail.com |
On Monday, July 24, 2006 at 3:37:08 PM UTC+3, Andrew Thompson wrote:
> vk wrote:
> > I would like to be able to read (parse) an html file into my Java
> > program. Once I'm able to do this, I need to be able to analyse the
> > html code.
>
> <sscce>
> import javax.xml.parsers.*;
> import org.w3c.dom.*;
> import javax.swing.*;
> import java.net.*;
> import java.util.*;
>
> public class ParseHTML extends JApplet {
> JTree tree;
>
> public void init() {
> Vector v = new Vector();
> URL index = getDocumentBase();
> try {
> Document doc = DocumentBuilderFactory.
> newInstance().
> newDocumentBuilder().
> parse((index.toURI()).
> toString());
> tree = new JTree();
> Element root = doc.getDocumentElement();
> NodeList children = root.getChildNodes();
> processElements( children, v );
> } catch(Exception e) {
> v.add(e.getMessage());
> }
> tree = new JTree(v);
> for (int ii=0; ii< tree.getRowCount(); ii++) {
> tree.expandRow(ii);
> }
> getContentPane().add( new JScrollPane(tree) );
> }
>
> public void processElements(
> NodeList list,
> Vector v) {
>
> for (int ii=0; ii< list.getLength(); ii++) {
> v.add( list.item(ii).toString() );
> if ( list.item(ii) instanceof Element ) {
> Element e = (Element)list.item(ii);
> NodeList children = e.getChildNodes();
> Vector v1 = new Vector();
> v.add( v1 );
> processElements( children, v1 );
> }
> }
> }
> }
> </sscce>
>
> <**html>
> <!DOCTYPE HTML>
> <HTML>
> <HEAD>
> <title>Parse HTML</title>
> </HEAD>
> <BODY>
> <h1>Example of parsing (valid) HTML</h1>
> <p>The applet in this web page loads the web page and attempts to
> parse it into a org.w3c.dom.Document object.</p>
> <p>The documents parsed must be well formed, which is
> uncommon for most web pages.</p>
> <APPLET
> CODE="ParseHTML.class"
> CODEBASE="."
> WIDTH="600" HEIGHT="600">
> </APPLET>
> </BODY>
> </HTML>
> </**html>
>
> HTH
>
> Andrew T.
I didn't end up using this because our (big and ugly) HTML was not well-formed enough and it was almost impossible to fix it to work with your suggestion, but this is the best - and ONLY - solution I found for this issue, and it is a rather brilliant one. Well done and thanks!
Ofer
Back to comp.lang.java.help | Previous | Next | Find similar
Re: hi i need a bit help ofernicus@gmail.com - 2015-09-30 01:16 -0700
csiph-web