Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.help > #3593

Re: hi i need a bit help

Newsgroups comp.lang.java.help
Date 2015-09-30 01:16 -0700
References <1153739499.868634.182090@i3g2000cwc.googlegroups.com> <1153744628.289088.11060@s13g2000cwa.googlegroups.com>
Message-ID <cdab5f35-f93d-4bb7-8557-76b6cfaec895@googlegroups.com> (permalink)
Subject Re: hi i need a bit help
From ofernicus@gmail.com

Show all headers | View raw


On Monday, July 24, 2006 at 3:37:08 PM UTC+3, Andrew Thompson wrote:
> vk wrote:
> > I would like to be able to read (parse) an html file into my Java
> > program. Once I'm able to do this, I need to be able to analyse the
> > html code.
> 
> <sscce>
> import javax.xml.parsers.*;
> import org.w3c.dom.*;
> import javax.swing.*;
> import java.net.*;
> import java.util.*;
> 
> public class ParseHTML extends JApplet {
>    JTree tree;
> 
>    public void init() {
>       Vector v = new Vector();
>       URL index = getDocumentBase();
>       try {
>          Document doc = DocumentBuilderFactory.
>             newInstance().
>             newDocumentBuilder().
>             parse((index.toURI()).
>             toString());
>          tree = new JTree();
>          Element root = doc.getDocumentElement();
>          NodeList children = root.getChildNodes();
>          processElements( children, v );
>       } catch(Exception e) {
>          v.add(e.getMessage());
>       }
>       tree = new JTree(v);
>       for (int ii=0; ii< tree.getRowCount(); ii++) {
>          tree.expandRow(ii);
>       }
>       getContentPane().add( new JScrollPane(tree) );
>    }
> 
>    public void processElements(
>       NodeList list,
>       Vector v) {
> 
>       for (int ii=0; ii< list.getLength(); ii++) {
>          v.add( list.item(ii).toString() );
>          if ( list.item(ii) instanceof Element ) {
>             Element e = (Element)list.item(ii);
>             NodeList children = e.getChildNodes();
>             Vector v1 = new Vector();
>             v.add( v1 );
>             processElements( children, v1 );
>          }
>       }
>    }
> }
> </sscce>
> 
> <**html>
> <!DOCTYPE HTML>
> <HTML>
> <HEAD>
> <title>Parse HTML</title>
> </HEAD>
> <BODY>
> <h1>Example of parsing (valid) HTML</h1>
> <p>The applet in this web page loads the web page and attempts to
> parse it into a org.w3c.dom.Document object.</p>
> <p>The documents parsed must be well formed, which is
> uncommon for most web pages.</p>
> <APPLET
> CODE="ParseHTML.class"
> CODEBASE="."
> WIDTH="600" HEIGHT="600">
> </APPLET>
> </BODY>
> </HTML>
> </**html>
> 
> HTH
> 
> Andrew T.

I didn't end up using this because our (big and ugly) HTML was not well-formed enough and it was almost impossible to fix it to work with your suggestion, but this is the best - and ONLY - solution I found for this issue, and it is a rather brilliant one. Well done and thanks!

Ofer

Back to comp.lang.java.help | Previous | Next | Find similar


Thread

Re: hi i need a bit help ofernicus@gmail.com - 2015-09-30 01:16 -0700

csiph-web