Groups > comp.lang.python > #92044

Get html DOM tree by only basic builtin moudles

Newsgroups	comp.lang.python
Date	2015-06-04 04:58 -0700
Message-ID	<099a955d-134a-46d6-bdba-61ec2b1eb44f@googlegroups.com> (permalink)
Subject	Get html DOM tree by only basic builtin moudles
From	Wesley <nispray@gmail.com>

Show all headers | View raw

Hi guys,
  I know there are many modules(builtin or not, e.g. beautifulsoup,xml,lxml,htmlparser .etc) to parse html files and output the DOM tree. However, if there is any better way to get the DOM tree without using those html/xml related modules? I mean, just by some general standard modules, e.g. file operations, re module .etc

Input file is something like this:
<html> 
  <head> 
    <title>DOM Tree test</title> 
  </head> 
  <body> 
    <h1>Header 1</h1> 
    <p>Hello world!</p> 
  </body>
</html>

Need the dom tree or just something like:
html -- head -- title(DOM Tree test)
html -- body -- h1(Header 1)
html -- body -- p(Hello world!)

Thanks.
Wesley

Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread

Thread

Get html DOM tree by only basic builtin moudles Wesley <nispray@gmail.com> - 2015-06-04 04:58 -0700
  Re: Get html DOM tree by only basic builtin moudles Laura Creighton <lac@openend.se> - 2015-06-04 15:01 +0200
    Re: Get html DOM tree by only basic builtin moudles Wesley <nispray@gmail.com> - 2015-06-05 11:10 -0700
      Re: Get html DOM tree by only basic builtin moudles Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-05 13:48 -0600
        Re: Get html DOM tree by only basic builtin moudles Wesley <nispray@gmail.com> - 2015-06-05 16:24 -0700

csiph-web