Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #92044

Get html DOM tree by only basic builtin moudles

X-Received by 10.140.147.11 with SMTP id 11mr48424422qht.10.1433419103702; Thu, 04 Jun 2015 04:58:23 -0700 (PDT)
X-Received by 10.50.85.39 with SMTP id e7mr513011igz.1.1433419103671; Thu, 04 Jun 2015 04:58:23 -0700 (PDT)
Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!peer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!q107no49741qgd.0!news-out.google.com!kd3ni576igb.0!nntp.google.com!h15no235394igd.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups comp.lang.python
Date Thu, 4 Jun 2015 04:58:23 -0700 (PDT)
Complaints-To groups-abuse@google.com
Injection-Info glegroupsg2000goo.googlegroups.com; posting-host=98.126.83.50; posting-account=5LTQ0QoAAAAkuQmAXI96xB4b9CDrokPc
NNTP-Posting-Host 98.126.83.50
User-Agent G2/1.0
MIME-Version 1.0
Message-ID <099a955d-134a-46d6-bdba-61ec2b1eb44f@googlegroups.com> (permalink)
Subject Get html DOM tree by only basic builtin moudles
From Wesley <nispray@gmail.com>
Injection-Date Thu, 04 Jun 2015 11:58:23 +0000
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding quoted-printable
X-Received-Bytes 1728
X-Received-Body-CRC 789809784
Xref csiph.com comp.lang.python:92044

Show key headers only | View raw


Hi guys,
  I know there are many modules(builtin or not, e.g. beautifulsoup,xml,lxml,htmlparser .etc) to parse html files and output the DOM tree. However, if there is any better way to get the DOM tree without using those html/xml related modules? I mean, just by some general standard modules, e.g. file operations, re module .etc

Input file is something like this:
<html> 
  <head> 
    <title>DOM Tree test</title> 
  </head> 
  <body> 
    <h1>Header 1</h1> 
    <p>Hello world!</p> 
  </body>
</html>

Need the dom tree or just something like:
html -- head -- title(DOM Tree test)
html -- body -- h1(Header 1)
html -- body -- p(Hello world!)

Thanks.
Wesley

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Get html DOM tree by only basic builtin moudles Wesley <nispray@gmail.com> - 2015-06-04 04:58 -0700
  Re: Get html DOM tree by only basic builtin moudles Laura Creighton <lac@openend.se> - 2015-06-04 15:01 +0200
    Re: Get html DOM tree by only basic builtin moudles Wesley <nispray@gmail.com> - 2015-06-05 11:10 -0700
      Re: Get html DOM tree by only basic builtin moudles Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-05 13:48 -0600
        Re: Get html DOM tree by only basic builtin moudles Wesley <nispray@gmail.com> - 2015-06-05 16:24 -0700

csiph-web