Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #92044
| X-Received | by 10.140.147.11 with SMTP id 11mr48424422qht.10.1433419103702; Thu, 04 Jun 2015 04:58:23 -0700 (PDT) |
|---|---|
| X-Received | by 10.50.85.39 with SMTP id e7mr513011igz.1.1433419103671; Thu, 04 Jun 2015 04:58:23 -0700 (PDT) |
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!peer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!q107no49741qgd.0!news-out.google.com!kd3ni576igb.0!nntp.google.com!h15no235394igd.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail |
| Newsgroups | comp.lang.python |
| Date | Thu, 4 Jun 2015 04:58:23 -0700 (PDT) |
| Complaints-To | groups-abuse@google.com |
| Injection-Info | glegroupsg2000goo.googlegroups.com; posting-host=98.126.83.50; posting-account=5LTQ0QoAAAAkuQmAXI96xB4b9CDrokPc |
| NNTP-Posting-Host | 98.126.83.50 |
| User-Agent | G2/1.0 |
| MIME-Version | 1.0 |
| Message-ID | <099a955d-134a-46d6-bdba-61ec2b1eb44f@googlegroups.com> (permalink) |
| Subject | Get html DOM tree by only basic builtin moudles |
| From | Wesley <nispray@gmail.com> |
| Injection-Date | Thu, 04 Jun 2015 11:58:23 +0000 |
| Content-Type | text/plain; charset=ISO-8859-1 |
| Content-Transfer-Encoding | quoted-printable |
| X-Received-Bytes | 1728 |
| X-Received-Body-CRC | 789809784 |
| Xref | csiph.com comp.lang.python:92044 |
Show key headers only | View raw
Hi guys,
I know there are many modules(builtin or not, e.g. beautifulsoup,xml,lxml,htmlparser .etc) to parse html files and output the DOM tree. However, if there is any better way to get the DOM tree without using those html/xml related modules? I mean, just by some general standard modules, e.g. file operations, re module .etc
Input file is something like this:
<html>
<head>
<title>DOM Tree test</title>
</head>
<body>
<h1>Header 1</h1>
<p>Hello world!</p>
</body>
</html>
Need the dom tree or just something like:
html -- head -- title(DOM Tree test)
html -- body -- h1(Header 1)
html -- body -- p(Hello world!)
Thanks.
Wesley
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
Get html DOM tree by only basic builtin moudles Wesley <nispray@gmail.com> - 2015-06-04 04:58 -0700
Re: Get html DOM tree by only basic builtin moudles Laura Creighton <lac@openend.se> - 2015-06-04 15:01 +0200
Re: Get html DOM tree by only basic builtin moudles Wesley <nispray@gmail.com> - 2015-06-05 11:10 -0700
Re: Get html DOM tree by only basic builtin moudles Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-05 13:48 -0600
Re: Get html DOM tree by only basic builtin moudles Wesley <nispray@gmail.com> - 2015-06-05 16:24 -0700
csiph-web