Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #39428
| Date | 2013-02-21 22:09 +0800 |
|---|---|
| From | python <mailtomanage@163.com> |
| Subject | why can not parse the web in almost same xpath expression? |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2169.1361455778.2939.python-list@python.org> (permalink) |
[Multipart message — attachments visible in raw view] - view raw
import urllib
import lxml.html
down='http://v.163.com/special/visualizingdata/'
file=urllib.urlopen(down).read()
root=lxml.html.document_fromstring(file)
urllist=root.xpath('//div[@class="down s-fc3 f-fl"]//a')
for url in urllist:
print url.get("href")
i get the output ,
http://mov.bn.netease.com/movieMP4/2012/12/A/7/S8H1TH9A7.mp4
http://mov.bn.netease.com/movieMP4/2012/12/D/9/S8H1ULCD9.mp4
http://mov.bn.netease.com/movieMP4/2012/12/4/P/S8H1UUH4P.mp4
http://mov.bn.netease.com/movieMP4/2012/12/B/V/S8H1V8RBV.mp4
http://mov.bn.netease.com/movieMP4/2012/12/6/E/S8H1VIF6E.mp4
http://mov.bn.netease.com/movieMP4/2012/12/B/G/S8H1VQ2BG.mp4
when i change
xpath('//div[@class="down s-fc3 f-fl"]//a')
into
xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a')
that is to say ,
urllist=root.xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a')
why i can't get nothing?
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
why can not parse the web in almost same xpath expression? python <mailtomanage@163.com> - 2013-02-21 22:09 +0800 Re: why can not parse the web in almost same xpath expression? Piet van Oostrum <piet@vanoostrum.org> - 2013-03-06 22:13 -0400
csiph-web