Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #39511

Issues a longer xpath expression

Date 2013-02-22 09:24 +0800
From python <mailtomanage@163.com>
Subject Issues a longer xpath expression
Newsgroups comp.lang.python
Message-ID <mailman.2231.1361497179.2939.python-list@python.org> (permalink)

Show all headers | View raw


[Multipart message — attachments visible in raw view] - view raw

I am having issues with the urllib and lxml.html modules.

Here is my original code:

import urllib  
import lxml.html  
down='http://v.163.com/special/visualizingdata/'  
file=urllib.urlopen(down).read()  
root=lxml.html.document_fromstring(file)
xpath_str="//div[@class='down s-fc3 f-fl']/a"  
urllist=root.xpath(xpath_str)for url in urllist:print url.get("href")

When run, it returns this output:

http://mov.bn.netease.com/movieMP4/2012/12/A/7/S8H1TH9A7.mp4  
http://mov.bn.netease.com/movieMP4/2012/12/D/9/S8H1ULCD9.mp4  
http://mov.bn.netease.com/movieMP4/2012/12/4/P/S8H1UUH4P.mp4  
http://mov.bn.netease.com/movieMP4/2012/12/B/V/S8H1V8RBV.mp4  
http://mov.bn.netease.com/movieMP4/2012/12/6/E/S8H1VIF6E.mp4  
http://mov.bn.netease.com/movieMP4/2012/12/B/G/S8H1VQ2BG.mp4  

But, when I change the line

xpath_str='//div[@class="down s-fc3 f-fl"]//a'

into

xpath_str='//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a'

that is to say,

urllist=root.xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a')

I do not receive any output. What is the flaw in this code?
it is so strange that the shorter one can work,the longer one can not,they have the same xpath structure!

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Issues a longer xpath expression python <mailtomanage@163.com> - 2013-02-22 09:24 +0800

csiph-web