Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #40688

Re: why can not parse the web in almost same xpath expression?

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!nuzba.szn.dk!pnx.dk!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From Piet van Oostrum <piet@vanoostrum.org>
Newsgroups comp.lang.python
Subject Re: why can not parse the web in almost same xpath expression?
Date Wed, 06 Mar 2013 22:13:17 -0400
Lines 44
Message-ID <m2y5e0q7nm.fsf@cochabamba.vanoostrum.org> (permalink)
References <mailman.2169.1361455778.2939.python-list@python.org>
Mime-Version 1.0
Content-Type text/plain; charset=us-ascii
X-Trace individual.net jGBRSn2NF26YYz9UhoMdGwIcAcKKCtxpWgvbKdXXByXC/qDGvfBtEyAkBAe0N8YEYB
Cancel-Lock sha1:WeqDeeRLxGffiu5QADL4QN9Sr0Q= sha1:AQi+JSKACHKBVbnmHNQvaYE4WVo=
User-Agent Gnus/5.13 (Gnus v5.13) Emacs/23.3.50 (darwin)
Xref csiph.com comp.lang.python:40688

Show key headers only | View raw


python <mailtomanage@163.com> writes:

>     import urllib
>     import lxml.html
>     down='http://v.163.com/special/visualizingdata/'
>     file=urllib.urlopen(down).read()
>     root=lxml.html.document_fromstring(file)
>     urllist=root.xpath('//div[@class="down s-fc3 f-fl"]//a') 
>     for url in urllist:
>          print url.get("href")
>   
> i get the output ,  
> http://mov.bn.netease.com/movieMP4/2012/12/A/7/S8H1TH9A7.mp4  
> http://mov.bn.netease.com/movieMP4/2012/12/D/9/S8H1ULCD9.mp4  
> http://mov.bn.netease.com/movieMP4/2012/12/4/P/S8H1UUH4P.mp4  
> http://mov.bn.netease.com/movieMP4/2012/12/B/V/S8H1V8RBV.mp4  
> http://mov.bn.netease.com/movieMP4/2012/12/6/E/S8H1VIF6E.mp4  
> http://mov.bn.netease.com/movieMP4/2012/12/B/G/S8H1VQ2BG.mp4  
>   
> when i change   
>
>     xpath('//div[@class="down s-fc3 f-fl"]//a')
>    
> into   
>
>     xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a')  
>   
> that is to say ,  
>
>     urllist=root.xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a')  
>
> why i can't get nothing? 

There is only one <div class="col f-cb"> in the document and that div
contains only a single <div class="down s-fc3 f-fl"> but the latter does
not contain any <a>. The URLs that you get in the first code are not
contained in a <div class="col f-cb">. They are contained in a <div
class="m-tdli">, however. 
So xpath('//div[@class="m-tdli"]//div[@class="down s-fc3 f-fl"]//a') works.

-- 
Piet van Oostrum <piet@vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

why can not parse the web in almost same xpath expression? python <mailtomanage@163.com> - 2013-02-21 22:09 +0800
  Re: why can not parse the web in almost same xpath expression? Piet van Oostrum <piet@vanoostrum.org> - 2013-03-06 22:13 -0400

csiph-web