Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #39511

Issues a longer xpath expression

Return-Path <mailtomanage@163.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.020
X-Spam-Evidence '*H*': 0.96; '*S*': 0.00; 'say,': 0.05; 'run,': 0.07; 'modules.': 0.09; 'padding:': 0.09; 'skip:f 30': 0.15; "'courier": 0.16; '10px;': 0.16; 'baseline;': 0.16; 'border:': 0.16; 'code?': 0.16; 'max-height:': 0.16; 'menlo,': 0.16; 'monaco,': 0.16; 'monospace,': 0.16; "new',": 0.16; 'overflow:': 0.16; 'skip:1 40': 0.16; 'skip:1 60': 0.16; 'subject:expression': 0.16; 'urllib': 0.16; 'skip:u 30': 0.17; 'to:name:python-list@python.org': 0.20; 'import': 0.21; 'message-id:@163.com': 0.22; 'url:2012': 0.30; 'auto;': 0.30; 'print': 0.32; 'skip:s 30': 0.33; 'received:220.181.13': 0.33; 'shorter': 0.33; 'to:addr:python- list': 0.33; 'code:': 0.33; 'skip:u 20': 0.36; 'to:addr:python.org': 0.39; 'url:12': 0.40; 'skip:u 10': 0.60; 'strange': 0.62; 'url:p': 0.63; 'here': 0.65; 'url:6': 0.65; 'url:v': 0.68; 'received:122': 0.70; 'receive': 0.71; 'sans': 0.71; 'url:163': 0.71; 'url:a': 0.72; 'url:4': 0.72; 'url:e': 0.78; "'lucida": 0.84; 'rgb(238,': 0.84; 'vera': 0.84
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Received:Date:From:To:Subject:Content-Type: MIME-Version:Message-ID; bh=keP7amV0yaLA8EIYkQxuZWrKoinj17iKJPk+ H8E0K2Q=; b=AqsqPMnoTb+WMwJiFnqyy0Rak/DFIKxakLvFqFptG4W38HGP32Zc 4+JB3vh4z8+mbNO3pNrEU7RyUhEzqUwnFEDbWIuhxoJ0R4uIor67j8P4SmdqTpdx 79lqWwxyWISF5tCD+htz6nfKk8Zc29Vteitp8A//x8AiGsldKMuqH+Q=
X-Originating-IP [122.198.132.248]
Date Fri, 22 Feb 2013 09:24:09 +0800 (CST)
From python <mailtomanage@163.com>
To "python-list@python.org" <python-list@python.org>
Subject Issues a longer xpath expression
X-Priority 3
X-Mailer Coremail Webmail Server Version SP_ntes V3.5 build 20130201(21528.5249.5248) Copyright (c) 2002-2013 www.mailtech.cn 163com
X-CM-CTRLDATA 8HC8wmZvb3Rlcl9odG09MTcxMzA6ODE=
Content-Type multipart/alternative; boundary="----=_Part_29430_252607125.1361496249733"
MIME-Version 1.0
X-CM-TRANSID RcGowEDJx0K6yCZRuJJMAA--.4793W
X-CM-SenderInfo hpdlz3xrpd0tljh6il2tof0z/1tbisAzLPFEAAexv9AABst
X-Coremail-Antispam 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU==
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.2231.1361497179.2939.python-list@python.org> (permalink)
Lines 59
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1361497179 news.xs4all.nl 6854 [2001:888:2000:d::a6]:60555
X-Complaints-To abuse@xs4all.nl
Path csiph.com!usenet.pasdenom.info!news.stben.net!border3.nntp.ams.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!xlned.com!feeder1.xlned.com!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Xref csiph.com comp.lang.python:39511

Show key headers only | View raw


[Multipart message — attachments visible in raw view] - view raw

I am having issues with the urllib and lxml.html modules.

Here is my original code:

import urllib  
import lxml.html  
down='http://v.163.com/special/visualizingdata/'  
file=urllib.urlopen(down).read()  
root=lxml.html.document_fromstring(file)
xpath_str="//div[@class='down s-fc3 f-fl']/a"  
urllist=root.xpath(xpath_str)for url in urllist:print url.get("href")

When run, it returns this output:

http://mov.bn.netease.com/movieMP4/2012/12/A/7/S8H1TH9A7.mp4  
http://mov.bn.netease.com/movieMP4/2012/12/D/9/S8H1ULCD9.mp4  
http://mov.bn.netease.com/movieMP4/2012/12/4/P/S8H1UUH4P.mp4  
http://mov.bn.netease.com/movieMP4/2012/12/B/V/S8H1V8RBV.mp4  
http://mov.bn.netease.com/movieMP4/2012/12/6/E/S8H1VIF6E.mp4  
http://mov.bn.netease.com/movieMP4/2012/12/B/G/S8H1VQ2BG.mp4  

But, when I change the line

xpath_str='//div[@class="down s-fc3 f-fl"]//a'

into

xpath_str='//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a'

that is to say,

urllist=root.xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a')

I do not receive any output. What is the flaw in this code?
it is so strange that the shorter one can work,the longer one can not,they have the same xpath structure!

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Issues a longer xpath expression python <mailtomanage@163.com> - 2013-02-22 09:24 +0800

csiph-web