Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!cs.uu.nl!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.028 X-Spam-Evidence: '*H*': 0.94; '*S*': 0.00; 'say,': 0.05; 'run,': 0.07; 'skip:/ 10': 0.07; 'python': 0.09; 'modules.': 0.09; 'cc:addr :python-list': 0.10; 'skip:f 30': 0.15; 'code?': 0.16; 'filename:fname piece:signature': 0.16; 'obviously,': 0.16; 'subject:expression': 0.16; 'urllib': 0.16; 'wrote:': 0.17; 'skip:u 30': 0.17; 'import': 0.21; 'cheers,': 0.23; 'cc:2**0': 0.23; 'matching': 0.23; 'cc:addr:python.org': 0.25; 'header:In- Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'am,': 0.27; 'right.': 0.27; 'url:2012': 0.30; 'code': 0.31; 'shorter': 0.33; 'code:': 0.33; 'received:192.168.2': 0.34; "won't": 0.35; 'anything': 0.36; 'subject:: ': 0.38; 'there,': 0.38; 'received:192': 0.39; 'received:192.168': 0.40; 'url:12': 0.40; 'your': 0.60; 'skip:u 10': 0.60; 'strange': 0.62; 'url:p': 0.63; 'here': 0.65; 'url:6': 0.65; 'url:v': 0.68; 'receive': 0.71; 'url:163': 0.71; 'url:a': 0.72; 'url:4': 0.72; 'url:e': 0.78; 'to:name:python': 0.84; 'to:addr:163.com': 0.91 Date: Fri, 22 Feb 2013 14:16:54 +0100 From: Philipp Hagemeister User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130116 Icedove/10.0.12 MIME-Version: 1.0 To: python Subject: Re: Issues a longer xpath expression References: <3ef02abb.1e5b.13cff801585.Coremail.mailtomanage@163.com> In-Reply-To: <3ef02abb.1e5b.13cff801585.Coremail.mailtomanage@163.com> X-Enigmail-Version: 1.4 OpenPGP: id=FAFB085C Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enig054D175AF6B564C380CC707B" Cc: "python-list@python.org" X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 84 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1361539018 news.xs4all.nl 6986 [2001:888:2000:d::a6]:33280 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:39558 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig054D175AF6B564C380CC707B Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi anonymous, your code is working perfectly right. It's just that the only time that you find anything matching //div[@class=3D"col f-cb"] is this one:
=E5=90=8D=E7=A7=B0
=E8=A7=86=E9=A2=91=E4=B8=8B=E8=BD=BD
=E8=AF=BE=E7=A8=8B=E7=AE=80=E4=BB=8B
And obviously, there's no in there, so the xpath won't match. Cheers, Philipp On 02/22/2013 02:24 AM, python wrote: > I am having issues with the urllib and lxml.html modules. >=20 > Here is my original code: >=20 > import urllib =20 > import lxml.html =20 > down=3D'http://v.163.com/special/visualizingdata/' =20 > file=3Durllib.urlopen(down).read() =20 > root=3Dlxml.html.document_fromstring(file) > xpath_str=3D"//div[@class=3D'down s-fc3 f-fl']/a" =20 > urllist=3Droot.xpath(xpath_str)for url in urllist:print url.get("href")= >=20 > When run, it returns this output: >=20 > http://mov.bn.netease.com/movieMP4/2012/12/A/7/S8H1TH9A7.mp4 =20 > http://mov.bn.netease.com/movieMP4/2012/12/D/9/S8H1ULCD9.mp4 =20 > http://mov.bn.netease.com/movieMP4/2012/12/4/P/S8H1UUH4P.mp4 =20 > http://mov.bn.netease.com/movieMP4/2012/12/B/V/S8H1V8RBV.mp4 =20 > http://mov.bn.netease.com/movieMP4/2012/12/6/E/S8H1VIF6E.mp4 =20 > http://mov.bn.netease.com/movieMP4/2012/12/B/G/S8H1VQ2BG.mp4 =20 >=20 > But, when I change the line >=20 > xpath_str=3D'//div[@class=3D"down s-fc3 f-fl"]//a' >=20 > into >=20 > xpath_str=3D'//div[@class=3D"col f-cb"]//div[@class=3D"down s-fc3 f-fl"= ]//a' >=20 > that is to say, >=20 > urllist=3Droot.xpath('//div[@class=3D"col f-cb"]//div[@class=3D"down s-= fc3 f-fl"]//a') >=20 > I do not receive any output. What is the flaw in this code? > it is so strange that the shorter one can work,the longer one can not,t= hey have the same xpath structure! >=20 >=20 >=20 --------------enig054D175AF6B564C380CC707B Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEAREKAAYFAlEnb8gACgkQ9eq1gvr7CFw2hACeJX3KBRBYdsLvEsy7sOhOiR6a 03cAn1/j/FWsilF8Lrh/7VV3NuqUVpmt =OUfq -----END PGP SIGNATURE----- --------------enig054D175AF6B564C380CC707B--