Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #39511
| Return-Path | <mailtomanage@163.com> |
|---|---|
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.020 |
| X-Spam-Evidence | '*H*': 0.96; '*S*': 0.00; 'say,': 0.05; 'run,': 0.07; 'modules.': 0.09; 'padding:': 0.09; 'skip:f 30': 0.15; "'courier": 0.16; '10px;': 0.16; 'baseline;': 0.16; 'border:': 0.16; 'code?': 0.16; 'max-height:': 0.16; 'menlo,': 0.16; 'monaco,': 0.16; 'monospace,': 0.16; "new',": 0.16; 'overflow:': 0.16; 'skip:1 40': 0.16; 'skip:1 60': 0.16; 'subject:expression': 0.16; 'urllib': 0.16; 'skip:u 30': 0.17; 'to:name:python-list@python.org': 0.20; 'import': 0.21; 'message-id:@163.com': 0.22; 'url:2012': 0.30; 'auto;': 0.30; 'print': 0.32; 'skip:s 30': 0.33; 'received:220.181.13': 0.33; 'shorter': 0.33; 'to:addr:python- list': 0.33; 'code:': 0.33; 'skip:u 20': 0.36; 'to:addr:python.org': 0.39; 'url:12': 0.40; 'skip:u 10': 0.60; 'strange': 0.62; 'url:p': 0.63; 'here': 0.65; 'url:6': 0.65; 'url:v': 0.68; 'received:122': 0.70; 'receive': 0.71; 'sans': 0.71; 'url:163': 0.71; 'url:a': 0.72; 'url:4': 0.72; 'url:e': 0.78; "'lucida": 0.84; 'rgb(238,': 0.84; 'vera': 0.84 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Received:Date:From:To:Subject:Content-Type: MIME-Version:Message-ID; bh=keP7amV0yaLA8EIYkQxuZWrKoinj17iKJPk+ H8E0K2Q=; b=AqsqPMnoTb+WMwJiFnqyy0Rak/DFIKxakLvFqFptG4W38HGP32Zc 4+JB3vh4z8+mbNO3pNrEU7RyUhEzqUwnFEDbWIuhxoJ0R4uIor67j8P4SmdqTpdx 79lqWwxyWISF5tCD+htz6nfKk8Zc29Vteitp8A//x8AiGsldKMuqH+Q= |
| X-Originating-IP | [122.198.132.248] |
| Date | Fri, 22 Feb 2013 09:24:09 +0800 (CST) |
| From | python <mailtomanage@163.com> |
| To | "python-list@python.org" <python-list@python.org> |
| Subject | Issues a longer xpath expression |
| X-Priority | 3 |
| X-Mailer | Coremail Webmail Server Version SP_ntes V3.5 build 20130201(21528.5249.5248) Copyright (c) 2002-2013 www.mailtech.cn 163com |
| X-CM-CTRLDATA | 8HC8wmZvb3Rlcl9odG09MTcxMzA6ODE= |
| Content-Type | multipart/alternative; boundary="----=_Part_29430_252607125.1361496249733" |
| MIME-Version | 1.0 |
| X-CM-TRANSID | RcGowEDJx0K6yCZRuJJMAA--.4793W |
| X-CM-SenderInfo | hpdlz3xrpd0tljh6il2tof0z/1tbisAzLPFEAAexv9AABst |
| X-Coremail-Antispam | 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU== |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2231.1361497179.2939.python-list@python.org> (permalink) |
| Lines | 59 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1361497179 news.xs4all.nl 6854 [2001:888:2000:d::a6]:60555 |
| X-Complaints-To | abuse@xs4all.nl |
| Path | csiph.com!usenet.pasdenom.info!news.stben.net!border3.nntp.ams.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!xlned.com!feeder1.xlned.com!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
| Xref | csiph.com comp.lang.python:39511 |
Show key headers only | View raw
[Multipart message — attachments visible in raw view] - view raw
I am having issues with the urllib and lxml.html modules.
Here is my original code:
import urllib
import lxml.html
down='http://v.163.com/special/visualizingdata/'
file=urllib.urlopen(down).read()
root=lxml.html.document_fromstring(file)
xpath_str="//div[@class='down s-fc3 f-fl']/a"
urllist=root.xpath(xpath_str)for url in urllist:print url.get("href")
When run, it returns this output:
http://mov.bn.netease.com/movieMP4/2012/12/A/7/S8H1TH9A7.mp4
http://mov.bn.netease.com/movieMP4/2012/12/D/9/S8H1ULCD9.mp4
http://mov.bn.netease.com/movieMP4/2012/12/4/P/S8H1UUH4P.mp4
http://mov.bn.netease.com/movieMP4/2012/12/B/V/S8H1V8RBV.mp4
http://mov.bn.netease.com/movieMP4/2012/12/6/E/S8H1VIF6E.mp4
http://mov.bn.netease.com/movieMP4/2012/12/B/G/S8H1VQ2BG.mp4
But, when I change the line
xpath_str='//div[@class="down s-fc3 f-fl"]//a'
into
xpath_str='//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a'
that is to say,
urllist=root.xpath('//div[@class="col f-cb"]//div[@class="down s-fc3 f-fl"]//a')
I do not receive any output. What is the flaw in this code?
it is so strange that the shorter one can work,the longer one can not,they have the same xpath structure!
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Issues a longer xpath expression python <mailtomanage@163.com> - 2013-02-22 09:24 +0800
csiph-web