Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python,': 0.02; 'skip:[ 20': 0.03; "'',": 0.07; '21,': 0.07; 'subject:help': 0.07; 'python': 0.09; 'skip:t 60': 0.09; 'tutorials,': 0.09; 'cc:addr :python-list': 0.10; 'def': 0.10; 'dec': 0.15; 'appreciated!': 0.16; 'make,': 0.16; 'wrote:': 0.17; 'script.': 0.17; 'tests': 0.18; 'sender:addr:gmail.com': 0.18; 'trying': 0.21; 'import': 0.21; 'os,': 0.22; 'cc:2**0': 0.23; 'errors': 0.23; "i've": 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header :In-Reply-To:1': 0.25; 'message-id:@mail.gmail.com': 0.27; 'subject:/': 0.28; 'all.': 0.28; 'run': 0.28; 'url:mailman': 0.29; 'class': 0.29; "i'm": 0.29; 'fri,': 0.30; 'received:209.85.210.174': 0.30; 'version,': 0.30; 'basic': 0.30; 'error': 0.30; 'url:python': 0.32; 'help,': 0.32; 'could': 0.32; 'url:listinfo': 0.32; 'skip:h 40': 0.33; 'weeks': 0.33; 'received:google.com': 0.34; 'pm,': 0.35; 'received:209.85': 0.35; 'but': 0.36; 'url:org': 0.36; 'skip:t 40': 0.37; 'item': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'unit': 0.38; 'some': 0.38; 'header:Received:5': 0.40; 'help': 0.40; 'url:mail': 0.40; 'share': 0.61; 'ama,': 0.84; 'luck': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Fw29fvr2m+MOPlktOtBVRBPnWq2McXU01agjigfCOIM=; b=BSeVz69KdqviueLz6i2brMMEubrrAa4YIkNjIJxfHpQ033y6op01CMgnM+v0e05z/u SyAJew5+xN1a+3/LQHUHY1peqRtJIx/4bAYmxL+zS0GkS7nyXYwNOhVTgSlWWIcRrbEI ML/eNlnhZzvVdYVwkDb9mIiGh27JFtLbOJ9g6phvNXq8BVTrT+Fg1itc+xeobg9uktFL niJYGpAXhKZ2y03e+zBdz1FmI0BF8l9Xs/30y+Q6w8/QALQ26CePoA/6Iy7bsof3MyVT b/ySDGAOZiLkwJhN7zKwmyHS1a7aTv2yGxArSNkOjoHWT55lxUgm89J+Gk2P39z3K1oE fDuA== MIME-Version: 1.0 Sender: grettke@gmail.com In-Reply-To: References: Date: Fri, 21 Dec 2012 15:34:11 -0600 X-Google-Sender-Auth: 2MLsu90ZrRDD3VbiYJ4hbJBIegw Subject: Re: Scrapy/XPath help From: Grant Rettke To: Always Learning Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 43 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1356125659 news.xs4all.nl 6897 [2001:888:2000:d::a6]:40493 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:35321 You might have better luck if you share the python make, version, os, error message, and some unit tests demonstrating what you expect. On Fri, Dec 21, 2012 at 3:21 PM, Always Learning wrote: > Hello all. I'm new to Python, but have been playing around with it for a = few weeks now, following tutorials, etc. I've spun off on my own and am try= ing to do some basic web scraping. I've used Firebug/View XPath in Firefox = for some help with the XPaths, however, I still am receiving errors when I = try to run this script. If you could help, it would be greatly appreciated! > > from scrapy.spider import BaseSpider > from scrapy.selector import HtmlXPathSelector > from cbb_info.items import CbbInfoItem, Field > > class GameInfoSpider(BaseSpider): > name =3D "game_info" > allowed_domains =3D ["www.sbrforum.com"] > start_urls =3D [ > 'http://www.sbrforum.com/betting-odds/ncaa-basketball/', > ] > > def parse(self, response): > hxs =3D HtmlXPathSelector(response) > toplevels =3D hxs.select("//div[@class=3D'eventLine-value']") > items =3D [] > for toplevels in toplevels: > item =3D CbbInfoItem() > item ["teams"] =3D toplevels.select("/span[@class=3D'team-nam= e'/text()").extract() > item ["lines"] =3D toplevels.select("/div[@rel=3D'19']").extr= act() > item.append(item) > return items > -- > http://mail.python.org/mailman/listinfo/python-list --=20 Grant Rettke | ACM, AMA, COG, IEEE grettke@acm.org | http://www.wisdomandwonder.com/ Wisdom begins in wonder. ((=CE=BB (x) (x x)) (=CE=BB (x) (x x)))