Groups > comp.lang.python > #35320 > unrolled thread

Scrapy/XPath help

Started by	Always Learning <cbrowning@ou.edu>
First post	2012-12-21 13:21 -0800
Last post	2012-12-21 13:58 -0800
Articles	7 — 4 participants

Back to article view | Back to comp.lang.python

  Scrapy/XPath help Always Learning <cbrowning@ou.edu> - 2012-12-21 13:21 -0800
    Re: Scrapy/XPath help Grant Rettke <grettke@acm.org> - 2012-12-21 15:34 -0600
      Re: Scrapy/XPath help Always Learning <cbrowning@ou.edu> - 2012-12-21 13:58 -0800
        Re: Scrapy/XPath help Dave Angel <d@davea.name> - 2012-12-21 22:18 -0500
        Re: Scrapy/XPath help donarb <donarb@nwlink.com> - 2012-12-25 11:15 -0800
        Re: Scrapy/XPath help donarb <donarb@nwlink.com> - 2012-12-25 11:15 -0800
      Re: Scrapy/XPath help Always Learning <cbrowning@ou.edu> - 2012-12-21 13:58 -0800

#35320 — Scrapy/XPath help

From	Always Learning <cbrowning@ou.edu>
Date	2012-12-21 13:21 -0800
Subject	Scrapy/XPath help
Message-ID	<e180db33-272f-4a9d-bc1e-231f3c3580bf@googlegroups.com>

Hello all. I'm new to Python, but have been playing around with it for a few weeks now, following tutorials, etc. I've spun off on my own and am trying to do some basic web scraping. I've used Firebug/View XPath in Firefox for some help with the XPaths, however, I still am receiving errors when I try to run this script. If you could help, it would be greatly appreciated!

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from cbb_info.items import CbbInfoItem, Field

class GameInfoSpider(BaseSpider):
    name = "game_info"
    allowed_domains = ["www.sbrforum.com"]
    start_urls = [
        'http://www.sbrforum.com/betting-odds/ncaa-basketball/',
        ]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        toplevels = hxs.select("//div[@class='eventLine-value']")
        items = []
        for toplevels in toplevels:
            item = CbbInfoItem()
            item ["teams"] = toplevels.select("/span[@class='team-name'/text()").extract()
            item ["lines"] = toplevels.select("/div[@rel='19']").extract()
            item.append(item)
        return items

[toc] | [next] | [standalone]

#35321

From	Grant Rettke <grettke@acm.org>
Date	2012-12-21 15:34 -0600
Message-ID	<mailman.1169.1356125659.29569.python-list@python.org>
In reply to	#35320

You might have better luck if you share the python make, version, os,
error message, and some unit tests demonstrating what you expect.

On Fri, Dec 21, 2012 at 3:21 PM, Always Learning <cbrowning@ou.edu> wrote:
> Hello all. I'm new to Python, but have been playing around with it for a few weeks now, following tutorials, etc. I've spun off on my own and am trying to do some basic web scraping. I've used Firebug/View XPath in Firefox for some help with the XPaths, however, I still am receiving errors when I try to run this script. If you could help, it would be greatly appreciated!
>
> from scrapy.spider import BaseSpider
> from scrapy.selector import HtmlXPathSelector
> from cbb_info.items import CbbInfoItem, Field
>
> class GameInfoSpider(BaseSpider):
>     name = "game_info"
>     allowed_domains = ["www.sbrforum.com"]
>     start_urls = [
>         'http://www.sbrforum.com/betting-odds/ncaa-basketball/',
>         ]
>
>     def parse(self, response):
>         hxs = HtmlXPathSelector(response)
>         toplevels = hxs.select("//div[@class='eventLine-value']")
>         items = []
>         for toplevels in toplevels:
>             item = CbbInfoItem()
>             item ["teams"] = toplevels.select("/span[@class='team-name'/text()").extract()
>             item ["lines"] = toplevels.select("/div[@rel='19']").extract()
>             item.append(item)
>         return items
> --
> http://mail.python.org/mailman/listinfo/python-list



-- 
Grant Rettke | ACM, AMA, COG, IEEE
grettke@acm.org | http://www.wisdomandwonder.com/
Wisdom begins in wonder.
((λ (x) (x x)) (λ (x) (x x)))

[toc] | [prev] | [next] | [standalone]

#35322

From	Always Learning <cbrowning@ou.edu>
Date	2012-12-21 13:58 -0800
Message-ID	<b36b317d-01e6-4a88-9f2c-7506d41bc480@googlegroups.com>
In reply to	#35321

Sorry about that. I'm using Python 2.7.3, 32 bit one Windows 7.

The errors I get are

>>File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
>>raise ValueError("Invalid XPath: %s" % xpath)
>>exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()

Ultimaly, I expect it to gather the team name in text, and then the odds in one of the columns in text as well, so I can then put it into a .csv

[toc] | [prev] | [next] | [standalone]

#35335

From	Dave Angel <d@davea.name>
Date	2012-12-21 22:18 -0500
Message-ID	<mailman.1182.1356146357.29569.python-list@python.org>
In reply to	#35322

On 12/21/2012 04:58 PM, Always Learning wrote:
> Sorry about that. I'm using Python 2.7.3, 32 bit one Windows 7.
>
> The errors I get are
>
>>> File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
>>> raise ValueError("Invalid XPath: %s" % xpath)
>>> exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()
> Ultimaly, I expect it to gather the team name in text, and then the odds in one of the columns in text as well, so I can then put it into a .csv

Why are you displaying only the last 3 lines of the error message?
Unless your source code is lxmlsel.py, there are other stack levels
above this one.

(I can't help, but I'm trying to save some time for someone who can)

-- 

DaveA

[toc] | [prev] | [next] | [standalone]

#35493

From	donarb <donarb@nwlink.com>
Date	2012-12-25 11:15 -0800
Message-ID	<b449136c-8a4b-41f5-a0b4-5404a69380e7@googlegroups.com>
In reply to	#35322

On Friday, December 21, 2012 1:58:47 PM UTC-8, Always Learning wrote:
> The errors I get are
> >>File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
> 
> >>raise ValueError("Invalid XPath: %s" % xpath)
> 
> >>exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()
> 


You're missing a right bracket in the xpath expression:

/span[@class='team-name']/text()

[toc] | [prev] | [next] | [standalone]

#35494

From	donarb <donarb@nwlink.com>
Date	2012-12-25 11:15 -0800
Message-ID	<mailman.1271.1356462933.29569.python-list@python.org>
In reply to	#35322

On Friday, December 21, 2012 1:58:47 PM UTC-8, Always Learning wrote:
> The errors I get are
> >>File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
> 
> >>raise ValueError("Invalid XPath: %s" % xpath)
> 
> >>exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()
> 


You're missing a right bracket in the xpath expression:

/span[@class='team-name']/text()

[toc] | [prev] | [next] | [standalone]

#35324

From	Always Learning <cbrowning@ou.edu>
Date	2012-12-21 13:58 -0800
Message-ID	<mailman.1171.1356127681.29569.python-list@python.org>
In reply to	#35321

Sorry about that. I'm using Python 2.7.3, 32 bit one Windows 7.

The errors I get are

>>File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
>>raise ValueError("Invalid XPath: %s" % xpath)
>>exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()

Ultimaly, I expect it to gather the team name in text, and then the odds in one of the columns in text as well, so I can then put it into a .csv

[toc] | [prev] | [standalone]

csiph-web

Scrapy/XPath help

Contents

#35320 — Scrapy/XPath help

#35321

#35322

#35335

#35493

#35494

#35324