Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35320 > unrolled thread

Scrapy/XPath help

Started byAlways Learning <cbrowning@ou.edu>
First post2012-12-21 13:21 -0800
Last post2012-12-21 13:58 -0800
Articles 7 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Scrapy/XPath help Always Learning <cbrowning@ou.edu> - 2012-12-21 13:21 -0800
    Re: Scrapy/XPath help Grant Rettke <grettke@acm.org> - 2012-12-21 15:34 -0600
      Re: Scrapy/XPath help Always Learning <cbrowning@ou.edu> - 2012-12-21 13:58 -0800
        Re: Scrapy/XPath help Dave Angel <d@davea.name> - 2012-12-21 22:18 -0500
        Re: Scrapy/XPath help donarb <donarb@nwlink.com> - 2012-12-25 11:15 -0800
        Re: Scrapy/XPath help donarb <donarb@nwlink.com> - 2012-12-25 11:15 -0800
      Re: Scrapy/XPath help Always Learning <cbrowning@ou.edu> - 2012-12-21 13:58 -0800

#35320 — Scrapy/XPath help

FromAlways Learning <cbrowning@ou.edu>
Date2012-12-21 13:21 -0800
SubjectScrapy/XPath help
Message-ID<e180db33-272f-4a9d-bc1e-231f3c3580bf@googlegroups.com>
Hello all. I'm new to Python, but have been playing around with it for a few weeks now, following tutorials, etc. I've spun off on my own and am trying to do some basic web scraping. I've used Firebug/View XPath in Firefox for some help with the XPaths, however, I still am receiving errors when I try to run this script. If you could help, it would be greatly appreciated!

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from cbb_info.items import CbbInfoItem, Field

class GameInfoSpider(BaseSpider):
    name = "game_info"
    allowed_domains = ["www.sbrforum.com"]
    start_urls = [
        'http://www.sbrforum.com/betting-odds/ncaa-basketball/',
        ]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        toplevels = hxs.select("//div[@class='eventLine-value']")
        items = []
        for toplevels in toplevels:
            item = CbbInfoItem()
            item ["teams"] = toplevels.select("/span[@class='team-name'/text()").extract()
            item ["lines"] = toplevels.select("/div[@rel='19']").extract()
            item.append(item)
        return items

[toc] | [next] | [standalone]


#35321

FromGrant Rettke <grettke@acm.org>
Date2012-12-21 15:34 -0600
Message-ID<mailman.1169.1356125659.29569.python-list@python.org>
In reply to#35320
You might have better luck if you share the python make, version, os,
error message, and some unit tests demonstrating what you expect.

On Fri, Dec 21, 2012 at 3:21 PM, Always Learning <cbrowning@ou.edu> wrote:
> Hello all. I'm new to Python, but have been playing around with it for a few weeks now, following tutorials, etc. I've spun off on my own and am trying to do some basic web scraping. I've used Firebug/View XPath in Firefox for some help with the XPaths, however, I still am receiving errors when I try to run this script. If you could help, it would be greatly appreciated!
>
> from scrapy.spider import BaseSpider
> from scrapy.selector import HtmlXPathSelector
> from cbb_info.items import CbbInfoItem, Field
>
> class GameInfoSpider(BaseSpider):
>     name = "game_info"
>     allowed_domains = ["www.sbrforum.com"]
>     start_urls = [
>         'http://www.sbrforum.com/betting-odds/ncaa-basketball/',
>         ]
>
>     def parse(self, response):
>         hxs = HtmlXPathSelector(response)
>         toplevels = hxs.select("//div[@class='eventLine-value']")
>         items = []
>         for toplevels in toplevels:
>             item = CbbInfoItem()
>             item ["teams"] = toplevels.select("/span[@class='team-name'/text()").extract()
>             item ["lines"] = toplevels.select("/div[@rel='19']").extract()
>             item.append(item)
>         return items
> --
> http://mail.python.org/mailman/listinfo/python-list



-- 
Grant Rettke | ACM, AMA, COG, IEEE
grettke@acm.org | http://www.wisdomandwonder.com/
Wisdom begins in wonder.
((λ (x) (x x)) (λ (x) (x x)))

[toc] | [prev] | [next] | [standalone]


#35322

FromAlways Learning <cbrowning@ou.edu>
Date2012-12-21 13:58 -0800
Message-ID<b36b317d-01e6-4a88-9f2c-7506d41bc480@googlegroups.com>
In reply to#35321
Sorry about that. I'm using Python 2.7.3, 32 bit one Windows 7.

The errors I get are

>>File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
>>raise ValueError("Invalid XPath: %s" % xpath)
>>exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()

Ultimaly, I expect it to gather the team name in text, and then the odds in one of the columns in text as well, so I can then put it into a .csv

[toc] | [prev] | [next] | [standalone]


#35335

FromDave Angel <d@davea.name>
Date2012-12-21 22:18 -0500
Message-ID<mailman.1182.1356146357.29569.python-list@python.org>
In reply to#35322
On 12/21/2012 04:58 PM, Always Learning wrote:
> Sorry about that. I'm using Python 2.7.3, 32 bit one Windows 7.
>
> The errors I get are
>
>>> File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
>>> raise ValueError("Invalid XPath: %s" % xpath)
>>> exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()
> Ultimaly, I expect it to gather the team name in text, and then the odds in one of the columns in text as well, so I can then put it into a .csv

Why are you displaying only the last 3 lines of the error message?
Unless your source code is lxmlsel.py, there are other stack levels
above this one.

(I can't help, but I'm trying to save some time for someone who can)

-- 

DaveA

[toc] | [prev] | [next] | [standalone]


#35493

Fromdonarb <donarb@nwlink.com>
Date2012-12-25 11:15 -0800
Message-ID<b449136c-8a4b-41f5-a0b4-5404a69380e7@googlegroups.com>
In reply to#35322
On Friday, December 21, 2012 1:58:47 PM UTC-8, Always Learning wrote:
> The errors I get are
> >>File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
> 
> >>raise ValueError("Invalid XPath: %s" % xpath)
> 
> >>exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()
> 


You're missing a right bracket in the xpath expression:

/span[@class='team-name']/text()

[toc] | [prev] | [next] | [standalone]


#35494

Fromdonarb <donarb@nwlink.com>
Date2012-12-25 11:15 -0800
Message-ID<mailman.1271.1356462933.29569.python-list@python.org>
In reply to#35322
On Friday, December 21, 2012 1:58:47 PM UTC-8, Always Learning wrote:
> The errors I get are
> >>File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
> 
> >>raise ValueError("Invalid XPath: %s" % xpath)
> 
> >>exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()
> 


You're missing a right bracket in the xpath expression:

/span[@class='team-name']/text()

[toc] | [prev] | [next] | [standalone]


#35324

FromAlways Learning <cbrowning@ou.edu>
Date2012-12-21 13:58 -0800
Message-ID<mailman.1171.1356127681.29569.python-list@python.org>
In reply to#35321
Sorry about that. I'm using Python 2.7.3, 32 bit one Windows 7.

The errors I get are

>>File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
>>raise ValueError("Invalid XPath: %s" % xpath)
>>exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()

Ultimaly, I expect it to gather the team name in text, and then the odds in one of the columns in text as well, so I can then put it into a .csv

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web