Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #35320 > unrolled thread
| Started by | Always Learning <cbrowning@ou.edu> |
|---|---|
| First post | 2012-12-21 13:21 -0800 |
| Last post | 2012-12-21 13:58 -0800 |
| Articles | 7 — 4 participants |
Back to article view | Back to comp.lang.python
Scrapy/XPath help Always Learning <cbrowning@ou.edu> - 2012-12-21 13:21 -0800
Re: Scrapy/XPath help Grant Rettke <grettke@acm.org> - 2012-12-21 15:34 -0600
Re: Scrapy/XPath help Always Learning <cbrowning@ou.edu> - 2012-12-21 13:58 -0800
Re: Scrapy/XPath help Dave Angel <d@davea.name> - 2012-12-21 22:18 -0500
Re: Scrapy/XPath help donarb <donarb@nwlink.com> - 2012-12-25 11:15 -0800
Re: Scrapy/XPath help donarb <donarb@nwlink.com> - 2012-12-25 11:15 -0800
Re: Scrapy/XPath help Always Learning <cbrowning@ou.edu> - 2012-12-21 13:58 -0800
| From | Always Learning <cbrowning@ou.edu> |
|---|---|
| Date | 2012-12-21 13:21 -0800 |
| Subject | Scrapy/XPath help |
| Message-ID | <e180db33-272f-4a9d-bc1e-231f3c3580bf@googlegroups.com> |
Hello all. I'm new to Python, but have been playing around with it for a few weeks now, following tutorials, etc. I've spun off on my own and am trying to do some basic web scraping. I've used Firebug/View XPath in Firefox for some help with the XPaths, however, I still am receiving errors when I try to run this script. If you could help, it would be greatly appreciated!
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from cbb_info.items import CbbInfoItem, Field
class GameInfoSpider(BaseSpider):
name = "game_info"
allowed_domains = ["www.sbrforum.com"]
start_urls = [
'http://www.sbrforum.com/betting-odds/ncaa-basketball/',
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
toplevels = hxs.select("//div[@class='eventLine-value']")
items = []
for toplevels in toplevels:
item = CbbInfoItem()
item ["teams"] = toplevels.select("/span[@class='team-name'/text()").extract()
item ["lines"] = toplevels.select("/div[@rel='19']").extract()
item.append(item)
return items
[toc] | [next] | [standalone]
| From | Grant Rettke <grettke@acm.org> |
|---|---|
| Date | 2012-12-21 15:34 -0600 |
| Message-ID | <mailman.1169.1356125659.29569.python-list@python.org> |
| In reply to | #35320 |
You might have better luck if you share the python make, version, os,
error message, and some unit tests demonstrating what you expect.
On Fri, Dec 21, 2012 at 3:21 PM, Always Learning <cbrowning@ou.edu> wrote:
> Hello all. I'm new to Python, but have been playing around with it for a few weeks now, following tutorials, etc. I've spun off on my own and am trying to do some basic web scraping. I've used Firebug/View XPath in Firefox for some help with the XPaths, however, I still am receiving errors when I try to run this script. If you could help, it would be greatly appreciated!
>
> from scrapy.spider import BaseSpider
> from scrapy.selector import HtmlXPathSelector
> from cbb_info.items import CbbInfoItem, Field
>
> class GameInfoSpider(BaseSpider):
> name = "game_info"
> allowed_domains = ["www.sbrforum.com"]
> start_urls = [
> 'http://www.sbrforum.com/betting-odds/ncaa-basketball/',
> ]
>
> def parse(self, response):
> hxs = HtmlXPathSelector(response)
> toplevels = hxs.select("//div[@class='eventLine-value']")
> items = []
> for toplevels in toplevels:
> item = CbbInfoItem()
> item ["teams"] = toplevels.select("/span[@class='team-name'/text()").extract()
> item ["lines"] = toplevels.select("/div[@rel='19']").extract()
> item.append(item)
> return items
> --
> http://mail.python.org/mailman/listinfo/python-list
--
Grant Rettke | ACM, AMA, COG, IEEE
grettke@acm.org | http://www.wisdomandwonder.com/
Wisdom begins in wonder.
((λ (x) (x x)) (λ (x) (x x)))
[toc] | [prev] | [next] | [standalone]
| From | Always Learning <cbrowning@ou.edu> |
|---|---|
| Date | 2012-12-21 13:58 -0800 |
| Message-ID | <b36b317d-01e6-4a88-9f2c-7506d41bc480@googlegroups.com> |
| In reply to | #35321 |
Sorry about that. I'm using Python 2.7.3, 32 bit one Windows 7.
The errors I get are
>>File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
>>raise ValueError("Invalid XPath: %s" % xpath)
>>exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()
Ultimaly, I expect it to gather the team name in text, and then the odds in one of the columns in text as well, so I can then put it into a .csv
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-12-21 22:18 -0500 |
| Message-ID | <mailman.1182.1356146357.29569.python-list@python.org> |
| In reply to | #35322 |
On 12/21/2012 04:58 PM, Always Learning wrote:
> Sorry about that. I'm using Python 2.7.3, 32 bit one Windows 7.
>
> The errors I get are
>
>>> File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
>>> raise ValueError("Invalid XPath: %s" % xpath)
>>> exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()
> Ultimaly, I expect it to gather the team name in text, and then the odds in one of the columns in text as well, so I can then put it into a .csv
Why are you displaying only the last 3 lines of the error message?
Unless your source code is lxmlsel.py, there are other stack levels
above this one.
(I can't help, but I'm trying to save some time for someone who can)
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | donarb <donarb@nwlink.com> |
|---|---|
| Date | 2012-12-25 11:15 -0800 |
| Message-ID | <b449136c-8a4b-41f5-a0b4-5404a69380e7@googlegroups.com> |
| In reply to | #35322 |
On Friday, December 21, 2012 1:58:47 PM UTC-8, Always Learning wrote:
> The errors I get are
> >>File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
>
> >>raise ValueError("Invalid XPath: %s" % xpath)
>
> >>exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()
>
You're missing a right bracket in the xpath expression:
/span[@class='team-name']/text()
[toc] | [prev] | [next] | [standalone]
| From | donarb <donarb@nwlink.com> |
|---|---|
| Date | 2012-12-25 11:15 -0800 |
| Message-ID | <mailman.1271.1356462933.29569.python-list@python.org> |
| In reply to | #35322 |
On Friday, December 21, 2012 1:58:47 PM UTC-8, Always Learning wrote:
> The errors I get are
> >>File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
>
> >>raise ValueError("Invalid XPath: %s" % xpath)
>
> >>exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()
>
You're missing a right bracket in the xpath expression:
/span[@class='team-name']/text()
[toc] | [prev] | [next] | [standalone]
| From | Always Learning <cbrowning@ou.edu> |
|---|---|
| Date | 2012-12-21 13:58 -0800 |
| Message-ID | <mailman.1171.1356127681.29569.python-list@python.org> |
| In reply to | #35321 |
Sorry about that. I'm using Python 2.7.3, 32 bit one Windows 7.
The errors I get are
>>File "C:\python27\lib\site-packages\scrapy-0.16.3-py2.7.egg\scrapy\selector\lxmlsel.py", line 47, in select
>>raise ValueError("Invalid XPath: %s" % xpath)
>>exceptions.ValueError: Invalid XPath: /span[@class='team-name'/text()
Ultimaly, I expect it to gather the team name in text, and then the odds in one of the columns in text as well, so I can then put it into a .csv
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web