Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #27093 > unrolled thread
| Started by | Dmitry Arsentiev <dmarsentev@gmail.com> |
|---|---|
| First post | 2012-08-15 05:49 -0700 |
| Last post | 2012-08-18 19:56 +0200 |
| Articles | 4 — 4 participants |
Back to article view | Back to comp.lang.python
python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' Dmitry Arsentiev <dmarsentev@gmail.com> - 2012-08-15 05:49 -0700
Re: python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' Dieter Maurer <dieter@handshake.de> - 2012-08-16 07:19 +0200
Re: python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' personificator@gmail.com - 2012-08-16 18:57 -0700
Re: python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' Stefan Behnel <stefan_ml@behnel.de> - 2012-08-18 19:56 +0200
| From | Dmitry Arsentiev <dmarsentev@gmail.com> |
|---|---|
| Date | 2012-08-15 05:49 -0700 |
| Subject | python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' |
| Message-ID | <cb6f13e3-f189-44e6-8aac-f11d3e7fa7ba@googlegroups.com> |
Hello.
Has anybody already meet the problem like this? -
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
When I run scrapy, I get
File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
line 14, in <module>
libxml2.HTML_PARSE_NOERROR + \
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
When I run
python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'
I get
Traceback (most recent call last):
File "<string>", line 1, in <module>
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
How can I cure it?
Python 2.7
libxml2-python 2.6.9
2.6.11-gentoo-r6
I will be grateful for any help.
DETAILS:
scrapy crawl lgz -o items.json -t json
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 4, in <module>
execute()
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 112, in execute
cmds = _get_commands_dict(inproject)
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 37, in _get_commands_dict
cmds = _get_commands_from_module('scrapy.commands', inproject)
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 30, in _get_commands_from_module
for cmd in _iter_command_classes(module):
File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 21, in _iter_command_classes
for module in walk_modules(module_name):
File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 65, in walk_modules
submod = __import__(fullpath, {}, {}, [''])
File "/usr/local/lib/python2.7/site-packages/scrapy/commands/shell.py", line 8, in <module>
from scrapy.shell import Shell
File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line 14, in <module>
from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
File "/usr/local/lib/python2.7/site-packages/scrapy/selector/__init__.py", line 30, in <module>
from scrapy.selector.libxml2sel import *
File "/usr/local/lib/python2.7/site-packages/scrapy/selector/libxml2sel.py", line 12, in <module>
from .factories import xmlDoc_from_html, xmlDoc_from_xml
File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py", line 14, in <module>
libxml2.HTML_PARSE_NOERROR + \
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
[toc] | [next] | [standalone]
| From | Dieter Maurer <dieter@handshake.de> |
|---|---|
| Date | 2012-08-16 07:19 +0200 |
| Subject | Re: python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' |
| Message-ID | <mailman.3342.1345094372.4697.python-list@python.org> |
| In reply to | #27093 |
Dmitry Arsentiev <dmarsentev@gmail.com> writes: > Has anybody already meet the problem like this? - > AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' > > When I run scrapy, I get > > File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py", > line 14, in <module> > libxml2.HTML_PARSE_NOERROR + \ > AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' Apparently, the versions of "scrapy" and "libxml2" do not fit. Check with which "libxml2" versions, your "scrapy" version can work and then install one of them.
[toc] | [prev] | [next] | [standalone]
| From | personificator@gmail.com |
|---|---|
| Date | 2012-08-16 18:57 -0700 |
| Message-ID | <317f8818-0b72-48b1-a111-3cb96b498085@googlegroups.com> |
| In reply to | #27093 |
I believe ftp://xmlsoft.org/libxml2/libxml2-2.8.0.tar.gz was what your looking for. Submit a ticket for the docs to get updated if your feeling generous.
On Wednesday, August 15, 2012 7:49:04 AM UTC-5, Dmitry Arsentiev wrote:
> Hello.
>
>
>
> Has anybody already meet the problem like this? -
>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
>
>
> When I run scrapy, I get
>
>
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
>
> line 14, in <module>
>
> libxml2.HTML_PARSE_NOERROR + \
>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
>
>
>
>
> When I run
>
> python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'
>
>
>
> I get
>
> Traceback (most recent call last):
>
> File "<string>", line 1, in <module>
>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
>
>
> How can I cure it?
>
>
>
> Python 2.7
>
> libxml2-python 2.6.9
>
> 2.6.11-gentoo-r6
>
>
>
>
>
> I will be grateful for any help.
>
>
>
> DETAILS:
>
>
>
> scrapy crawl lgz -o items.json -t json
>
> Traceback (most recent call last):
>
> File "/usr/local/bin/scrapy", line 4, in <module>
>
> execute()
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 112, in execute
>
> cmds = _get_commands_dict(inproject)
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 37, in _get_commands_dict
>
> cmds = _get_commands_from_module('scrapy.commands', inproject)
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 30, in _get_commands_from_module
>
> for cmd in _iter_command_classes(module):
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 21, in _iter_command_classes
>
> for module in walk_modules(module_name):
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 65, in walk_modules
>
> submod = __import__(fullpath, {}, {}, [''])
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/commands/shell.py", line 8, in <module>
>
> from scrapy.shell import Shell
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line 14, in <module>
>
> from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/__init__.py", line 30, in <module>
>
> from scrapy.selector.libxml2sel import *
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/libxml2sel.py", line 12, in <module>
>
> from .factories import xmlDoc_from_html, xmlDoc_from_xml
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py", line 14, in <module>
>
> libxml2.HTML_PARSE_NOERROR + \
>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
[toc] | [prev] | [next] | [standalone]
| From | Stefan Behnel <stefan_ml@behnel.de> |
|---|---|
| Date | 2012-08-18 19:56 +0200 |
| Message-ID | <mailman.3465.1345312619.4697.python-list@python.org> |
| In reply to | #27093 |
Dmitry Arsentiev, 15.08.2012 14:49: > Has anybody already meet the problem like this? - > AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' > > When I run scrapy, I get > > File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py", > line 14, in <module> > libxml2.HTML_PARSE_NOERROR + \ > AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' > > > When I run > python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER' > > I get > Traceback (most recent call last): > File "<string>", line 1, in <module> > AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' > > How can I cure it? > > Python 2.7 > libxml2-python 2.6.9 > 2.6.11-gentoo-r6 That version of libxml2 is way too old and doesn't support parsing real-world HTML. IIRC, that started with 2.6.21 and got improved a bit after that. Get a 2.8.0 installation, as someone pointed out already. Stefan
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web