Groups > comp.lang.python > #27093 > unrolled thread

python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

Started by	Dmitry Arsentiev <dmarsentev@gmail.com>
First post	2012-08-15 05:49 -0700
Last post	2012-08-18 19:56 +0200
Articles	4 — 4 participants

Back to article view | Back to comp.lang.python

  python+libxml2+scrapy  AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' Dmitry Arsentiev <dmarsentev@gmail.com> - 2012-08-15 05:49 -0700
    Re: python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' Dieter Maurer <dieter@handshake.de> - 2012-08-16 07:19 +0200
    Re: python+libxml2+scrapy  AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' personificator@gmail.com - 2012-08-16 18:57 -0700
    Re: python+libxml2+scrapy  AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER' Stefan Behnel <stefan_ml@behnel.de> - 2012-08-18 19:56 +0200

#27093 — python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

From	Dmitry Arsentiev <dmarsentev@gmail.com>
Date	2012-08-15 05:49 -0700
Subject	python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
Message-ID	<cb6f13e3-f189-44e6-8aac-f11d3e7fa7ba@googlegroups.com>

Hello.

Has anybody already meet the problem like this? -
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

When I run scrapy, I get

  File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
line 14, in <module>
    libxml2.HTML_PARSE_NOERROR + \
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'


When I run
 python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'

I get
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

How can I cure it?

Python 2.7
libxml2-python 2.6.9
2.6.11-gentoo-r6


I will be grateful for any help.

DETAILS:

scrapy crawl lgz -o items.json -t json
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 4, in <module>
    execute()
  File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 112, in execute
    cmds = _get_commands_dict(inproject)
  File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 37, in _get_commands_dict
    cmds = _get_commands_from_module('scrapy.commands', inproject)
  File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 30, in _get_commands_from_module
    for cmd in _iter_command_classes(module):
  File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 21, in _iter_command_classes
    for module in walk_modules(module_name):
  File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 65, in walk_modules
    submod = __import__(fullpath, {}, {}, [''])
  File "/usr/local/lib/python2.7/site-packages/scrapy/commands/shell.py", line 8, in <module>
    from scrapy.shell import Shell
  File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line 14, in <module>
    from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
  File "/usr/local/lib/python2.7/site-packages/scrapy/selector/__init__.py", line 30, in <module>
    from scrapy.selector.libxml2sel import *
  File "/usr/local/lib/python2.7/site-packages/scrapy/selector/libxml2sel.py", line 12, in <module>
    from .factories import xmlDoc_from_html, xmlDoc_from_xml
  File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py", line 14, in <module>
    libxml2.HTML_PARSE_NOERROR + \
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

[toc] | [next] | [standalone]

#27140 — Re: python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

From	Dieter Maurer <dieter@handshake.de>
Date	2012-08-16 07:19 +0200
Subject	Re: python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
Message-ID	<mailman.3342.1345094372.4697.python-list@python.org>
In reply to	#27093

Dmitry Arsentiev <dmarsentev@gmail.com> writes:

> Has anybody already meet the problem like this? -
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
> When I run scrapy, I get
>
>   File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
> line 14, in <module>
>     libxml2.HTML_PARSE_NOERROR + \
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

Apparently, the versions of "scrapy" and "libxml2" do not fit.

Check with which "libxml2" versions, your "scrapy" version can work
and then install one of them.

[toc] | [prev] | [next] | [standalone]

#27217

From	personificator@gmail.com
Date	2012-08-16 18:57 -0700
Message-ID	<317f8818-0b72-48b1-a111-3cb96b498085@googlegroups.com>
In reply to	#27093

I believe ftp://xmlsoft.org/libxml2/libxml2-2.8.0.tar.gz was what your looking for. Submit a ticket for the docs to get updated if your feeling generous.

On Wednesday, August 15, 2012 7:49:04 AM UTC-5, Dmitry Arsentiev wrote:
> Hello.
> 
> 
> 
> Has anybody already meet the problem like this? -
> 
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
> 
> 
> 
> When I run scrapy, I get
> 
> 
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
> 
> line 14, in <module>
> 
>     libxml2.HTML_PARSE_NOERROR + \
> 
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
> 
> 
> 
> 
> 
> When I run
> 
>  python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'
> 
> 
> 
> I get
> 
> Traceback (most recent call last):
> 
>   File "<string>", line 1, in <module>
> 
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
> 
> 
> 
> How can I cure it?
> 
> 
> 
> Python 2.7
> 
> libxml2-python 2.6.9
> 
> 2.6.11-gentoo-r6
> 
> 
> 
> 
> 
> I will be grateful for any help.
> 
> 
> 
> DETAILS:
> 
> 
> 
> scrapy crawl lgz -o items.json -t json
> 
> Traceback (most recent call last):
> 
>   File "/usr/local/bin/scrapy", line 4, in <module>
> 
>     execute()
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 112, in execute
> 
>     cmds = _get_commands_dict(inproject)
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 37, in _get_commands_dict
> 
>     cmds = _get_commands_from_module('scrapy.commands', inproject)
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 30, in _get_commands_from_module
> 
>     for cmd in _iter_command_classes(module):
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 21, in _iter_command_classes
> 
>     for module in walk_modules(module_name):
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 65, in walk_modules
> 
>     submod = __import__(fullpath, {}, {}, [''])
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/commands/shell.py", line 8, in <module>
> 
>     from scrapy.shell import Shell
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line 14, in <module>
> 
>     from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/selector/__init__.py", line 30, in <module>
> 
>     from scrapy.selector.libxml2sel import *
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/selector/libxml2sel.py", line 12, in <module>
> 
>     from .factories import xmlDoc_from_html, xmlDoc_from_xml
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py", line 14, in <module>
> 
>     libxml2.HTML_PARSE_NOERROR + \
> 
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

[toc] | [prev] | [next] | [standalone]

#27318

From	Stefan Behnel <stefan_ml@behnel.de>
Date	2012-08-18 19:56 +0200
Message-ID	<mailman.3465.1345312619.4697.python-list@python.org>
In reply to	#27093

Dmitry Arsentiev, 15.08.2012 14:49:
> Has anybody already meet the problem like this? -
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
> 
> When I run scrapy, I get
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
> line 14, in <module>
>     libxml2.HTML_PARSE_NOERROR + \
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
> 
> 
> When I run
>  python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'
> 
> I get
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
> 
> How can I cure it?
> 
> Python 2.7
> libxml2-python 2.6.9
> 2.6.11-gentoo-r6

That version of libxml2 is way too old and doesn't support parsing
real-world HTML. IIRC, that started with 2.6.21 and got improved a bit
after that.

Get a 2.8.0 installation, as someone pointed out already.

Stefan

[toc] | [prev] | [standalone]

csiph-web

python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

Contents

#27093 — python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

#27140 — Re: python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

#27217

#27318