Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #93728

Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ?

From Peter Otten <__peter__@web.de>
Subject Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ?
Date 2015-07-12 11:51 +0200
Organization None
References <cbcc6d3f-1fc7-4caf-b6d9-3a7ff9d8f1d5@googlegroups.com> <f0b23331-69f6-4503-b9f5-52024fb78609@googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.446.1436694733.3674.python-list@python.org> (permalink)

Show all headers | View raw


Simon Evans wrote:

> Dear Mark Lawrence, thank you for your advice.
> I take it that I use the input you suggest for the line :
> 
> soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid.html",lxml")
> 
> seeing as I have to give the file's full address I therefore have to
> modify your :
> 
> soup = BeautifulSoup(ecological_pyramid,"lxml")
> 
> to :
> 
> soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid," "lxml")
> 
> otherwise I get :
> 
> 
>>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as
>>>> ecological_pyramid: soup = BeautifulSoup(ecological_pyramid,"lxml")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> NameError: name 'ecological_pyramid' is not defined
> 
> 
> so anyway with the input therefore as:
> 
>>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as
>>>> ecological_pyramid: soup = BeautifulSoup("C:\Beautiful
>>>> Soup\ecological_pyramid,","lxml") producer_entries = soup.find("ul")
>>>> print(producer_entries.li.div.string)

No. If you pass the filename beautiful soup will mistake it as the HTML. You
can verify that in the interactive interpreter:

>>> soup = BeautifulSoup("C:\Beautiful Soup\ecologicalpyramid.html","lxml")
>>> soup
<html><body><p>C:\Beautiful Soup\ecologicalpyramid.html</p></body></html>

You have to pass an open file to BeautifulSoup, not a filename:

>>> with open("C:\Beautiful Soup\ecologicalpyramid.html","r") as f:
...     soup = BeautifulSoup(f, "lxml")
... 

However, if you look at the data returned by soup.find("ul") you'll see

>>> producer_entries = soup.find("ul")
>>> producer_entries
<ul id="producers">
<li class="producers">
</li><li class="producerlist">
<div class="name">plants</div>
<div class="number">100000</div>
</li>
<li class="producerlist">
<div class="name">algae</div>
<div class="number">100000</div>
</li>
</ul>

The first <li>...</li> node does not contain a div

>>> producer_entries.li
<li class="producers">
</li>

and thus

>>> producer_entries.li.div is None
True

and the following error is expected with the given data. 
Returning None is beautiful soup's way of indicating that the
<li> node has no <div> child at all. If you want to 
process the first li that does have a <div> child a straight-forward 
way is to iterate over the children:

>>> for li in producer_entries.find_all("li"):
...     if li.div is not None:
...         print(li.div.string)
...         break # remove if you want all, not just the first
... 
plants

Taking a second look at the data you probably want the li nodes with
class="producerlist":

>>> for li in soup.find_all("li", attrs={"class": "producerlist"}):
...     print(li.div.string)
... 
plants
algae

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Simon Evans <musicalhacksaw@yahoo.co.uk> - 2015-07-11 15:17 -0700
  Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-12 00:06 +0100
  Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Simon Evans <musicalhacksaw@yahoo.co.uk> - 2015-07-12 01:59 -0700
    Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Peter Otten <__peter__@web.de> - 2015-07-12 11:51 +0200
  Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Simon Evans <musicalhacksaw@yahoo.co.uk> - 2015-07-12 04:48 -0700
    Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Peter Otten <__peter__@web.de> - 2015-07-12 14:26 +0200
  Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Simon Evans <musicalhacksaw@yahoo.co.uk> - 2015-07-12 05:36 -0700
  Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Simon Evans <musicalhacksaw@yahoo.co.uk> - 2015-07-12 05:48 -0700
    Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Peter Otten <__peter__@web.de> - 2015-07-12 15:12 +0200
    Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Larry Hudson <orgnut@yahoo.com> - 2015-07-12 13:06 -0700
  Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Simon Evans <musicalhacksaw@yahoo.co.uk> - 2015-07-12 10:33 -0700
    Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? MRAB <python@mrabarnett.plus.com> - 2015-07-12 19:05 +0100
    Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-12 19:23 +0100
    Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? "Albert Visser" <albert.visser@gmail.com> - 2015-07-12 20:34 +0200
    Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Laura Creighton <lac@openend.se> - 2015-07-12 21:47 +0200
    Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-12 21:09 +0100
    Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Laura Creighton <lac@openend.se> - 2015-07-12 22:29 +0200
    Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-12 21:48 +0100
  Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Laurent Pointal <laurent.pointal@free.fr> - 2015-07-12 19:54 +0200
    Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Chris Angelico <rosuav@gmail.com> - 2015-07-13 03:58 +1000

csiph-web