Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'modify': 0.04; 'subject:text': 0.04; 'indicating': 0.05; 'none:': 0.05; 'filename': 0.07; "subject:' ": 0.07; 'subject:code': 0.07; 'advice.': 0.09; 'iterate': 0.09; 'nameerror:': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:Getting': 0.09; 'subject:Why': 0.09; 'anyway': 0.11; 'suggest': 0.15; "file's": 0.16; 'lawrence,': 0.16; 'nodes': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'soup': 0.16; 'subject:Started': 0.16; 'wrote:': 0.16; 'subject:page': 0.18; 'input': 0.18; '>>>': 0.20; 'all,': 0.20; '"",': 0.22; 'pass': 0.22; 'defined': 0.23; 'second': 0.24; '(most': 0.24; 'thus': 0.24; 'all.': 0.24; 'header:User-Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'error': 0.27; '100000': 0.29; 'node': 0.29; 'skip:b 40': 0.29; 'probably': 0.31; 'skip:s 30': 0.31; 'returned': 0.32; 'traceback': 0.33; 'open': 0.33; "skip:' 20": 0.34; 'file': 0.34; 'returning': 0.35; 'expected': 0.35; 'skip:p 30': 0.35; 'child': 0.36; 'data.': 0.36; 'to:addr :python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'skip:p 20': 0.38; 'thank': 0.38; 'data': 0.39; 'does': 0.39; 'to:addr:python.org': 0.40; 'mark': 0.40; 'subject:with': 0.40; 'received:de': 0.40; 'your': 0.60; "you'll": 0.61; 'address': 0.61; 'no.': 0.62; 'beautiful': 0.66; 'therefore': 0.67; 'dear': 0.67; 'as:': 0.79; 'plants': 0.84; 'mistake': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Peter Otten <__peter__@web.de> Subject: Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Date: Sun, 12 Jul 2015 11:51:58 +0200 Organization: None References:

Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Gmane-NNTP-Posting-Host: p57bd9bf4.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 96 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1436694733 news.xs4all.nl 2853 [2001:888:2000:d::a6]:53145 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:93728 Simon Evans wrote: > Dear Mark Lawrence, thank you for your advice. > I take it that I use the input you suggest for the line : > > soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid.html",lxml") > > seeing as I have to give the file's full address I therefore have to > modify your : > > soup = BeautifulSoup(ecological_pyramid,"lxml") > > to : > > soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid," "lxml") > > otherwise I get : > > >>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as >>>> ecological_pyramid: soup = BeautifulSoup(ecological_pyramid,"lxml") > Traceback (most recent call last): > File "", line 1, in > NameError: name 'ecological_pyramid' is not defined > > > so anyway with the input therefore as: > >>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as >>>> ecological_pyramid: soup = BeautifulSoup("C:\Beautiful >>>> Soup\ecological_pyramid,","lxml") producer_entries = soup.find("ul") >>>> print(producer_entries.li.div.string) No. If you pass the filename beautiful soup will mistake it as the HTML. You can verify that in the interactive interpreter: >>> soup = BeautifulSoup("C:\Beautiful Soup\ecologicalpyramid.html","lxml") >>> soup

C:\Beautiful Soup\ecologicalpyramid.html

You have to pass an open file to BeautifulSoup, not a filename: >>> with open("C:\Beautiful Soup\ecologicalpyramid.html","r") as f: ... soup = BeautifulSoup(f, "lxml") ... However, if you look at the data returned by soup.find("ul") you'll see >>> producer_entries = soup.find("ul") >>> producer_entries

plants

100000
algae

100000

The first

...

node does not contain a div >>> producer_entries.li

and thus >>> producer_entries.li.div is None True and the following error is expected with the given data. Returning None is beautiful soup's way of indicating that the

node has no

child at all. If you want to process the first li that does have a

child a straight-forward way is to iterate over the children: >>> for li in producer_entries.find_all("li"): ... if li.div is not None: ... print(li.div.string) ... break # remove if you want all, not just the first ... plants Taking a second look at the data you probably want the li nodes with class="producerlist": >>> for li in soup.find_all("li", attrs={"class": "producerlist"}): ... print(li.div.string) ... plants algae