Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'modify': 0.04; 'subject:text': 0.04; 'indicating': 0.05; 'none:': 0.05; 'filename': 0.07; "subject:' ": 0.07; 'subject:code': 0.07; 'advice.': 0.09; 'iterate': 0.09; 'nameerror:': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:Getting': 0.09; 'subject:Why': 0.09; 'anyway': 0.11; 'suggest': 0.15; "file's": 0.16; 'lawrence,': 0.16; 'nodes': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'soup': 0.16; 'subject:Started': 0.16; 'wrote:': 0.16; 'subject:page': 0.18; 'input': 0.18; '>>>': 0.20; 'all,': 0.20; '"",': 0.22; 'pass': 0.22; 'defined': 0.23; 'second': 0.24; '(most': 0.24; 'thus': 0.24; 'all.': 0.24; 'header:User-Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'error': 0.27; '100000': 0.29; 'node': 0.29; 'skip:b 40': 0.29; 'probably': 0.31; 'skip:s 30': 0.31; 'returned': 0.32; 'traceback': 0.33; 'open': 0.33; "skip:' 20": 0.34; 'file': 0.34; 'returning': 0.35; 'expected': 0.35; 'skip:p 30': 0.35; 'child': 0.36; 'data.': 0.36; 'to:addr :python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'skip:p 20': 0.38; 'thank': 0.38; 'data': 0.39; 'does': 0.39; 'to:addr:python.org': 0.40; 'mark': 0.40; 'subject:with': 0.40; 'received:de': 0.40; 'your': 0.60; "you'll": 0.61; 'address': 0.61; 'no.': 0.62; 'beautiful': 0.66; 'therefore': 0.67; 'dear': 0.67; 'as:': 0.79; 'plants': 0.84; 'mistake': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Peter Otten <__peter__@web.de> Subject: Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Date: Sun, 12 Jul 2015 11:51:58 +0200 Organization: None References: Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Gmane-NNTP-Posting-Host: p57bd9bf4.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 96 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1436694733 news.xs4all.nl 2853 [2001:888:2000:d::a6]:53145 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:93728 Simon Evans wrote: > Dear Mark Lawrence, thank you for your advice. > I take it that I use the input you suggest for the line : > > soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid.html",lxml") > > seeing as I have to give the file's full address I therefore have to > modify your : > > soup = BeautifulSoup(ecological_pyramid,"lxml") > > to : > > soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid," "lxml") > > otherwise I get : > > >>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as >>>> ecological_pyramid: soup = BeautifulSoup(ecological_pyramid,"lxml") > Traceback (most recent call last): > File "", line 1, in > NameError: name 'ecological_pyramid' is not defined > > > so anyway with the input therefore as: > >>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as >>>> ecological_pyramid: soup = BeautifulSoup("C:\Beautiful >>>> Soup\ecological_pyramid,","lxml") producer_entries = soup.find("ul") >>>> print(producer_entries.li.div.string) No. If you pass the filename beautiful soup will mistake it as the HTML. You can verify that in the interactive interpreter: >>> soup = BeautifulSoup("C:\Beautiful Soup\ecologicalpyramid.html","lxml") >>> soup

C:\Beautiful Soup\ecologicalpyramid.html

You have to pass an open file to BeautifulSoup, not a filename: >>> with open("C:\Beautiful Soup\ecologicalpyramid.html","r") as f: ... soup = BeautifulSoup(f, "lxml") ... However, if you look at the data returned by soup.find("ul") you'll see >>> producer_entries = soup.find("ul") >>> producer_entries
  • plants
    100000
  • algae
    100000
The first
  • ...
  • node does not contain a div >>> producer_entries.li
  • and thus >>> producer_entries.li.div is None True and the following error is expected with the given data. Returning None is beautiful soup's way of indicating that the
  • node has no
    child at all. If you want to process the first li that does have a
    child a straight-forward way is to iterate over the children: >>> for li in producer_entries.find_all("li"): ... if li.div is not None: ... print(li.div.string) ... break # remove if you want all, not just the first ... plants Taking a second look at the data you probably want the li nodes with class="producerlist": >>> for li in soup.find_all("li", attrs={"class": "producerlist"}): ... print(li.div.string) ... plants algae