Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #93728
| Path | csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <python-python-list@m.gmane.org> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'modify': 0.04; 'subject:text': 0.04; 'indicating': 0.05; 'none:': 0.05; 'filename': 0.07; "subject:' ": 0.07; 'subject:code': 0.07; 'advice.': 0.09; 'iterate': 0.09; 'nameerror:': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:Getting': 0.09; 'subject:Why': 0.09; 'anyway': 0.11; 'suggest': 0.15; "file's": 0.16; 'lawrence,': 0.16; 'nodes': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'soup': 0.16; 'subject:Started': 0.16; 'wrote:': 0.16; 'subject:page': 0.18; 'input': 0.18; '>>>': 0.20; 'all,': 0.20; '"",': 0.22; 'pass': 0.22; 'defined': 0.23; 'second': 0.24; '(most': 0.24; 'thus': 0.24; 'all.': 0.24; 'header:User-Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'error': 0.27; '100000': 0.29; 'node': 0.29; 'skip:b 40': 0.29; 'probably': 0.31; 'skip:s 30': 0.31; 'returned': 0.32; 'traceback': 0.33; 'open': 0.33; "skip:' 20": 0.34; 'file': 0.34; 'returning': 0.35; 'expected': 0.35; 'skip:p 30': 0.35; 'child': 0.36; 'data.': 0.36; 'to:addr :python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'skip:p 20': 0.38; 'thank': 0.38; 'data': 0.39; 'does': 0.39; 'to:addr:python.org': 0.40; 'mark': 0.40; 'subject:with': 0.40; 'received:de': 0.40; 'your': 0.60; "you'll": 0.61; 'address': 0.61; 'no.': 0.62; 'beautiful': 0.66; 'therefore': 0.67; 'dear': 0.67; 'as:': 0.79; 'plants': 0.84; 'mistake': 0.91 |
| X-Injected-Via-Gmane | http://gmane.org/ |
| To | python-list@python.org |
| From | Peter Otten <__peter__@web.de> |
| Subject | Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? |
| Date | Sun, 12 Jul 2015 11:51:58 +0200 |
| Organization | None |
| References | <cbcc6d3f-1fc7-4caf-b6d9-3a7ff9d8f1d5@googlegroups.com> <f0b23331-69f6-4503-b9f5-52024fb78609@googlegroups.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset="ISO-8859-1" |
| Content-Transfer-Encoding | 7Bit |
| X-Gmane-NNTP-Posting-Host | p57bd9bf4.dip0.t-ipconnect.de |
| User-Agent | KNode/4.13.3 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.20+ |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.446.1436694733.3674.python-list@python.org> (permalink) |
| Lines | 96 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1436694733 news.xs4all.nl 2853 [2001:888:2000:d::a6]:53145 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:93728 |
Show key headers only | View raw
Simon Evans wrote:
> Dear Mark Lawrence, thank you for your advice.
> I take it that I use the input you suggest for the line :
>
> soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid.html",lxml")
>
> seeing as I have to give the file's full address I therefore have to
> modify your :
>
> soup = BeautifulSoup(ecological_pyramid,"lxml")
>
> to :
>
> soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid," "lxml")
>
> otherwise I get :
>
>
>>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as
>>>> ecological_pyramid: soup = BeautifulSoup(ecological_pyramid,"lxml")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'ecological_pyramid' is not defined
>
>
> so anyway with the input therefore as:
>
>>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as
>>>> ecological_pyramid: soup = BeautifulSoup("C:\Beautiful
>>>> Soup\ecological_pyramid,","lxml") producer_entries = soup.find("ul")
>>>> print(producer_entries.li.div.string)
No. If you pass the filename beautiful soup will mistake it as the HTML. You
can verify that in the interactive interpreter:
>>> soup = BeautifulSoup("C:\Beautiful Soup\ecologicalpyramid.html","lxml")
>>> soup
<html><body><p>C:\Beautiful Soup\ecologicalpyramid.html</p></body></html>
You have to pass an open file to BeautifulSoup, not a filename:
>>> with open("C:\Beautiful Soup\ecologicalpyramid.html","r") as f:
... soup = BeautifulSoup(f, "lxml")
...
However, if you look at the data returned by soup.find("ul") you'll see
>>> producer_entries = soup.find("ul")
>>> producer_entries
<ul id="producers">
<li class="producers">
</li><li class="producerlist">
<div class="name">plants</div>
<div class="number">100000</div>
</li>
<li class="producerlist">
<div class="name">algae</div>
<div class="number">100000</div>
</li>
</ul>
The first <li>...</li> node does not contain a div
>>> producer_entries.li
<li class="producers">
</li>
and thus
>>> producer_entries.li.div is None
True
and the following error is expected with the given data.
Returning None is beautiful soup's way of indicating that the
<li> node has no <div> child at all. If you want to
process the first li that does have a <div> child a straight-forward
way is to iterate over the children:
>>> for li in producer_entries.find_all("li"):
... if li.div is not None:
... print(li.div.string)
... break # remove if you want all, not just the first
...
plants
Taking a second look at the data you probably want the li nodes with
class="producerlist":
>>> for li in soup.find_all("li", attrs={"class": "producerlist"}):
... print(li.div.string)
...
plants
algae
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Simon Evans <musicalhacksaw@yahoo.co.uk> - 2015-07-11 15:17 -0700
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-12 00:06 +0100
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Simon Evans <musicalhacksaw@yahoo.co.uk> - 2015-07-12 01:59 -0700
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Peter Otten <__peter__@web.de> - 2015-07-12 11:51 +0200
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Simon Evans <musicalhacksaw@yahoo.co.uk> - 2015-07-12 04:48 -0700
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Peter Otten <__peter__@web.de> - 2015-07-12 14:26 +0200
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Simon Evans <musicalhacksaw@yahoo.co.uk> - 2015-07-12 05:36 -0700
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Simon Evans <musicalhacksaw@yahoo.co.uk> - 2015-07-12 05:48 -0700
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Peter Otten <__peter__@web.de> - 2015-07-12 15:12 +0200
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Larry Hudson <orgnut@yahoo.com> - 2015-07-12 13:06 -0700
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Simon Evans <musicalhacksaw@yahoo.co.uk> - 2015-07-12 10:33 -0700
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? MRAB <python@mrabarnett.plus.com> - 2015-07-12 19:05 +0100
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-12 19:23 +0100
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? "Albert Visser" <albert.visser@gmail.com> - 2015-07-12 20:34 +0200
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Laura Creighton <lac@openend.se> - 2015-07-12 21:47 +0200
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-12 21:09 +0100
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Laura Creighton <lac@openend.se> - 2015-07-12 22:29 +0200
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-12 21:48 +0100
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Laurent Pointal <laurent.pointal@free.fr> - 2015-07-12 19:54 +0200
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ? Chris Angelico <rosuav@gmail.com> - 2015-07-13 03:58 +1000
csiph-web