Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #68767 > unrolled thread
| Started by | teddybubu@gmail.com |
|---|---|
| First post | 2014-03-22 04:21 -0700 |
| Last post | 2014-03-24 09:39 +1100 |
| Articles | 11 — 5 participants |
Back to article view | Back to comp.lang.python
help with for loop----python 2.7.2 teddybubu@gmail.com - 2014-03-22 04:21 -0700
Re: help with for loop----python 2.7.2 Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-22 06:00 -0600
Re: help with for loop----python 2.7.2 tad na <teddybubu@gmail.com> - 2014-03-23 10:29 -0700
Re: help with for loop----python 2.7.2 tad na <teddybubu@gmail.com> - 2014-03-23 10:30 -0700
Re: help with for loop----python 2.7.2 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-23 17:40 +0000
Re: help with for loop----python 2.7.2 tad na <teddybubu@gmail.com> - 2014-03-23 10:49 -0700
Re: help with for loop----python 2.7.2 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-23 18:43 +0000
Re: help with for loop----python 2.7.2 Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-23 11:49 -0600
Re: help with for loop----python 2.7.2 tad na <teddybubu@gmail.com> - 2014-03-23 14:51 -0700
Re: help with for loop----python 2.7.2 Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-23 16:44 -0600
Re: help with for loop----python 2.7.2 Ben Finney <ben+python@benfinney.id.au> - 2014-03-24 09:39 +1100
| From | teddybubu@gmail.com |
|---|---|
| Date | 2014-03-22 04:21 -0700 |
| Subject | help with for loop----python 2.7.2 |
| Message-ID | <84eb4c69-d43d-4777-8a99-34eed9be73d6@googlegroups.com> |
I am trying to get all the element data from the rss below.
The only thing I am pulling is the first element.
I don't understand why the for loop does not go through the entire rss.
Here is my code....
try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen
from bs4 import BeautifulSoup
soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)
for item in soup.find_all('item'):
#for item in soup:
title = soup.find('title').text
link = soup.find('link').text
item = soup.find('item').text
print item
print title
print link
[toc] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-03-22 06:00 -0600 |
| Message-ID | <mailman.8394.1395489716.18130.python-list@python.org> |
| In reply to | #68767 |
On Sat, Mar 22, 2014 at 5:21 AM, <teddybubu@gmail.com> wrote:
> I am trying to get all the element data from the rss below.
> The only thing I am pulling is the first element.
> I don't understand why the for loop does not go through the entire rss.
> Here is my code....
[SNIP]
> for item in soup.find_all('item'):
> #for item in soup:
> title = soup.find('title').text
> link = soup.find('link').text
> item = soup.find('item').text
The three find method calls in the for loop are searching from the
document root (the "soup" variable), not from the item you're
currently iterating at. Try changing these to calls of item.find. And
note that calling one of the results "item" will replace the loop
variable. That won't affect the iteration, but it's bad practice to
refer to two different things by the same local name.
[toc] | [prev] | [next] | [standalone]
| From | tad na <teddybubu@gmail.com> |
|---|---|
| Date | 2014-03-23 10:29 -0700 |
| Message-ID | <03c8b5d0-363e-4287-80d0-a43b0266f2a3@googlegroups.com> |
| In reply to | #68767 |
On Saturday, March 22, 2014 6:21:30 AM UTC-5, tad na wrote:
> I am trying to get all the element data from the rss below.
>
> The only thing I am pulling is the first element.
> I don't understand why the for loop does not go through the entire rss.
> Here is my code....
> try:
> from urllib2 import urlopen
> except ImportError:
> from urllib.request import urlopen
> from bs4 import BeautifulSoup
> soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
> #print soup.find_all('item')
> #print (soup)
> for item in soup.find_all('item'):
> #for item in soup:
> title = soup.find('title').text
> link = soup.find('link').text
> item = soup.find('item').text
> print item
> print title
> print link
OK . second problem :)
I can print the date. not sure how to do this one..
try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)
data = soup.find_all("item")
x=0
for item in soup.find_all('item'):
title = item.find('title').text
link = item.find('link').text
date = item.find('pubDate')
# print date
print('+++++++++++++++++')
print data[x].title.text
print data[x].link.text
print data[x].guid.text
print data[x].pubDate
x = x + 1
[toc] | [prev] | [next] | [standalone]
| From | tad na <teddybubu@gmail.com> |
|---|---|
| Date | 2014-03-23 10:30 -0700 |
| Message-ID | <32858d81-7dcf-45de-af10-6157068f15af@googlegroups.com> |
| In reply to | #68813 |
On Sunday, March 23, 2014 12:29:40 PM UTC-5, tad na wrote:
> On Saturday, March 22, 2014 6:21:30 AM UTC-5, tad na wrote:
>
> > I am trying to get all the element data from the rss below.
>
> >
>
> > The only thing I am pulling is the first element.
>
>
>
> > I don't understand why the for loop does not go through the entire rss.
>
>
>
> > Here is my code....
>
> > try:
>
> > from urllib2 import urlopen
>
> > except ImportError:
>
> > from urllib.request import urlopen
>
> > from bs4 import BeautifulSoup
>
> > soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
>
> > #print soup.find_all('item')
>
> > #print (soup)
>
> > for item in soup.find_all('item'):
>
> > #for item in soup:
>
> > title = soup.find('title').text
>
> > link = soup.find('link').text
>
> > item = soup.find('item').text
>
> > print item
>
> > print title
>
> > print link
>
> OK . second problem :)
>
> I can print the date. not sure how to do this one..
>
> try:
>
> from urllib2 import urlopen
>
> except ImportError:
>
> from urllib.request import urlopen
>
> import urllib2
>
> from bs4 import BeautifulSoup
>
>
>
> soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
>
> #print soup.find_all('item')
>
> #print (soup)
>
> data = soup.find_all("item")
>
>
>
> x=0
>
> for item in soup.find_all('item'):
>
> title = item.find('title').text
>
> link = item.find('link').text
>
> date = item.find('pubDate')
>
> # print date
>
> print('+++++++++++++++++')
>
> print data[x].title.text
>
> print data[x].link.text
>
> print data[x].guid.text
>
> print data[x].pubDate
>
> x = x + 1
meant to say CANNOT print the date
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-03-23 17:40 +0000 |
| Message-ID | <mailman.8417.1395596423.18130.python-list@python.org> |
| In reply to | #68814 |
On 23/03/2014 17:30, tad na wrote: Would you please use the mailing list https://mail.python.org/mailman/listinfo/python-list or read and action this https://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing double line spacing and single line paragraphs, thanks. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
[toc] | [prev] | [next] | [standalone]
| From | tad na <teddybubu@gmail.com> |
|---|---|
| Date | 2014-03-23 10:49 -0700 |
| Message-ID | <1bd3b586-b3ed-4d18-8d68-26a2144f6163@googlegroups.com> |
| In reply to | #68816 |
On Sunday, March 23, 2014 12:40:04 PM UTC-5, Mark Lawrence wrote: > On 23/03/2014 17:30, tad na wrote: > Would you please use the mailing list > https://mail.python.org/mailman/listinfo/python-list or read and action > this https://wiki.python.org/moin/GoogleGroupsPython to prevent us > seeing double line spacing and single line paragraphs, thanks. > -- > My fellow Pythonistas, ask not what our language can do for you, ask > what you can do for our language. > Mark Lawrence > --- > This email is free from viruses and malware because avast! Antivirus protection is active. > http://www.avast.com mark not sure what i did wrong. The double line in the code is mine. it helps me keep things separate.
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-03-23 18:43 +0000 |
| Message-ID | <mailman.8419.1395600257.18130.python-list@python.org> |
| In reply to | #68814 |
On 23/03/2014 17:30, tad na wrote: > On Sunday, March 23, 2014 12:29:40 PM UTC-5, tad na wrote: >> On Saturday, March 22, 2014 6:21:30 AM UTC-5, tad na wrote: >> >>> I am trying to get all the element data from the rss below. >> >>> >> >>> The only thing I am pulling is the first element. >> >> >> >>> I don't understand why the for loop does not go through the entire rss. >> >> >> >>> Here is my code.... I've snipped the bulk of the message, but imagine what the above looks like when its been back and forth through gg a few times, it's effectively unreadable. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-03-23 11:49 -0600 |
| Message-ID | <mailman.8418.1395596955.18130.python-list@python.org> |
| In reply to | #68813 |
[Multipart message — attachments visible in raw view] — view raw
On Mar 23, 2014 11:31 AM, "tad na" <teddybubu@gmail.com> wrote:
> OK . second problem :)
> I can print the date. not sure how to do this one..
Why not? What happens when you try?
> try:
> from urllib2 import urlopen
> except ImportError:
> from urllib.request import urlopen
> import urllib2
> from bs4 import BeautifulSoup
>
> soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
> #print soup.find_all('item')
> #print (soup)
> data = soup.find_all("item")
>
> x=0
> for item in soup.find_all('item'):
> title = item.find('title').text
> link = item.find('link').text
> date = item.find('pubDate')
> # print date
> print('+++++++++++++++++')
> print data[x].title.text
> print data[x].link.text
> print data[x].guid.text
> print data[x].pubDate
> x = x + 1
data[x] should be the same object as item, no? If you want to keep track of
the current iteration index, a cleaner way to do that is by using enumerate:
for x, item in enumerate(soup.find_all('item')):
As far as printing the pubDate goes, why not start by getting its text
property as you do with the other tags? From there you can either print the
string out directly or parse it into a datetime object.
[toc] | [prev] | [next] | [standalone]
| From | tad na <teddybubu@gmail.com> |
|---|---|
| Date | 2014-03-23 14:51 -0700 |
| Message-ID | <5f8a6cae-4e84-4748-beb1-fd931b187e4e@googlegroups.com> |
| In reply to | #68818 |
On Sunday, March 23, 2014 12:49:11 PM UTC-5, Ian wrote:
> On Mar 23, 2014 11:31 AM, "tad na" <tedd...@gmail.com> wrote:
> > OK . second problem :)
> > I can print the date. not sure how to do this one..
> Why not? What happens when you try?
> > try:
> > from urllib2 import urlopen
> > except ImportError:
> > from urllib.request import urlopen
> > import urllib2
> > from bs4 import BeautifulSoup
> > soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
> > #print soup.find_all('item')
> > #print (soup)
> > data = soup.find_all("item")
> > x=0
> > for item in soup.find_all('item'):
> > title = item.find('title').text
> > link = item.find('link').text
> > date = item.find('pubDate')
> > # print date
> > print('+++++++++++++++++')
> > print data[x].title.text
> > print data[x].link.text
> > print data[x].guid.text
> > print data[x].pubDate
> > x = x + 1
> data[x] should be the same object as item, no? If you want to keep track of the current iteration index, a cleaner way to do that is by using enumerate:
> for x, item in enumerate(soup.find_all('item')):
> As far as printing the pubDate goes, why not start by getting its text property as you do with the other tags? From there you can either print the string out directly or parse it into a datetime object.
This is the error I get with
1. print data[x].pubDate.text
AttributeError: 'NoneType' object has no attribute 'text'
2. print data[x].pubDate
It results in "None"
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-03-23 16:44 -0600 |
| Message-ID | <mailman.8421.1395614684.18130.python-list@python.org> |
| In reply to | #68821 |
[Multipart message — attachments visible in raw view] — view raw
On Mar 23, 2014 3:56 PM, "tad na" <teddybubu@gmail.com> wrote: > > This is the error I get with > 1. print data[x].pubDate.text > AttributeError: 'NoneType' object has no attribute 'text' > 2. print data[x].pubDate > It results in "None" So the problem is that it's not even finding the pubDate tag in the first place. Some sites on the Web suggest that beautiful soup normalizes all tags to lowercase; try looking for the pubdate tag instead.
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2014-03-24 09:39 +1100 |
| Message-ID | <mailman.8420.1395614361.18130.python-list@python.org> |
| In reply to | #68767 |
teddybubu@gmail.com writes:
> I am trying to get all the element data from the rss below.
[…]
> from bs4 import BeautifulSoup
>
> soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
RSS is not HTML; so BeautifulSoup is not a good tool to use for parsing
RSS.
Instead, you will do better if you use one of the libraries already
written for RSS parsing <URL:https://wiki.python.org/moin/RssLibraries>.
--
\ “To be is to do” —Plato |
`\ “To do is to be” —Aristotle |
_o__) “Do be do be do” —Sinatra |
Ben Finney
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web