Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #68767 > unrolled thread

help with for loop----python 2.7.2

Started byteddybubu@gmail.com
First post2014-03-22 04:21 -0700
Last post2014-03-24 09:39 +1100
Articles 11 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  help with for loop----python 2.7.2 teddybubu@gmail.com - 2014-03-22 04:21 -0700
    Re: help with for loop----python 2.7.2 Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-22 06:00 -0600
    Re: help with for loop----python 2.7.2 tad na <teddybubu@gmail.com> - 2014-03-23 10:29 -0700
      Re: help with for loop----python 2.7.2 tad na <teddybubu@gmail.com> - 2014-03-23 10:30 -0700
        Re: help with for loop----python 2.7.2 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-23 17:40 +0000
          Re: help with for loop----python 2.7.2 tad na <teddybubu@gmail.com> - 2014-03-23 10:49 -0700
        Re: help with for loop----python 2.7.2 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-23 18:43 +0000
      Re: help with for loop----python 2.7.2 Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-23 11:49 -0600
        Re: help with for loop----python 2.7.2 tad na <teddybubu@gmail.com> - 2014-03-23 14:51 -0700
          Re: help with for loop----python 2.7.2 Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-23 16:44 -0600
    Re: help with for loop----python 2.7.2 Ben Finney <ben+python@benfinney.id.au> - 2014-03-24 09:39 +1100

#68767 — help with for loop----python 2.7.2

Fromteddybubu@gmail.com
Date2014-03-22 04:21 -0700
Subjecthelp with for loop----python 2.7.2
Message-ID<84eb4c69-d43d-4777-8a99-34eed9be73d6@googlegroups.com>
I am trying to get all the element data from the rss below.
The only thing I am pulling is the first element.
I don't understand why the for loop does not go through the entire rss.
Here is my code....


try:
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen 

from bs4 import BeautifulSoup 

soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)

for item in soup.find_all('item'):
#for item in soup:    
    title = soup.find('title').text    
    link = soup.find('link').text
    item = soup.find('item').text
    print item
    print title
    print link

[toc] | [next] | [standalone]


#68769

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-03-22 06:00 -0600
Message-ID<mailman.8394.1395489716.18130.python-list@python.org>
In reply to#68767
On Sat, Mar 22, 2014 at 5:21 AM,  <teddybubu@gmail.com> wrote:
> I am trying to get all the element data from the rss below.
> The only thing I am pulling is the first element.
> I don't understand why the for loop does not go through the entire rss.
> Here is my code....

[SNIP]

> for item in soup.find_all('item'):
> #for item in soup:
>     title = soup.find('title').text
>     link = soup.find('link').text
>     item = soup.find('item').text

The three find method calls in the for loop are searching from the
document root (the "soup" variable), not from the item you're
currently iterating at.  Try changing these to calls of item.find. And
note that calling one of the results "item" will replace the loop
variable.  That won't affect the iteration, but it's bad practice to
refer to two different things by the same local name.

[toc] | [prev] | [next] | [standalone]


#68813

Fromtad na <teddybubu@gmail.com>
Date2014-03-23 10:29 -0700
Message-ID<03c8b5d0-363e-4287-80d0-a43b0266f2a3@googlegroups.com>
In reply to#68767
On Saturday, March 22, 2014 6:21:30 AM UTC-5, tad na wrote:
> I am trying to get all the element data from the rss below.
> 
> The only thing I am pulling is the first element.
 
> I don't understand why the for loop does not go through the entire rss.

> Here is my code....
> try:
>     from urllib2 import urlopen
> except ImportError:
>     from urllib.request import urlopen  
> from bs4 import BeautifulSoup 
> soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
> #print soup.find_all('item')
> #print (soup)
> for item in soup.find_all('item'):
> #for item in soup:    
>     title = soup.find('title').text    
>     link = soup.find('link').text
>     item = soup.find('item').text
>     print item
>     print title
>     print link
OK . second problem :)
I can print the date.  not sure how to do this one..
try:
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen 
import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
#print soup.find_all('item')
#print (soup)
data = soup.find_all("item")

x=0
for item in soup.find_all('item'):
    title = item.find('title').text    
    link = item.find('link').text
    date = item.find('pubDate')
   # print date
    print('+++++++++++++++++')
    print data[x].title.text
    print data[x].link.text
    print data[x].guid.text
    print data[x].pubDate
    x = x + 1

[toc] | [prev] | [next] | [standalone]


#68814

Fromtad na <teddybubu@gmail.com>
Date2014-03-23 10:30 -0700
Message-ID<32858d81-7dcf-45de-af10-6157068f15af@googlegroups.com>
In reply to#68813
On Sunday, March 23, 2014 12:29:40 PM UTC-5, tad na wrote:
> On Saturday, March 22, 2014 6:21:30 AM UTC-5, tad na wrote:
> 
> > I am trying to get all the element data from the rss below.
> 
> > 
> 
> > The only thing I am pulling is the first element.
> 
>  
> 
> > I don't understand why the for loop does not go through the entire rss.
> 
> 
> 
> > Here is my code....
> 
> > try:
> 
> >     from urllib2 import urlopen
> 
> > except ImportError:
> 
> >     from urllib.request import urlopen  
> 
> > from bs4 import BeautifulSoup 
> 
> > soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
> 
> > #print soup.find_all('item')
> 
> > #print (soup)
> 
> > for item in soup.find_all('item'):
> 
> > #for item in soup:    
> 
> >     title = soup.find('title').text    
> 
> >     link = soup.find('link').text
> 
> >     item = soup.find('item').text
> 
> >     print item
> 
> >     print title
> 
> >     print link
> 
> OK . second problem :)
> 
> I can print the date.  not sure how to do this one..
> 
> try:
> 
>     from urllib2 import urlopen
> 
> except ImportError:
> 
>     from urllib.request import urlopen 
> 
> import urllib2
> 
> from bs4 import BeautifulSoup
> 
> 
> 
> soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
> 
> #print soup.find_all('item')
> 
> #print (soup)
> 
> data = soup.find_all("item")
> 
> 
> 
> x=0
> 
> for item in soup.find_all('item'):
> 
>     title = item.find('title').text    
> 
>     link = item.find('link').text
> 
>     date = item.find('pubDate')
> 
>    # print date
> 
>     print('+++++++++++++++++')
> 
>     print data[x].title.text
> 
>     print data[x].link.text
> 
>     print data[x].guid.text
> 
>     print data[x].pubDate
> 
>     x = x + 1

meant to say CANNOT print the date

[toc] | [prev] | [next] | [standalone]


#68816

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-03-23 17:40 +0000
Message-ID<mailman.8417.1395596423.18130.python-list@python.org>
In reply to#68814
On 23/03/2014 17:30, tad na wrote:

Would you please use the mailing list 
https://mail.python.org/mailman/listinfo/python-list or read and action 
this https://wiki.python.org/moin/GoogleGroupsPython to prevent us 
seeing double line spacing and single line paragraphs, thanks.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

[toc] | [prev] | [next] | [standalone]


#68817

Fromtad na <teddybubu@gmail.com>
Date2014-03-23 10:49 -0700
Message-ID<1bd3b586-b3ed-4d18-8d68-26a2144f6163@googlegroups.com>
In reply to#68816
On Sunday, March 23, 2014 12:40:04 PM UTC-5, Mark Lawrence wrote:
> On 23/03/2014 17:30, tad na wrote:
> Would you please use the mailing list 
> https://mail.python.org/mailman/listinfo/python-list or read and action 
> this https://wiki.python.org/moin/GoogleGroupsPython to prevent us 
> seeing double line spacing and single line paragraphs, thanks.
> -- 
> My fellow Pythonistas, ask not what our language can do for you, ask 
> what you can do for our language.
> Mark Lawrence
> ---
> This email is free from viruses and malware because avast! Antivirus protection is active.
> http://www.avast.com

mark not sure what i did wrong. The double line in the code is mine.
it helps me keep things separate.

[toc] | [prev] | [next] | [standalone]


#68819

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-03-23 18:43 +0000
Message-ID<mailman.8419.1395600257.18130.python-list@python.org>
In reply to#68814
On 23/03/2014 17:30, tad na wrote:
> On Sunday, March 23, 2014 12:29:40 PM UTC-5, tad na wrote:
>> On Saturday, March 22, 2014 6:21:30 AM UTC-5, tad na wrote:
>>
>>> I am trying to get all the element data from the rss below.
>>
>>>
>>
>>> The only thing I am pulling is the first element.
>>
>>
>>
>>> I don't understand why the for loop does not go through the entire rss.
>>
>>
>>
>>> Here is my code....

I've snipped the bulk of the message, but imagine what the above looks 
like when its been back and forth through gg a few times, it's 
effectively unreadable.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

[toc] | [prev] | [next] | [standalone]


#68818

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-03-23 11:49 -0600
Message-ID<mailman.8418.1395596955.18130.python-list@python.org>
In reply to#68813

[Multipart message — attachments visible in raw view] — view raw

On Mar 23, 2014 11:31 AM, "tad na" <teddybubu@gmail.com> wrote:
> OK . second problem :)
> I can print the date.  not sure how to do this one..

Why not? What happens when you try?

> try:
>     from urllib2 import urlopen
> except ImportError:
>     from urllib.request import urlopen
> import urllib2
> from bs4 import BeautifulSoup
>
> soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
> #print soup.find_all('item')
> #print (soup)
> data = soup.find_all("item")
>
> x=0
> for item in soup.find_all('item'):
>     title = item.find('title').text
>     link = item.find('link').text
>     date = item.find('pubDate')
>    # print date
>     print('+++++++++++++++++')
>     print data[x].title.text
>     print data[x].link.text
>     print data[x].guid.text
>     print data[x].pubDate
>     x = x + 1

data[x] should be the same object as item, no? If you want to keep track of
the current iteration index, a cleaner way to do that is by using enumerate:

    for x, item in enumerate(soup.find_all('item')):

As far as printing the pubDate goes, why not start by getting its text
property as you do with the other tags? From there you can either print the
string out directly or parse it into a datetime object.

[toc] | [prev] | [next] | [standalone]


#68821

Fromtad na <teddybubu@gmail.com>
Date2014-03-23 14:51 -0700
Message-ID<5f8a6cae-4e84-4748-beb1-fd931b187e4e@googlegroups.com>
In reply to#68818
On Sunday, March 23, 2014 12:49:11 PM UTC-5, Ian wrote:
> On Mar 23, 2014 11:31 AM, "tad na" <tedd...@gmail.com> wrote:
 
> > OK . second problem :)

> > I can print the date.  not sure how to do this one..
> Why not? What happens when you try?
> > try:
> >     from urllib2 import urlopen

> > except ImportError:
> >     from urllib.request import urlopen
> > import urllib2
> > from bs4 import BeautifulSoup

> > soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))
> > #print soup.find_all('item')
> > #print (soup)
> > data = soup.find_all("item")
> > x=0
> > for item in soup.find_all('item'):
> >     title = item.find('title').text
> >     link = item.find('link').text
> >     date = item.find('pubDate')
> >    # print date
> >     print('+++++++++++++++++')
> >     print data[x].title.text
> >     print data[x].link.text
> >     print data[x].guid.text
> >     print data[x].pubDate
> >     x = x + 1
> data[x] should be the same object as item, no? If you want to keep track of the current iteration index, a cleaner way to do that is by using enumerate:

>     for x, item in enumerate(soup.find_all('item')):
 
> As far as printing the pubDate goes, why not start by getting its text property as you do with the other tags? From there you can either print the string out directly or parse it into a datetime object.

This is the error I get with 
1. print data[x].pubDate.text
    AttributeError: 'NoneType' object has no attribute 'text'
2. print data[x].pubDate
    It results in "None"

[toc] | [prev] | [next] | [standalone]


#68823

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-03-23 16:44 -0600
Message-ID<mailman.8421.1395614684.18130.python-list@python.org>
In reply to#68821

[Multipart message — attachments visible in raw view] — view raw

On Mar 23, 2014 3:56 PM, "tad na" <teddybubu@gmail.com> wrote:
>
> This is the error I get with
> 1. print data[x].pubDate.text
>     AttributeError: 'NoneType' object has no attribute 'text'
> 2. print data[x].pubDate
>     It results in "None"

So the problem is that it's not even finding the pubDate tag in the first
place. Some sites on the Web suggest that beautiful soup normalizes all
tags to lowercase; try looking for the pubdate tag instead.

[toc] | [prev] | [next] | [standalone]


#68822

FromBen Finney <ben+python@benfinney.id.au>
Date2014-03-24 09:39 +1100
Message-ID<mailman.8420.1395614361.18130.python-list@python.org>
In reply to#68767
teddybubu@gmail.com writes:

> I am trying to get all the element data from the rss below.

[…]

> from bs4 import BeautifulSoup 
>
> soup = BeautifulSoup(urlopen('http://bl.ocks.org/mbostock.rss'))

RSS is not HTML; so BeautifulSoup is not a good tool to use for parsing
RSS.

Instead, you will do better if you use one of the libraries already
written for RSS parsing <URL:https://wiki.python.org/moin/RssLibraries>.

-- 
 \                                             “To be is to do” —Plato |
  `\                                       “To do is to be” —Aristotle |
_o__)                                        “Do be do be do” —Sinatra |
Ben Finney

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web