Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #67950 > unrolled thread
| Started by | teddybubu@gmail.com |
|---|---|
| First post | 2014-03-06 12:22 -0800 |
| Last post | 2014-03-12 08:36 +0100 |
| Articles | 8 — 5 participants |
Back to article view | Back to comp.lang.python
beautiful soup get class info teddybubu@gmail.com - 2014-03-06 12:22 -0800
Re: beautiful soup get class info John Gordon <gordon@panix.com> - 2014-03-06 20:58 +0000
Re: beautiful soup get class info teddybubu@gmail.com - 2014-03-06 13:38 -0800
Re: beautiful soup get class info John Gordon <gordon@panix.com> - 2014-03-06 22:28 +0000
Re: beautiful soup get class info teddybubu@gmail.com - 2014-03-06 17:37 -0800
Re: beautiful soup get class info Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-07 01:48 +0000
Re: beautiful soup get class info Christopher Welborn <cjwelborn@live.com> - 2014-03-11 21:04 -0500
Re: beautiful soup get class info Peter Otten <__peter__@web.de> - 2014-03-12 08:36 +0100
| From | teddybubu@gmail.com |
|---|---|
| Date | 2014-03-06 12:22 -0800 |
| Subject | beautiful soup get class info |
| Message-ID | <e73d29eb-17bb-472e-bdc4-c38ca904c60f@googlegroups.com> |
I am using beautifulsoup to get the title and date of the website. title is working fine but I am not able to pull the date. Here is the code in the url: <span class="date">October 22, 2011</span> In Python, I am using the following code: date1 = soup.span.text data=soup.find_all(date="value") Results in: [] March 5, 2014 What is the proper way to get this info? Thanks.
[toc] | [next] | [standalone]
| From | John Gordon <gordon@panix.com> |
|---|---|
| Date | 2014-03-06 20:58 +0000 |
| Message-ID | <lfanh4$mna$1@reader1.panix.com> |
| In reply to | #67950 |
In <e73d29eb-17bb-472e-bdc4-c38ca904c60f@googlegroups.com> teddybubu@gmail.com writes: > <span class="date">October 22, 2011</span> > date1 = soup.span.text > data=soup.find_all(date="value") Try this: soup.find_all(name="span", class="date") -- John Gordon Imagine what it must be like for a real medical doctor to gordon@panix.com watch 'House', or a real serial killer to watch 'Dexter'.
[toc] | [prev] | [next] | [standalone]
| From | teddybubu@gmail.com |
|---|---|
| Date | 2014-03-06 13:38 -0800 |
| Message-ID | <ae5b837c-501d-498e-bd3a-3b2c709c42b0@googlegroups.com> |
| In reply to | #67951 |
On Thursday, March 6, 2014 2:58:12 PM UTC-6, John Gordon wrote: > In <e73d29eb-17bb-472e-bdc4-c38ca904c60f@googlegroups.com> teddy writes: > > > > > <span class="date">October 22, 2011</span> > > > > > date1 = soup.span.text > > > data=soup.find_all(date="value") > > > > Try this: > > > > soup.find_all(name="span", class="date") > > > > -- > > John Gordon Imagine what it must be like for a real medical doctor to > > watch 'House', or a real serial killer to watch 'Dexter'. I have python 2.7.2 and it does not like class in the code you provided. Now when I take out [ class="date"], this is returned: [<span class="date">March 5, 2014</span>, <span class="date">March 5, 2014</span>] This is the code I am using: "data = soup.find_all(name="span") print (data)" 1. it returns today's date instead of the actual date 2. returns it twice
[toc] | [prev] | [next] | [standalone]
| From | John Gordon <gordon@panix.com> |
|---|---|
| Date | 2014-03-06 22:28 +0000 |
| Message-ID | <lfaspm$998$1@reader1.panix.com> |
| In reply to | #67952 |
In <ae5b837c-501d-498e-bd3a-3b2c709c42b0@googlegroups.com> teddybubu@gmail.com writes: > > soup.find_all(name="span", class="date") > I have python 2.7.2 and it does not like class in the code you provided. Oh right, 'class' is a reserved word. I imagine beautifulsoup has a workaround for that. > Now when I take out [ class="date"], this is returned: > [<span class="date">March 5, 2014</span>, <span class="date">March 5, 2014</span>] > > This is the code I am using: "data = soup.find_all(name="span") > print (data)" > 1. it returns today's date instead of the actual date > 2. returns it twice Are there two occurrences of '<span class="date">March 5, 2014</span>' in the HTML? If so, then beautifulsoup is doing its job correctly. It might help if you posted the sample HTML data you're working with. -- John Gordon Imagine what it must be like for a real medical doctor to gordon@panix.com watch 'House', or a real serial killer to watch 'Dexter'.
[toc] | [prev] | [next] | [standalone]
| From | teddybubu@gmail.com |
|---|---|
| Date | 2014-03-06 17:37 -0800 |
| Message-ID | <c303cbad-d790-43ce-a88d-2068ec8e371c@googlegroups.com> |
| In reply to | #67958 |
On Thursday, March 6, 2014 4:28:06 PM UTC-6, John Gordon wrote: > In <ae5b837c-501d-498e-bd3a-3b2c709c42b0@googlegroups.com> writes: > > > > > > soup.find_all(name="span", class="date") > > > > > I have python 2.7.2 and it does not like class in the code you provided. > > > > Oh right, 'class' is a reserved word. I imagine beautifulsoup has > > a workaround for that. > > > > > Now when I take out [ class="date"], this is returned: > > > [<span class="date">March 5, 2014</span>, <span class="date">March 5, 2014</span>] > > > > > > This is the code I am using: "data = soup.find_all(name="span") > > > print (data)" > > > 1. it returns today's date instead of the actual date > > > 2. returns it twice > > > > Are there two occurrences of '<span class="date">March 5, 2014</span>' > > in the HTML? If so, then beautifulsoup is doing its job correctly. > > > > It might help if you posted the sample HTML data you're working with. > > > > -- > > John Gordon Imagine what it must be like for a real medical doctor to > > watch 'House', or a real serial killer to watch 'Dexter'. ok I got this working. now to the next problem.... thanks.
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-03-07 01:48 +0000 |
| Message-ID | <mailman.7886.1394156960.18130.python-list@python.org> |
| In reply to | #67971 |
On 07/03/2014 01:37, teddybubu@gmail.com wrote: > On Thursday, March 6, 2014 4:28:06 PM UTC-6, John Gordon wrote: >> In <ae5b837c-501d-498e-bd3a-3b2c709c42b0@googlegroups.com> writes: >> >> >> >>>> soup.find_all(name="span", class="date") >> >> >> >>> I have python 2.7.2 and it does not like class in the code you provided. >> >> >> >> Oh right, 'class' is a reserved word. I imagine beautifulsoup has >> >> a workaround for that. >> >> >> >>> Now when I take out [ class="date"], this is returned: >> >>> [<span class="date">March 5, 2014</span>, <span class="date">March 5, 2014</span>] >> >>> >> >>> This is the code I am using: "data = soup.find_all(name="span") >> >>> print (data)" >> >>> 1. it returns today's date instead of the actual date >> >>> 2. returns it twice >> >> >> >> Are there two occurrences of '<span class="date">March 5, 2014</span>' >> >> in the HTML? If so, then beautifulsoup is doing its job correctly. >> >> >> >> It might help if you posted the sample HTML data you're working with. >> >> >> >> -- >> >> John Gordon Imagine what it must be like for a real medical doctor to >> >> watch 'House', or a real serial killer to watch 'Dexter'. > > ok I got this working. now to the next problem.... thanks. > I'm pleased to see that you have a solution. Now, should you wish to ask further questions, would you please read and action this first https://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing the double line spacing above, thanks. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
[toc] | [prev] | [next] | [standalone]
| From | Christopher Welborn <cjwelborn@live.com> |
|---|---|
| Date | 2014-03-11 21:04 -0500 |
| Message-ID | <mailman.8069.1394589869.18130.python-list@python.org> |
| In reply to | #67950 |
On 03/06/2014 02:22 PM, teddybubu@gmail.com wrote:
> I am using beautifulsoup to get the title and date of the website.
> title is working fine but I am not able to pull the date. Here is the code in the url:
>
> <span class="date">October 22, 2011</span>
>
> In Python, I am using the following code:
> date1 = soup.span.text
> data=soup.find_all(date="value")
>
> Results in:
>
> []
> March 5, 2014
>
> What is the proper way to get this info?
> Thanks.
>
I believe it's the 'attrs' argument.
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
# Workaround the 'class' problem:
data = soup.find_all(attrs={'class': 'date'})
I haven't tested it, but it's worth looking into.
--
\¯\ /¯/\
\ \/¯¯\/ / / Christopher Welborn (cj)
\__/\__/ / cjwelborn at live·com
\__/\__/ http://welbornprod.com
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-03-12 08:36 +0100 |
| Message-ID | <mailman.8074.1394609816.18130.python-list@python.org> |
| In reply to | #67950 |
Christopher Welborn wrote:
> On 03/06/2014 02:22 PM, teddybubu@gmail.com wrote:
>> I am using beautifulsoup to get the title and date of the website.
>> title is working fine but I am not able to pull the date. Here is the
>> code in the url:
>>
>> <span class="date">October 22, 2011</span>
>>
>> In Python, I am using the following code:
>> date1 = soup.span.text
>> data=soup.find_all(date="value")
>>
>> Results in:
>>
>> []
>> March 5, 2014
>>
>> What is the proper way to get this info?
>> Thanks.
>>
>
> I believe it's the 'attrs' argument.
> http://www.crummy.com/software/BeautifulSoup/bs4/doc/
>
> # Workaround the 'class' problem:
> data = soup.find_all(attrs={'class': 'date'})
>
> I haven't tested it, but it's worth looking into.
Yes there are two ways to filtr by class:
>>> soup = bs4.BeautifulSoup("""
... <span class="one">alpha</span>
... <span class="two">beta</span>""")
Use attrs:
>>> soup.find_all(attrs={"class": "one"})
[<span class="one">alpha</span>]
Append an underscore:
>>> soup.find_all(class_="two")
[<span class="two">beta</span>]
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web