Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #25119

How to pick content from html using beatifulsoup

From Sheetal Singh <sheetalsingh@shopzilla.com>
Subject How to pick content from html using beatifulsoup
Date 2012-07-10 04:02 +0000
Newsgroups comp.lang.python
Message-ID <mailman.1975.1341893033.4697.python-list@python.org> (permalink)

Show all headers | View raw


[Multipart message — attachments visible in raw view] - view raw

Hi,

I am a newbie in python, I need to fetch names of side filters and save in csv [PFA screen shot].

Following is snippet from code:
  soup = BeautifulStoneSoup(html)
#                for e in soup.findAll('div'):
#                     for c in e.findAll('h3'):
#                        for d in c.findAll('li'):
#                            print'@@@@@@@', d.extract()
#

#                #select_pod=soup.findAll('div', {"class":"win aboutUs"})
#                #promeg= select_pod[0].findAll("p")[0]
#
#



#                for dv in soup.findAll('div', {"class":"attribution"}):
#                            ds = dv.findAll("<h3>")
#                            print ds



                select_pod = soup.findAll('div')
                print select_pod
                for j in select_pod:
                        if j is not None:
                            print j.findall('a')
                promeg = select_pod.findAll("<h3>")
                #print '--', promeg




                #hreflist = [ each.get('value') for each in soup.findAll('<h3>') ]


                for m in promeg :
                                if m:
                                        print 'Data values', m
                                        fd1.writerow([x[2], m, i[0], "Data Found"])


Structure of HTML:

<div class="attribution">
<div>
<h3>By Brand</h3>
<ul>
<li>
<a href="http://www.xyz.com/cellphones/nokia/nokia/259-33902/buy">Nokia</a>
</li>
<li>
<li>
<li>
<li>
<li>
<li>
<li>
<li class="more">
</ul>
</div>
<div>
<h3>By Seller</h3>
<ul>
<li>
<a id="att_296935_184059" class="attributeUrlReplacementTarget" href="http://www.xyz.com/cellphones/nokia/amazon-marketplace/296935-184059/buy">Amazon Marketplace</a>
<input id="att_296935_184059_replacement" type="hidden" value="http://www.xyz.com/cellphones/nokia/amazon-marketplace/296935-184059/buy">
</li>
<li>
<li>
<li>
<li>
<li>
<li>
<li>
<li class="more">
</ul>
</div>
<div>
<div>
</div>


Output required in csv:

By Brands
Nokia
Samsung
.
.

By Seller
Amazon
Buy.com
.
.
.



Please suggest how to fetch details.

Sheetal Singh

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

How to pick content from html using beatifulsoup Sheetal Singh <sheetalsingh@shopzilla.com> - 2012-07-10 04:02 +0000

csiph-web