Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #25119
| Return-Path | <sheetalsingh@shopzilla.com> |
|---|---|
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.002 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'python,': 0.02; 'output': 0.04; 'newbie': 0.05; 'none:': 0.05; 'subject:How': 0.09; '#print': 0.09; 'fetch': 0.09; 'skip:# 30': 0.09; 'snippet': 0.09; 'subject:using': 0.09; 'suggest': 0.11; '<div': 0.16; '<input': 0.16; 'csv': 0.16; 'received:corp': 0.16; 'seller': 0.16; 'skip:{ 40': 0.16; 'soup': 0.16; 'skip:i 40': 0.17; 'skip:{ 20': 0.17; 'to:name:python-list@python.org': 0.20; 'url:gt': 0.22; 'amazon': 0.24; 'structure': 0.32; 'print': 0.32; 'skip:s 30': 0.33; 'singh': 0.33; 'to:addr:python-list': 0.33; 'code:': 0.33; 'hi,': 0.33; 'skip:b 20': 0.34; 'screen': 0.34; 'url:org': 0.36; 'skip:{ 10': 0.36; 'charset:us-ascii': 0.36; 'url:rec-html40': 0.37; 'to:addr:python.org': 0.39; 'url:schemas': 0.39; 'url:office': 0.39; 'url:omml': 0.39; 'url:2004': 0.39; 'url:microsoft': 0.39; 'url:12': 0.40; 'save': 0.61; 'brands': 0.61; 'side': 0.61; 'brand': 0.78; 'marketplace': 0.78; 'nokia': 0.84; 'samsung': 0.84; 'subject:content': 0.84; 'url:quot': 0.84; '<a': 0.91 |
| DKIM-Signature | v=1; a=rsa-sha1; c=relaxed/relaxed; s=s1024;d=shopzilla.com; h=from:to:subject:date:message-id:content-type:mime-version; bh=Vgvw1k01EczyjQ1d3cgSx35fmV4=; b=eD+yGD9xdlJaHen5FPPa0KrTl1qtKaNVJdeQRSkPXUG3fB6xpPEp93kEPQWj30byb9T/8I2W 9W1+2qmOxnGwI0KL32SdGv09qpQQuaN3WpPGpC54ah+wUKiCioAoLrnY1g0RpGKiQQcq4l7E 2uNDXy0tx6uIzvnKgY6ItJjFYAU= |
| DomainKey-Signature | a=rsa-sha1; q=dns; c=nofws; s=s1024;d=shopzilla.com; h=from:to:subject:date:message-id:content-type:mime-version; b=4F8GWPTnfDDZepFLBqI31IHOWJd/pwRhOASTOSVWwktOLL3sazOr1CJ8txACXCrb/kivi3lL uQlmCXDtK9SSqnSTnUY3g/FCOcOZtMR5f4wg+yjOz3arZcuyo16jLnZkNZsd8IVIy7BBUH0q Lh8ck+Cf9M8oylxTv9nQsxj/Fvw= |
| From | Sheetal Singh <sheetalsingh@shopzilla.com> |
| To | "python-list@python.org" <python-list@python.org> |
| Subject | How to pick content from html using beatifulsoup |
| Thread-Topic | How to pick content from html using beatifulsoup |
| Thread-Index | Ac1eTqrj10GHP+ZKTGqzLeaUDOepZg== |
| Date | Tue, 10 Jul 2012 04:02:28 +0000 |
| Accept-Language | en-IN, en-US |
| Content-Language | en-US |
| X-MS-Has-Attach | yes |
| X-MS-TNEF-Correlator | |
| x-originating-ip | [10.40.4.17] |
| Content-Type | multipart/mixed; boundary="_004_7EC567BF36771942AE9F9279F86F13703B455ASZHQMSXNODE1Bshop_" |
| MIME-Version | 1.0 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.12 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1975.1341893033.4697.python-list@python.org> (permalink) |
| Lines | 519 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1341893033 news.xs4all.nl 6863 [2001:888:2000:d::a6]:54121 |
| X-Complaints-To | abuse@xs4all.nl |
| Path | csiph.com!usenet.pasdenom.info!news.stben.net!border3.nntp.ams.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail |
| Xref | csiph.com comp.lang.python:25119 |
Show key headers only | View raw
[Multipart message — attachments visible in raw view] - view raw
Hi,
I am a newbie in python, I need to fetch names of side filters and save in csv [PFA screen shot].
Following is snippet from code:
soup = BeautifulStoneSoup(html)
# for e in soup.findAll('div'):
# for c in e.findAll('h3'):
# for d in c.findAll('li'):
# print'@@@@@@@', d.extract()
#
# #select_pod=soup.findAll('div', {"class":"win aboutUs"})
# #promeg= select_pod[0].findAll("p")[0]
#
#
# for dv in soup.findAll('div', {"class":"attribution"}):
# ds = dv.findAll("<h3>")
# print ds
select_pod = soup.findAll('div')
print select_pod
for j in select_pod:
if j is not None:
print j.findall('a')
promeg = select_pod.findAll("<h3>")
#print '--', promeg
#hreflist = [ each.get('value') for each in soup.findAll('<h3>') ]
for m in promeg :
if m:
print 'Data values', m
fd1.writerow([x[2], m, i[0], "Data Found"])
Structure of HTML:
<div class="attribution">
<div>
<h3>By Brand</h3>
<ul>
<li>
<a href="http://www.xyz.com/cellphones/nokia/nokia/259-33902/buy">Nokia</a>
</li>
<li>
<li>
<li>
<li>
<li>
<li>
<li>
<li class="more">
</ul>
</div>
<div>
<h3>By Seller</h3>
<ul>
<li>
<a id="att_296935_184059" class="attributeUrlReplacementTarget" href="http://www.xyz.com/cellphones/nokia/amazon-marketplace/296935-184059/buy">Amazon Marketplace</a>
<input id="att_296935_184059_replacement" type="hidden" value="http://www.xyz.com/cellphones/nokia/amazon-marketplace/296935-184059/buy">
</li>
<li>
<li>
<li>
<li>
<li>
<li>
<li>
<li class="more">
</ul>
</div>
<div>
<div>
</div>
Output required in csv:
By Brands
Nokia
Samsung
.
.
By Seller
Amazon
Buy.com
.
.
.
Please suggest how to fetch details.
Sheetal Singh
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
How to pick content from html using beatifulsoup Sheetal Singh <sheetalsingh@shopzilla.com> - 2012-07-10 04:02 +0000
csiph-web