Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #73737
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Writing Multiple files at a times |
| Date | 2014-06-29 19:21 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <lopp03$91i$1@dont-email.me> (permalink) |
| References | <b8951113-5171-4441-b490-4d731eb56cec@googlegroups.com> <mailman.11325.1404048700.18130.python-list@python.org> <roy-AFAF93.10013729062014@news.panix.com> <6bf28329-9983-4024-ac90-374fb11ac854@googlegroups.com> |
On Sun, 29 Jun 2014 10:32:00 -0700, subhabangalore wrote:
> I am opening multiple URLs with urllib.open, now one Url has huge html
> source files, like that each one has. As these files are read I am
> trying to concatenate them and put in one txt file as string.
> From this big txt file I am trying to take out each html file body of
> each URL and trying to write and store them
OK, let me clarify what I think you said.
First you concatenate all the web pages into a single file.
Then you extract all the page bodies from the single file and save them
as separate files.
This seems a silly way to do things, why don't you just save each html
body section as you receive it?
This sounds like it should be something as simple as:
from BeautifulSoup import BeautifulSoup
import requests
urlList = [
"http://something/",
"http://something/",
"http://something/",
....... ]
n = 0
for url in urlList:
r = requests.get( url )
soup = BeautifulSoup( r.content )
body = soup.find( "body" )
fp = open( "scraped/body{:0>5d}.htm".format( n ), "w" )
fp.write( body.prettify() )
fp.close
n += 1
will give you:
scraped/body00000.htm
scraped/body00001.htm
scraped/body00002.htm
........
for as many urls as you have in your url list. (make sure the target
directory exists!)
--
Denis McMahon, denismfmcmahon@gmail.com
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Writing Multiple files at a times subhabangalore@gmail.com - 2014-06-29 03:49 -0700
Re: Writing Multiple files at a times Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-06-29 12:17 +0100
Re:Writing Multiple files at a times Dave Angel <davea@davea.name> - 2014-06-29 09:33 -0400
Re: Writing Multiple files at a times Roy Smith <roy@panix.com> - 2014-06-29 10:01 -0400
Re: Writing Multiple files at a times subhabangalore@gmail.com - 2014-06-29 10:32 -0700
Re: Writing Multiple files at a times Denis McMahon <denismfmcmahon@gmail.com> - 2014-06-29 19:21 +0000
Re: Writing Multiple files at a times subhabangalore@gmail.com - 2014-06-30 12:23 -0700
Re: Writing Multiple files at a times Denis McMahon <denismfmcmahon@gmail.com> - 2014-06-30 23:22 +0000
csiph-web