Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #73737

Re: Writing Multiple files at a times

From Denis McMahon <denismfmcmahon@gmail.com>
Newsgroups comp.lang.python
Subject Re: Writing Multiple files at a times
Date 2014-06-29 19:21 +0000
Organization A noiseless patient Spider
Message-ID <lopp03$91i$1@dont-email.me> (permalink)
References <b8951113-5171-4441-b490-4d731eb56cec@googlegroups.com> <mailman.11325.1404048700.18130.python-list@python.org> <roy-AFAF93.10013729062014@news.panix.com> <6bf28329-9983-4024-ac90-374fb11ac854@googlegroups.com>

Show all headers | View raw


On Sun, 29 Jun 2014 10:32:00 -0700, subhabangalore wrote:

> I am opening multiple URLs with urllib.open, now one Url has huge html
> source files, like that each one has. As these files are read I am
> trying to concatenate them and put in one txt file as string.
> From this big txt file I am trying to take out each html file body of
> each URL and trying to write and store them

OK, let me clarify what I think you said.

First you concatenate all the web pages into a single file.
Then you extract all the page bodies from the single file and save them 
as separate files.

This seems a silly way to do things, why don't you just save each html 
body section as you receive it?

This sounds like it should be something as simple as:

from BeautifulSoup import BeautifulSoup
import requests

urlList = [ 
    "http://something/", 
    "http://something/", 
    "http://something/", 
    ....... ]

n = 0
for url in urlList:
    r = requests.get( url )
    soup = BeautifulSoup( r.content )
    body = soup.find( "body" )
    fp = open( "scraped/body{:0>5d}.htm".format( n ), "w" )
    fp.write( body.prettify() )
    fp.close
    n += 1

will give you:

scraped/body00000.htm
scraped/body00001.htm
scraped/body00002.htm
........

for as many urls as you have in your url list. (make sure the target 
directory exists!)

-- 
Denis McMahon, denismfmcmahon@gmail.com

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Writing Multiple files at a times subhabangalore@gmail.com - 2014-06-29 03:49 -0700
  Re: Writing Multiple files at a times Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-06-29 12:17 +0100
  Re:Writing Multiple files at a times Dave Angel <davea@davea.name> - 2014-06-29 09:33 -0400
    Re: Writing Multiple files at a times Roy Smith <roy@panix.com> - 2014-06-29 10:01 -0400
      Re: Writing Multiple files at a times subhabangalore@gmail.com - 2014-06-29 10:32 -0700
        Re: Writing Multiple files at a times Denis McMahon <denismfmcmahon@gmail.com> - 2014-06-29 19:21 +0000
  Re: Writing Multiple files at a times subhabangalore@gmail.com - 2014-06-30 12:23 -0700
    Re: Writing Multiple files at a times Denis McMahon <denismfmcmahon@gmail.com> - 2014-06-30 23:22 +0000

csiph-web