Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder1.xlned.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.039 X-Spam-Evidence: '*H*': 0.92; '*S*': 0.00; 'needed,': 0.05; 'python': 0.09; 'command.': 0.09; 'preserve': 0.09; 'archive': 0.11; 'attributes.': 0.16; 'browsers.': 0.16; 'wrote:': 0.17; 'code,': 0.18; 'programming': 0.23; 'header:In-Reply-To:1': 0.25; 'header :User-Agent:1': 0.26; 'am,': 0.27; "doesn't": 0.28; 'actual': 0.28; 'feeds': 0.29; 'really,': 0.29; 'smart': 0.29; 'stuff': 0.30; 'could': 0.32; 'text,': 0.33; 'to:addr:python-list': 0.33; 'server': 0.35; 'data,': 0.35; 'something': 0.35; 'list.': 0.35; 'but': 0.36; 'enough': 0.36; 'subject:: ': 0.38; 'some': 0.38; 'nothing': 0.38; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'received:192.168': 0.40; 'your': 0.60; 'kind': 0.61; 'email addr:gmail.com': 0.63; 'internet': 0.71; 'received:74.208': 0.71; 'received:74.208.4.194': 0.84; 'text-based': 0.84 Date: Thu, 21 Feb 2013 10:56:15 -0500 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: python-list@python.org Subject: Re: Urllib's urlopen and urlretrieve References: <34998ea2-6b19-4a98-8ea0-389aca0192ca@googlegroups.com> In-Reply-To: <34998ea2-6b19-4a98-8ea0-389aca0192ca@googlegroups.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:vhryn8c5prEBTspcZEi+wP29eR6rTCtN76SJ7Ofg1bF 3CEAFPxseIFba8mHdDrc2IwvW5ffcw3jFysb3LyhYro3lGuEcM OYHPU/1nwXgFQGYI+BsIxaVwCfrCrCBD2JIetroBrOZqpIYflr SRBk5ua9CFGa1hGuuwzoeJI+V/9WNDp1Fd8AEwsmUmJScufag6 w8SyJEbU1lIJyTIntMKTVyY2v3LyJxwxzJXf0V6Bb7xFXElaQe Q0VkK+MxalOht9m/26wvwv2BDdJn/3fHVVkAmecyJlZi7dctTz B/SeirquARIe+b+TOrUw6MtF8X+A9a8s60oIlqmhGhJ/eV+eg= = X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 19 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1361462189 news.xs4all.nl 6861 [2001:888:2000:d::a6]:46712 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:39439 On 02/21/2013 07:12 AM, qoresucks@gmail.com wrote: > I only just started Python and given that I know nothing about network programming or internet programming of any kind really, I thought it would be interesting to try write something that could create an archive of a website for myself. Please send your emails as text, not html; this is a text-based mailing list. To archive your website, use the rsync command. No need to write any code, as rsync will descend into all the directories as needed, and it'll get the actual website data, not the stuff that the web server feeds to the browsers. If for some reason you don't have rsync, you could use scp. But it doesn't seem to be able to preserve attributes. It's also not smart enough to only copy stuff that's been changed, when you want to update incrementally. -- DaveA