Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #6169 > unrolled thread

Code Review

Started byad <adsquaired@gmail.com>
First post2011-05-24 13:10 -0700
Last post2011-05-25 07:26 -0700
Articles 6 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Code Review ad <adsquaired@gmail.com> - 2011-05-24 13:10 -0700
    Re: Code Review Peter Otten <__peter__@web.de> - 2011-05-25 09:22 +0200
    Re: Code Review Chris Torek <nospam@torek.net> - 2011-05-25 07:37 +0000
    Re: Code Review Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2011-05-25 10:06 +0200
      Re: Code Review ad <adsquaired@gmail.com> - 2011-05-25 06:44 -0700
        Re: Code Review Iain King <iainking@gmail.com> - 2011-05-25 07:26 -0700

#6169 — Code Review

Fromad <adsquaired@gmail.com>
Date2011-05-24 13:10 -0700
SubjectCode Review
Message-ID<37ba7b40-3663-4094-b507-696fc598bf48@l26g2000yqm.googlegroups.com>
Hello all,

Please review the code pasted below. I am wondering what other ways
there are of performing the same tasks. This was typed using version
3.2. The script is designed to clean up a directory (FTP, Logs, etc.)
Basically you pass two arguments. The first argument is an number of
days old to delete. The second argument is the directory where the
files and folders should be deleted. I imagine one enhancement would
be to create a function out of some of this.

### BEGIN ###

import os

import time

import shutil

import argparse



CurrentTime = time.time()

epocDay = 86400   # seconds





parser = argparse.ArgumentParser(description = "Delete files and
folders in a directory N days old", add_help=False,
prog='directorycleaner', usage='%(prog)s 7 c:\\temp')

parser.add_argument('days', type=int, help="Numeric value: delete
files and folders older then N days")

parser.add_argument('directory', help="delete files and folders in
this directory")

parser.print_help()

args = parser.parse_args()



dictKeys = (vars(args))



HowManyDays = dictKeys['days']

WhatDirectory = dictKeys['directory']

print (HowManyDays)

print (WhatDirectory)



DaysToDelete = HowManyDays * epocDay





dirExists = os.path.exists(WhatDirectory)



if dirExists == False: print ("The directory is missing")



DirListing = os.listdir(WhatDirectory)



for files in DirListing:

    # Get the absolute path of the file name

    abspath = (os.path.join(WhatDirectory, files))

    # Get the current creation time of the file in epoc format
(midnight 1/1/1970)

    FileCreationTime = (os.path.getctime(abspath))

    # time.ctime converts epoch to a normal date

    #print (time.ctime(CurrentTime))

    # Get the date from seven days ago

    WeekOldFileDate = CurrentTime - DaysToDelete

    #print (CurrentTime)

    #print (FileCreationTime)

    #print (WeekOldFileDate)



    #If the file is older than seve days doe something

    if FileCreationTime < WeekOldFileDate:

        #check if the object is a file

        if os.path.isfile(abspath): os.remove(abspath)

        # It is not a file it is a directory

        elif os.path.isdir(abspath): shutil.rmtree(abspath)



##### END ####

[toc] | [next] | [standalone]


#6198

FromPeter Otten <__peter__@web.de>
Date2011-05-25 09:22 +0200
Message-ID<mailman.2054.1306308167.9059.python-list@python.org>
In reply to#6169
ad wrote:

> Please review the code pasted below. I am wondering what other ways
> there are of performing the same tasks. This was typed using version
> 3.2. The script is designed to clean up a directory (FTP, Logs, etc.)
> Basically you pass two arguments. The first argument is an number of
> days old to delete. The second argument is the directory where the
> files and folders should be deleted. I imagine one enhancement would
> be to create a function out of some of this.

> CurrentTime = time.time()

Read PEP 8 on naming conventions etc., see 
http://python.org/dev/peps/pep-0008/
> 
> epocDay = 86400   # seconds
> 
> 
> 
> 
> 
> parser = argparse.ArgumentParser(description = "Delete files and

What's the purpose of those many empty lines?

> folders in a directory N days old", add_help=False,
> prog='directorycleaner', usage='%(prog)s 7 c:\\temp')
> 
> parser.add_argument('days', type=int, help="Numeric value: delete
> files and folders older then N days")
> 
> parser.add_argument('directory', help="delete files and folders in
> this directory")
> 
> parser.print_help()

What's the idea behind add_help=False and the explicit print_help()?

> dictKeys = (vars(args))
> HowManyDays = dictKeys['days']

This can be simplified to

HowManyDays = args.days

> if dirExists == False: print ("The directory is missing")

if x == False: ...

is normally written as 

if not x: ...

> DirListing = os.listdir(WhatDirectory)
> for files in DirListing:

You might write this as

for filename in os.listdir(WhatDirectory): ...

or even

for filename in os.listdir(args.directory): ...

Personally I would put this stuff in a separate function like

def remove_old_files_and_folders(parent_directory, age_in_seconds):
    ...

>     # time.ctime converts epoch to a normal date
> 
>     #print (time.ctime(CurrentTime))
> 
>     # Get the date from seven days ago
> 
>     WeekOldFileDate = CurrentTime - DaysToDelete
> 
>     #print (CurrentTime)
> 
>     #print (FileCreationTime)
> 
>     #print (WeekOldFileDate)

Don't let out-commented code eclipse the actual code; remove it. If you want 
to preserve it for eternity, have a look at version control systems, e. g. 
Mercurial.

[toc] | [prev] | [next] | [standalone]


#6202

FromChris Torek <nospam@torek.net>
Date2011-05-25 07:37 +0000
Message-ID<iribkf02fhq@news5.newsguy.com>
In reply to#6169
In article
<37ba7b40-3663-4094-b507-696fc598bf48@l26g2000yqm.googlegroups.com>
ad  <adsquaired@gmail.com> wrote:
>Please review the code pasted below. I am wondering what other ways
>there are of performing the same tasks. ...  I imagine one enhancement
>would be to create a function out of some of this.

Indeed -- "recommended styles" include defining a main()
function and calling main at the bottom, e.g., with:

    if __name__ == '__main__':
        main()

or:

    if __name__ == '__main__':
        main(sys.argv)

Something has also double-spaced your code; I will undo that below.

>import os
>import time
>import shutil
>import argparse

So far so good.

Let me start by putting the next parts into main().  I will use
the no-argument format since we can simply let parser.parse_args()
fetch sys.argv the same way you did (we could def main(argv) and
pass argv; this is sometimes useful for testing purposes, but here
it makes little difference one way or another).

def main():
>    CurrentTime = time.time()
>    epocDay = 86400   # seconds

(You could write 24 * 60 * 60, but again this is sort of a six-of-one
half-a-dozen-of-the-other situation.)  What is not clear is why you
are setting these now, rather than later, closer to where they are
going to be used.

Many style guides, including Guido's and pylint's, dislike
UppercaseVariableName and camelCase style names within functions.
(UppercaseNames are reserved to classes.)  I am leaving these alone
for now though.

>    parser = argparse.ArgumentParser(description = "Delete files and
>    folders in a directory N days old", add_help=False,
>    prog='directorycleaner', usage='%(prog)s 7 c:\\temp')

Line wrap has done bad things to this.  Still, there is something
odd about it other than the line wrap:

  - sometimes you write:
        param_name = value
    but sometimes you use:
        param_name=value
    and in one case you even use both:
        usage= [string]

  - you switch back and forth between single and double quoted
    strings, without much reason.

Consistency is not *required* but it is generally a good idea.
(This is also minor, like deciding between 86400 or 24*60*60.
Eventually, though, lots of minor things add up.)

(I have to admit here that I tend to switch back and forth on
quotes too. :-) )

Another "sixes" case occurs here with the backslashes: to get
a literal backslash in a string, you can use "raw strings".  Here
it makes no difference but I will go ahead and do it just for
illustration:

    parser = argparse.ArgumentParser(
        description='Delete files and folders in a directory N days old',
        add_help=False,
        prog='directorycleaner',
        usage=r'%(prog)s 7 c:\temp')

Finally, I am not sure why you do not want to allow a -h / --help 
option (but if you take out the add_help=False, it's probably best
to take out the call to parser.print_help() too), and there is no
need to supply a usage= argument at all -- argparse() will build
one for you.

>    parser.add_argument('days', type=int, help="Numeric value: delete
>    files and folders older then N days")
>    parser.add_argument('directory', help="delete files and folders in
>    this directory")

(Again, line wrap has broken these; the fixes are obvious so I skip
over them here.)

>    parser.print_help()
>    args = parser.parse_args()

(So far so good, although again, you probably want to remove the call
to print_help() if you allow argparse to add a -h / --help option,
at least.)

>    dictKeys = (vars(args))

There is no *need* to do this, although it is documented as allowed.
I prefer to just use args.<field> myself:

>    HowManyDays = dictKeys['days']
>    WhatDirectory = dictKeys['directory']

so this would become:

    HowManyDays = args.days
    WhatDirectory = args.directory

>    print (HowManyDays)
>    print (WhatDirectory)

These are presumably debug statements and should be removed, but
until then, it might be good to prefix the output with what is
being printed (i.e., a debug message).  (I have taken them out
of my copy, for output shown below.)

(In a fancier program, you could use the logging module and
logging.debug().)

>    DaysToDelete = HowManyDays * epocDay

Right before this would be a good place to create epocDay.

>    dirExists = os.path.exists(WhatDirectory)
>    if dirExists == False: print ("The directory is missing")

An explicit "if expr == False" is generally a bad idea -- if an
expression can be "considered boolean" (and the return value of
os.path.exists certainly can), just write "if not expr".

Most style guides suggest putting subsequent statements on new
lines, rather than right after the ":".

Checking that the directory exists seems reasonable enough.  However,
after printing that it does not, you continue on with code that is
going to immediately raise an OSError exception:

>    DirListing = os.listdir(WhatDirectory)

In general, it is better to try to do the operation, and catch the
failure and do something about it at that point, than to test to
see if the operation is going to succeed.  (Among other things,
this avoids a "race" between program 1 that says "does some directory
exist" and program 2 that says "delete the directory".  If program
1 "wins" this race, the directory does exist at the point of the
test, then program 2 deletes it, then program 1 goes on to access
the now-deleted directory ... and crashes.)

I am using a Unix-like system so what I get may not be quite the
same as what you would get on a Windows-like system, but:

    % cd /tmp
    % python foo.py 1 /tmp/nosuchdir
    The directory is missing
    Traceback (most recent call last):
     ...
    OSError: [Errno 2] No such file or directory: '/tmp/nosuchdir'
    %

More significantly, consider this:

    % python foo.py 1 /tmp/foo.py
    Traceback (most recent call last):
     ...
    OSError: [Errno 20] Not a directory: '/tmp/foo.py'
    %

So instead of the three previous lines, consider:

    try:
        DirListing = os.listdir(WhatDirectory)
    except OSError as err:
        sys.exit("can't read %s: %s" % (WhatDirectory, err))

(you will need to "import sys", and I am using an older version of
Python and the "except OSError, err" syntax, but the effect is the
same).  Now the second example results in:

    % python foo.py 1 /tmp/foo.py
    can't read /tmp/foo.py: [Errno 20] Not a directory: '/tmp/foo.py'
    %

>    for files in DirListing:
>        # Get the absolute path of the file name
>        abspath = (os.path.join(WhatDirectory, files))

This is not necessarily an absolute path -- for instance, if the
program is run as:

    python foo.py 7 rel\ative\path

the joined file names (on a Windows-like system) will be things
like "rel\ative\path\file.txt" and so on.  I would suggest shortening
the variable name to just "path".

The outermost set of parentheses are unnecessary, too.

>        # Get the current creation time of the file in epoc format
>        # (midnight 1/1/1970)
>        FileCreationTime = (os.path.getctime(abspath))

Beware: on Unix-like systems, this gets a "time of last change"
rather than a "time of create".  Even if you are sure you will use
Windows you may wish to use the file's mtime (time of last
"modification"; on Unix-like systems, the distinction between a
"modification" and a "change" is that "change" covers alterations
to the file's meta-data as well as the data, e.g., "chmod a-w file",
making it read-only, changes its ctime but not its mtime, while
"echo foo >> file" changes both its ctime and its mtime -- provided
it is not read-only, of course :-) ).

Again, the outermost parentheses here are unnecessary.

>        # time.ctime converts epoch to a normal date
>        #print (time.ctime(CurrentTime))
>        # Get the date from seven days ago
>        WeekOldFileDate = CurrentTime - DaysToDelete
>        #print (CurrentTime)
>        #print (FileCreationTime)
>        #print (WeekOldFileDate)
>
>        #If the file is older than seve days doe something

Apparently, the program has evolved somewhat: originally you had
"seven days" hardcoded, now you have a variable number.  The
comments, however, have not changed -- and the final variable name
is no longer appropriate.

It is probably also time to ditch the commented-out debug print
statements (and fix the comments, including the typo on the last
one above).

>        if FileCreationTime < WeekOldFileDate:
>            #check if the object is a file
>            if os.path.isfile(abspath): os.remove(abspath)
>            # It is not a file it is a directory
>            elif os.path.isdir(abspath): shutil.rmtree(abspath)

Again, the comment and code do not quite agree: the comment says
"if it is not a file it *is* a directory" but the code says "if it
is not a file, check to see if it is a directory", which leaves
open the possibility that it is some other kind of entity (this is
definitely possible on a Unix-like system, where it could be a
socket, symbolic link, or device node, for instance).

In this particular program, written the way it is, there is no
actual benefit (yet) to doing this, but I suggest moving the guts
of the "clean out a directory" process into a function.  What
this allows is the ability to list more than one directory,
provided of course you also change the argument parser a bit.

Having put this all together (and neutered the actual file-or-directory
removing code) gives me the code below.  There are still plenty of
things you could do with it -- for instance, exiting partway through
processing a list of directories if one in the middle does not exist
is perhaps not optimal:

    % python foo.py 7 /tmp/dir1 /tmp/oops /tmp/dir2

(where /tmp/dir1 and /tmp/dir2 do exist, but /tmp/oops does not)
will clean out /tmp/dir1 but then exit without ever processing
/tmp/dir2.  (There are lots of ways to handle this; you would
have to choose one and implement it.)  Or, instead of the kind of
processing done here, you could generalize it into a Unix-like
"find" command.  (Perhaps with a less-ugly syntax. :-) )  The
"find" command can do what this script does:

    find DIRLIST -ctime +N ( -type d -o -type f ) -exec rm -rf {} \;

but can also a great deal more since (a) it has many other options
than just -ctime, and (b) -exec will execute any arbitrary command.

                ---------------------------

import os
import time
import shutil
import argparse
import sys

def main():
    """
    main program: parse arguments, and clean out directories.
    """
    parser = argparse.ArgumentParser(
        description="Delete files and folders in a directory N days old",
        prog="directorycleaner")
    parser.add_argument("days", type=int,
        help="Numeric value: delete files and folders older than N days")
    parser.add_argument("directory", nargs="+",
        help="delete files and folders in this directory")

    args = parser.parse_args()

    for dirname in args.directory:
        clean_dir(dirname, args.days)

def clean_dir(dirname, n_days):
    """
    Clean one directory of files / subdirectories older than
    the given number of days.
    """
    time_to_live = n_days * 86400 # 86400 = seconds-per-day
    current_time = time.time()

    try:
        contents = os.listdir(dirname)
    except OSError, err:
        sys.exit("can't read %s: %s" % (dirname, err))
    for filename in contents:
        # Get the path of the file name
        path = os.path.join(dirname, filename)
        # Get the creation time of the file
        # NOTE: this only works on Windows-like systems
        when_created = os.path.getctime(path)
        # If the file/directory has expired, remove it
        if when_created + time_to_live < current_time:
            if os.path.isfile(path):
                print "os.remove(%s)" % path
            # It is not a file it is a directory
            elif os.path.isdir(path):
                print "shutil.rmtree(%s)" % path

if __name__ == "__main__":
    main()
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html

[toc] | [prev] | [next] | [standalone]


#6207

FromUlrich Eckhardt <ulrich.eckhardt@dominolaser.com>
Date2011-05-25 10:06 +0200
Message-ID<umpua8-s08.ln1@satorlaser.homedns.org>
In reply to#6169
ad wrote:
> Please review the code pasted below. I am wondering what other ways
> there are of performing the same tasks.

On a unix system, you would call "find" with according arguments and then 
handle the found files with "-exec rm ..." or something like that, but I see 
you are on MS Windows.


> args = parser.parse_args()
> 
> dictKeys = (vars(args))

The first of these looks okay, while I don't get the additional brackets in 
the second one. Another habit I observe here is the Hungarian notation of 
prefixing the type to the name and using camelCaps. PEP 8 (IIRC) has 
something to say on the preferred naming. I'm not 100% against encoding the 
type in the variable name in Python, since it lacks static type checking, I 
would have chosen "key_dict" here though, or, due to the small size of the 
overall program just "keys".

> print (HowManyDays)

This puzzled me at first, again the useless additional brackets I thought. 
However, in Python 3, "print" is a function, so that is correct. Still, it 
should be "print(foo)" not "print (foo)".

> for files in DirListing:
> 
>     # Get the absolute path of the file name
>     abspath = (os.path.join(WhatDirectory, files))

"files" is just the name of a single file, right? In that case the name is a 
bit confusing.

>     # Get the date from seven days ago
>     WeekOldFileDate = CurrentTime - DaysToDelete

You are repeating this calculation for every file in the loop.

>     if FileCreationTime < WeekOldFileDate:
>         #check if the object is a file
>         if os.path.isfile(abspath): os.remove(abspath)
>         # It is not a file it is a directory
>         elif os.path.isdir(abspath): shutil.rmtree(abspath)

I'm not sure, but I believe you could use shutil.rmtree() for both files and 
directories. In any case, be prepared for the file still being open or 
otherwise read-only, i.e. for having to handle errors.

Also, what if a directory is old but the content is new? Would this cause 
the non-old content to be deleted?


Cheers!

Uli


-- 
Domino Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

[toc] | [prev] | [next] | [standalone]


#6242

Fromad <adsquaired@gmail.com>
Date2011-05-25 06:44 -0700
Message-ID<072d3db3-82fe-4520-a520-db7ac5312edd@l14g2000pro.googlegroups.com>
In reply to#6207
On May 25, 4:06 am, Ulrich Eckhardt <ulrich.eckha...@dominolaser.com>
wrote:
> ad wrote:
> > Please review the code pasted below. I am wondering what other ways
> > there are of performing the same tasks.
>
> On a unix system, you would call "find" with according arguments and then
> handle the found files with "-exec rm ..." or something like that, but I see
> you are on MS Windows.
>
> > args = parser.parse_args()
>
> > dictKeys = (vars(args))
>
> The first of these looks okay, while I don't get the additional brackets in
> the second one. Another habit I observe here is the Hungarian notation of
> prefixing the type to the name and using camelCaps. PEP 8 (IIRC) has
> something to say on the preferred naming. I'm not 100% against encoding the
> type in the variable name in Python, since it lacks static type checking, I
> would have chosen "key_dict" here though, or, due to the small size of the
> overall program just "keys".
>
> > print (HowManyDays)
>
> This puzzled me at first, again the useless additional brackets I thought.
> However, in Python 3, "print" is a function, so that is correct. Still, it
> should be "print(foo)" not "print (foo)".
>
> > for files in DirListing:
>
> >     # Get the absolute path of the file name
> >     abspath = (os.path.join(WhatDirectory, files))
>
> "files" is just the name of a single file, right? In that case the name is a
> bit confusing.
>
> >     # Get the date from seven days ago
> >     WeekOldFileDate = CurrentTime - DaysToDelete
>
> You are repeating this calculation for every file in the loop.
>
> >     if FileCreationTime < WeekOldFileDate:
> >         #check if the object is a file
> >         if os.path.isfile(abspath): os.remove(abspath)
> >         # It is not a file it is a directory
> >         elif os.path.isdir(abspath): shutil.rmtree(abspath)
>
> I'm not sure, but I believe you could use shutil.rmtree() for both files and
> directories. In any case, be prepared for the file still being open or
> otherwise read-only, i.e. for having to handle errors.
>
> Also, what if a directory is old but the content is new? Would this cause
> the non-old content to be deleted?
>
> Cheers!
>
> Uli
>
> --
> Domino Laser GmbH
> Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

Thank you guys very much for the excellent points. I will use this
information as a reference as I write more code and fix up the
existing script.

Chris, thank you for putting so much time into your post!

Until we type again...........

[toc] | [prev] | [next] | [standalone]


#6245

FromIain King <iainking@gmail.com>
Date2011-05-25 07:26 -0700
Message-ID<d6a0608b-64de-4b5c-a741-b0a740d56ba5@h36g2000pro.googlegroups.com>
In reply to#6242
On May 25, 2:44 pm, ad <adsquai...@gmail.com> wrote:
> On May 25, 4:06 am, Ulrich Eckhardt <ulrich.eckha...@dominolaser.com>
> wrote:
>
>
>
> > ad wrote:
> > > Please review the code pasted below. I am wondering what other ways
> > > there are of performing the same tasks.
>
> > On a unix system, you would call "find" with according arguments and then
> > handle the found files with "-exec rm ..." or something like that, but I see
> > you are on MS Windows.
>
> > > args = parser.parse_args()
>
> > > dictKeys = (vars(args))
>
> > The first of these looks okay, while I don't get the additional brackets in
> > the second one. Another habit I observe here is the Hungarian notation of
> > prefixing the type to the name and using camelCaps. PEP 8 (IIRC) has
> > something to say on the preferred naming. I'm not 100% against encoding the
> > type in the variable name in Python, since it lacks static type checking, I
> > would have chosen "key_dict" here though, or, due to the small size of the
> > overall program just "keys".
>
> > > print (HowManyDays)
>
> > This puzzled me at first, again the useless additional brackets I thought.
> > However, in Python 3, "print" is a function, so that is correct. Still, it
> > should be "print(foo)" not "print (foo)".
>
> > > for files in DirListing:
>
> > >     # Get the absolute path of the file name
> > >     abspath = (os.path.join(WhatDirectory, files))
>
> > "files" is just the name of a single file, right? In that case the name is a
> > bit confusing.
>
> > >     # Get the date from seven days ago
> > >     WeekOldFileDate = CurrentTime - DaysToDelete
>
> > You are repeating this calculation for every file in the loop.
>
> > >     if FileCreationTime < WeekOldFileDate:
> > >         #check if the object is a file
> > >         if os.path.isfile(abspath): os.remove(abspath)
> > >         # It is not a file it is a directory
> > >         elif os.path.isdir(abspath): shutil.rmtree(abspath)
>
> > I'm not sure, but I believe you could use shutil.rmtree() for both files and
> > directories. In any case, be prepared for the file still being open or
> > otherwise read-only, i.e. for having to handle errors.
>
> > Also, what if a directory is old but the content is new? Would this cause
> > the non-old content to be deleted?
>
> > Cheers!
>
> > Uli
>
> > --
> > Domino Laser GmbH
> > Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
>
> Thank you guys very much for the excellent points. I will use this
> information as a reference as I write more code and fix up the
> existing script.
>
> Chris, thank you for putting so much time into your post!
>
> Until we type again...........


Wrote something to do the same basic thing a little while ago.  Less
verbose than yours, and it only does files, not folders.  If I was
going to do folders though, I'd do them by recursing into them and
pruning files, and not go by the folder modify date, which I don't
think changes the way you think it changes (for example, if a file
inside a folder got updated such that it shouln't be deleted it still
will be with your code if the folder modify date is old [this is on
Windows])

import os, glob, time, sys

if len(sys.argv) < 3:
    print "USAGE: %s <PATTERN> <DAYS>" % sys.argv[0]
    sys.exit(1)

pattern = sys.argv[1]
days = int(sys.argv[2])
threshold = days * 24 * 60 * 60
t = time.time()

for f in glob.glob(pattern):
    if t - os.stat(f)[9] > threshold:
        print f
        os.remove(f)

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web