Groups > comp.lang.python > #88262 > unrolled thread

Strategy/ Advice for How to Best Attack this Problem?

Started by	Saran Ahluwalia <ahlusar.ahluwalia@gmail.com>
First post	2015-03-29 04:32 -0700
Last post	2015-04-03 19:16 -0700
Articles	4 on this page of 24 — 9 participants

Back to article view | Back to comp.lang.python

  Strategy/ Advice for How to Best Attack this Problem? Saran Ahluwalia <ahlusar.ahluwalia@gmail.com> - 2015-03-29 04:32 -0700
    Addendum to Strategy/ Advice for How to Best Attack this Problem? Saran Ahluwalia <ahlusar.ahluwalia@gmail.com> - 2015-03-29 04:37 -0700
      Re: Addendum to Strategy/ Advice for How to Best Attack this Problem? Peter Otten <__peter__@web.de> - 2015-03-29 14:33 +0200
        Re: Addendum to Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-03-29 05:41 -0700
        Re: Addendum to Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-01 07:17 -0700
      Re: Addendum to Strategy/ Advice for How to Best Attack this Problem? Dave Angel <davea@davea.name> - 2015-03-29 09:27 -0400
    Re: Strategy/ Advice for How to Best Attack this Problem? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-03-29 19:00 -0400
    Re: Strategy/ Advice for How to Best Attack this Problem? Paul Rubin <no.email@nospam.invalid> - 2015-03-29 18:08 -0700
      Re: Strategy/ Advice for How to Best Attack this Problem? Chris Angelico <rosuav@gmail.com> - 2015-03-30 13:04 +1100
        Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-03-30 09:45 -0700
          Re: Strategy/ Advice for How to Best Attack this Problem? Dave Angel <davea@davea.name> - 2015-03-30 14:35 -0400
            Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-03-31 04:00 -0700
              Re: Strategy/ Advice for How to Best Attack this Problem? Dave Angel <davea@davea.name> - 2015-03-31 09:19 -0400
                Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-01 06:43 -0700
                  Re: Strategy/ Advice for How to Best Attack this Problem? Dave Angel <davea@davea.name> - 2015-04-01 19:51 -0400
                    Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-02 06:06 -0700
                      Re: Strategy/ Advice for How to Best Attack this Problem? Dave Angel <davea@davea.name> - 2015-04-02 17:10 -0400
                        Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-02 16:43 -0700
                          Re: Strategy/ Advice for How to Best Attack this Problem? Peter Otten <__peter__@web.de> - 2015-04-03 12:45 +0200
                            Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-03 06:40 -0700
                          Re: Strategy/ Advice for How to Best Attack this Problem? Dave Angel <d@davea.name> - 2015-04-03 07:59 -0400
                            Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-03 05:50 -0700
                              Re: Strategy/ Advice for How to Best Attack this Problem? Dave Angel <davea@davea.name> - 2015-04-03 16:21 -0400
                                Re: Strategy/ Advice for How to Best Attack this Problem? Rustom Mody <rustompmody@gmail.com> - 2015-04-03 19:16 -0700

Page 2 of 2 — ← Prev page 1 [2]

#88478

From	Dave Angel <d@davea.name>
Date	2015-04-03 07:59 -0400
Message-ID	<mailman.30.1428062701.12925.python-list@python.org>
In reply to	#88463

On 04/02/2015 07:43 PM, Saran A wrote:
    <snip>
>
> I debugged and rewrote everything. Here is the full version. Feel free to tear this apart. The homework assignment is not due until tomorrow, so I am currently also experimenting with pyinotify as well. I do have questions regarding how to make this function compatible with the ProcessEvent Class. I will create another post for this.
>
> What would you advise in regards to renaming the inaptly named dirlist?

Asked and answered.  You called it path at one point.  dir_name would 
also be good.  The point is that if you have a good name, you're less 
likely to have unreasonable code processing that name.  For example,

>
> # # # Without data to examine here, I can only guess based on this requirement's language that
> # # fixed records are in the input.  If so, here's a slight revision to the helper functions that I wrote earlier which
> # # takes the function fileinfo as a starting point and demonstrates calling a function from within a function.
> # I tested this little sample on a small set of files created with MD5 checksums.  I wrote the Python in such a way as it
> # would work with Python 2.x or 3.x (note the __future__ at the top).
>
> # # # There are so many wonderful ways of failure, so, from a development standpoint, I would probably spend a bit
> # # more time trying to determine which failure(s) I would want to report to the user, and how (perhaps creating my own Exceptions)
>
> # # # The only other comments I would make are about safe-file handling.
>
> # # #   #1:  Question: After a user has created a file that has failed (in
> # # #        processing),can the user create a file with the same name?
> # # #        If so, then you will probably want to look at some sort
> # # #        of file-naming strategy to avoid overwriting evidence of
> # # #        earlier failures.
>
> # # # File naming is a tricky thing.  I referenced the tempfile module [1] and the Maildir naming scheme to see two different
> # # types of solutions to the problem of choosing a unique filename.
>> ## I am assuming that all of my files are going to be specified in unicode
>
> ## Utilized Spyder's Scientific Computing IDE to debug, check for indentation errors and test function suite
>
> from __future__ import print_function
>
> import os.path
> import time
> import logging
>
>
> def initialize_logger(output_dir):
            <snip>

I didn't ever bother to read the body of this function, since a simple

       print(mydata, file=mylog_file)

will suffice to add data to the chosen text file.  Why is it constantly 
getting more complex, to solve a problem that was simple in the beginning?

>
>
> #Returns filename, rootdir and filesize
>
> def fileinfo(f):
>      filename = os.path.basename(f)
>      rootdir = os.path.dirname(f)
>      filesize = os.path.getsize(f)
>      return filename, rootdir, filesize
>
> #returns length of file
> def file_len(f):
>      with open(f) as f:
>          for i, l in enumerate(f):
>              pass
>              return i + 1

Always returns 1 or None.  Check the indentation, and test to see what 
it does for empty file, for a file with one line, and for a file with 
more than one line.

> #attempts to copy file and move file to it's directory
> def copy_and_move_file(src, dest):

Which is it, are you trying to copy it, or move it?  Pick one and make a 
good function name that shows your choice.

>      try:
>          os.rename(src, dest)

Why are you using rename, when you're trying to move the file?  Take a 
closer look at shutil, and see if it has a function that does it safer 
than rename.  The function you need uses rename, when it'll work, and 
does it other ways when rename will not.

>          # eg. src and dest are the same file
>      except IOError as e:
>          print('Error: %s' % e.strerror)

A try/except that doesn't try to correct the problem is not generally 
useful.  Figure out what could be triggering the exception, and how 
you're going to handle it.  If it cannot be handled, terminate the program.

For example, what if you don't have permissions to modify one of the 
specified directories?  You can't get any useful work done, so you 
should notify the user and exit.  An alternative is to produce 50 
thousand reports of each file you've got, telling how it succeeded or 
failed, over and over.

>
> path = "."
> dirlist = os.listdir(path)
>
> def main(dirlist):
>      before = dict([(f, 0) for f in dirlist])

Since dirlist is a path, it's a string.  So you're looping through the 
characters of the name of the path.  I still don't have a clue what the 
dict is supposed to mean here.

>      while True:
>          time.sleep(1) #time between update check

That loop goes forever, so the following code will never run.

>      after = dict([(f, None) for f in dirlist])

Once again, you're looping through the letters of the directory name. 
Or if dirlist is really a list, and you're deciding that's what it 
should be, then of course after will be identical to before.

>      added = [f for f in after if not f in before]
>      if added:
>          f = ''.join(added)
>          print('Sucessfully added %s file - ready to validate') %()
>          return validate_files(f)
>      else:
>          return move_to_failure_folder_and_return_error_file(f)
>
>
>
> def validate_files(f):

You completely changed the scope of this function.  Why does it still 
have an s in its name?

>      creation = time.ctime(os.path.getctime(f))
>      lastmod = time.ctime(os.path.getmtime(f))

What do timestamps have to do with anything?

>      if creation == lastmod and file_len(f) > 0:
>          return move_to_success_folder_and_read(f)

How have you decided that the records are all the same length?

>      if file_len < 0 and creation != lastmod:

file_len cannot be safely compared to 0, it's a function.  If this were 
Python 3, it would give you a runtime error for trying.  Even if you had 
successfully called the function here, how is it possible for it to 
return a negative value?

>          return move_to_success_folder_and_read(f)
>      else:
>          return move_to_failure_folder_and_return_error_file(f)
>
>
> #Potential Additions/Substitutions
>
> def move_to_failure_folder_and_return_error_file():
>      filename, rootdir, lastmod, creation, filesize = fileinfo(file)

fileinfo() returns 3 things, and you're stuffing them in 5 variables.

>      os.mkdir('Failure')

What do you do the second time through this function, when that 
directory is already existing?

>      copy_and_move_file( 'Failure')

The function takes two arguments, neither of which is likely to be that 
string.

>      initialize_logger('rootdir/Failure')
>      logging.error("Either this file is empty or there are no lines")

>
> def move_to_success_folder_and_read():
>      filename, rootdir, lastmod, creation, filesize = fileinfo(file)
>      os.mkdir('Success')
>      copy_and_move_file(rootdir, 'Success') #file name
>      print("Success", file)
>      return file_len(file)
>
>
>
> if __name__ == '__main__':
>     main(dirlist)

So now you're passing a list of filenames to main()?  How then is it 
going to know when new files arrive?  You were originally going to tell 
it a directory name.  That's why I suggested that you call the parameter 
'path' or 'dirname'.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#88480

From	Saran A <ahlusar.ahluwalia@gmail.com>
Date	2015-04-03 05:50 -0700
Message-ID	<cb95baf5-7c53-4d3d-b388-e525b2d645cf@googlegroups.com>
In reply to	#88478

On Friday, April 3, 2015 at 8:05:14 AM UTC-4, Dave Angel wrote:
> On 04/02/2015 07:43 PM, Saran A wrote:
>     <snip>
> >
> > I debugged and rewrote everything. Here is the full version. Feel free to tear this apart. The homework assignment is not due until tomorrow, so I am currently also experimenting with pyinotify as well. I do have questions regarding how to make this function compatible with the ProcessEvent Class. I will create another post for this.
> >
> > What would you advise in regards to renaming the inaptly named dirlist?
> 
> Asked and answered.  You called it path at one point.  dir_name would 
> also be good.  The point is that if you have a good name, you're less 
> likely to have unreasonable code processing that name.  For example,
> 
> >
> > # # # Without data to examine here, I can only guess based on this requirement's language that
> > # # fixed records are in the input.  If so, here's a slight revision to the helper functions that I wrote earlier which
> > # # takes the function fileinfo as a starting point and demonstrates calling a function from within a function.
> > # I tested this little sample on a small set of files created with MD5 checksums.  I wrote the Python in such a way as it
> > # would work with Python 2.x or 3.x (note the __future__ at the top).
> >
> > # # # There are so many wonderful ways of failure, so, from a development standpoint, I would probably spend a bit
> > # # more time trying to determine which failure(s) I would want to report to the user, and how (perhaps creating my own Exceptions)
> >
> > # # # The only other comments I would make are about safe-file handling.
> >
> > # # #   #1:  Question: After a user has created a file that has failed (in
> > # # #        processing),can the user create a file with the same name?
> > # # #        If so, then you will probably want to look at some sort
> > # # #        of file-naming strategy to avoid overwriting evidence of
> > # # #        earlier failures.
> >
> > # # # File naming is a tricky thing.  I referenced the tempfile module [1] and the Maildir naming scheme to see two different
> > # # types of solutions to the problem of choosing a unique filename.
> >> ## I am assuming that all of my files are going to be specified in unicode
> >
> > ## Utilized Spyder's Scientific Computing IDE to debug, check for indentation errors and test function suite
> >
> > from __future__ import print_function
> >
> > import os.path
> > import time
> > import logging
> >
> >
> > def initialize_logger(output_dir):
>             <snip>
> 
> I didn't ever bother to read the body of this function, since a simple
> 
>        print(mydata, file=mylog_file)
> 
> will suffice to add data to the chosen text file.  Why is it constantly 
> getting more complex, to solve a problem that was simple in the beginning?
> 
> >
> >
> > #Returns filename, rootdir and filesize
> >
> > def fileinfo(f):
> >      filename = os.path.basename(f)
> >      rootdir = os.path.dirname(f)
> >      filesize = os.path.getsize(f)
> >      return filename, rootdir, filesize
> >
> > #returns length of file
> > def file_len(f):
> >      with open(f) as f:
> >          for i, l in enumerate(f):
> >              pass
> >              return i + 1
> 
> Always returns 1 or None.  Check the indentation, and test to see what 
> it does for empty file, for a file with one line, and for a file with 
> more than one line.
> 
> 
> > #attempts to copy file and move file to it's directory
> > def copy_and_move_file(src, dest):
> 
> Which is it, are you trying to copy it, or move it?  Pick one and make a 
> good function name that shows your choice.
> 
> >      try:
> >          os.rename(src, dest)
> 
> Why are you using rename, when you're trying to move the file?  Take a 
> closer look at shutil, and see if it has a function that does it safer 
> than rename.  The function you need uses rename, when it'll work, and 
> does it other ways when rename will not.
> 
> >          # eg. src and dest are the same file
> >      except IOError as e:
> >          print('Error: %s' % e.strerror)
> 
> A try/except that doesn't try to correct the problem is not generally 
> useful.  Figure out what could be triggering the exception, and how 
> you're going to handle it.  If it cannot be handled, terminate the program.
> 
> For example, what if you don't have permissions to modify one of the 
> specified directories?  You can't get any useful work done, so you 
> should notify the user and exit.  An alternative is to produce 50 
> thousand reports of each file you've got, telling how it succeeded or 
> failed, over and over.
> 
> >
> > path = "."
> > dirlist = os.listdir(path)
> >
> > def main(dirlist):
> >      before = dict([(f, 0) for f in dirlist])
> 
> Since dirlist is a path, it's a string.  So you're looping through the 
> characters of the name of the path.  I still don't have a clue what the 
> dict is supposed to mean here.
> 
> >      while True:
> >          time.sleep(1) #time between update check
> 
> That loop goes forever, so the following code will never run.
> 
> >      after = dict([(f, None) for f in dirlist])
> 
> Once again, you're looping through the letters of the directory name. 
> Or if dirlist is really a list, and you're deciding that's what it 
> should be, then of course after will be identical to before.
> 
> >      added = [f for f in after if not f in before]
> >      if added:
> >          f = ''.join(added)
> >          print('Sucessfully added %s file - ready to validate') %()
> >          return validate_files(f)
> >      else:
> >          return move_to_failure_folder_and_return_error_file(f)
> >
> >
> >
> > def validate_files(f):
> 
> You completely changed the scope of this function.  Why does it still 
> have an s in its name?
> 
> >      creation = time.ctime(os.path.getctime(f))
> >      lastmod = time.ctime(os.path.getmtime(f))
> 
> What do timestamps have to do with anything?
> 
> >      if creation == lastmod and file_len(f) > 0:
> >          return move_to_success_folder_and_read(f)
> 
> How have you decided that the records are all the same length?
> 
> >      if file_len < 0 and creation != lastmod:
> 
> file_len cannot be safely compared to 0, it's a function.  If this were 
> Python 3, it would give you a runtime error for trying.  Even if you had 
> successfully called the function here, how is it possible for it to 
> return a negative value?
> 
> >          return move_to_success_folder_and_read(f)
> >      else:
> >          return move_to_failure_folder_and_return_error_file(f)
> >
> >
> > #Potential Additions/Substitutions
> >
> > def move_to_failure_folder_and_return_error_file():
> >      filename, rootdir, lastmod, creation, filesize = fileinfo(file)
> 
> fileinfo() returns 3 things, and you're stuffing them in 5 variables.
> 
> >      os.mkdir('Failure')
> 
> What do you do the second time through this function, when that 
> directory is already existing?
> 
> >      copy_and_move_file( 'Failure')
> 
> The function takes two arguments, neither of which is likely to be that 
> string.
> 
> >      initialize_logger('rootdir/Failure')
> >      logging.error("Either this file is empty or there are no lines")
> 
> 
> >
> > def move_to_success_folder_and_read():
> >      filename, rootdir, lastmod, creation, filesize = fileinfo(file)
> >      os.mkdir('Success')
> >      copy_and_move_file(rootdir, 'Success') #file name
> >      print("Success", file)
> >      return file_len(file)
> >
> >
> >
> > if __name__ == '__main__':
> >     main(dirlist)
> 
> So now you're passing a list of filenames to main()?  How then is it 
> going to know when new files arrive?  You were originally going to tell 
> it a directory name.  That's why I suggested that you call the parameter 
> 'path' or 'dirname'.
> 
> 
> 
> -- 
> DaveA

@DaveA

I addressed most of the issues. I do admit that, as a novice, I feel beholden to the computer - hence the over-engineering.

As you correctly stated:

"
> What do you do the second time through this function, when that 
> directory is already existing?
> 
> >      copy_and_move_file( 'Failure')
> 
> The function takes two arguments, neither of which is likely to be that 
> string.
> 
> >      initialize_logger('rootdir/Failure')
> >      logging.error("Either this file is empty or there are no lines")"


How would I ensure that this s directory is made only once and every file that is passeed goes only to 'success' or 'failure'?

[toc] | [prev] | [next] | [standalone]

#88492

From	Dave Angel <davea@davea.name>
Date	2015-04-03 16:21 -0400
Message-ID	<mailman.36.1428092476.12925.python-list@python.org>
In reply to	#88480

On 04/03/2015 08:50 AM, Saran A wrote:
> On Friday, April 3, 2015 at 8:05:14 AM UTC-4, Dave Angel wrote:
>> On 04/02/2015 07:43 PM, Saran A wrote:
>
> I addressed most of the issues. I do admit that, as a novice, I feel beholden to the computer - hence the over-engineering.
>

Should be quite the opposite.  As a novice, you ought to be testing the 
heck out of your functions, worrying about whether they are properly 
named, properly commented, and properly tested.

 >>>       os.mkdir('Success')

> As you correctly stated:
>
> "
>> What do you do the second time through this function, when that
>> directory is already existing?
>>
>>>       copy_and_move_file( 'Failure')
>>
>> The function takes two arguments, neither of which is likely to be that
>> string.
>>
>>>       initialize_logger('rootdir/Failure')
>>>       logging.error("Either this file is empty or there are no lines")"
>
>
> How would I ensure that this s directory is made only once and every file that is passeed goes only to 'success' or 'failure'?
>

Well, you could use an if clause checking with os.exist().  If the 
directory already exists, don't call the mkdir function.  That may not 
be perfect, but it should suffice for an assignment at your level.

Alternatively, you could set a global variable equal to 'Failure' or 
whatever the full path to the directory is going to be, and do a mkdir 
at the beginning of main().   Likewise for success directory, and the 
output text file.  In that case, of course, instead of creating the 
directory, you open the file (for append, of course, so the next run of 
the program doesn't trash the file), and keep the file handle handy.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#88497

From	Rustom Mody <rustompmody@gmail.com>
Date	2015-04-03 19:16 -0700
Message-ID	<421dcb67-5d2d-4666-a3af-b0b5c61f3a74@googlegroups.com>
In reply to	#88492

On Saturday, April 4, 2015 at 1:52:20 AM UTC+5:30, Dave Angel wrote:
> On 04/03/2015 08:50 AM, Saran A wrote:
> > On Friday, April 3, 2015 at 8:05:14 AM UTC-4, Dave Angel wrote:
> >> On 04/02/2015 07:43 PM, Saran A wrote:
> >
> > I addressed most of the issues. I do admit that, as a novice, I feel beholden to the computer - hence the over-engineering.
> >
> 
> Should be quite the opposite.  As a novice, you ought to be testing the 
> heck out of your functions, worrying about whether they are properly 
> named, properly commented, and properly tested.

I think this is a culture problem.

We dont (typically) take python so seriously that we are 'beholden' (!!!) to 
python/computer/python's creator/each other out here/etc.

In other cultures (sometimes called 'bondage-and-discipline' languages
http://c2.com/cgi/wiki?BondageAndDisciplineLanguage
you are expected to write mountains of documentation before writing code.

The best we can make out, you are writing too much code, trying out (also
called hacking) too little.

Remember python is named after Monty Python.

If you hunt youtube for Monty python or other productions of John Cleese
like Fawlty Towers you'll get a picture of the spirit

Not directly related but recently posted on this list
https://www.youtube.com/watch?v=kAG39jKi0lI

In short, if you are not having fun you are not getting python.
If you are having fun, mucking around is not taboo...
Also called using the interpreter interactively

[toc] | [prev] | [standalone]

Page 2 of 2 — ← Prev page 1 [2]

csiph-web

Strategy/ Advice for How to Best Attack this Problem?

Contents

#88478

#88480

#88492

#88497