Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #88262 > unrolled thread
| Started by | Saran Ahluwalia <ahlusar.ahluwalia@gmail.com> |
|---|---|
| First post | 2015-03-29 04:32 -0700 |
| Last post | 2015-04-03 19:16 -0700 |
| Articles | 4 on this page of 24 — 9 participants |
Back to article view | Back to comp.lang.python
Strategy/ Advice for How to Best Attack this Problem? Saran Ahluwalia <ahlusar.ahluwalia@gmail.com> - 2015-03-29 04:32 -0700
Addendum to Strategy/ Advice for How to Best Attack this Problem? Saran Ahluwalia <ahlusar.ahluwalia@gmail.com> - 2015-03-29 04:37 -0700
Re: Addendum to Strategy/ Advice for How to Best Attack this Problem? Peter Otten <__peter__@web.de> - 2015-03-29 14:33 +0200
Re: Addendum to Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-03-29 05:41 -0700
Re: Addendum to Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-01 07:17 -0700
Re: Addendum to Strategy/ Advice for How to Best Attack this Problem? Dave Angel <davea@davea.name> - 2015-03-29 09:27 -0400
Re: Strategy/ Advice for How to Best Attack this Problem? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-03-29 19:00 -0400
Re: Strategy/ Advice for How to Best Attack this Problem? Paul Rubin <no.email@nospam.invalid> - 2015-03-29 18:08 -0700
Re: Strategy/ Advice for How to Best Attack this Problem? Chris Angelico <rosuav@gmail.com> - 2015-03-30 13:04 +1100
Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-03-30 09:45 -0700
Re: Strategy/ Advice for How to Best Attack this Problem? Dave Angel <davea@davea.name> - 2015-03-30 14:35 -0400
Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-03-31 04:00 -0700
Re: Strategy/ Advice for How to Best Attack this Problem? Dave Angel <davea@davea.name> - 2015-03-31 09:19 -0400
Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-01 06:43 -0700
Re: Strategy/ Advice for How to Best Attack this Problem? Dave Angel <davea@davea.name> - 2015-04-01 19:51 -0400
Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-02 06:06 -0700
Re: Strategy/ Advice for How to Best Attack this Problem? Dave Angel <davea@davea.name> - 2015-04-02 17:10 -0400
Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-02 16:43 -0700
Re: Strategy/ Advice for How to Best Attack this Problem? Peter Otten <__peter__@web.de> - 2015-04-03 12:45 +0200
Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-03 06:40 -0700
Re: Strategy/ Advice for How to Best Attack this Problem? Dave Angel <d@davea.name> - 2015-04-03 07:59 -0400
Re: Strategy/ Advice for How to Best Attack this Problem? Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-03 05:50 -0700
Re: Strategy/ Advice for How to Best Attack this Problem? Dave Angel <davea@davea.name> - 2015-04-03 16:21 -0400
Re: Strategy/ Advice for How to Best Attack this Problem? Rustom Mody <rustompmody@gmail.com> - 2015-04-03 19:16 -0700
Page 2 of 2 — ← Prev page 1 [2]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2015-04-03 07:59 -0400 |
| Message-ID | <mailman.30.1428062701.12925.python-list@python.org> |
| In reply to | #88463 |
On 04/02/2015 07:43 PM, Saran A wrote:
<snip>
>
> I debugged and rewrote everything. Here is the full version. Feel free to tear this apart. The homework assignment is not due until tomorrow, so I am currently also experimenting with pyinotify as well. I do have questions regarding how to make this function compatible with the ProcessEvent Class. I will create another post for this.
>
> What would you advise in regards to renaming the inaptly named dirlist?
Asked and answered. You called it path at one point. dir_name would
also be good. The point is that if you have a good name, you're less
likely to have unreasonable code processing that name. For example,
>
> # # # Without data to examine here, I can only guess based on this requirement's language that
> # # fixed records are in the input. If so, here's a slight revision to the helper functions that I wrote earlier which
> # # takes the function fileinfo as a starting point and demonstrates calling a function from within a function.
> # I tested this little sample on a small set of files created with MD5 checksums. I wrote the Python in such a way as it
> # would work with Python 2.x or 3.x (note the __future__ at the top).
>
> # # # There are so many wonderful ways of failure, so, from a development standpoint, I would probably spend a bit
> # # more time trying to determine which failure(s) I would want to report to the user, and how (perhaps creating my own Exceptions)
>
> # # # The only other comments I would make are about safe-file handling.
>
> # # # #1: Question: After a user has created a file that has failed (in
> # # # processing),can the user create a file with the same name?
> # # # If so, then you will probably want to look at some sort
> # # # of file-naming strategy to avoid overwriting evidence of
> # # # earlier failures.
>
> # # # File naming is a tricky thing. I referenced the tempfile module [1] and the Maildir naming scheme to see two different
> # # types of solutions to the problem of choosing a unique filename.
>> ## I am assuming that all of my files are going to be specified in unicode
>
> ## Utilized Spyder's Scientific Computing IDE to debug, check for indentation errors and test function suite
>
> from __future__ import print_function
>
> import os.path
> import time
> import logging
>
>
> def initialize_logger(output_dir):
<snip>
I didn't ever bother to read the body of this function, since a simple
print(mydata, file=mylog_file)
will suffice to add data to the chosen text file. Why is it constantly
getting more complex, to solve a problem that was simple in the beginning?
>
>
> #Returns filename, rootdir and filesize
>
> def fileinfo(f):
> filename = os.path.basename(f)
> rootdir = os.path.dirname(f)
> filesize = os.path.getsize(f)
> return filename, rootdir, filesize
>
> #returns length of file
> def file_len(f):
> with open(f) as f:
> for i, l in enumerate(f):
> pass
> return i + 1
Always returns 1 or None. Check the indentation, and test to see what
it does for empty file, for a file with one line, and for a file with
more than one line.
> #attempts to copy file and move file to it's directory
> def copy_and_move_file(src, dest):
Which is it, are you trying to copy it, or move it? Pick one and make a
good function name that shows your choice.
> try:
> os.rename(src, dest)
Why are you using rename, when you're trying to move the file? Take a
closer look at shutil, and see if it has a function that does it safer
than rename. The function you need uses rename, when it'll work, and
does it other ways when rename will not.
> # eg. src and dest are the same file
> except IOError as e:
> print('Error: %s' % e.strerror)
A try/except that doesn't try to correct the problem is not generally
useful. Figure out what could be triggering the exception, and how
you're going to handle it. If it cannot be handled, terminate the program.
For example, what if you don't have permissions to modify one of the
specified directories? You can't get any useful work done, so you
should notify the user and exit. An alternative is to produce 50
thousand reports of each file you've got, telling how it succeeded or
failed, over and over.
>
> path = "."
> dirlist = os.listdir(path)
>
> def main(dirlist):
> before = dict([(f, 0) for f in dirlist])
Since dirlist is a path, it's a string. So you're looping through the
characters of the name of the path. I still don't have a clue what the
dict is supposed to mean here.
> while True:
> time.sleep(1) #time between update check
That loop goes forever, so the following code will never run.
> after = dict([(f, None) for f in dirlist])
Once again, you're looping through the letters of the directory name.
Or if dirlist is really a list, and you're deciding that's what it
should be, then of course after will be identical to before.
> added = [f for f in after if not f in before]
> if added:
> f = ''.join(added)
> print('Sucessfully added %s file - ready to validate') %()
> return validate_files(f)
> else:
> return move_to_failure_folder_and_return_error_file(f)
>
>
>
> def validate_files(f):
You completely changed the scope of this function. Why does it still
have an s in its name?
> creation = time.ctime(os.path.getctime(f))
> lastmod = time.ctime(os.path.getmtime(f))
What do timestamps have to do with anything?
> if creation == lastmod and file_len(f) > 0:
> return move_to_success_folder_and_read(f)
How have you decided that the records are all the same length?
> if file_len < 0 and creation != lastmod:
file_len cannot be safely compared to 0, it's a function. If this were
Python 3, it would give you a runtime error for trying. Even if you had
successfully called the function here, how is it possible for it to
return a negative value?
> return move_to_success_folder_and_read(f)
> else:
> return move_to_failure_folder_and_return_error_file(f)
>
>
> #Potential Additions/Substitutions
>
> def move_to_failure_folder_and_return_error_file():
> filename, rootdir, lastmod, creation, filesize = fileinfo(file)
fileinfo() returns 3 things, and you're stuffing them in 5 variables.
> os.mkdir('Failure')
What do you do the second time through this function, when that
directory is already existing?
> copy_and_move_file( 'Failure')
The function takes two arguments, neither of which is likely to be that
string.
> initialize_logger('rootdir/Failure')
> logging.error("Either this file is empty or there are no lines")
>
> def move_to_success_folder_and_read():
> filename, rootdir, lastmod, creation, filesize = fileinfo(file)
> os.mkdir('Success')
> copy_and_move_file(rootdir, 'Success') #file name
> print("Success", file)
> return file_len(file)
>
>
>
> if __name__ == '__main__':
> main(dirlist)
So now you're passing a list of filenames to main()? How then is it
going to know when new files arrive? You were originally going to tell
it a directory name. That's why I suggested that you call the parameter
'path' or 'dirname'.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Saran A <ahlusar.ahluwalia@gmail.com> |
|---|---|
| Date | 2015-04-03 05:50 -0700 |
| Message-ID | <cb95baf5-7c53-4d3d-b388-e525b2d645cf@googlegroups.com> |
| In reply to | #88478 |
On Friday, April 3, 2015 at 8:05:14 AM UTC-4, Dave Angel wrote:
> On 04/02/2015 07:43 PM, Saran A wrote:
> <snip>
> >
> > I debugged and rewrote everything. Here is the full version. Feel free to tear this apart. The homework assignment is not due until tomorrow, so I am currently also experimenting with pyinotify as well. I do have questions regarding how to make this function compatible with the ProcessEvent Class. I will create another post for this.
> >
> > What would you advise in regards to renaming the inaptly named dirlist?
>
> Asked and answered. You called it path at one point. dir_name would
> also be good. The point is that if you have a good name, you're less
> likely to have unreasonable code processing that name. For example,
>
> >
> > # # # Without data to examine here, I can only guess based on this requirement's language that
> > # # fixed records are in the input. If so, here's a slight revision to the helper functions that I wrote earlier which
> > # # takes the function fileinfo as a starting point and demonstrates calling a function from within a function.
> > # I tested this little sample on a small set of files created with MD5 checksums. I wrote the Python in such a way as it
> > # would work with Python 2.x or 3.x (note the __future__ at the top).
> >
> > # # # There are so many wonderful ways of failure, so, from a development standpoint, I would probably spend a bit
> > # # more time trying to determine which failure(s) I would want to report to the user, and how (perhaps creating my own Exceptions)
> >
> > # # # The only other comments I would make are about safe-file handling.
> >
> > # # # #1: Question: After a user has created a file that has failed (in
> > # # # processing),can the user create a file with the same name?
> > # # # If so, then you will probably want to look at some sort
> > # # # of file-naming strategy to avoid overwriting evidence of
> > # # # earlier failures.
> >
> > # # # File naming is a tricky thing. I referenced the tempfile module [1] and the Maildir naming scheme to see two different
> > # # types of solutions to the problem of choosing a unique filename.
> >> ## I am assuming that all of my files are going to be specified in unicode
> >
> > ## Utilized Spyder's Scientific Computing IDE to debug, check for indentation errors and test function suite
> >
> > from __future__ import print_function
> >
> > import os.path
> > import time
> > import logging
> >
> >
> > def initialize_logger(output_dir):
> <snip>
>
> I didn't ever bother to read the body of this function, since a simple
>
> print(mydata, file=mylog_file)
>
> will suffice to add data to the chosen text file. Why is it constantly
> getting more complex, to solve a problem that was simple in the beginning?
>
> >
> >
> > #Returns filename, rootdir and filesize
> >
> > def fileinfo(f):
> > filename = os.path.basename(f)
> > rootdir = os.path.dirname(f)
> > filesize = os.path.getsize(f)
> > return filename, rootdir, filesize
> >
> > #returns length of file
> > def file_len(f):
> > with open(f) as f:
> > for i, l in enumerate(f):
> > pass
> > return i + 1
>
> Always returns 1 or None. Check the indentation, and test to see what
> it does for empty file, for a file with one line, and for a file with
> more than one line.
>
>
> > #attempts to copy file and move file to it's directory
> > def copy_and_move_file(src, dest):
>
> Which is it, are you trying to copy it, or move it? Pick one and make a
> good function name that shows your choice.
>
> > try:
> > os.rename(src, dest)
>
> Why are you using rename, when you're trying to move the file? Take a
> closer look at shutil, and see if it has a function that does it safer
> than rename. The function you need uses rename, when it'll work, and
> does it other ways when rename will not.
>
> > # eg. src and dest are the same file
> > except IOError as e:
> > print('Error: %s' % e.strerror)
>
> A try/except that doesn't try to correct the problem is not generally
> useful. Figure out what could be triggering the exception, and how
> you're going to handle it. If it cannot be handled, terminate the program.
>
> For example, what if you don't have permissions to modify one of the
> specified directories? You can't get any useful work done, so you
> should notify the user and exit. An alternative is to produce 50
> thousand reports of each file you've got, telling how it succeeded or
> failed, over and over.
>
> >
> > path = "."
> > dirlist = os.listdir(path)
> >
> > def main(dirlist):
> > before = dict([(f, 0) for f in dirlist])
>
> Since dirlist is a path, it's a string. So you're looping through the
> characters of the name of the path. I still don't have a clue what the
> dict is supposed to mean here.
>
> > while True:
> > time.sleep(1) #time between update check
>
> That loop goes forever, so the following code will never run.
>
> > after = dict([(f, None) for f in dirlist])
>
> Once again, you're looping through the letters of the directory name.
> Or if dirlist is really a list, and you're deciding that's what it
> should be, then of course after will be identical to before.
>
> > added = [f for f in after if not f in before]
> > if added:
> > f = ''.join(added)
> > print('Sucessfully added %s file - ready to validate') %()
> > return validate_files(f)
> > else:
> > return move_to_failure_folder_and_return_error_file(f)
> >
> >
> >
> > def validate_files(f):
>
> You completely changed the scope of this function. Why does it still
> have an s in its name?
>
> > creation = time.ctime(os.path.getctime(f))
> > lastmod = time.ctime(os.path.getmtime(f))
>
> What do timestamps have to do with anything?
>
> > if creation == lastmod and file_len(f) > 0:
> > return move_to_success_folder_and_read(f)
>
> How have you decided that the records are all the same length?
>
> > if file_len < 0 and creation != lastmod:
>
> file_len cannot be safely compared to 0, it's a function. If this were
> Python 3, it would give you a runtime error for trying. Even if you had
> successfully called the function here, how is it possible for it to
> return a negative value?
>
> > return move_to_success_folder_and_read(f)
> > else:
> > return move_to_failure_folder_and_return_error_file(f)
> >
> >
> > #Potential Additions/Substitutions
> >
> > def move_to_failure_folder_and_return_error_file():
> > filename, rootdir, lastmod, creation, filesize = fileinfo(file)
>
> fileinfo() returns 3 things, and you're stuffing them in 5 variables.
>
> > os.mkdir('Failure')
>
> What do you do the second time through this function, when that
> directory is already existing?
>
> > copy_and_move_file( 'Failure')
>
> The function takes two arguments, neither of which is likely to be that
> string.
>
> > initialize_logger('rootdir/Failure')
> > logging.error("Either this file is empty or there are no lines")
>
>
> >
> > def move_to_success_folder_and_read():
> > filename, rootdir, lastmod, creation, filesize = fileinfo(file)
> > os.mkdir('Success')
> > copy_and_move_file(rootdir, 'Success') #file name
> > print("Success", file)
> > return file_len(file)
> >
> >
> >
> > if __name__ == '__main__':
> > main(dirlist)
>
> So now you're passing a list of filenames to main()? How then is it
> going to know when new files arrive? You were originally going to tell
> it a directory name. That's why I suggested that you call the parameter
> 'path' or 'dirname'.
>
>
>
> --
> DaveA
@DaveA
I addressed most of the issues. I do admit that, as a novice, I feel beholden to the computer - hence the over-engineering.
As you correctly stated:
"
> What do you do the second time through this function, when that
> directory is already existing?
>
> > copy_and_move_file( 'Failure')
>
> The function takes two arguments, neither of which is likely to be that
> string.
>
> > initialize_logger('rootdir/Failure')
> > logging.error("Either this file is empty or there are no lines")"
How would I ensure that this s directory is made only once and every file that is passeed goes only to 'success' or 'failure'?
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2015-04-03 16:21 -0400 |
| Message-ID | <mailman.36.1428092476.12925.python-list@python.org> |
| In reply to | #88480 |
On 04/03/2015 08:50 AM, Saran A wrote:
> On Friday, April 3, 2015 at 8:05:14 AM UTC-4, Dave Angel wrote:
>> On 04/02/2015 07:43 PM, Saran A wrote:
>
> I addressed most of the issues. I do admit that, as a novice, I feel beholden to the computer - hence the over-engineering.
>
Should be quite the opposite. As a novice, you ought to be testing the
heck out of your functions, worrying about whether they are properly
named, properly commented, and properly tested.
>>> os.mkdir('Success')
> As you correctly stated:
>
> "
>> What do you do the second time through this function, when that
>> directory is already existing?
>>
>>> copy_and_move_file( 'Failure')
>>
>> The function takes two arguments, neither of which is likely to be that
>> string.
>>
>>> initialize_logger('rootdir/Failure')
>>> logging.error("Either this file is empty or there are no lines")"
>
>
> How would I ensure that this s directory is made only once and every file that is passeed goes only to 'success' or 'failure'?
>
Well, you could use an if clause checking with os.exist(). If the
directory already exists, don't call the mkdir function. That may not
be perfect, but it should suffice for an assignment at your level.
Alternatively, you could set a global variable equal to 'Failure' or
whatever the full path to the directory is going to be, and do a mkdir
at the beginning of main(). Likewise for success directory, and the
output text file. In that case, of course, instead of creating the
directory, you open the file (for append, of course, so the next run of
the program doesn't trash the file), and keep the file handle handy.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2015-04-03 19:16 -0700 |
| Message-ID | <421dcb67-5d2d-4666-a3af-b0b5c61f3a74@googlegroups.com> |
| In reply to | #88492 |
On Saturday, April 4, 2015 at 1:52:20 AM UTC+5:30, Dave Angel wrote: > On 04/03/2015 08:50 AM, Saran A wrote: > > On Friday, April 3, 2015 at 8:05:14 AM UTC-4, Dave Angel wrote: > >> On 04/02/2015 07:43 PM, Saran A wrote: > > > > I addressed most of the issues. I do admit that, as a novice, I feel beholden to the computer - hence the over-engineering. > > > > Should be quite the opposite. As a novice, you ought to be testing the > heck out of your functions, worrying about whether they are properly > named, properly commented, and properly tested. I think this is a culture problem. We dont (typically) take python so seriously that we are 'beholden' (!!!) to python/computer/python's creator/each other out here/etc. In other cultures (sometimes called 'bondage-and-discipline' languages http://c2.com/cgi/wiki?BondageAndDisciplineLanguage you are expected to write mountains of documentation before writing code. The best we can make out, you are writing too much code, trying out (also called hacking) too little. Remember python is named after Monty Python. If you hunt youtube for Monty python or other productions of John Cleese like Fawlty Towers you'll get a picture of the spirit Not directly related but recently posted on this list https://www.youtube.com/watch?v=kAG39jKi0lI In short, if you are not having fun you are not getting python. If you are having fun, mucking around is not taboo... Also called using the interpreter interactively
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.lang.python
csiph-web