Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #88441 > unrolled thread

New to Programming: TypeError: coercing to Unicode: need string or buffer, list found

Started bySaran A <ahlusar.ahluwalia@gmail.com>
First post2015-04-02 05:02 -0700
Last post2015-04-02 05:51 -0700
Articles 12 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  New to Programming: TypeError: coercing to Unicode: need string or buffer, list found Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-02 05:02 -0700
    Re: New to Programming: TypeError: coercing to Unicode: need string or buffer, list found Chris Angelico <rosuav@gmail.com> - 2015-04-02 23:24 +1100
      Re: New to Programming: TypeError: coercing to Unicode: need string or buffer, list found Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-02 05:46 -0700
        Re: New to Programming: TypeError: coercing to Unicode: need string or buffer, list found Chris Angelico <rosuav@gmail.com> - 2015-04-03 00:06 +1100
          Re: New to Programming: TypeError: coercing to Unicode: need string or buffer, list found Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-02 06:28 -0700
            Re: New to Programming: TypeError: coercing to Unicode: need string or buffer, list found Chris Angelico <rosuav@gmail.com> - 2015-04-03 00:57 +1100
        Re: New to Programming: TypeError: coercing to Unicode: need string or buffer, list found Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-04-02 20:03 -0400
          Re: New to Programming: TypeError: coercing to Unicode: need string or buffer, list found Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-02 17:14 -0700
            Re: New to Programming: TypeError: coercing to Unicode: need string or buffer, list found Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-04-03 11:33 -0400
        Re: New to Programming: TypeError: coercing to Unicode: need string or buffer, list found Chris Angelico <rosuav@gmail.com> - 2015-04-03 11:12 +1100
    Re: New to Programming: TypeError: coercing to Unicode: need string or buffer, list found Peter Otten <__peter__@web.de> - 2015-04-02 14:26 +0200
      Re: New to Programming: TypeError: coercing to Unicode: need string or buffer, list found Saran A <ahlusar.ahluwalia@gmail.com> - 2015-04-02 05:51 -0700

#88441 — New to Programming: TypeError: coercing to Unicode: need string or buffer, list found

FromSaran A <ahlusar.ahluwalia@gmail.com>
Date2015-04-02 05:02 -0700
SubjectNew to Programming: TypeError: coercing to Unicode: need string or buffer, list found
Message-ID<6203299c-f9b2-4169-9d68-4c92e0f7b32f@googlegroups.com>
Good Morning:

I understand this error message when I run this code. However, I am curious to know what the most pythonic way is to convert  the list to a string? I use Python 2.7.

"Traceback (most recent call last):
before = dict([(f, None) for f in os.listdir(dirlist)])
TypeError: coercing to Unicode: need string or buffer, list found"


The sample code that I am trying to run is:

path = "/Users/Desktop/Projects/"
dirlist = os.listdir(path)
before = dict([(f, None) for f in os.listdir(dirlist)])

def main(dirlist):
    while True:
        time.sleep(10) #time between update check
    after = dict([(f, None) for f in os.listdir(dirlist)])
    added = [f for f in after if not f in before]
    if added:
        print('Successfully added new file - ready to validate')
if __name__ == "__main__": 
    main() 



[toc] | [next] | [standalone]


#88442

FromChris Angelico <rosuav@gmail.com>
Date2015-04-02 23:24 +1100
Message-ID<mailman.18.1427977502.12925.python-list@python.org>
In reply to#88441
On Thu, Apr 2, 2015 at 11:02 PM, Saran A <ahlusar.ahluwalia@gmail.com> wrote:
> I understand this error message when I run this code. However, I am curious to know what the most pythonic way is to convert  the list to a string? I use Python 2.7.
>

I don't think you actually want to convert a list into a string, here.
Tell me if I'm understanding your code's intention correctly:

> The sample code that I am trying to run is:
>
> path = "/Users/Desktop/Projects/"
> dirlist = os.listdir(path)
> before = dict([(f, None) for f in os.listdir(dirlist)])

Start up and get a full list of pre-existing files.

> def main(dirlist):
>     while True:
>         time.sleep(10) #time between update check

Then, every ten seconds...

>     after = dict([(f, None) for f in os.listdir(dirlist)])
>     added = [f for f in after if not f in before]

... get a list of files, and if there are new ones...

>     if added:
>         print('Successfully added new file - ready to validate')
> if __name__ == "__main__":
>     main()

... print out a message.

If that's what you're trying to do, I would suggest using a directory
notification system instead. Here's one that I use on Linux:

https://github.com/Rosuav/shed/blob/master/dirwatch.py

Here's another one, this time built for Windows:

https://github.com/Rosuav/shed/blob/master/senddir.py

But even if you absolutely have to poll, like that, you'll need to
make a few code changes. The exception you're getting is symptomatic
of just one problem with the code as published. My suspicion is that
you just want to use listdir(path) rather than listdir(dirlist) - but
if you want subdirectories, then you'll need to do things a bit
differently (probably using os.walk instead).

Also: You say you're using Python 2.7. If you have no particular
reason to use 2.7, you'll do better to jump to Python 3. Your code
will probably run identically, when it's this simple.

ChrisA

[toc] | [prev] | [next] | [standalone]


#88444

FromSaran A <ahlusar.ahluwalia@gmail.com>
Date2015-04-02 05:46 -0700
Message-ID<57693d65-e683-4972-ac8d-97b2feace3bb@googlegroups.com>
In reply to#88442
On Thursday, April 2, 2015 at 8:26:01 AM UTC-4, Chris Angelico wrote:
> On Thu, Apr 2, 2015 at 11:02 PM, Saran A <ahlusar.ahluwalia@gmail.com> wrote:
> > I understand this error message when I run this code. However, I am curious to know what the most pythonic way is to convert  the list to a string? I use Python 2.7.
> >
> 
> I don't think you actually want to convert a list into a string, here.
> Tell me if I'm understanding your code's intention correctly:
> 
> > The sample code that I am trying to run is:
> >
> > path = "/Users/Desktop/Projects/"
> > dirlist = os.listdir(path)
> > before = dict([(f, None) for f in os.listdir(dirlist)])
> 
> Start up and get a full list of pre-existing files.
> 
> > def main(dirlist):
> >     while True:
> >         time.sleep(10) #time between update check
> 
> Then, every ten seconds...
> 
> >     after = dict([(f, None) for f in os.listdir(dirlist)])
> >     added = [f for f in after if not f in before]
> 
> ... get a list of files, and if there are new ones...
> 
> >     if added:
> >         print('Successfully added new file - ready to validate')
> > if __name__ == "__main__":
> >     main()
> 
> ... print out a message.
> 
> If that's what you're trying to do, I would suggest using a directory
> notification system instead. Here's one that I use on Linux:
> 
> https://github.com/Rosuav/shed/blob/master/dirwatch.py
> 
> Here's another one, this time built for Windows:
> 
> https://github.com/Rosuav/shed/blob/master/senddir.py
> 
> But even if you absolutely have to poll, like that, you'll need to
> make a few code changes. The exception you're getting is symptomatic
> of just one problem with the code as published. My suspicion is that
> you just want to use listdir(path) rather than listdir(dirlist) - but
> if you want subdirectories, then you'll need to do things a bit
> differently (probably using os.walk instead).
> 
> Also: You say you're using Python 2.7. If you have no particular
> reason to use 2.7, you'll do better to jump to Python 3. Your code
> will probably run identically, when it's this simple.
> 
> ChrisA

@ChrisA - this is a smaller function that will take the most updated file. My intention is the following:

* Monitor a folder for files that are dropped throughout the day

* When a file is dropped in the folder the program should scan the file

o IF all the contents in the file have the same length (let's assume line length)

o THEN the file should be moved to a "success" folder and a text file written indicating the total number of records/lines/words processed

o IF the file is empty OR the contents are not all of the same length

o THEN the file should be moved to a "failure" folder and a text file written indicating the cause for failure (for example: Empty file or line 100 was not the same length as the rest).

Here is the code I have written:

import os
import time
import glob
import sys

def initialize_logger(output_dir):
    logger = logging.getLogger()
    logger.setLevel(logging.DEBUG)
     
    # create console handler and set level to info
    handler = logging.StreamHandler()
    handler.setLevel(logging.INFO)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)
 
    # create error file handler and set level to error
    handler = logging.FileHandler(os.path.join(output_dir, "error.log"),"w", encoding=None, delay="true")
    handler.setLevel(logging.ERROR)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

    # create debug file handler and set level to debug
    handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w")
    handler.setLevel(logging.DEBUG)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

#Helper Functions for the Success and Failure Folder Outcomes, respectively

#checks the length of the file
    def file_len(filename
        with open(filename) as f:
            for i, l in enumerate(f):
                pass
            return i + 1

#copies file to new destination

    def copyFile(src, dest):
        try:
            shutil.copy(src, dest)
        # eg. src and dest are the same file
        except shutil.Error as e:
            print('Error: %s' % e)
        # eg. source or destination doesn't exist
        except IOError as e:
            print('Error: %s' % e.strerror)

#Failure Folder

def move_to_failure_folder_and_return_error_file():
    os.mkdir('Failure')
    copyFile(filename, 'Failure')
    initialize_logger('rootdir/Failure')
    logging.error("Either this file is empty or the lines")
     
# Success Folder Requirement
             
def move_to_success_folder_and_read(file):
    os.mkdir('Success')
    copyFile(filename, 'Success')
    print("Success", file)
    return file_len()


#This simply checks the file information by name

def fileinfo(file):
    filename = os.path.basename(file)
    rootdir = os.path.dirname(file)
    lastmod = time.ctime(os.path.getmtime(file))
    creation = time.ctime(os.path.getctime(file))
    filesize = os.path.getsize(file)
    return filename, rootdir, lastmod, creation, filesize

if __name__ == '__main__':
   import sys
   validate_files(sys.argv[1:])

[toc] | [prev] | [next] | [standalone]


#88447

FromChris Angelico <rosuav@gmail.com>
Date2015-04-03 00:06 +1100
Message-ID<mailman.21.1427979994.12925.python-list@python.org>
In reply to#88444
On Thu, Apr 2, 2015 at 11:46 PM, Saran A <ahlusar.ahluwalia@gmail.com> wrote:
> @ChrisA - this is a smaller function that will take the most updated file. My intention is the following:
>
> * Monitor a folder for files that are dropped throughout the day
>
> * When a file is dropped in the folder the program should scan the file
>
> o IF all the contents in the file have the same length (let's assume line length)
>
> o THEN the file should be moved to a "success" folder and a text file written indicating the total number of records/lines/words processed
>
> o IF the file is empty OR the contents are not all of the same length
>
> o THEN the file should be moved to a "failure" folder and a text file written indicating the cause for failure (for example: Empty file or line 100 was not the same length as the rest).
>

Sounds like a perfect job for inotify, then. Your function will be
called whenever there's a new file.

> Here is the code I have written:
>
> def initialize_logger(output_dir):
>     logger = logging.getLogger()
>     ...
>     def file_len(filename
>         with open(filename) as f:
>             for i, l in enumerate(f):
>                 pass
>             return i + 1

These functions are all getting defined inside your
initialize_logger() function. I suspect you want them to be flush left
instead.

>     def copyFile(src, dest):
>         try:
>             shutil.copy(src, dest)
>         # eg. src and dest are the same file
>         except shutil.Error as e:
>             print('Error: %s' % e)
>         # eg. source or destination doesn't exist
>         except IOError as e:
>             print('Error: %s' % e.strerror)

Recommendation: Skip the try/except, and just let exceptions bubble
up. Don't just print out messages and keep going.

> def move_to_failure_folder_and_return_error_file():
>     os.mkdir('Failure')
>     copyFile(filename, 'Failure')
>     initialize_logger('rootdir/Failure')
>     logging.error("Either this file is empty or the lines")

This doesn't move the file, it copies it. Is that your intention?

Moving a file is pretty easy. Just use os.rename().

> if __name__ == '__main__':
>    import sys
>    validate_files(sys.argv[1:])

I've no idea what validate_files() does, as you haven't included that.

I think you could code this fairly efficiently as a simple callback
off pyinotify, or if you're not on Linux, with one of the equivalent
services. What you're doing here (watching for files, looking inside
them, and moving them when done) is pretty common around the world.

ChrisA

[toc] | [prev] | [next] | [standalone]


#88456

FromSaran A <ahlusar.ahluwalia@gmail.com>
Date2015-04-02 06:28 -0700
Message-ID<33906269-3688-48fc-8315-e88eac644ace@googlegroups.com>
In reply to#88447
On Thursday, April 2, 2015 at 9:06:49 AM UTC-4, Chris Angelico wrote:
> On Thu, Apr 2, 2015 at 11:46 PM, Saran A <ahlusar.ahluwalia@gmail.com> wrote:
> > @ChrisA - this is a smaller function that will take the most updated file. My intention is the following:
> >
> > * Monitor a folder for files that are dropped throughout the day
> >
> > * When a file is dropped in the folder the program should scan the file
> >
> > o IF all the contents in the file have the same length (let's assume line length)
> >
> > o THEN the file should be moved to a "success" folder and a text file written indicating the total number of records/lines/words processed
> >
> > o IF the file is empty OR the contents are not all of the same length
> >
> > o THEN the file should be moved to a "failure" folder and a text file written indicating the cause for failure (for example: Empty file or line 100 was not the same length as the rest).
> >
> 
> Sounds like a perfect job for inotify, then. Your function will be
> called whenever there's a new file.
> 
> > Here is the code I have written:
> >
> > def initialize_logger(output_dir):
> >     logger = logging.getLogger()
> >     ...
> >     def file_len(filename
> >         with open(filename) as f:
> >             for i, l in enumerate(f):
> >                 pass
> >             return i + 1
> 
> These functions are all getting defined inside your
> initialize_logger() function. I suspect you want them to be flush left
> instead.
> 
> >     def copyFile(src, dest):
> >         try:
> >             shutil.copy(src, dest)
> >         # eg. src and dest are the same file
> >         except shutil.Error as e:
> >             print('Error: %s' % e)
> >         # eg. source or destination doesn't exist
> >         except IOError as e:
> >             print('Error: %s' % e.strerror)
> 
> Recommendation: Skip the try/except, and just let exceptions bubble
> up. Don't just print out messages and keep going.
> 
> > def move_to_failure_folder_and_return_error_file():
> >     os.mkdir('Failure')
> >     copyFile(filename, 'Failure')
> >     initialize_logger('rootdir/Failure')
> >     logging.error("Either this file is empty or the lines")
> 
> This doesn't move the file, it copies it. Is that your intention?
> 
> Moving a file is pretty easy. Just use os.rename().
> 
> > if __name__ == '__main__':
> >    import sys
> >    validate_files(sys.argv[1:])
> 
> I've no idea what validate_files() does, as you haven't included that.
> 
> I think you could code this fairly efficiently as a simple callback
> off pyinotify, or if you're not on Linux, with one of the equivalent
> services. What you're doing here (watching for files, looking inside
> them, and moving them when done) is pretty common around the world.
> 
> ChrisA

@ChrisA

validate_files will:

#double check for record time and record length - logic to be written to either pass to Failure or Success folder respectively. I welcome your thoughts on this.  

def validate_files(): 
    creation = time.ctime(os.path.getctime(added)) 
    lastmod = time.ctime(os.path.getmtime(added)) 


Does this address the issue. I particularly like writing my own exceptions as they provide me with more information on what could be the root cause. I know that in other circumstances, try and except are not the best practice. I appreciate the reminder though.  

Does this modification to copyFile do the job of moving the file? I haven't written a test yet. 

Thanks for catching the indentation for the helper functions. 
   
def copyFile(src, dest):
> >         try:
> >             shutil.rename(src, dest)
> >         # eg. src and dest are the same file
> >         except shutil.Error as e:
> >             print('Error: %s' % e)
> >         # eg. source or destination doesn't exist
> >         except IOError as e:
> >             print('Error: %s' % e.strerror)

[toc] | [prev] | [next] | [standalone]


#88457

FromChris Angelico <rosuav@gmail.com>
Date2015-04-03 00:57 +1100
Message-ID<mailman.22.1427983064.12925.python-list@python.org>
In reply to#88456
On Fri, Apr 3, 2015 at 12:28 AM, Saran A <ahlusar.ahluwalia@gmail.com> wrote:
> Does this modification to copyFile do the job of moving the file? I haven't written a test yet.
>
> Thanks for catching the indentation for the helper functions.
>
> def copyFile(src, dest):
>> >         try:
>> >             shutil.rename(src, dest)
>> >         # eg. src and dest are the same file
>> >         except shutil.Error as e:
>> >             print('Error: %s' % e)
>> >         # eg. source or destination doesn't exist
>> >         except IOError as e:
>> >             print('Error: %s' % e.strerror)

You shouldn't need shutil here; just os.rename(src, dest) should do
the trick. But be careful! Now you have a function which moves a file,
and it's called "copyFile". If its purpose changes, so should its
name.

Have fun!

ChrisA

[toc] | [prev] | [next] | [standalone]


#88464

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2015-04-02 20:03 -0400
Message-ID<mailman.24.1428019417.12925.python-list@python.org>
In reply to#88444
On Thu, 2 Apr 2015 05:46:57 -0700 (PDT), Saran A
<ahlusar.ahluwalia@gmail.com> declaimed the following:

>
>@ChrisA - this is a smaller function that will take the most updated file. My intention is the following:
>
>* Monitor a folder for files that are dropped throughout the day
>
	I would suggest that your first prototype is to be a program that
contains a function whose only purpose is to report on the files it finds
-- forget about all the processing/moving of the files until you can
successfully loop around the work of fetching the directory and handling
the file names found (by maybe printing the names of the ones determined to
be new since last fetch).

>* When a file is dropped in the folder the program should scan the file
>
>o IF all the contents in the file have the same length (let's assume line length)
>
>o THEN the file should be moved to a "success" folder and a text file written indicating the total number of records/lines/words processed
>
>o IF the file is empty OR the contents are not all of the same length
>
>o THEN the file should be moved to a "failure" folder and a text file written indicating the cause for failure (for example: Empty file or line 100 was not the same length as the rest).
>
	You still haven't defined how you determine the "correct length" of the
record. What if the first line is 79 characters, and all the others are 80
characters? Do you report ALL lines EXCEPT the first as being the wrong
length, when really it is the first line that is wrong?

	Also, if the files are Unicode (UTF-8, in particular) -- the byte
length of a line could differ but the character length could be the same.

>Here is the code I have written:
>
>import os
>import time
>import glob
>import sys
>
>def initialize_logger(output_dir):
>    logger = logging.getLogger()
>    logger.setLevel(logging.DEBUG)
>     
>    # create console handler and set level to info
>    handler = logging.StreamHandler()
>    handler.setLevel(logging.INFO)
>    formatter = logging.Formatter("%(levelname)s - %(message)s")
>    handler.setFormatter(formatter)
>    logger.addHandler(handler)
> 
>    # create error file handler and set level to error
>    handler = logging.FileHandler(os.path.join(output_dir, "error.log"),"w", encoding=None, delay="true")
>    handler.setLevel(logging.ERROR)
>    formatter = logging.Formatter("%(levelname)s - %(message)s")
>    handler.setFormatter(formatter)
>    logger.addHandler(handler)
>
>    # create debug file handler and set level to debug
>    handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w")
>    handler.setLevel(logging.DEBUG)
>    formatter = logging.Formatter("%(levelname)s - %(message)s")
>    handler.setFormatter(formatter)
>    logger.addHandler(handler)
>
>#Helper Functions for the Success and Failure Folder Outcomes, respectively
>
>#checks the length of the file
>    def file_len(filename
>        with open(filename) as f:
>            for i, l in enumerate(f):
>                pass
>            return i + 1
>
>#copies file to new destination
>
>    def copyFile(src, dest):
>        try:
>            shutil.copy(src, dest)
>        # eg. src and dest are the same file
>        except shutil.Error as e:
>            print('Error: %s' % e)
>        # eg. source or destination doesn't exist
>        except IOError as e:
>            print('Error: %s' % e.strerror)
>
>#Failure Folder
>
>def move_to_failure_folder_and_return_error_file():
>    os.mkdir('Failure')
>    copyFile(filename, 'Failure')
>    initialize_logger('rootdir/Failure')
>    logging.error("Either this file is empty or the lines")
>     
># Success Folder Requirement
>             
>def move_to_success_folder_and_read(file):
>    os.mkdir('Success')
>    copyFile(filename, 'Success')
>    print("Success", file)
>    return file_len()
>
>
>#This simply checks the file information by name
>
>def fileinfo(file):
>    filename = os.path.basename(file)
>    rootdir = os.path.dirname(file)
>    lastmod = time.ctime(os.path.getmtime(file))
>    creation = time.ctime(os.path.getctime(file))
>    filesize = os.path.getsize(file)
>    return filename, rootdir, lastmod, creation, filesize
>
>if __name__ == '__main__':
>   import sys
>   validate_files(sys.argv[1:])

	Yeesh... Did you even try running that?

	validate_files		is not defined
	file_len				is at the wrong indentation
						is syntactically garbage
						is a big time-waste (you read the file just to
enumerate the number of lines? Why didn't you count the lines while
checking the line lengths)
	copyFile			is at the wrong indentation
						(after a bunch of word_word, why camelCase here)

	Correct all the edit errors and copy/paste the actual file that at
least attempts to run.

	You might also want to look at os.stat, rather than using three os.path
calls.
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#88466

FromSaran A <ahlusar.ahluwalia@gmail.com>
Date2015-04-02 17:14 -0700
Message-ID<81391d9c-74a0-429a-82e6-d874057b8f9c@googlegroups.com>
In reply to#88464
On Thursday, April 2, 2015 at 8:03:53 PM UTC-4, Dennis Lee Bieber wrote:
> On Thu, 2 Apr 2015 05:46:57 -0700 (PDT), Saran A
> <ahlusar.ahluwalia@gmail.com> declaimed the following:
> 
> >
> >@ChrisA - this is a smaller function that will take the most updated file. My intention is the following:
> >
> >* Monitor a folder for files that are dropped throughout the day
> >
> 	I would suggest that your first prototype is to be a program that
> contains a function whose only purpose is to report on the files it finds
> -- forget about all the processing/moving of the files until you can
> successfully loop around the work of fetching the directory and handling
> the file names found (by maybe printing the names of the ones determined to
> be new since last fetch).
> 
> >* When a file is dropped in the folder the program should scan the file
> >
> >o IF all the contents in the file have the same length (let's assume line length)
> >
> >o THEN the file should be moved to a "success" folder and a text file written indicating the total number of records/lines/words processed
> >
> >o IF the file is empty OR the contents are not all of the same length
> >
> >o THEN the file should be moved to a "failure" folder and a text file written indicating the cause for failure (for example: Empty file or line 100 was not the same length as the rest).
> >
> 	You still haven't defined how you determine the "correct length" of the
> record. What if the first line is 79 characters, and all the others are 80
> characters? Do you report ALL lines EXCEPT the first as being the wrong
> length, when really it is the first line that is wrong?
> 
> 	Also, if the files are Unicode (UTF-8, in particular) -- the byte
> length of a line could differ but the character length could be the same.
> 
> >Here is the code I have written:
> >
> >import os
> >import time
> >import glob
> >import sys
> >
> >def initialize_logger(output_dir):
> >    logger = logging.getLogger()
> >    logger.setLevel(logging.DEBUG)
> >     
> >    # create console handler and set level to info
> >    handler = logging.StreamHandler()
> >    handler.setLevel(logging.INFO)
> >    formatter = logging.Formatter("%(levelname)s - %(message)s")
> >    handler.setFormatter(formatter)
> >    logger.addHandler(handler)
> > 
> >    # create error file handler and set level to error
> >    handler = logging.FileHandler(os.path.join(output_dir, "error.log"),"w", encoding=None, delay="true")
> >    handler.setLevel(logging.ERROR)
> >    formatter = logging.Formatter("%(levelname)s - %(message)s")
> >    handler.setFormatter(formatter)
> >    logger.addHandler(handler)
> >
> >    # create debug file handler and set level to debug
> >    handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w")
> >    handler.setLevel(logging.DEBUG)
> >    formatter = logging.Formatter("%(levelname)s - %(message)s")
> >    handler.setFormatter(formatter)
> >    logger.addHandler(handler)
> >
> >#Helper Functions for the Success and Failure Folder Outcomes, respectively
> >
> >#checks the length of the file
> >    def file_len(filename
> >        with open(filename) as f:
> >            for i, l in enumerate(f):
> >                pass
> >            return i + 1
> >
> >#copies file to new destination
> >
> >    def copyFile(src, dest):
> >        try:
> >            shutil.copy(src, dest)
> >        # eg. src and dest are the same file
> >        except shutil.Error as e:
> >            print('Error: %s' % e)
> >        # eg. source or destination doesn't exist
> >        except IOError as e:
> >            print('Error: %s' % e.strerror)
> >
> >#Failure Folder
> >
> >def move_to_failure_folder_and_return_error_file():
> >    os.mkdir('Failure')
> >    copyFile(filename, 'Failure')
> >    initialize_logger('rootdir/Failure')
> >    logging.error("Either this file is empty or the lines")
> >     
> ># Success Folder Requirement
> >             
> >def move_to_success_folder_and_read(file):
> >    os.mkdir('Success')
> >    copyFile(filename, 'Success')
> >    print("Success", file)
> >    return file_len()
> >
> >
> >#This simply checks the file information by name
> >
> >def fileinfo(file):
> >    filename = os.path.basename(file)
> >    rootdir = os.path.dirname(file)
> >    lastmod = time.ctime(os.path.getmtime(file))
> >    creation = time.ctime(os.path.getctime(file))
> >    filesize = os.path.getsize(file)
> >    return filename, rootdir, lastmod, creation, filesize
> >
> >if __name__ == '__main__':
> >   import sys
> >   validate_files(sys.argv[1:])
> 
> 	Yeesh... Did you even try running that?
> 
> 	validate_files		is not defined
> 	file_len				is at the wrong indentation
> 						is syntactically garbage
> 						is a big time-waste (you read the file just to
> enumerate the number of lines? Why didn't you count the lines while
> checking the line lengths)
> 	copyFile			is at the wrong indentation
> 						(after a bunch of word_word, why camelCase here)
> 
> 	Correct all the edit errors and copy/paste the actual file that at
> least attempts to run.
> 
> 	You might also want to look at os.stat, rather than using three os.path
> calls.
> -- 
> 	Wulfraed                 Dennis Lee Bieber         AF6VN
>     wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

@Dennis:

Below is my full program (so far). Please feel free to tear it apart and provide me with constructive criticism. I have been programming for 8 months now and this is a huge learning experience for me. Feedback and modifications is very welcome. 

What would be a better name for dirlist?

# # # Without data to examine here, I can only guess based on this requirement's language that 
# # fixed records are in the input.

##I made the assumption that the directories are in the same filesystem

# # Takes the function fileinfo as a starting point and demonstrates calling a function from within a function.  
# I tested this little sample on a small set of files created with MD5 checksums.  I wrote the Python in such a way as it 
# would work with Python 2.x or 3.x (note the __future__ at the top).

# # # There are so many wonderful ways of failure, so, from a development standpoint, I would probably spend a bit 
# # more time trying to determine which failure(s) I would want to report to the user, and how (perhaps creating my own Exceptions)

# # # The only other comments I would make are about safe-file handling.

# # #   #1:  Question: After a user has created a file that has failed (in
# # #        processing),can the user create a file with the same name?
# # #        If so, then you will probably want to look at some sort
# # #        of file-naming strategy to avoid overwriting evidence of
# # #        earlier failures.

# # # File naming is a tricky thing.  I referenced the tempfile module [1] and the Maildir naming scheme to see two different 
# # types of solutions to the problem of choosing a unique filename.

## I am assuming that all of my files are going to be specified in unicode  

## Utilized Spyder's Scientific Computing IDE to debug, check for indentation errors and test function suite

from __future__ import print_function

import os.path
import time
import difflib
import logging

def initialize_logger(output_dir):
    logger = logging.getLogger()
    logger.setLevel(logging.DEBUG)
     
    # create console handler and set level to info
    handler = logging.StreamHandler()
    handler.setLevel(logging.INFO)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)
 
    # create error file handler and set level to error
    handler = logging.FileHandler(os.path.join(output_dir, "error.log"),"w", encoding=None, delay="true")
    handler.setLevel(logging.ERROR)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

    # create debug file handler and set level to debug
    handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w")
    handler.setLevel(logging.DEBUG)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)


#This function's purpose is to obtain the filename, rootdir and filesize 

def fileinfo(f):
    filename = os.path.basename(f)
    rootdir = os.path.dirname(f)  
    filesize = os.path.getsize(f)
    return filename, rootdir, filesize

#This helper function returns the length of the file
def file_len(f):
    with open(f) as f:
        for i, l in enumerate(f):
            pass
            return i + 1

#This helper function attempts to copy file and move file to the respective directory
#I am assuming that the directories are in the same filesystem

# If directories ARE in different file systems, I would use the following helper function:

# def move(src, dest): 
#     shutil.move(src, dest)

def copy_and_move_file(src, dest):
    try:
        os.rename(src, dest)
        # eg. src and dest are the same file
    except IOError as e:
        print('Error: %s' % e.strerror)


path = "."
dirlist = os.listdir(path)


# Caveats of the "main" function is that it does not scale well 
#(although it is appropriate if one assumes that there will be few changes)

# It does not account for updated files existing in the directory - only new files "dropped" in
# (If this was included in the requirements, os.stat would be appropriate here)

 
def main(dirlist):   
    before = dict([(f, 0) for f in dirlist])
    while True:
        time.sleep(1) #time between update check
    after = dict([(f, None) for f in dirlist])
    added = [f for f in after if not f in before]
    if added:
        f = ''.join(added)
        print('Sucessfully added %s file - ready to validate') %(f)
        return validate_files(f)
    else:
        return move_to_failure_folder_and_return_error_file(f)


    
def validate_files(f):
    creation = time.ctime(os.path.getctime(f))
    lastmod = time.ctime(os.path.getmtime(f))
    if creation == lastmod and file_len(f) > 0:
        return move_to_success_folder_and_read(f)
    if file_len < 0 and creation != lastmod:
        return move_to_success_folder_and_read(f)
    else:
        return move_to_failure_folder_and_return_error_file(f)


# Failure/Success Folder Functions

def move_to_failure_folder_and_return_error_file():
    filename, rootdir, lastmod, creation, filesize = fileinfo(file)  
    os.mkdir('Failure')
    copy_and_move_file( 'Failure')
    initialize_logger('rootdir/Failure')
    logging.error("Either this file is empty or there are no lines")
     
             
def move_to_success_folder_and_read():
    filename, rootdir, lastmod, creation, filesize = fileinfo(file)  
    os.mkdir('Success')
    copy_and_move_file(rootdir, 'Success') #file name
    print("Success", file)
    return file_len(file)



if __name__ == '__main__':
   main(dirlist) 

[toc] | [prev] | [next] | [standalone]


#88487

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2015-04-03 11:33 -0400
Message-ID<mailman.34.1428075217.12925.python-list@python.org>
In reply to#88466
On Thu, 2 Apr 2015 17:14:30 -0700 (PDT), Saran A
<ahlusar.ahluwalia@gmail.com> declaimed the following:

>On Thursday, April 2, 2015 at 8:03:53 PM UTC-4, Dennis Lee Bieber wrote:

>> 
>> 	Yeesh... Did you even try running that?
>> 
>> 	validate_files		is not defined
>> 	file_len				is at the wrong indentation
>> 						is syntactically garbage
>> 						is a big time-waste (you read the file just to
>> enumerate the number of lines? Why didn't you count the lines while
>> checking the line lengths)
>> 	copyFile			is at the wrong indentation
>> 						(after a bunch of word_word, why camelCase here)
>> 
>> 	Correct all the edit errors and copy/paste the actual file that at
>> least attempts to run.
>> 
>> 	You might also want to look at os.stat, rather than using three os.path
>> calls.
>> -- 
>> 	Wulfraed                 Dennis Lee Bieber         AF6VN
>>     wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/
>
>@Dennis:
>
>Below is my full program (so far). Please feel free to tear it apart and provide me with constructive criticism. I have been programming for 8 months now and this is a huge learning experience for me. Feedback and modifications is very welcome. 
>

	You still seem to be adding more and more to this program without ever
showing that you've tried to run it OR fixing the bugs that have been
pointed out to you in multiple responses.

	The following is more help than should be given for a homework
assignment:

-=-=-=-=-=-=-
"""
    PSEUDO-CODE -- DO NOT ATTEMPT TO RUN THIS AS-IS
"""

import sys


LOOPTIME = 1234     #TBD optimize between too fast and too slow

def validateFile(fid):
    """
        Validates the contents of the provided file

        Currently a simplistic test for line length
        differing from that of the first line in the file

        Returns good (True) and number of lines encountered, or
        not good (False) and an error string reporting the
        first line at which the length differed
    """
    fin = open(fid, "r")    #assuming a text file
    good = True
    i = None
    for i, l in enumerate(fin):
        if i == 0:
            nominal = len(l)    #or whatever defines "good"
        if len(l) != nominal:
            good = False
            i = "Mismatch length at line %s" % i
            break
    fin.close()
    if i is None:
        good = False
        i = "Empty file"
    return good, i  #i is error message OR number of lines

def mainLoop(spoolDir, successDir, failureDir):
    """
        Main processing loop for the monitor task

        Obtains a pending list of the current contents of the
        spool directory, then enters processing loop.

        In the loop, it delays for <looptime>, to allow
        for any active files in the pending list to
        be completely written. It then:
            Obtains the current list of the directory contents
            Removes from the current list any name found in the
                pending list
            Loops over the names in the pending list and,
                for each that is NOT a directory, passes it
                to the validation function.
            Based on the result of the validation function it
                then moves the file from the spool directory to
                the appropriate success or failure directory
            Finally, it moves the current list into the pending list
                to prepare for the next cycle of the loop
    """
    pending = listdir(spoolDir)
    while True:
        sleep(LOOPTIME)
        current = [ fid for fid in listdir(spoolDir)
                    if fid not in pending ]
        for fid in pending:
            fpath = join(spoolDir, fid)
            if isfile(fpath):
                v, s = validateFile(fpath)
                if v:
                    move(fpath, join(successDir, fid))
                    sys.stdout.write("Processed %s with %s records\n"
                                     % (fpath, s))
                else:
                    move(fpath, join(failureDir, fid))
                    sys.stdout.write("Rejected %s for %s\n"
                                     % (fpath, s))
        pending = current

def checkDir(dir, create=False):
    """
        Given a purported directory path this function
        will:
            normalize the path
            confirm the path exists and is a directory or;
                if the optional create argument is true,
                create the directory
        If returns the normalized path if a valid directory
        is found/created, otherwise it returns None as an error
        signal
    """
    dir = normalize(dir)
    if not exists(dir):
        if create:
            makedir(dir)
        else:
            dir = None
    else:
        if not isdir(dir:
            dir = None
    return dir

if __name__ == "__main__":
    if len(sys.srgv) == 3:
        spoolDir = checkDir(sys.argv[0], False)
        successDir = checkDir(sys.argv[1], True)
        failureDir = checkDir(sys.argv[2], True)
        if (spoolDir is None
            or successDir is None
            or failureDir is None):
            sys.stderr.write("Unable to access one or more directories\n")
        else:
            mainLoop(spoolDir, successDir, failureDir)
    else:
        sys.stderr.write("USAGE: monitor.py spool-directory "
                         "success-directory failure-directory\n")
-=-=-=-=-=-=-

	Any function that is NOT DEFINED in that listing exists in one or more
modules of the standard library -- recommend you read that document to
locate the correct one.

NOTE: above is Python 2 syntax
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#88465

FromChris Angelico <rosuav@gmail.com>
Date2015-04-03 11:12 +1100
Message-ID<mailman.25.1428019934.12925.python-list@python.org>
In reply to#88444
On Fri, Apr 3, 2015 at 11:03 AM, Dennis Lee Bieber
<wlfraed@ix.netcom.com> wrote:
>>o IF all the contents in the file have the same length (let's assume line length)
>>
>>o THEN the file should be moved to a "success" folder and a text file written indicating the total number of records/lines/words processed
>>
>>o IF the file is empty OR the contents are not all of the same length
>>
>>o THEN the file should be moved to a "failure" folder and a text file written indicating the cause for failure (for example: Empty file or line 100 was not the same length as the rest).
>>
>         You still haven't defined how you determine the "correct length" of the
> record. What if the first line is 79 characters, and all the others are 80
> characters? Do you report ALL lines EXCEPT the first as being the wrong
> length, when really it is the first line that is wrong?

Relatively immaterial here; in the first place, line length is just a
placeholder (my guess is it's more likely to be something like "CSV
files with the same number of cells on each row", or something), and
in the second place, the lines aren't the failures - if there's a
mismatch, the entire file is deemed wrong. It doesn't matter whether
it's the first line or the other lines, the file is dead.

ChrisA

[toc] | [prev] | [next] | [standalone]


#88443

FromPeter Otten <__peter__@web.de>
Date2015-04-02 14:26 +0200
Message-ID<mailman.19.1427977593.12925.python-list@python.org>
In reply to#88441
Saran A wrote:

> Good Morning:
> 
> I understand this error message when I run this code. However, I am
> curious to know what the most pythonic way is to convert  the list to a
> string? I use Python 2.7.
> 
> "Traceback (most recent call last):
> before = dict([(f, None) for f in os.listdir(dirlist)])
> TypeError: coercing to Unicode: need string or buffer, list found"
> 
> 
> The sample code that I am trying to run is:
> 
> path = "/Users/Desktop/Projects/"
> dirlist = os.listdir(path)

At this point dirlist is a list of names of the files and directories in 

"/Users/Desktop/Projects/"

Assuming that the Projects folder contains the subfolders or files
/Users/Desktop/Projects/foo, /Users/Desktop/Projects/bar and 
/Users/Desktop/Projects/baz dirlist looks like this:

["foo", "bar", "baz"]

It makes no sense to pass this list to os.listdir() as you do below:

> before = dict([(f, None) for f in os.listdir(dirlist)])

Forget about the other details in the error message; the actual problem is 
the "list found" part.

Now what would be a possible fix? Sorry, I have no idea what your intention 
is. Again, you don't need to convert your list to string, you need to decide 
what directory you want to pass to listdir(). If you have multiple such 
directories you need to invoke listdir() multiple times with a single 
directory, typically in a loop.

Bonus info:

>     while True:
>         time.sleep(10) #time between update check
 
This loop will never terminate.

[toc] | [prev] | [next] | [standalone]


#88446

FromSaran A <ahlusar.ahluwalia@gmail.com>
Date2015-04-02 05:51 -0700
Message-ID<1f248f13-92f6-40d3-8d9e-a94677d4d003@googlegroups.com>
In reply to#88443
On Thursday, April 2, 2015 at 8:26:51 AM UTC-4, Peter Otten wrote:
> Saran A wrote:
> 
> > Good Morning:
> > 
> > I understand this error message when I run this code. However, I am
> > curious to know what the most pythonic way is to convert  the list to a
> > string? I use Python 2.7.
> > 
> > "Traceback (most recent call last):
> > before = dict([(f, None) for f in os.listdir(dirlist)])
> > TypeError: coercing to Unicode: need string or buffer, list found"
> > 
> > 
> > The sample code that I am trying to run is:
> > 
> > path = "/Users/Desktop/Projects/"
> > dirlist = os.listdir(path)
> 
> At this point dirlist is a list of names of the files and directories in 
> 
> "/Users/Desktop/Projects/"
> 
> Assuming that the Projects folder contains the subfolders or files
> /Users/Desktop/Projects/foo, /Users/Desktop/Projects/bar and 
> /Users/Desktop/Projects/baz dirlist looks like this:
> 
> ["foo", "bar", "baz"]
> 
> It makes no sense to pass this list to os.listdir() as you do below:
> 
> > before = dict([(f, None) for f in os.listdir(dirlist)])
> 
> Forget about the other details in the error message; the actual problem is 
> the "list found" part.
> 
> Now what would be a possible fix? Sorry, I have no idea what your intention 
> is. Again, you don't need to convert your list to string, you need to decide 
> what directory you want to pass to listdir(). If you have multiple such 
> directories you need to invoke listdir() multiple times with a single 
> directory, typically in a loop.
> 
> Bonus info:
> 
> >     while True:
> >         time.sleep(10) #time between update check
>  
> This loop will never terminate.

@Peter I understand that the intention of this program is to not terminate. Here is what I have written so far:

I thought I would run this by you, since you offer such valuable feedback, in the past. Just a quick rundown on what I want my program to do:

* Monitor a folder for files that are dropped throughout the day

* When a file is dropped in the folder the program should scan the file

o IF all the contents in the file have the same length (let's assume line length)

o THEN the file should be moved to a "success" folder and a text file written indicating the total number of records/lines/words processed

o IF the file is empty OR the contents are not all of the same length

o THEN the file should be moved to a "failure" folder and a text file written indicating the cause for failure (for example: Empty file or line 100 was not the same length as the rest).

Here is the code I have written:

import os
import time
import glob
import sys

def initialize_logger(output_dir):
    logger = logging.getLogger()
    logger.setLevel(logging.DEBUG)
     
    # create console handler and set level to info
    handler = logging.StreamHandler()
    handler.setLevel(logging.INFO)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)
 
    # create error file handler and set level to error
    handler = logging.FileHandler(os.path.join(output_dir, "error.log"),"w", encoding=None, delay="true")
    handler.setLevel(logging.ERROR)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

    # create debug file handler and set level to debug
    handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w")
    handler.setLevel(logging.DEBUG)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)


def main(dirslist):     
    while True:
        for file in os.listdir(dirslist) :
        	return validate_files(file)
        	time.sleep(5)

if __name__ == "__main__": 
    main() 


#Helper Functions for the Success and Failure Folder Outcomes, respectively

#checks the length of the file
    def file_len(filename
        with open(filename) as f:
            for i, l in enumerate(f):
                pass
            return i + 1

#copies file to new destination

    def copyFile(src, dest):
        try:
            shutil.copy(src, dest)
        # eg. src and dest are the same file
        except shutil.Error as e:
            print('Error: %s' % e)
        # eg. source or destination doesn't exist
        except IOError as e:
            print('Error: %s' % e.strerror)

#Failure Folder

def move_to_failure_folder_and_return_error_file():
    os.mkdir('Failure')
    copyFile(filename, 'Failure')
    initialize_logger('rootdir/Failure')
    logging.error("Either this file is empty or the lines")
     
# Success Folder Requirement
             
def move_to_success_folder_and_read(file):
    os.mkdir('Success')
    copyFile(filename, 'Success')
    print("Success", file)
    return file_len()

#This simply checks the file information by name

def fileinfo(file):
    filename = os.path.basename(file)
    rootdir = os.path.dirname(file)
    lastmod = time.ctime(os.path.getmtime(file))
    creation = time.ctime(os.path.getctime(file))
    filesize = os.path.getsize(file)
    return filename, rootdir, lastmod, creation, filesize

if __name__ == '__main__':
   import sys
   validate_files(sys.argv[1:])



I am trying to specifically address the fact that the program does not:

The present code does not move any files to success or failure directories (I have added functions at the end that could serve to address this requirement)

The present code doesn't calculate or write to a text file. 

The present code runs once through the names, and terminates.  It doesn't "monitor" anything  - I think that I have added the correct while loop to address this

The present code doesn't check for zero-length files 

-Saran-

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web