Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #19971 > unrolled thread

Reading files in from the proper directory

Started bySMac2347@comcast.net
First post2012-02-07 10:14 -0800
Last post2012-02-07 19:46 +0000
Articles 12 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Reading files in from the proper directory SMac2347@comcast.net - 2012-02-07 10:14 -0800
    Re: Reading files in from the proper directory Dave Angel <d@davea.name> - 2012-02-07 13:40 -0500
      Re: Reading files in from the proper directory SMac2347@comcast.net - 2012-02-07 11:03 -0800
    Re: Reading files in from the proper directory Peter Otten <__peter__@web.de> - 2012-02-07 19:59 +0100
      Re: Reading files in from the proper directory SMac2347@comcast.net - 2012-02-07 11:13 -0800
        Re: Reading files in from the proper directory John Gordon <gordon@panix.com> - 2012-02-07 20:00 +0000
        Re: Reading files in from the proper directory Peter Otten <__peter__@web.de> - 2012-02-07 21:16 +0100
          Re: Reading files in from the proper directory SMac2347@comcast.net - 2012-02-09 12:32 -0800
            Re: Reading files in from the proper directory Peter Otten <__peter__@web.de> - 2012-02-09 22:07 +0100
            Re: Reading files in from the proper directory Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-02-09 23:05 -0500
          Re: Reading files in from the proper directory SMac2347@comcast.net - 2012-02-09 12:36 -0800
    Re: Reading files in from the proper directory John Gordon <gordon@panix.com> - 2012-02-07 19:46 +0000

#19971 — Reading files in from the proper directory

FromSMac2347@comcast.net
Date2012-02-07 10:14 -0800
SubjectReading files in from the proper directory
Message-ID<f3f576e8-e608-4348-b6ee-fe775d9d1021@x19g2000yqh.googlegroups.com>
Hello. I am admittedly a Python novice, and ran into some trouble
trying to write a program that will pull multiple excel files all into
one file, with each file on a different sheet.

I am confident most of the code is correct, as the program runs
without any errors and I found the base of it online, making changes
as necessary for my own purposes. However, I am having trouble
specifying the exact directory where my code should be pulling the
files from. All the files are in the same folder, and I have put the
folder on my desktop. Am I correct in thinking that I need to change
the current working directory to this folder in order for Python to
read in these files, then generate my output? Or should I be doing
something else?

Any and all help is appreciated, thanks!

[toc] | [next] | [standalone]


#19972

FromDave Angel <d@davea.name>
Date2012-02-07 13:40 -0500
Message-ID<mailman.5508.1328640027.27778.python-list@python.org>
In reply to#19971
On 02/07/2012 01:14 PM, SMac2347@comcast.net wrote:
> Hello. I am admittedly a Python novice, and ran into some trouble
> trying to write a program that will pull multiple excel files all into
> one file, with each file on a different sheet.
>
> I am confident most of the code is correct, as the program runs
> without any errors and I found the base of it online, making changes
> as necessary for my own purposes. However, I am having trouble
> specifying the exact directory where my code should be pulling the
> files from. All the files are in the same folder, and I have put the
> folder on my desktop. Am I correct in thinking that I need to change
> the current working directory to this folder in order for Python to
> read in these files,
No, Python certainly does not constrain you to working with files only 
in the current working directory.  My rule of thumb is never to change 
the cwd in a Python program.  You can use relative paths to open files, 
or you can use absolute paths.  There is even a library function 
os.path.abspath() for converting a relative path to an absolute one.

If you do change cwd during the running of a program, then relative 
filenames that worked earlier might no longer work.  You could convert 
them all to absolute paths, but that's more work.

You can piece together path strings using os.path.join().  It's smart 
enough to know the path separator for your particular platform.

Check out this page: http://docs.python.org/library/os.path.html

> then generate my output? Or should I be doing
> something else?
>
> Any and all help is appreciated, thanks!
>


-- 

DaveA

[toc] | [prev] | [next] | [standalone]


#19976

FromSMac2347@comcast.net
Date2012-02-07 11:03 -0800
Message-ID<6fc4ebfa-6607-4af7-83d7-65c61408a6ac@m24g2000yqb.googlegroups.com>
In reply to#19972
On Feb 7, 1:40 pm, Dave Angel <d...@davea.name> wrote:
> On 02/07/2012 01:14 PM, SMac2...@comcast.net wrote:> Hello. I am admittedly a Python novice, and ran into some trouble
> > trying to write a program that will pull multiple excel files all into
> > one file, with each file on a different sheet.
>
> > I am confident most of the code is correct, as the program runs
> > without any errors and I found the base of it online, making changes
> > as necessary for my own purposes. However, I am having trouble
> > specifying the exact directory where my code should be pulling the
> > files from. All the files are in the same folder, and I have put the
> > folder on my desktop. Am I correct in thinking that I need to change
> > the current working directory to this folder in order for Python to
> > read in these files,
>
> No, Python certainly does not constrain you to working with files only
> in the current working directory.  My rule of thumb is never to change
> the cwd in a Python program.  You can use relative paths to open files,
> or you can use absolute paths.  There is even a library function
> os.path.abspath() for converting a relative path to an absolute one.
>
> If you do change cwd during the running of a program, then relative
> filenames that worked earlier might no longer work.  You could convert
> them all to absolute paths, but that's more work.
>
> You can piece together path strings using os.path.join().  It's smart
> enough to know the path separator for your particular platform.
>
> Check out this page:http://docs.python.org/library/os.path.html
>
> > then generate my output? Or should I be doing
> > something else?
>
> > Any and all help is appreciated, thanks!
>
> --
>
> DaveA

Thanks Dave. I am a bit lost as to what the problem is then - the
program runs glitch free, but then only prints: "NOTE *** No xls files
in C:/Documents and Settings/smacdon/." as specified below by my
program. Any idea what the issue might be (my code is below):


import xlrd, xlwt
import glob, os.path
def merge_xls (in_dir, out_file="C:\Documents and Settings\smacdon
\Desktop\09 Aggregate JWS\09_merged_data.xls"):
    xls_files   = glob.glob(in_dir + "*.xls")
    sheet_names = [os.path.basename(v)[:-4] for v in xls_files]
    sheet_excl  = [os.path.basename(v)[:-4] for v in xls_files if len
(os.path.basename(v)[:-4]) > 29]
    merged_book = xlwt.Workbook()
    if in_dir[-1:] != "/": in_dir = in_dir + "/"
    xls_files.sort()
    if xls_files:
        for k, xls_file in enumerate(xls_files):
            print "---> Processing file %s" % (xls_file)
            if len (sheet_names[k]) <= 29:
                book = xlrd.open_workbook(xls_file)
                if book.nsheets == 1:
                    ws  = merged_book.add_sheet(sheet_names[k])
                    sheet = book.sheet_by_index(0)
                    for rx in range(sheet.nrows):
                        for cx in range(sheet.ncols):
                            ws.write(rx, cx, sheet.cell_value(rx,
cx))
                elif book.nsheets in range(2, 100):
                    for sheetx in range(book.nsheets):
                        sheet0n = sheet_names[k]+str(sheetx
+1).zfill(2)
                        ws = merged_book.add_sheet(sheet0n)
                        sheet = book.sheet_by_index(sheetx)
                        for rx in range(sheet.nrows):
                            for cx in range(sheet.ncols):
                                ws.write(rx, cx, sheet.cell_value(rx,
cx))
                else:
                    print "ERROR *** File %s has %s sheets (maximum is
99)" % (xls_file, book.nsheets)
                    raise
            else:
                print "WARNING *** File name too long: <%s.xls>
(maximum is 29 chars) " % (sheet_names[k])
                print "WARNING *** File <%s.xls> was skipped." %
(sheet_names[k])
        merged_book.save(out_file)
        print
        print "---> Merged xls file written to %s using the following
source files: " % (out_file)
        for k, v in enumerate(sheet_names):
            if len(v) <= 29:
                print "\t", str(k+1).zfill(3), "%s.xls" % (v)
        print
        if sheet_excl:
            print "--> The following files were skipped because the
file name exceeds 29 characters: "
            for k, v in enumerate(sheet_excl):
                print "\t", str(k+1).zfill(3), v
    else:
        print "NOTE *** No xls files in %s." % (in_dir)
merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09
Aggregate JWS"

[toc] | [prev] | [next] | [standalone]


#19974

FromPeter Otten <__peter__@web.de>
Date2012-02-07 19:59 +0100
Message-ID<mailman.5510.1328641121.27778.python-list@python.org>
In reply to#19971
SMac2347@comcast.net wrote:

> Hello. I am admittedly a Python novice, and ran into some trouble
> trying to write a program that will pull multiple excel files all into
> one file, with each file on a different sheet.
> 
> I am confident most of the code is correct, as the program runs
> without any errors and I found the base of it online, making changes
> as necessary for my own purposes. 

That confidence usually evaporates once you write the first unit test ;)

> However, I am having trouble
> specifying the exact directory where my code should be pulling the
> files from. All the files are in the same folder, and I have put the
> folder on my desktop. Am I correct in thinking that I need to change
> the current working directory to this folder in order for Python to
> read in these files, then generate my output? Or should I be doing
> something else?

Do it properly, allow specifying the files on the commandline:

import argparse

def process_files(files, destfile):
    # put your real code here
    print "merge " + "\n      ".join(files)
    print "into  " + destfile

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("files", metavar="file", nargs="+")
    parser.add_argument("destfile")
    args = parser.parse_args()
    process_files(args.files, args.destfile)

If you have standard locations for sources and destination you can wrap your 
python script into a little batch file containing something like

python \source\path\*.xls \dest\path\merged.xls

and invoke that to get both flexibility and convenience.

[toc] | [prev] | [next] | [standalone]


#19975

FromSMac2347@comcast.net
Date2012-02-07 11:13 -0800
Message-ID<9bfb3e39-2bc6-4399-90cc-1c53aa06265e@h6g2000yqk.googlegroups.com>
In reply to#19974
Thanks for the responses.


Below is the code I have thus far. while the program runs glitch-free,
it only results in the printing of the message: "NOTE *** No xls files
in C:/Documents and Settings/smacdon/." as specified by my code. Any
idea as to why it might be unable to find the .xls documents (yes they
are .xls documents and not .xlsx). Thanks!


import xlrd, xlwt
import glob, os.path
def merge_xls (in_dir, out_file="C:\Documents and Settings\smacdon
\Desktop\09 Aggregate JWS\09_merged_data.xls"):
    xls_files   = glob.glob(in_dir + "*.xls")
    sheet_names = [os.path.basename(v)[:-4] for v in xls_files]
    sheet_excl  = [os.path.basename(v)[:-4] for v in xls_files if len
(os.path.basename(v)[:-4]) > 29]
    merged_book = xlwt.Workbook()
    if in_dir[-1:] != "/": in_dir = in_dir + "/"
    xls_files.sort()
    if xls_files:
        for k, xls_file in enumerate(xls_files):
            print "---> Processing file %s" % (xls_file)
            if len (sheet_names[k]) <= 29:
                book = xlrd.open_workbook(xls_file)
                if book.nsheets == 1:
                    ws  = merged_book.add_sheet(sheet_names[k])
                    sheet = book.sheet_by_index(0)
                    for rx in range(sheet.nrows):
                        for cx in range(sheet.ncols):
                            ws.write(rx, cx, sheet.cell_value(rx,
cx))
                elif book.nsheets in range(2, 100):
                    for sheetx in range(book.nsheets):
                        sheet0n = sheet_names[k]+str(sheetx
+1).zfill(2)
                        ws = merged_book.add_sheet(sheet0n)
                        sheet = book.sheet_by_index(sheetx)
                        for rx in range(sheet.nrows):
                            for cx in range(sheet.ncols):
                                ws.write(rx, cx, sheet.cell_value(rx,
cx))
                else:
                    print "ERROR *** File %s has %s sheets (maximum is
99)" % (xls_file, book.nsheets)
                    raise
            else:
                print "WARNING *** File name too long: <%s.xls>
(maximum is 29 chars) " % (sheet_names[k])
                print "WARNING *** File <%s.xls> was skipped." %
(sheet_names[k])
        merged_book.save(out_file)
        print
        print "---> Merged xls file written to %s using the following
source files: " % (out_file)
        for k, v in enumerate(sheet_names):
            if len(v) <= 29:
                print "\t", str(k+1).zfill(3), "%s.xls" % (v)
        print
        if sheet_excl:
            print "--> The following files were skipped because the
file name exceeds 29 characters: "
            for k, v in enumerate(sheet_excl):
                print "\t", str(k+1).zfill(3), v
    else:
        print "NOTE *** No xls files in %s." % (in_dir)
merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09
Aggregate JWS"

[toc] | [prev] | [next] | [standalone]


#19980

FromJohn Gordon <gordon@panix.com>
Date2012-02-07 20:00 +0000
Message-ID<jgrvt2$8qm$1@reader1.panix.com>
In reply to#19975
In <9bfb3e39-2bc6-4399-90cc-1c53aa06265e@h6g2000yqk.googlegroups.com> SMac2347@comcast.net writes:

>     xls_files   = glob.glob(in_dir + "*.xls")

You may want to put a directory separator character in between the
directory name and the filename glob pattern.

-- 
John Gordon                   A is for Amy, who fell down the stairs
gordon@panix.com              B is for Basil, assaulted by bears
                                -- Edward Gorey, "The Gashlycrumb Tinies"

[toc] | [prev] | [next] | [standalone]


#19981

FromPeter Otten <__peter__@web.de>
Date2012-02-07 21:16 +0100
Message-ID<mailman.5511.1328645799.27778.python-list@python.org>
In reply to#19975
SMac2347@comcast.net wrote:

> xls_files   = glob.glob(in_dir + "*.xls")

Try changing that to

pattern = os.path.join(in_dir, "*.xls")
xls_files = glob.glob(pattern) 

os.path.join() inserts a (back)slash between directory and filename if 
necessary.

> merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS")

If you paste the directory name literal into the interactive interpreter 
you'll be surprised:

>>> "C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"
'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS'

"\09" is intrpreted as chr(9). Use a raw string to prevent Python from 
interpreting a backslash as the start of an escape sequence

>>> r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"
'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS'

or use forward slashes as directory separators.

[toc] | [prev] | [next] | [standalone]


#20106

FromSMac2347@comcast.net
Date2012-02-09 12:32 -0800
Message-ID<10d15214-1fd8-4b80-8384-c0b291c6976d@h3g2000yqe.googlegroups.com>
In reply to#19981
On Feb 7, 3:16 pm, Peter Otten <__pete...@web.de> wrote:
> SMac2...@comcast.net wrote:
> > xls_files   = glob.glob(in_dir + "*.xls")
>
> Try changing that to
>
> pattern = os.path.join(in_dir, "*.xls")
> xls_files = glob.glob(pattern)
>
> os.path.join() inserts a (back)slash between directory and filename if
> necessary.
>
> > merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS")
>
> If you paste the directory name literal into the interactive interpreter
> you'll be surprised:
>
> >>> "C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"
>
> 'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS'
>
> "\09" is intrpreted as chr(9). Use a raw string to prevent Python from
> interpreting a backslash as the start of an escape sequence
>
> >>> r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"
>
> 'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS'
>
> or use forward slashes as directory separators.

Peter, thanks so much for your help, your suggestions were spot on. So
now my program runs and is able to find and process the files
correctly, but I end up getting the following message:

Traceback (most recent call last):
  File "C:/Documents and Settings/smacdon/My Documents/
excel_merge_files_indirectory v2.py", line 49, in <module>
    merge_xls(in_dir=r"C:\Documents and Settings\smacdon\Desktop\09
Aggregate JWS")
  File "C:/Documents and Settings/smacdon/My Documents/
excel_merge_files_indirectory v2.py", line 36, in merge_xls
    merged_book.save(out_file)
  File "C:\Python27\lib\site-packages\xlwt\Workbook.py", line 634, in
save
    doc.save(filename, self.get_biff_data())
  File "C:\Python27\lib\site-packages\xlwt\CompoundDoc.py", line 507,
in save
    f = open(file_name_or_filelike_obj, 'wb')
TypeError: file() argument 1 must be encoded string without NULL
bytes, not str


If I am interpreting correctly, am I to understand that it would
appear the issue is tracing back to functions in the xlwt module? If
so, what can I do to fix this? Again, any and all help is appreciated!

[toc] | [prev] | [next] | [standalone]


#20110

FromPeter Otten <__peter__@web.de>
Date2012-02-09 22:07 +0100
Message-ID<mailman.5612.1328821689.27778.python-list@python.org>
In reply to#20106
SMac2347@comcast.net wrote:

> On Feb 7, 3:16 pm, Peter Otten <__pete...@web.de> wrote:
>> SMac2...@comcast.net wrote:
>> > xls_files   = glob.glob(in_dir + "*.xls")
>>
>> Try changing that to
>>
>> pattern = os.path.join(in_dir, "*.xls")
>> xls_files = glob.glob(pattern)
>>
>> os.path.join() inserts a (back)slash between directory and filename if
>> necessary.
>>
>> > merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09
>> > Aggregate JWS")
>>
>> If you paste the directory name literal into the interactive interpreter
>> you'll be surprised:
>>
>> >>> "C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"
>>
>> 'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS'
>>
>> "\09" is intrpreted as chr(9). Use a raw string to prevent Python from

Sorry, I was wrong here. "\09" is actually "\0" (i. e. chr(0))
followed by "9". Escape sequences starting with 0 are octal numbers
in Python 2 and thus may never contain digits > 7.

>> interpreting a backslash as the start of an escape sequence
>>
>> >>> r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"
>>
>> 'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS'
>>
>> or use forward slashes as directory separators.
> 
> Peter, thanks so much for your help, your suggestions were spot on. So
> now my program runs and is able to find and process the files
> correctly, but I end up getting the following message:
> 
> Traceback (most recent call last):
>   File "C:/Documents and Settings/smacdon/My Documents/
> excel_merge_files_indirectory v2.py", line 49, in <module>
>     merge_xls(in_dir=r"C:\Documents and Settings\smacdon\Desktop\09
> Aggregate JWS")
>   File "C:/Documents and Settings/smacdon/My Documents/
> excel_merge_files_indirectory v2.py", line 36, in merge_xls
>     merged_book.save(out_file)
>   File "C:\Python27\lib\site-packages\xlwt\Workbook.py", line 634, in
> save
>     doc.save(filename, self.get_biff_data())
>   File "C:\Python27\lib\site-packages\xlwt\CompoundDoc.py", line 507,
> in save
>     f = open(file_name_or_filelike_obj, 'wb')
> TypeError: file() argument 1 must be encoded string without NULL
> bytes, not str
> 
> 
> If I am interpreting correctly, am I to understand that it would
> appear the issue is tracing back to functions in the xlwt module? If
> so, what can I do to fix this? Again, any and all help is appreciated!

You probably forgot to convert the default value for out_file into a 
raw string:

def merge_xls(in_dir, out_file=
r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS\09_merged_data.xls"):

"\0" is therefore interpreted as chr(0) which marks the end of a C string and 
may not occur in a file name. chr(0) is called "NULL byte" in the error message 
you get.

[toc] | [prev] | [next] | [standalone]


#20139

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2012-02-09 23:05 -0500
Message-ID<mailman.5638.1328847009.27778.python-list@python.org>
In reply to#20106
On Thu, 9 Feb 2012 12:32:35 -0800 (PST), SMac2347@comcast.net wrote:

>On Feb 7, 3:16 pm, Peter Otten <__pete...@web.de> wrote:

>> or use forward slashes as directory separators.
>
	Or use 
		os.path.join([list, of, path, parts]) 
and have the correct separator for whatever operating system (though it
does have a problem with a drive letter component -- that does need a
"\" to indicate the root of the drive rather than what the OS thinks of
as the drive's "current directory"

>  File "C:/Documents and Settings/smacdon/My Documents/
>excel_merge_files_indirectory v2.py", line 36, in merge_xls
>    merged_book.save(out_file)
	<snip>
>
>If I am interpreting correctly, am I to understand that it would
>appear the issue is tracing back to functions in the xlwt module? If
>so, what can I do to fix this? Again, any and all help is appreciated!

	Where does out_file come from? I suspect THAT is what is corrupt.
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#20107

FromSMac2347@comcast.net
Date2012-02-09 12:36 -0800
Message-ID<ff9c3b3c-3c47-4ed8-b4cb-60cf76da6bd5@h3g2000yqe.googlegroups.com>
In reply to#19981
On Feb 7, 3:16 pm, Peter Otten <__pete...@web.de> wrote:
> SMac2...@comcast.net wrote:
> > xls_files   = glob.glob(in_dir + "*.xls")
>
> Try changing that to
>
> pattern = os.path.join(in_dir, "*.xls")
> xls_files = glob.glob(pattern)
>
> os.path.join() inserts a (back)slash between directory and filename if
> necessary.
>
> > merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS")
>
> If you paste the directory name literal into the interactive interpreter
> you'll be surprised:
>
> >>> "C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"
>
> 'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS'
>
> "\09" is intrpreted as chr(9). Use a raw string to prevent Python from
> interpreting a backslash as the start of an escape sequence
>
> >>> r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"
>
> 'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS'
>
> or use forward slashes as directory separators.

Disregard my last post, I was able to figure it out, I also had to
cover the out_file file name into a raw string as well. Thanks again
for all the help!!!

[toc] | [prev] | [next] | [standalone]


#19978

FromJohn Gordon <gordon@panix.com>
Date2012-02-07 19:46 +0000
Message-ID<jgrv32$ih7$1@reader1.panix.com>
In reply to#19971
In <f3f576e8-e608-4348-b6ee-fe775d9d1021@x19g2000yqh.googlegroups.com> SMac2347@comcast.net writes:

> Am I correct in thinking that I need to change the current working
> directory to this folder in order for Python to read in these files,
> then generate my output?

You don't have to do it that way, no.

In general, when opening a file, you can do it two ways: Either provide
a full pathname, or provide a relative pathname.

If you provide a full pathname (for example "/usr/home/smith/myfile.txt"),
that file will be opened and it does not matter what the current working
directory is.

If you provide a relative pathname (for example "myfile.txt"), python
will attempt to open that file starting from the current working dir.

-- 
John Gordon                   A is for Amy, who fell down the stairs
gordon@panix.com              B is for Basil, assaulted by bears
                                -- Edward Gorey, "The Gashlycrumb Tinies"

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web