Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #19971 > unrolled thread
| Started by | SMac2347@comcast.net |
|---|---|
| First post | 2012-02-07 10:14 -0800 |
| Last post | 2012-02-07 19:46 +0000 |
| Articles | 12 — 5 participants |
Back to article view | Back to comp.lang.python
Reading files in from the proper directory SMac2347@comcast.net - 2012-02-07 10:14 -0800
Re: Reading files in from the proper directory Dave Angel <d@davea.name> - 2012-02-07 13:40 -0500
Re: Reading files in from the proper directory SMac2347@comcast.net - 2012-02-07 11:03 -0800
Re: Reading files in from the proper directory Peter Otten <__peter__@web.de> - 2012-02-07 19:59 +0100
Re: Reading files in from the proper directory SMac2347@comcast.net - 2012-02-07 11:13 -0800
Re: Reading files in from the proper directory John Gordon <gordon@panix.com> - 2012-02-07 20:00 +0000
Re: Reading files in from the proper directory Peter Otten <__peter__@web.de> - 2012-02-07 21:16 +0100
Re: Reading files in from the proper directory SMac2347@comcast.net - 2012-02-09 12:32 -0800
Re: Reading files in from the proper directory Peter Otten <__peter__@web.de> - 2012-02-09 22:07 +0100
Re: Reading files in from the proper directory Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-02-09 23:05 -0500
Re: Reading files in from the proper directory SMac2347@comcast.net - 2012-02-09 12:36 -0800
Re: Reading files in from the proper directory John Gordon <gordon@panix.com> - 2012-02-07 19:46 +0000
| From | SMac2347@comcast.net |
|---|---|
| Date | 2012-02-07 10:14 -0800 |
| Subject | Reading files in from the proper directory |
| Message-ID | <f3f576e8-e608-4348-b6ee-fe775d9d1021@x19g2000yqh.googlegroups.com> |
Hello. I am admittedly a Python novice, and ran into some trouble trying to write a program that will pull multiple excel files all into one file, with each file on a different sheet. I am confident most of the code is correct, as the program runs without any errors and I found the base of it online, making changes as necessary for my own purposes. However, I am having trouble specifying the exact directory where my code should be pulling the files from. All the files are in the same folder, and I have put the folder on my desktop. Am I correct in thinking that I need to change the current working directory to this folder in order for Python to read in these files, then generate my output? Or should I be doing something else? Any and all help is appreciated, thanks!
[toc] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-02-07 13:40 -0500 |
| Message-ID | <mailman.5508.1328640027.27778.python-list@python.org> |
| In reply to | #19971 |
On 02/07/2012 01:14 PM, SMac2347@comcast.net wrote: > Hello. I am admittedly a Python novice, and ran into some trouble > trying to write a program that will pull multiple excel files all into > one file, with each file on a different sheet. > > I am confident most of the code is correct, as the program runs > without any errors and I found the base of it online, making changes > as necessary for my own purposes. However, I am having trouble > specifying the exact directory where my code should be pulling the > files from. All the files are in the same folder, and I have put the > folder on my desktop. Am I correct in thinking that I need to change > the current working directory to this folder in order for Python to > read in these files, No, Python certainly does not constrain you to working with files only in the current working directory. My rule of thumb is never to change the cwd in a Python program. You can use relative paths to open files, or you can use absolute paths. There is even a library function os.path.abspath() for converting a relative path to an absolute one. If you do change cwd during the running of a program, then relative filenames that worked earlier might no longer work. You could convert them all to absolute paths, but that's more work. You can piece together path strings using os.path.join(). It's smart enough to know the path separator for your particular platform. Check out this page: http://docs.python.org/library/os.path.html > then generate my output? Or should I be doing > something else? > > Any and all help is appreciated, thanks! > -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | SMac2347@comcast.net |
|---|---|
| Date | 2012-02-07 11:03 -0800 |
| Message-ID | <6fc4ebfa-6607-4af7-83d7-65c61408a6ac@m24g2000yqb.googlegroups.com> |
| In reply to | #19972 |
On Feb 7, 1:40 pm, Dave Angel <d...@davea.name> wrote:
> On 02/07/2012 01:14 PM, SMac2...@comcast.net wrote:> Hello. I am admittedly a Python novice, and ran into some trouble
> > trying to write a program that will pull multiple excel files all into
> > one file, with each file on a different sheet.
>
> > I am confident most of the code is correct, as the program runs
> > without any errors and I found the base of it online, making changes
> > as necessary for my own purposes. However, I am having trouble
> > specifying the exact directory where my code should be pulling the
> > files from. All the files are in the same folder, and I have put the
> > folder on my desktop. Am I correct in thinking that I need to change
> > the current working directory to this folder in order for Python to
> > read in these files,
>
> No, Python certainly does not constrain you to working with files only
> in the current working directory. My rule of thumb is never to change
> the cwd in a Python program. You can use relative paths to open files,
> or you can use absolute paths. There is even a library function
> os.path.abspath() for converting a relative path to an absolute one.
>
> If you do change cwd during the running of a program, then relative
> filenames that worked earlier might no longer work. You could convert
> them all to absolute paths, but that's more work.
>
> You can piece together path strings using os.path.join(). It's smart
> enough to know the path separator for your particular platform.
>
> Check out this page:http://docs.python.org/library/os.path.html
>
> > then generate my output? Or should I be doing
> > something else?
>
> > Any and all help is appreciated, thanks!
>
> --
>
> DaveA
Thanks Dave. I am a bit lost as to what the problem is then - the
program runs glitch free, but then only prints: "NOTE *** No xls files
in C:/Documents and Settings/smacdon/." as specified below by my
program. Any idea what the issue might be (my code is below):
import xlrd, xlwt
import glob, os.path
def merge_xls (in_dir, out_file="C:\Documents and Settings\smacdon
\Desktop\09 Aggregate JWS\09_merged_data.xls"):
xls_files = glob.glob(in_dir + "*.xls")
sheet_names = [os.path.basename(v)[:-4] for v in xls_files]
sheet_excl = [os.path.basename(v)[:-4] for v in xls_files if len
(os.path.basename(v)[:-4]) > 29]
merged_book = xlwt.Workbook()
if in_dir[-1:] != "/": in_dir = in_dir + "/"
xls_files.sort()
if xls_files:
for k, xls_file in enumerate(xls_files):
print "---> Processing file %s" % (xls_file)
if len (sheet_names[k]) <= 29:
book = xlrd.open_workbook(xls_file)
if book.nsheets == 1:
ws = merged_book.add_sheet(sheet_names[k])
sheet = book.sheet_by_index(0)
for rx in range(sheet.nrows):
for cx in range(sheet.ncols):
ws.write(rx, cx, sheet.cell_value(rx,
cx))
elif book.nsheets in range(2, 100):
for sheetx in range(book.nsheets):
sheet0n = sheet_names[k]+str(sheetx
+1).zfill(2)
ws = merged_book.add_sheet(sheet0n)
sheet = book.sheet_by_index(sheetx)
for rx in range(sheet.nrows):
for cx in range(sheet.ncols):
ws.write(rx, cx, sheet.cell_value(rx,
cx))
else:
print "ERROR *** File %s has %s sheets (maximum is
99)" % (xls_file, book.nsheets)
raise
else:
print "WARNING *** File name too long: <%s.xls>
(maximum is 29 chars) " % (sheet_names[k])
print "WARNING *** File <%s.xls> was skipped." %
(sheet_names[k])
merged_book.save(out_file)
print
print "---> Merged xls file written to %s using the following
source files: " % (out_file)
for k, v in enumerate(sheet_names):
if len(v) <= 29:
print "\t", str(k+1).zfill(3), "%s.xls" % (v)
print
if sheet_excl:
print "--> The following files were skipped because the
file name exceeds 29 characters: "
for k, v in enumerate(sheet_excl):
print "\t", str(k+1).zfill(3), v
else:
print "NOTE *** No xls files in %s." % (in_dir)
merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09
Aggregate JWS"
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2012-02-07 19:59 +0100 |
| Message-ID | <mailman.5510.1328641121.27778.python-list@python.org> |
| In reply to | #19971 |
SMac2347@comcast.net wrote:
> Hello. I am admittedly a Python novice, and ran into some trouble
> trying to write a program that will pull multiple excel files all into
> one file, with each file on a different sheet.
>
> I am confident most of the code is correct, as the program runs
> without any errors and I found the base of it online, making changes
> as necessary for my own purposes.
That confidence usually evaporates once you write the first unit test ;)
> However, I am having trouble
> specifying the exact directory where my code should be pulling the
> files from. All the files are in the same folder, and I have put the
> folder on my desktop. Am I correct in thinking that I need to change
> the current working directory to this folder in order for Python to
> read in these files, then generate my output? Or should I be doing
> something else?
Do it properly, allow specifying the files on the commandline:
import argparse
def process_files(files, destfile):
# put your real code here
print "merge " + "\n ".join(files)
print "into " + destfile
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("files", metavar="file", nargs="+")
parser.add_argument("destfile")
args = parser.parse_args()
process_files(args.files, args.destfile)
If you have standard locations for sources and destination you can wrap your
python script into a little batch file containing something like
python \source\path\*.xls \dest\path\merged.xls
and invoke that to get both flexibility and convenience.
[toc] | [prev] | [next] | [standalone]
| From | SMac2347@comcast.net |
|---|---|
| Date | 2012-02-07 11:13 -0800 |
| Message-ID | <9bfb3e39-2bc6-4399-90cc-1c53aa06265e@h6g2000yqk.googlegroups.com> |
| In reply to | #19974 |
Thanks for the responses.
Below is the code I have thus far. while the program runs glitch-free,
it only results in the printing of the message: "NOTE *** No xls files
in C:/Documents and Settings/smacdon/." as specified by my code. Any
idea as to why it might be unable to find the .xls documents (yes they
are .xls documents and not .xlsx). Thanks!
import xlrd, xlwt
import glob, os.path
def merge_xls (in_dir, out_file="C:\Documents and Settings\smacdon
\Desktop\09 Aggregate JWS\09_merged_data.xls"):
xls_files = glob.glob(in_dir + "*.xls")
sheet_names = [os.path.basename(v)[:-4] for v in xls_files]
sheet_excl = [os.path.basename(v)[:-4] for v in xls_files if len
(os.path.basename(v)[:-4]) > 29]
merged_book = xlwt.Workbook()
if in_dir[-1:] != "/": in_dir = in_dir + "/"
xls_files.sort()
if xls_files:
for k, xls_file in enumerate(xls_files):
print "---> Processing file %s" % (xls_file)
if len (sheet_names[k]) <= 29:
book = xlrd.open_workbook(xls_file)
if book.nsheets == 1:
ws = merged_book.add_sheet(sheet_names[k])
sheet = book.sheet_by_index(0)
for rx in range(sheet.nrows):
for cx in range(sheet.ncols):
ws.write(rx, cx, sheet.cell_value(rx,
cx))
elif book.nsheets in range(2, 100):
for sheetx in range(book.nsheets):
sheet0n = sheet_names[k]+str(sheetx
+1).zfill(2)
ws = merged_book.add_sheet(sheet0n)
sheet = book.sheet_by_index(sheetx)
for rx in range(sheet.nrows):
for cx in range(sheet.ncols):
ws.write(rx, cx, sheet.cell_value(rx,
cx))
else:
print "ERROR *** File %s has %s sheets (maximum is
99)" % (xls_file, book.nsheets)
raise
else:
print "WARNING *** File name too long: <%s.xls>
(maximum is 29 chars) " % (sheet_names[k])
print "WARNING *** File <%s.xls> was skipped." %
(sheet_names[k])
merged_book.save(out_file)
print
print "---> Merged xls file written to %s using the following
source files: " % (out_file)
for k, v in enumerate(sheet_names):
if len(v) <= 29:
print "\t", str(k+1).zfill(3), "%s.xls" % (v)
print
if sheet_excl:
print "--> The following files were skipped because the
file name exceeds 29 characters: "
for k, v in enumerate(sheet_excl):
print "\t", str(k+1).zfill(3), v
else:
print "NOTE *** No xls files in %s." % (in_dir)
merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09
Aggregate JWS"
[toc] | [prev] | [next] | [standalone]
| From | John Gordon <gordon@panix.com> |
|---|---|
| Date | 2012-02-07 20:00 +0000 |
| Message-ID | <jgrvt2$8qm$1@reader1.panix.com> |
| In reply to | #19975 |
In <9bfb3e39-2bc6-4399-90cc-1c53aa06265e@h6g2000yqk.googlegroups.com> SMac2347@comcast.net writes:
> xls_files = glob.glob(in_dir + "*.xls")
You may want to put a directory separator character in between the
directory name and the filename glob pattern.
--
John Gordon A is for Amy, who fell down the stairs
gordon@panix.com B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2012-02-07 21:16 +0100 |
| Message-ID | <mailman.5511.1328645799.27778.python-list@python.org> |
| In reply to | #19975 |
SMac2347@comcast.net wrote: > xls_files = glob.glob(in_dir + "*.xls") Try changing that to pattern = os.path.join(in_dir, "*.xls") xls_files = glob.glob(pattern) os.path.join() inserts a (back)slash between directory and filename if necessary. > merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS") If you paste the directory name literal into the interactive interpreter you'll be surprised: >>> "C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS" 'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS' "\09" is intrpreted as chr(9). Use a raw string to prevent Python from interpreting a backslash as the start of an escape sequence >>> r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS" 'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS' or use forward slashes as directory separators.
[toc] | [prev] | [next] | [standalone]
| From | SMac2347@comcast.net |
|---|---|
| Date | 2012-02-09 12:32 -0800 |
| Message-ID | <10d15214-1fd8-4b80-8384-c0b291c6976d@h3g2000yqe.googlegroups.com> |
| In reply to | #19981 |
On Feb 7, 3:16 pm, Peter Otten <__pete...@web.de> wrote:
> SMac2...@comcast.net wrote:
> > xls_files = glob.glob(in_dir + "*.xls")
>
> Try changing that to
>
> pattern = os.path.join(in_dir, "*.xls")
> xls_files = glob.glob(pattern)
>
> os.path.join() inserts a (back)slash between directory and filename if
> necessary.
>
> > merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS")
>
> If you paste the directory name literal into the interactive interpreter
> you'll be surprised:
>
> >>> "C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"
>
> 'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS'
>
> "\09" is intrpreted as chr(9). Use a raw string to prevent Python from
> interpreting a backslash as the start of an escape sequence
>
> >>> r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS"
>
> 'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS'
>
> or use forward slashes as directory separators.
Peter, thanks so much for your help, your suggestions were spot on. So
now my program runs and is able to find and process the files
correctly, but I end up getting the following message:
Traceback (most recent call last):
File "C:/Documents and Settings/smacdon/My Documents/
excel_merge_files_indirectory v2.py", line 49, in <module>
merge_xls(in_dir=r"C:\Documents and Settings\smacdon\Desktop\09
Aggregate JWS")
File "C:/Documents and Settings/smacdon/My Documents/
excel_merge_files_indirectory v2.py", line 36, in merge_xls
merged_book.save(out_file)
File "C:\Python27\lib\site-packages\xlwt\Workbook.py", line 634, in
save
doc.save(filename, self.get_biff_data())
File "C:\Python27\lib\site-packages\xlwt\CompoundDoc.py", line 507,
in save
f = open(file_name_or_filelike_obj, 'wb')
TypeError: file() argument 1 must be encoded string without NULL
bytes, not str
If I am interpreting correctly, am I to understand that it would
appear the issue is tracing back to functions in the xlwt module? If
so, what can I do to fix this? Again, any and all help is appreciated!
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2012-02-09 22:07 +0100 |
| Message-ID | <mailman.5612.1328821689.27778.python-list@python.org> |
| In reply to | #20106 |
SMac2347@comcast.net wrote: > On Feb 7, 3:16 pm, Peter Otten <__pete...@web.de> wrote: >> SMac2...@comcast.net wrote: >> > xls_files = glob.glob(in_dir + "*.xls") >> >> Try changing that to >> >> pattern = os.path.join(in_dir, "*.xls") >> xls_files = glob.glob(pattern) >> >> os.path.join() inserts a (back)slash between directory and filename if >> necessary. >> >> > merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09 >> > Aggregate JWS") >> >> If you paste the directory name literal into the interactive interpreter >> you'll be surprised: >> >> >>> "C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS" >> >> 'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS' >> >> "\09" is intrpreted as chr(9). Use a raw string to prevent Python from Sorry, I was wrong here. "\09" is actually "\0" (i. e. chr(0)) followed by "9". Escape sequences starting with 0 are octal numbers in Python 2 and thus may never contain digits > 7. >> interpreting a backslash as the start of an escape sequence >> >> >>> r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS" >> >> 'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS' >> >> or use forward slashes as directory separators. > > Peter, thanks so much for your help, your suggestions were spot on. So > now my program runs and is able to find and process the files > correctly, but I end up getting the following message: > > Traceback (most recent call last): > File "C:/Documents and Settings/smacdon/My Documents/ > excel_merge_files_indirectory v2.py", line 49, in <module> > merge_xls(in_dir=r"C:\Documents and Settings\smacdon\Desktop\09 > Aggregate JWS") > File "C:/Documents and Settings/smacdon/My Documents/ > excel_merge_files_indirectory v2.py", line 36, in merge_xls > merged_book.save(out_file) > File "C:\Python27\lib\site-packages\xlwt\Workbook.py", line 634, in > save > doc.save(filename, self.get_biff_data()) > File "C:\Python27\lib\site-packages\xlwt\CompoundDoc.py", line 507, > in save > f = open(file_name_or_filelike_obj, 'wb') > TypeError: file() argument 1 must be encoded string without NULL > bytes, not str > > > If I am interpreting correctly, am I to understand that it would > appear the issue is tracing back to functions in the xlwt module? If > so, what can I do to fix this? Again, any and all help is appreciated! You probably forgot to convert the default value for out_file into a raw string: def merge_xls(in_dir, out_file= r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS\09_merged_data.xls"): "\0" is therefore interpreted as chr(0) which marks the end of a C string and may not occur in a file name. chr(0) is called "NULL byte" in the error message you get.
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2012-02-09 23:05 -0500 |
| Message-ID | <mailman.5638.1328847009.27778.python-list@python.org> |
| In reply to | #20106 |
On Thu, 9 Feb 2012 12:32:35 -0800 (PST), SMac2347@comcast.net wrote:
>On Feb 7, 3:16 pm, Peter Otten <__pete...@web.de> wrote:
>> or use forward slashes as directory separators.
>
Or use
os.path.join([list, of, path, parts])
and have the correct separator for whatever operating system (though it
does have a problem with a drive letter component -- that does need a
"\" to indicate the root of the drive rather than what the OS thinks of
as the drive's "current directory"
> File "C:/Documents and Settings/smacdon/My Documents/
>excel_merge_files_indirectory v2.py", line 36, in merge_xls
> merged_book.save(out_file)
<snip>
>
>If I am interpreting correctly, am I to understand that it would
>appear the issue is tracing back to functions in the xlwt module? If
>so, what can I do to fix this? Again, any and all help is appreciated!
Where does out_file come from? I suspect THAT is what is corrupt.
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [next] | [standalone]
| From | SMac2347@comcast.net |
|---|---|
| Date | 2012-02-09 12:36 -0800 |
| Message-ID | <ff9c3b3c-3c47-4ed8-b4cb-60cf76da6bd5@h3g2000yqe.googlegroups.com> |
| In reply to | #19981 |
On Feb 7, 3:16 pm, Peter Otten <__pete...@web.de> wrote: > SMac2...@comcast.net wrote: > > xls_files = glob.glob(in_dir + "*.xls") > > Try changing that to > > pattern = os.path.join(in_dir, "*.xls") > xls_files = glob.glob(pattern) > > os.path.join() inserts a (back)slash between directory and filename if > necessary. > > > merge_xls(in_dir="C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS") > > If you paste the directory name literal into the interactive interpreter > you'll be surprised: > > >>> "C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS" > > 'C:\\Documents and Settings\\smacdon\\Desktop\x009 Aggregate JWS' > > "\09" is intrpreted as chr(9). Use a raw string to prevent Python from > interpreting a backslash as the start of an escape sequence > > >>> r"C:\Documents and Settings\smacdon\Desktop\09 Aggregate JWS" > > 'C:\\Documents and Settings\\smacdon\\Desktop\\09 Aggregate JWS' > > or use forward slashes as directory separators. Disregard my last post, I was able to figure it out, I also had to cover the out_file file name into a raw string as well. Thanks again for all the help!!!
[toc] | [prev] | [next] | [standalone]
| From | John Gordon <gordon@panix.com> |
|---|---|
| Date | 2012-02-07 19:46 +0000 |
| Message-ID | <jgrv32$ih7$1@reader1.panix.com> |
| In reply to | #19971 |
In <f3f576e8-e608-4348-b6ee-fe775d9d1021@x19g2000yqh.googlegroups.com> SMac2347@comcast.net writes:
> Am I correct in thinking that I need to change the current working
> directory to this folder in order for Python to read in these files,
> then generate my output?
You don't have to do it that way, no.
In general, when opening a file, you can do it two ways: Either provide
a full pathname, or provide a relative pathname.
If you provide a full pathname (for example "/usr/home/smith/myfile.txt"),
that file will be opened and it does not matter what the current working
directory is.
If you provide a relative pathname (for example "myfile.txt"), python
will attempt to open that file starting from the current working dir.
--
John Gordon A is for Amy, who fell down the stairs
gordon@panix.com B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web