Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #107283 > unrolled thread
| Started by | Sayth Renshaw <flebber.crue@gmail.com> |
|---|---|
| First post | 2016-04-18 16:44 -0700 |
| Last post | 2016-04-19 02:17 +0100 |
| Articles | 8 — 4 participants |
Back to article view | Back to comp.lang.python
Why are my files in in my list - os module used with sys argv Sayth Renshaw <flebber.crue@gmail.com> - 2016-04-18 16:44 -0700
Re: Why are my files in in my list - os module used with sys argv Steven D'Aprano <steve@pearwood.info> - 2016-04-19 10:19 +1000
Re: Why are my files in in my list - os module used with sys argv Peter Otten <__peter__@web.de> - 2016-04-19 10:16 +0200
Re: Why are my files in in my list - os module used with sys argv Sayth Renshaw <flebber.crue@gmail.com> - 2016-04-19 06:21 -0700
Re: Why are my files in in my list - os module used with sys argv Sayth Renshaw <flebber.crue@gmail.com> - 2016-04-19 06:31 -0700
Re: Why are my files in in my list - os module used with sys argv Peter Otten <__peter__@web.de> - 2016-04-19 15:45 +0200
Re: Why are my files in in my list - os module used with sys argv Sayth Renshaw <flebber.crue@gmail.com> - 2016-04-19 07:18 -0700
Re: Why are my files in in my list - os module used with sys argv MRAB <python@mrabarnett.plus.com> - 2016-04-19 02:17 +0100
| From | Sayth Renshaw <flebber.crue@gmail.com> |
|---|---|
| Date | 2016-04-18 16:44 -0700 |
| Subject | Why are my files in in my list - os module used with sys argv |
| Message-ID | <0bce26b1-ad3d-42c4-8f99-08e82d1bce88@googlegroups.com> |
Hi
Why would it be that my files are not being found in this script?
from pyquery import PyQuery as pq
import pandas as pd
import os
import sys
if len(sys.argv) == 2:
print("no params")
sys.exit(1)
dir = sys.argv[1]
mask = sys.argv[2]
files = os.listdir(dir)
fileResult = filter(lambda x: x.endswith(mask), files)
# d = pq(filename='20160319RHIL0_edit.xml')
data = []
for file in fileResult:
print(file)
for items in fileResult:
d = pq(filename=items)
res = d('nomination')
attrs = ('id', 'horse')
data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
# from nominations
# res = d('nomination')
# nomID = [res.eq(i).attr('id') for i in range(len(res))]
# horseName = [res.eq(i).attr('horse') for i in range(len(res))]
# attrs = ('id', 'horse')
frames = pd.DataFrame(data)
print(frames)
I am running this from the bash prompt as
(pyquery)sayth@sayth-E6410:~/Projects/pyquery$ python jqxml.py samples *.xml
my directory structure
(pyquery)sayth@sayth-E6410:~/Projects/pyquery$ ls -a
. .. environment.yml .git .gitignore #jqxml.py# jqxml.py samples
and samples contains
(pyquery)sayth@sayth-E6410:~/Projects/pyquery/samples$ ls -a
. 20160319RHIL0_edit.xml 20160409RAND0.xml
.. 20160402RAND0.xml 20160416RAND0.xml
yet I get no files out of the print statement.
Ideas?
Sayth
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-04-19 10:19 +1000 |
| Message-ID | <5715798b$0$1612$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #107283 |
On Tue, 19 Apr 2016 09:44 am, Sayth Renshaw wrote: > Hi > > Why would it be that my files are not being found in this script? You are calling the script with: python jqxml.py samples *.xml This does not do what you think it does: under Linux shells, the glob *.xml will be expanded by the shell. Fortunately, in your case, you have no files in the current directory matching the glob *.xml, so it is not expanded and the arguments your script receives are: "python jqxml.py" # not used "samples" # dir "*.xml" # mask You then call: fileResult = filter(lambda x: x.endswith(mask), files) which looks for file names which end with a literal string (asterisk, dot, x, m, l) in that order. You have no files that match that string. At the shell prompt, enter this: touch samples/junk\*.xml and run the script again, and you should see that it now matches one file. Instead, what you should do is: (1) Use the glob module: https://docs.python.org/2/library/glob.html https://docs.python.org/3/library/glob.html https://pymotw.com/2/glob/ https://pymotw.com/3/glob/ (2) When calling the script, avoid the shell expanding wildcards by escaping them or quoting them: python jqxml.py samples "*.xml" -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2016-04-19 10:16 +0200 |
| Message-ID | <mailman.3.1461053808.30862.python-list@python.org> |
| In reply to | #107284 |
Steven D'Aprano wrote:
> On Tue, 19 Apr 2016 09:44 am, Sayth Renshaw wrote:
>
>> Hi
>>
>> Why would it be that my files are not being found in this script?
>
> You are calling the script with:
>
> python jqxml.py samples *.xml
>
> This does not do what you think it does: under Linux shells, the glob
> *.xml will be expanded by the shell. Fortunately, in your case, you have
> no files in the current directory matching the glob *.xml, so it is not
> expanded and the arguments your script receives are:
>
>
> "python jqxml.py" # not used
>
> "samples" # dir
>
> "*.xml" # mask
>
>
> You then call:
>
> fileResult = filter(lambda x: x.endswith(mask), files)
>
> which looks for file names which end with a literal string (asterisk, dot,
> x, m, l) in that order. You have no files that match that string.
>
> At the shell prompt, enter this:
>
> touch samples/junk\*.xml
>
> and run the script again, and you should see that it now matches one file.
>
> Instead, what you should do is:
>
>
> (1) Use the glob module:
>
> https://docs.python.org/2/library/glob.html
> https://docs.python.org/3/library/glob.html
>
> https://pymotw.com/2/glob/
> https://pymotw.com/3/glob/
>
>
> (2) When calling the script, avoid the shell expanding wildcards by
> escaping them or quoting them:
>
> python jqxml.py samples "*.xml"
(3) *Use* the expansion mechanism provided by the shell instead of fighting
it:
$ python jqxml.py samples/*.xml
This requires that you change your script
from pyquery import PyQuery as pq
import pandas as pd
import sys
fileResult = sys.argv[1:]
if not fileResult:
print("no files specified")
sys.exit(1)
for file in fileResult:
print(file)
for items in fileResult:
try:
d = pq(filename=items)
except FileNotFoundError as e:
print(e)
continue
res = d('nomination')
# you could move the attrs definition before the loop
attrs = ('id', 'horse')
# probably a bug: you are overwriting data on every iteration
data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
I think this is the most natural approach if you are willing to accept the
quirk that the script tries to process the file 'samples/*.xml' if the
samples directory doesn't contain any files with the .xml suffix. Common
shell tools work that way:
$ ls samples/*.xml
samples/1.xml samples/2.xml samples/3.xml
$ ls samples/*.XML
ls: cannot access samples/*.XML: No such file or directory
Unrelated: instead of working with sys.argv directly you could use argparse
which is part of the standard library. The code to get at least one file is
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("files", nargs="+")
args = parser.parse_args()
print(args.files)
Note that this doesn't fix the shell expansion oddity.
[toc] | [prev] | [next] | [standalone]
| From | Sayth Renshaw <flebber.crue@gmail.com> |
|---|---|
| Date | 2016-04-19 06:21 -0700 |
| Message-ID | <719abd0a-d3b4-483d-ab77-1989b06f062e@googlegroups.com> |
| In reply to | #107302 |
On Tuesday, 19 April 2016 18:17:02 UTC+10, Peter Otten wrote:
> Steven D'Aprano wrote:
>
> > On Tue, 19 Apr 2016 09:44 am, Sayth Renshaw wrote:
> >
> >> Hi
> >>
> >> Why would it be that my files are not being found in this script?
> >
> > You are calling the script with:
> >
> > python jqxml.py samples *.xml
> >
> > This does not do what you think it does: under Linux shells, the glob
> > *.xml will be expanded by the shell. Fortunately, in your case, you have
> > no files in the current directory matching the glob *.xml, so it is not
> > expanded and the arguments your script receives are:
> >
> >
> > "python jqxml.py" # not used
> >
> > "samples" # dir
> >
> > "*.xml" # mask
> >
> >
> > You then call:
> >
> > fileResult = filter(lambda x: x.endswith(mask), files)
> >
> > which looks for file names which end with a literal string (asterisk, dot,
> > x, m, l) in that order. You have no files that match that string.
> >
> > At the shell prompt, enter this:
> >
> > touch samples/junk\*.xml
> >
> > and run the script again, and you should see that it now matches one file.
> >
> > Instead, what you should do is:
> >
> >
> > (1) Use the glob module:
> >
> > https://docs.python.org/2/library/glob.html
> > https://docs.python.org/3/library/glob.html
> >
> > https://pymotw.com/2/glob/
> > https://pymotw.com/3/glob/
> >
> >
> > (2) When calling the script, avoid the shell expanding wildcards by
> > escaping them or quoting them:
> >
> > python jqxml.py samples "*.xml"
>
> (3) *Use* the expansion mechanism provided by the shell instead of fighting
> it:
>
> $ python jqxml.py samples/*.xml
>
> This requires that you change your script
>
> from pyquery import PyQuery as pq
> import pandas as pd
> import sys
>
> fileResult = sys.argv[1:]
>
> if not fileResult:
> print("no files specified")
> sys.exit(1)
>
> for file in fileResult:
> print(file)
>
> for items in fileResult:
> try:
> d = pq(filename=items)
> except FileNotFoundError as e:
> print(e)
> continue
> res = d('nomination')
> # you could move the attrs definition before the loop
> attrs = ('id', 'horse')
> # probably a bug: you are overwriting data on every iteration
> data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
>
> I think this is the most natural approach if you are willing to accept the
> quirk that the script tries to process the file 'samples/*.xml' if the
> samples directory doesn't contain any files with the .xml suffix. Common
> shell tools work that way:
>
> $ ls samples/*.xml
> samples/1.xml samples/2.xml samples/3.xml
> $ ls samples/*.XML
> ls: cannot access samples/*.XML: No such file or directory
>
> Unrelated: instead of working with sys.argv directly you could use argparse
> which is part of the standard library. The code to get at least one file is
>
> import argparse
>
> parser = argparse.ArgumentParser()
> parser.add_argument("files", nargs="+")
> args = parser.parse_args()
>
> print(args.files)
>
> Note that this doesn't fix the shell expansion oddity.
Hi
Thanks for the insight, after doing a little reading I found this post which uses both argparse and glob and attempts to cover the windows and bash expansion of wildcards, http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html
import argparse
from glob import glob
def main(file_names):
print file_names
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("file_names", nargs='*')
#nargs='*' tells it to combine all positional arguments into a single list
args = parser.parse_args()
file_names = list()
#go through all of the arguments and replace ones with wildcards with the expansion
#if a string does not contain a wildcard, glob will return it as is.
for arg in args.file_names:
file_names += glob(arg)
main(file_names)
And way beyond my needs for such a tiny script but I think tis is the flask developers python cli creation package Click http://click.pocoo.org/5/why/#why-not-argparse based of optparse.
> # probably a bug: you are overwriting data on every iteration
> data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
Thanks for picking this up will have to append to it on each iteration for each attribute.
Thank You
Sayth
[toc] | [prev] | [next] | [standalone]
| From | Sayth Renshaw <flebber.crue@gmail.com> |
|---|---|
| Date | 2016-04-19 06:31 -0700 |
| Message-ID | <19e288ee-fc9d-4ed3-bf7a-e6d76c379b60@googlegroups.com> |
| In reply to | #107319 |
On Tuesday, 19 April 2016 23:21:42 UTC+10, Sayth Renshaw wrote:
> On Tuesday, 19 April 2016 18:17:02 UTC+10, Peter Otten wrote:
> > Steven D'Aprano wrote:
> >
> > > On Tue, 19 Apr 2016 09:44 am, Sayth Renshaw wrote:
> > >
> > >> Hi
> > >>
> > >> Why would it be that my files are not being found in this script?
> > >
> > > You are calling the script with:
> > >
> > > python jqxml.py samples *.xml
> > >
> > > This does not do what you think it does: under Linux shells, the glob
> > > *.xml will be expanded by the shell. Fortunately, in your case, you have
> > > no files in the current directory matching the glob *.xml, so it is not
> > > expanded and the arguments your script receives are:
> > >
> > >
> > > "python jqxml.py" # not used
> > >
> > > "samples" # dir
> > >
> > > "*.xml" # mask
> > >
> > >
> > > You then call:
> > >
> > > fileResult = filter(lambda x: x.endswith(mask), files)
> > >
> > > which looks for file names which end with a literal string (asterisk, dot,
> > > x, m, l) in that order. You have no files that match that string.
> > >
> > > At the shell prompt, enter this:
> > >
> > > touch samples/junk\*.xml
> > >
> > > and run the script again, and you should see that it now matches one file.
> > >
> > > Instead, what you should do is:
> > >
> > >
> > > (1) Use the glob module:
> > >
> > > https://docs.python.org/2/library/glob.html
> > > https://docs.python.org/3/library/glob.html
> > >
> > > https://pymotw.com/2/glob/
> > > https://pymotw.com/3/glob/
> > >
> > >
> > > (2) When calling the script, avoid the shell expanding wildcards by
> > > escaping them or quoting them:
> > >
> > > python jqxml.py samples "*.xml"
> >
> > (3) *Use* the expansion mechanism provided by the shell instead of fighting
> > it:
> >
> > $ python jqxml.py samples/*.xml
> >
> > This requires that you change your script
> >
> > from pyquery import PyQuery as pq
> > import pandas as pd
> > import sys
> >
> > fileResult = sys.argv[1:]
> >
> > if not fileResult:
> > print("no files specified")
> > sys.exit(1)
> >
> > for file in fileResult:
> > print(file)
> >
> > for items in fileResult:
> > try:
> > d = pq(filename=items)
> > except FileNotFoundError as e:
> > print(e)
> > continue
> > res = d('nomination')
> > # you could move the attrs definition before the loop
> > attrs = ('id', 'horse')
> > # probably a bug: you are overwriting data on every iteration
> > data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
> >
> > I think this is the most natural approach if you are willing to accept the
> > quirk that the script tries to process the file 'samples/*.xml' if the
> > samples directory doesn't contain any files with the .xml suffix. Common
> > shell tools work that way:
> >
> > $ ls samples/*.xml
> > samples/1.xml samples/2.xml samples/3.xml
> > $ ls samples/*.XML
> > ls: cannot access samples/*.XML: No such file or directory
> >
> > Unrelated: instead of working with sys.argv directly you could use argparse
> > which is part of the standard library. The code to get at least one file is
> >
> > import argparse
> >
> > parser = argparse.ArgumentParser()
> > parser.add_argument("files", nargs="+")
> > args = parser.parse_args()
> >
> > print(args.files)
> >
> > Note that this doesn't fix the shell expansion oddity.
>
> Hi
>
> Thanks for the insight, after doing a little reading I found this post which uses both argparse and glob and attempts to cover the windows and bash expansion of wildcards, http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html
>
> import argparse
> from glob import glob
>
> def main(file_names):
> print file_names
>
> if __name__ == "__main__":
> parser = argparse.ArgumentParser()
> parser.add_argument("file_names", nargs='*')
> #nargs='*' tells it to combine all positional arguments into a single list
> args = parser.parse_args()
> file_names = list()
>
> #go through all of the arguments and replace ones with wildcards with the expansion
> #if a string does not contain a wildcard, glob will return it as is.
> for arg in args.file_names:
> file_names += glob(arg)
>
> main(file_names)
>
> And way beyond my needs for such a tiny script but I think tis is the flask developers python cli creation package Click http://click.pocoo.org/5/why/#why-not-argparse based of optparse.
>
>
> > # probably a bug: you are overwriting data on every iteration
> > data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
>
> Thanks for picking this up will have to append to it on each iteration for each attribute.
>
> Thank You
>
> Sayth
Scratch that bit about the code for http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html can't get it to work, good general direction though
Sayth
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2016-04-19 15:45 +0200 |
| Message-ID | <mailman.14.1461073547.30862.python-list@python.org> |
| In reply to | #107319 |
Sayth Renshaw wrote: > Thanks for the insight, after doing a little reading I found this post > which uses both argparse and glob and attempts to cover the windows and > bash expansion of wildcards, > http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html I hope you read the comment section of that page carefully. On Linux your script's behaviour will be surprising.
[toc] | [prev] | [next] | [standalone]
| From | Sayth Renshaw <flebber.crue@gmail.com> |
|---|---|
| Date | 2016-04-19 07:18 -0700 |
| Message-ID | <96dfb79a-697c-47cb-b9d9-690f4f937062@googlegroups.com> |
| In reply to | #107324 |
On Tuesday, 19 April 2016 23:46:01 UTC+10, Peter Otten wrote:
> Sayth Renshaw wrote:
>
> > Thanks for the insight, after doing a little reading I found this post
> > which uses both argparse and glob and attempts to cover the windows and
> > bash expansion of wildcards,
> > http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html
>
> I hope you read the comment section of that page carefully.
> On Linux your script's behaviour will be surprising.
Yes I have gone your way now and am parsing the files, where my data is going will have to wait till after I sleep.
Thanks for the advice.
from pyquery import PyQuery as pq
import pandas as pd
import argparse
# from glob import glob
parser = argparse.ArgumentParser(description=None)
def GetArgs(parser):
"""Parser function using argparse"""
# parser.add_argument('directory', help='directory use',
# action='store', nargs='*')
parser.add_argument("files", nargs="+")
return parser.parse_args()
fileList = GetArgs(parser)
print(fileList.files)
# d = pq(filename='20160319RHIL0_edit.xml')
data = []
attrs = ('id', 'horse')
for items in fileList.files:
d = pq(filename=items)
res = d('nomination')
dataSets = [[res.eq(i).attr(x)
for x in attrs] for i in range(len(res))]
resultList = data.append(dataSets)
frames = pd.DataFrame(resultList)
print(frames)
--------------------------------------------------------------------------
(pyquery)sayth@sayth-E6410:~/Projects/pyquery$ python jqxml.py samples/*.xml
['samples/20160319RHIL0_edit.xml', 'samples/20160402RAND0.xml', 'samples/20160409RAND0.xml', 'samples/20160416RAND0.xml']
Empty DataFrame
Columns: []
Index: []
(pyquery)sayth@sayth-E6410:~/Projects/pyquery$
Thanks
Sayth
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2016-04-19 02:17 +0100 |
| Message-ID | <mailman.159.1461028646.6324.python-list@python.org> |
| In reply to | #107283 |
On 2016-04-19 00:44, Sayth Renshaw wrote:
> Hi
>
> Why would it be that my files are not being found in this script?
>
> from pyquery import PyQuery as pq
> import pandas as pd
> import os
> import sys
>
> if len(sys.argv) == 2:
> print("no params")
> sys.exit(1)
>
> dir = sys.argv[1]
> mask = sys.argv[2]
>
> files = os.listdir(dir)
>
> fileResult = filter(lambda x: x.endswith(mask), files)
>
> # d = pq(filename='20160319RHIL0_edit.xml')
> data = []
>
> for file in fileResult:
> print(file)
>
> for items in fileResult:
> d = pq(filename=items)
> res = d('nomination')
> attrs = ('id', 'horse')
> data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
>
> # from nominations
> # res = d('nomination')
> # nomID = [res.eq(i).attr('id') for i in range(len(res))]
> # horseName = [res.eq(i).attr('horse') for i in range(len(res))]
>
> # attrs = ('id', 'horse')
>
> frames = pd.DataFrame(data)
> print(frames)
>
>
> I am running this from the bash prompt as
>
> (pyquery)sayth@sayth-E6410:~/Projects/pyquery$ python jqxml.py samples *.xml
>
> my directory structure
>
> (pyquery)sayth@sayth-E6410:~/Projects/pyquery$ ls -a
> . .. environment.yml .git .gitignore #jqxml.py# jqxml.py samples
>
> and samples contains
>
> (pyquery)sayth@sayth-E6410:~/Projects/pyquery/samples$ ls -a
> . 20160319RHIL0_edit.xml 20160409RAND0.xml
> .. 20160402RAND0.xml 20160416RAND0.xml
>
> yet I get no files out of the print statement.
>
> Ideas?
>
I don't use Linux, but I think it might be a problem with what you have
on the command line. I believe that Linux expands wildcarded names, so
what you might be getting is "samples" followed by all the names in the
current directory that match "*.xml".
Even if that isn't the case, and mask is "*.xml", the filtering that
you're doing is asking for those names that end with "*.xml"; you might
find a name that ends with ".xml", but I doubt you'll ever find one that
ends with "*.xml"!
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web