Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #107302
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Peter Otten <__peter__@web.de> |
| Newsgroups | comp.lang.python |
| Subject | Re: Why are my files in in my list - os module used with sys argv |
| Date | Tue, 19 Apr 2016 10:16:28 +0200 |
| Organization | None |
| Lines | 111 |
| Message-ID | <mailman.3.1461053808.30862.python-list@python.org> (permalink) |
| References | <0bce26b1-ad3d-42c4-8f99-08e82d1bce88@googlegroups.com> <5715798b$0$1612$c3e8da3$5496439d@news.astraweb.com> <nf4ph0$6gh$1@ger.gmane.org> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset="ISO-8859-1" |
| Content-Transfer-Encoding | 7Bit |
| X-Trace | news.uni-berlin.de gqDBlqMQ2Jf+wG4xDhaZKA+m6U88dPgOByF6YeN1X+YA== |
| Return-Path | <python-python-list@m.gmane.org> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'receives': 0.03; 'args': 0.04; 'string.': 0.04; 'sys': 0.05; 'tries': 0.05; 'mask': 0.07; 'matches': 0.07; 'literal': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'res': 0.09; 'script,': 0.09; 'subject:Why': 0.09; 'subject:files': 0.09; 'subject:module': 0.09; 'way:': 0.09; 'python': 0.10; '"python': 0.16; '2016': 0.16; 'argparse': 0.16; 'attrs': 0.16; 'bug:': 0.16; 'dot,': 0.16; 'files)': 0.16; 'iteration': 0.16; 'ls:': 0.16; 'module:': 0.16; 'overwriting': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'script?': 0.16; 'sys.exit(1)': 0.16; 'wildcards': 0.16; 'wrote:': 0.16; 'string': 0.17; 'expanded': 0.18; 'mechanism': 0.18; 'try:': 0.18; 'shell': 0.18; 'fix': 0.21; 'arguments': 0.22; 'parser': 0.22; 'file.': 0.22; 'am,': 0.23; 'matching': 0.23; 'this:': 0.23; 'import': 0.24; 'script': 0.25; 'header:User-Agent:1': 0.26; "doesn't": 0.26; 'subject:list': 0.26; 'header:X-Complaints-To:1': 0.26; 'linux': 0.26; 'expanding': 0.27; 'order.': 0.27; 'least': 0.27; 'expansion': 0.27; "skip:' 10": 0.28; 'looks': 0.29; 'fighting': 0.29; 'quoting': 0.29; 'code': 0.30; 'skip:[ 10': 0.31; 'probably': 0.31; 'are:': 0.32; 'run': 0.33; 'url:python': 0.33; 'common': 0.33; "d'aprano": 0.33; 'instead,': 0.33; 'steven': 0.33; 'case,': 0.34; 'definition': 0.34; 'tue,': 0.34; 'file': 0.34; 'except': 0.34; 'could': 0.35; 'dir': 0.35; 'library.': 0.35; 'should': 0.36; 'instead': 0.36; 'url:org': 0.36; '(3)': 0.36; 'url:library': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'being': 0.37; 'received:org': 0.37; '(2)': 0.37; '(1)': 0.38; 'names': 0.38; 'skip:p 20': 0.38; 'files': 0.38; 'end': 0.39; 'why': 0.39; 'data': 0.39; 'does': 0.39; 'skip:x 10': 0.40; 'to:addr:python.org': 0.40; 'subject:with': 0.40; 'received:de': 0.40; 'url:3': 0.60; 'your': 0.60; 'avoid': 0.61; 'touch': 0.66; 'natural': 0.67; 'escaping': 0.84; 'fortunately,': 0.84 |
| X-Injected-Via-Gmane | http://gmane.org/ |
| X-Gmane-NNTP-Posting-Host | p57bd90a3.dip0.t-ipconnect.de |
| User-Agent | KNode/4.13.3 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.22 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| X-Mailman-Original-Message-ID | <nf4ph0$6gh$1@ger.gmane.org> |
| X-Mailman-Original-References | <0bce26b1-ad3d-42c4-8f99-08e82d1bce88@googlegroups.com> <5715798b$0$1612$c3e8da3$5496439d@news.astraweb.com> |
| Xref | csiph.com comp.lang.python:107302 |
Show key headers only | View raw
Steven D'Aprano wrote:
> On Tue, 19 Apr 2016 09:44 am, Sayth Renshaw wrote:
>
>> Hi
>>
>> Why would it be that my files are not being found in this script?
>
> You are calling the script with:
>
> python jqxml.py samples *.xml
>
> This does not do what you think it does: under Linux shells, the glob
> *.xml will be expanded by the shell. Fortunately, in your case, you have
> no files in the current directory matching the glob *.xml, so it is not
> expanded and the arguments your script receives are:
>
>
> "python jqxml.py" # not used
>
> "samples" # dir
>
> "*.xml" # mask
>
>
> You then call:
>
> fileResult = filter(lambda x: x.endswith(mask), files)
>
> which looks for file names which end with a literal string (asterisk, dot,
> x, m, l) in that order. You have no files that match that string.
>
> At the shell prompt, enter this:
>
> touch samples/junk\*.xml
>
> and run the script again, and you should see that it now matches one file.
>
> Instead, what you should do is:
>
>
> (1) Use the glob module:
>
> https://docs.python.org/2/library/glob.html
> https://docs.python.org/3/library/glob.html
>
> https://pymotw.com/2/glob/
> https://pymotw.com/3/glob/
>
>
> (2) When calling the script, avoid the shell expanding wildcards by
> escaping them or quoting them:
>
> python jqxml.py samples "*.xml"
(3) *Use* the expansion mechanism provided by the shell instead of fighting
it:
$ python jqxml.py samples/*.xml
This requires that you change your script
from pyquery import PyQuery as pq
import pandas as pd
import sys
fileResult = sys.argv[1:]
if not fileResult:
print("no files specified")
sys.exit(1)
for file in fileResult:
print(file)
for items in fileResult:
try:
d = pq(filename=items)
except FileNotFoundError as e:
print(e)
continue
res = d('nomination')
# you could move the attrs definition before the loop
attrs = ('id', 'horse')
# probably a bug: you are overwriting data on every iteration
data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
I think this is the most natural approach if you are willing to accept the
quirk that the script tries to process the file 'samples/*.xml' if the
samples directory doesn't contain any files with the .xml suffix. Common
shell tools work that way:
$ ls samples/*.xml
samples/1.xml samples/2.xml samples/3.xml
$ ls samples/*.XML
ls: cannot access samples/*.XML: No such file or directory
Unrelated: instead of working with sys.argv directly you could use argparse
which is part of the standard library. The code to get at least one file is
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("files", nargs="+")
args = parser.parse_args()
print(args.files)
Note that this doesn't fix the shell expansion oddity.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Why are my files in in my list - os module used with sys argv Sayth Renshaw <flebber.crue@gmail.com> - 2016-04-18 16:44 -0700
Re: Why are my files in in my list - os module used with sys argv Steven D'Aprano <steve@pearwood.info> - 2016-04-19 10:19 +1000
Re: Why are my files in in my list - os module used with sys argv Peter Otten <__peter__@web.de> - 2016-04-19 10:16 +0200
Re: Why are my files in in my list - os module used with sys argv Sayth Renshaw <flebber.crue@gmail.com> - 2016-04-19 06:21 -0700
Re: Why are my files in in my list - os module used with sys argv Sayth Renshaw <flebber.crue@gmail.com> - 2016-04-19 06:31 -0700
Re: Why are my files in in my list - os module used with sys argv Peter Otten <__peter__@web.de> - 2016-04-19 15:45 +0200
Re: Why are my files in in my list - os module used with sys argv Sayth Renshaw <flebber.crue@gmail.com> - 2016-04-19 07:18 -0700
Re: Why are my files in in my list - os module used with sys argv MRAB <python@mrabarnett.plus.com> - 2016-04-19 02:17 +0100
csiph-web