Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #91860 > unrolled thread
| Started by | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| First post | 2015-06-02 18:13 +0200 |
| Last post | 2015-06-07 22:13 +1000 |
| Articles | 20 — 10 participants |
Back to article view | Back to comp.lang.python
Find in ipython3 Cecil Westerhof <Cecil@decebal.nl> - 2015-06-02 18:13 +0200
Re: Find in ipython3 Cameron Simpson <cs@zip.com.au> - 2015-06-04 12:54 +1000
Re: Find in ipython3 Cecil Westerhof <Cecil@decebal.nl> - 2015-06-04 07:09 +0200
Re: Find in ipython3 Cameron Simpson <cs@zip.com.au> - 2015-06-04 15:43 +1000
Re: Find in ipython3 Grant Edwards <invalid@invalid.invalid> - 2015-06-04 14:27 +0000
Re: Find in ipython3 Cecil Westerhof <Cecil@decebal.nl> - 2015-06-04 17:12 +0200
Re: Find in ipython3 Michael Torrie <torriem@gmail.com> - 2015-06-04 13:11 -0600
Re: Find in ipython3 Michael Torrie <torriem@gmail.com> - 2015-06-04 13:09 -0600
Re: Find in ipython3 Tim Chase <python.list@tim.thechases.com> - 2015-06-04 14:17 -0500
Re: Find in ipython3 random832@fastmail.us - 2015-06-04 16:13 -0400
Re: Find in ipython3 Cecil Westerhof <Cecil@decebal.nl> - 2015-06-05 09:17 +0200
Re: Find in ipython3 Cecil Westerhof <Cecil@decebal.nl> - 2015-06-06 11:57 +0200
Re: Find in ipython3 Laura Creighton <lac@openend.se> - 2015-06-06 13:07 +0200
Re: Find in ipython3 Cecil Westerhof <Cecil@decebal.nl> - 2015-06-07 08:20 +0200
Re: Find in ipython3 Cameron Simpson <cs@zip.com.au> - 2015-06-07 17:38 +1000
Re: Find in ipython3 Laura Creighton <lac@openend.se> - 2015-06-07 11:33 +0200
Re: Find in ipython3 Steven D'Aprano <steve@pearwood.info> - 2015-06-07 23:16 +1000
Re: Find in ipython3 Peter Otten <__peter__@web.de> - 2015-06-07 12:27 +0200
Re: Find in ipython3 Laura Creighton <lac@openend.se> - 2015-06-07 15:01 +0200
Re: Find in ipython3 Chris Angelico <rosuav@gmail.com> - 2015-06-07 22:13 +1000
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-06-02 18:13 +0200 |
| Subject | Find in ipython3 |
| Message-ID | <87y4k2hyvf.fsf@Equus.decebal.nl> |
I am thinking about using ipython3 instead of bash. When I want to
find a file I can do the following:
!find ~ -iname '*python*.pdf'
but is there a python way?
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [next] | [standalone]
| From | Cameron Simpson <cs@zip.com.au> |
|---|---|
| Date | 2015-06-04 12:54 +1000 |
| Message-ID | <mailman.142.1433388453.13271.python-list@python.org> |
| In reply to | #91860 |
On 02Jun2015 18:13, Cecil Westerhof <Cecil@decebal.nl> wrote:
>I am thinking about using ipython3 instead of bash. When I want to
>find a file I can do the following:
> !find ~ -iname '*python*.pdf'
>but is there a python way?
That succinct? Not out of the box, but something can easily be built on top of
the os.walk() function.
Cheers,
Cameron Simpson <cs@zip.com.au>
Perl combines all of the worst aspects of BASIC, C and line noise.
- Keith Packard <keithp@ncd.com>
[toc] | [prev] | [next] | [standalone]
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-06-04 07:09 +0200 |
| Message-ID | <87vbf4giu6.fsf@Equus.decebal.nl> |
| In reply to | #92022 |
Op Thursday 4 Jun 2015 04:54 CEST schreef Cameron Simpson: > On 02Jun2015 18:13, Cecil Westerhof <Cecil@decebal.nl> wrote: >> I am thinking about using ipython3 instead of bash. When I want to >> find a file I can do the following: >> !find ~ -iname '*python*.pdf' >> but is there a python way? > > That succinct? Not out of the box, but something can easily be built > on top of the os.walk() function. OK, I will write it. And something to implement du. Thanks. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [prev] | [next] | [standalone]
| From | Cameron Simpson <cs@zip.com.au> |
|---|---|
| Date | 2015-06-04 15:43 +1000 |
| Message-ID | <mailman.145.1433396629.13271.python-list@python.org> |
| In reply to | #92030 |
On 04Jun2015 07:09, Cecil Westerhof <Cecil@decebal.nl> wrote: >Op Thursday 4 Jun 2015 04:54 CEST schreef Cameron Simpson: > >> On 02Jun2015 18:13, Cecil Westerhof <Cecil@decebal.nl> wrote: >>> I am thinking about using ipython3 instead of bash. When I want to >>> find a file I can do the following: >>> !find ~ -iname '*python*.pdf' >>> but is there a python way? >> >> That succinct? Not out of the box, but something can easily be built >> on top of the os.walk() function. > >OK, I will write it. And something to implement du. >Thanks. Have fun. For added points, do not forget that du knows about hardlinks and only counts a file once. And that it counts the blocks field, not the byte size (think: sparse files). Cheers, Cameron Simpson <cs@zip.com.au> Thermochromic ink is the Pet Rock ink of the New Millennium. - overhead by WIRED at the Intelligent Printing conference Oct2006
[toc] | [prev] | [next] | [standalone]
| From | Grant Edwards <invalid@invalid.invalid> |
|---|---|
| Date | 2015-06-04 14:27 +0000 |
| Message-ID | <mkpn9f$ghc$1@reader1.panix.com> |
| In reply to | #92030 |
On 2015-06-04, Cecil Westerhof <Cecil@decebal.nl> wrote:
> Op Thursday 4 Jun 2015 04:54 CEST schreef Cameron Simpson:
>
>> On 02Jun2015 18:13, Cecil Westerhof <Cecil@decebal.nl> wrote:
>>> I am thinking about using ipython3 instead of bash. When I want to
>>> find a file I can do the following:
>>> !find ~ -iname '*python*.pdf'
>>> but is there a python way?
>>
>> That succinct? Not out of the box, but something can easily be built
>> on top of the os.walk() function.
>
> OK, I will write it. And something to implement du.
Can't IPython just call the find and du utilities?
--
Grant Edwards grant.b.edwards Yow! Are we THERE yet?
at
gmail.com
[toc] | [prev] | [next] | [standalone]
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-06-04 17:12 +0200 |
| Message-ID | <87iob3h5id.fsf@Equus.decebal.nl> |
| In reply to | #92061 |
Op Thursday 4 Jun 2015 16:27 CEST schreef Grant Edwards:
> On 2015-06-04, Cecil Westerhof <Cecil@decebal.nl> wrote:
>> Op Thursday 4 Jun 2015 04:54 CEST schreef Cameron Simpson:
>>
>>> On 02Jun2015 18:13, Cecil Westerhof <Cecil@decebal.nl> wrote:
>>>> I am thinking about using ipython3 instead of bash. When I want
>>>> to find a file I can do the following: !find ~ -iname
>>>> '*python*.pdf' but is there a python way?
>>>
>>> That succinct? Not out of the box, but something can easily be
>>> built on top of the os.walk() function.
>>
>> OK, I will write it. And something to implement du.
>
> Can't IPython just call the find and du utilities?
That is what
!find ~ -iname '*python*.pdf'
does. But I do not find that aesthetically.
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2015-06-04 13:11 -0600 |
| Message-ID | <mailman.174.1433445065.13271.python-list@python.org> |
| In reply to | #92065 |
On 06/04/2015 09:12 AM, Cecil Westerhof wrote: >> Can't IPython just call the find and du utilities? > > That is what > !find ~ -iname '*python*.pdf' > does. But I do not find that aesthetically. Like I said, I find ipython to be hackish, but invoking find this way is no more hackish than writing a 100 lines of python just to emulate it. Using these existing external commands is more portable and more robust. You could always write an ipython function that wraps the call to the find command so you don't always have to see it.
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2015-06-04 13:09 -0600 |
| Message-ID | <mailman.173.1433444951.13271.python-list@python.org> |
| In reply to | #91860 |
On 06/02/2015 10:13 AM, Cecil Westerhof wrote: > I am thinking about using ipython3 instead of bash. When I want to > find a file I can do the following: > !find ~ -iname '*python*.pdf' > but is there a python way? No more than there is a bash-native way of doing find. Bash scripts use a myriad of external, utility programs like find, cut, awk, sed, etc to do heavy lifting. The same things work just fine in iPython. Why do you need a "python way?" Besides, Python isn't really a shell scripting language, and ipython seems more like a hack to me. Bash, Zsh, etc are all better suited to actual shell scripting because their syntaxs integrate the spawning external processes and building pipes in a nice way (if you can call bash syntax nice). That's not to say Python can't be used for system programming. It certainly can. And it can replace perl easily at fast text processing. Generator expressions are the single most powerful tool in this. Why not use Python for what it's good for and say pipe the results of find into your python script? Reinventing find poorly isn't going to buy you anything.
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2015-06-04 14:17 -0500 |
| Message-ID | <mailman.176.1433445966.13271.python-list@python.org> |
| In reply to | #91860 |
On 2015-06-04 13:09, Michael Torrie wrote: > Why not use Python for what it's good for and say pipe the results > of find into your python script? Reinventing find poorly isn't > going to buy you anything. Until you port your app to Windows where find(1) is unavailable natively ;-) -tkc
[toc] | [prev] | [next] | [standalone]
| From | random832@fastmail.us |
|---|---|
| Date | 2015-06-04 16:13 -0400 |
| Message-ID | <mailman.177.1433448836.13271.python-list@python.org> |
| In reply to | #91860 |
On Tue, Jun 2, 2015, at 12:13, Cecil Westerhof wrote:
> I am thinking about using ipython3 instead of bash. When I want to
> find a file I can do the following:
> !find ~ -iname '*python*.pdf'
> but is there a python way?
Python really isn't a good substitute for a shell, but the normal python
way to do this task is:
import os, os.path, fnmatch
home = os.path.expanduser('~') # only needed since you used ~
for dirpath, dirnames, filenames in os.walk(home):
print(dirpath)
for filename in filenames:
if(fnmatch.fnmatch(filename.lower(), '*python*.pdf')):
print(os.path.join(dirpath, filename))
Note that if you have filenames with invalid unicode characters (or any
non-ASCII characters at all on Windows) you may have to do additional
processing to the filename before printing it. And of course instead of
printing it you may want to store the filenames in a list for further
processing. But these are the basic building blocks.
I don't use ipython, so I don't know what it provides if anything to
make any of this easier.
[toc] | [prev] | [next] | [standalone]
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-06-05 09:17 +0200 |
| Message-ID | <87bnguhbec.fsf@Equus.decebal.nl> |
| In reply to | #92090 |
Op Thursday 4 Jun 2015 22:13 CEST schreef random:
> On Tue, Jun 2, 2015, at 12:13, Cecil Westerhof wrote:
>> I am thinking about using ipython3 instead of bash. When I want to
>> find a file I can do the following:
>> !find ~ -iname '*python*.pdf'
>> but is there a python way?
>
> Python really isn't a good substitute for a shell, but the normal
> python way to do this task is:
>
> import os, os.path, fnmatch
>
> home = os.path.expanduser('~') # only needed since you used ~
> for dirpath, dirnames, filenames in os.walk(home):
> print(dirpath)
> for filename in filenames:
> if(fnmatch.fnmatch(filename.lower(), '*python*.pdf')):
> print(os.path.join(dirpath, filename))
I was already thinking along those lines. I made it:
def find(directory, to_match):
to_match = to_match.lower()
results = []
for dirpath, dirnames, filenames in os.walk(expanduser(directory)):
for filename in filenames:
if(fnmatch(filename.lower(), to_match)):
results.append(os.path.join(dirpath, filename))
return results
> Note that if you have filenames with invalid unicode characters (or
> any non-ASCII characters at all on Windows) you may have to do
> additional processing to the filename before printing it. And of
> course instead of printing it you may want to store the filenames in
> a list for further processing. But these are the basic building
> blocks.
I have to look into it further. For one thing default the match should
be case dependent and an option used to make it independent.
> I don't use ipython, so I don't know what it provides if anything to
> make any of this easier.
I think it is useful to have it in Python also, so I should not use
ipython specific things.
In ‘~/.ipython/profile_default/startup/00-init.ipy’ I have:
from utilDecebal import find
and now ‘find('~', '*Python*.pdf')’ gives what I want.
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [prev] | [next] | [standalone]
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-06-06 11:57 +0200 |
| Message-ID | <874mmlqhul.fsf@Equus.decebal.nl> |
| In reply to | #92127 |
On Friday 5 Jun 2015 09:17 CEST, Cecil Westerhof wrote:
> I was already thinking along those lines. I made it:
> def find(directory, to_match):
> to_match = to_match.lower()
> results = []
> for dirpath, dirnames, filenames in os.walk(expanduser(directory)):
> for filename in filenames:
> if(fnmatch(filename.lower(), to_match)):
> results.append(os.path.join(dirpath, filename))
> return results
I have a slightly better variant:
def find(directory, to_match, ignore_case = False):
to_match = to_match + r'$'
if ignore_case:
p = re.compile(to_match, re.IGNORECASE)
else:
p = re.compile(to_match)
results = []
for dirpath, dirnames, filenames in os.walk(expanduser(directory)):
for filename in filenames:
if p.match(filename):
results.append(os.path.join(dirpath, filename))
return results
Default it works now case sensitive. But I now use regular expression.
That is a lot more efficient. The old version took 4.4 seconds and
this version takes 2.4 seconds. But the ‘!find’ version takes about
half a second. Why is this version so much less efficient?
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [prev] | [next] | [standalone]
| From | Laura Creighton <lac@openend.se> |
|---|---|
| Date | 2015-06-06 13:07 +0200 |
| Message-ID | <mailman.215.1433588892.13271.python-list@python.org> |
| In reply to | #92178 |
The !find version is C code optimised to do one thing, find files in your directory structure, which happens to be what you want to do. General regular expression matching is harder. Carl Friedrich Bolz investigated regular expression algorithms and their implementation to see if this is the sort of task that a JIT can improve. He blogged about it in 2 posts (part1 and part2). There are benchmarks for part2. Benchmarks in part2. see: http://morepypy.blogspot.se/2010/05/efficient-and-elegant-regular.html http://morepypy.blogspot.se/2010/06/jit-for-regular-expression-matching.html You may get faster results if you use Matthew Barnett's replacement for re here: https://pypi.python.org/pypi/regex You will get faster results if you build your IPython shell to use PyPy, but I would still be very surprised if it beat the C program find. Laura
[toc] | [prev] | [next] | [standalone]
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-06-07 08:20 +0200 |
| Message-ID | <87sia4ox8h.fsf@Equus.decebal.nl> |
| In reply to | #92180 |
On Saturday 6 Jun 2015 13:07 CEST, Laura Creighton wrote: > The !find version is C code optimised to do one thing, find files in > your directory structure, which happens to be what you want to do. > General regular expression matching is harder. > > Carl Friedrich Bolz investigated regular expression algorithms and > their implementation to see if this is the sort of task that a JIT > can improve. He blogged about it in 2 posts (part1 and part2). There > are benchmarks for part2. Benchmarks in part2. > > see: > http://morepypy.blogspot.se/2010/05/efficient-and-elegant-regular.html > http://morepypy.blogspot.se/2010/06/jit-for-regular-expression-matching.html > > You may get faster results if you use Matthew Barnett's replacement > for re here: https://pypi.python.org/pypi/regex > > You will get faster results if you build your IPython shell to use > PyPy, but I would still be very surprised if it beat the C program > find. I have to look into that. But I prefer to write a version that can be used by ‘everyone’. It is of-course not a very big program. The difference is significant, but I do not use find that much. And if it is significant I still can use the shell version. There is no gain to get in standard Python? By switching from fnmatch to re I got almost a speed gain of two. So I was wondering if I could do more. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [prev] | [next] | [standalone]
| From | Cameron Simpson <cs@zip.com.au> |
|---|---|
| Date | 2015-06-07 17:38 +1000 |
| Message-ID | <mailman.232.1433664199.13271.python-list@python.org> |
| In reply to | #92206 |
On 07Jun2015 08:20, Cecil Westerhof <Cecil@decebal.nl> wrote: >There is no gain to get in standard Python? By switching from fnmatch >to re I got almost a speed gain of two. So I was wondering if I could >do more. Maybe write a few versions: one really dumb using filename == matchstring (like -name foo), one using filename.startswith(matchstring) (like -name '*foo'), one using matchstring in filename (like -name '*foo*') and your current one. See how much these affect the runtime. You _will_ need to make multiple identical runs (OSes do lots of caching, and other processes competing for I/O or the CPU will also perturb things). One the topic of multiple runs, have a look at the timeit module, psecificly designed for testing code with multiple runs. Cheers, Cameron Simpson <cs@zip.com.au>
[toc] | [prev] | [next] | [standalone]
| From | Laura Creighton <lac@openend.se> |
|---|---|
| Date | 2015-06-07 11:33 +0200 |
| Message-ID | <mailman.233.1433669631.13271.python-list@python.org> |
| In reply to | #92206 |
In a message of Sun, 07 Jun 2015 08:20:46 +0200, Cecil Westerhof writes: >> You may get faster results if you use Matthew Barnett's replacement >> for re here: https://pypi.python.org/pypi/regex >> >> You will get faster results if you build your IPython shell to use >> PyPy, but I would still be very surprised if it beat the C program >> find. > >I have to look into that. But I prefer to write a version that can be >used by ‘everyone’. Well, everybody can download Matthew Barnett's regex, and get pypy, and the claim is that ipython just works with pypy, and if it ever doesn't the ipython team wants a bug report, so I am not sure what you mean by 'used by "everyone"' here that these don't have. >There is no gain to get in standard Python? By switching from fnmatch >to re I got almost a speed gain of two. So I was wondering if I could >do more. That's because speeding up regular expression matching is something that people have put a significant amount of effort in, but under the hood, so to speak. Other modules are likely to be slower, as people haven't gone to so much effort to make them fast. You have reached the point where, if your python code is too slow, and you don't want to use PyPy, people go grab Cython (or Boost or Swig, but Cython is a whole lot easier and more fun to use) and make themselves a C extension. Which you can, of course, do as part of your voyage of discovery. But I am still betting that it won't perform as well as the linux utility find. :) Laura
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-06-07 23:16 +1000 |
| Message-ID | <55744426$0$12986$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #92218 |
On Sun, 7 Jun 2015 07:33 pm, Laura Creighton wrote:
> In a message of Sun, 07 Jun 2015 08:20:46 +0200, Cecil Westerhof writes:
>>> You may get faster results if you use Matthew Barnett's replacement
>>> for re here: https://pypi.python.org/pypi/regex
>>>
>>> You will get faster results if you build your IPython shell to use
>>> PyPy, but I would still be very surprised if it beat the C program
>>> find.
>>
>>I have to look into that. But I prefer to write a version that can be
>>used by ‘everyone’.
>
> Well, everybody can download Matthew Barnett's regex, and get pypy,
> and the claim is that ipython just works with pypy, and if it ever doesn't
> the ipython team wants a bug report, so I am not sure what you mean by
> 'used by "everyone"' here that these don't have.
Pedantically speaking, not everyone has Internet access, or even
electricity, but you know that :-)
More to the point, I think what Cecil might be trying to say is that he
wants to rely only on the standard library. Many people in schools, or
corporate environments, do not control what is on their computer and cannot
just download regex and pypy (not if they want to keep their job). There
may be processes to follow, forms to fill out, paperwork to file, approval
to be gained, and sometimes in a culture which is perplexed by, if not
actively hostile to, FOSS culture. ("What do you mean it's free? You mean
its pirated?")
For many of those people, they've already gone through the process to get
Python and the standard library approved. It might be part of the school's
SOE, or already installed on the corporate server, for example, so they can
use the standard library easily enough, but nothing else.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-06-07 12:27 +0200 |
| Message-ID | <mailman.236.1433672873.13271.python-list@python.org> |
| In reply to | #92206 |
Cecil Westerhof wrote:
> On Saturday 6 Jun 2015 13:07 CEST, Laura Creighton wrote:
>
>> The !find version is C code optimised to do one thing, find files in
>> your directory structure, which happens to be what you want to do.
>> General regular expression matching is harder.
>>
>> Carl Friedrich Bolz investigated regular expression algorithms and
>> their implementation to see if this is the sort of task that a JIT
>> can improve. He blogged about it in 2 posts (part1 and part2). There
>> are benchmarks for part2. Benchmarks in part2.
>>
>> see:
>> http://morepypy.blogspot.se/2010/05/efficient-and-elegant-regular.html
>> http://morepypy.blogspot.se/2010/06/jit-for-regular-expression-matching.html
>>
>> You may get faster results if you use Matthew Barnett's replacement
>> for re here: https://pypi.python.org/pypi/regex
>>
>> You will get faster results if you build your IPython shell to use
>> PyPy, but I would still be very surprised if it beat the C program
>> find.
>
> I have to look into that. But I prefer to write a version that can be
> used by ‘everyone’.
>
> It is of-course not a very big program. The difference is significant,
> but I do not use find that much. And if it is significant I still can
> use the shell version.
>
> There is no gain to get in standard Python? By switching from fnmatch
> to re I got almost a speed gain of two. So I was wondering if I could
> do more.
Just wait for Python 3.5. The switch from os.listdir() to the (new)
os.scandir() in the implementation of os.walk() is likely to improve the
situation:
$ cat findfiles.py
import fnmatch
import os
import re
import subprocess
import time
def find_re(root, pattern, ignore_case=False):
match = re.compile(
fnmatch.translate(pattern),
re.IGNORECASE if ignore_case else 0).match
results = []
for path, _folders, files in os.walk(root):
for filename in files:
if match(filename):
results.append(os.path.join(path, filename))
return results
def find_sp(
root, pattern, ignore_case=False,
encoding="utf-8", errors="surrogateescape"):
name_opt = "-iname" if ignore_case else "-name"
matches = subprocess.Popen(
["find", root, name_opt, pattern, "-print0"], stdout=subprocess.PIPE
).communicate()[0].decode(encoding, errors=errors).split("\0")
assert len(matches[-1]) == 0
del matches[-1]
return matches
def measure(f, *args):
start = time.time()
try:
return f(*args)
finally:
end = time.time()
print("{}{}".format(f.__name__, args), end - start)
def main():
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("root")
parser.add_argument("pattern")
parser.add_argument("-i", "--ignore-case", action="store_true")
args = parser.parse_args()
a = measure(find_re, args.root, args.pattern, args.ignore_case)
b = measure(find_sp, args.root, args.pattern, args.ignore_case)
measure(find_re, args.root, args.pattern, args.ignore_case)
assert sorted(a) == sorted(b)
print(len(a), "matches")
if __name__ == "__main__":
main()
$ python3.4 findfiles.py . '*a*.PY' -i
find_re('.', '*a*.PY', True) 0.14614605903625488
find_sp('.', '*a*.PY', True) 0.043445587158203125
find_re('.', '*a*.PY', True) 0.16485309600830078
1454 matches
$ python3.5 findfiles.py . '*a*.PY' -i
find_re('.', '*a*.PY', True) 0.07263660430908203
find_sp('.', '*a*.PY', True) 0.04418468475341797
find_re('.', '*a*.PY', True) 0.07320952415466309
1454 matches
[toc] | [prev] | [next] | [standalone]
| From | Laura Creighton <lac@openend.se> |
|---|---|
| Date | 2015-06-07 15:01 +0200 |
| Message-ID | <mailman.244.1433682115.13271.python-list@python.org> |
| In reply to | #92206 |
In a message of Sun, 07 Jun 2015 12:27:05 +0200, Peter Otten writes: >> There is no gain to get in standard Python? By switching from fnmatch >> to re I got almost a speed gain of two. So I was wondering if I could >> do more. > >Just wait for Python 3.5. The switch from os.listdir() to the (new) >os.scandir() in the implementation of os.walk() is likely to improve the >situation: Oh cool. I had no idea. Thank you for showing us this. Laura
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-06-07 22:13 +1000 |
| Message-ID | <mailman.248.1433684927.13271.python-list@python.org> |
| In reply to | #92206 |
On Sun, Jun 7, 2015 at 8:27 PM, Peter Otten <__peter__@web.de> wrote: > Just wait for Python 3.5. The switch from os.listdir() to the (new) > os.scandir() in the implementation of os.walk() is likely to improve the > situation Why wait? I've been using 3.5 for ages (and actually, my /usr/local/bin/python3 now announces itself as 3.6), and the betas have been available with all the regular installers. Python betas are pretty stable, and apart from a few glitches with the installers on Windows, I haven't heard any showstopper bugs. Aside from not depending on them for your nuclear power plant safety systems, there's not a lot that the betas can't be used for. Mind you, a lot of the benefit of os.scandir() comes from its behaviour across network mounts and such, which is why you're seeing no more than about a 2:1 difference here. From what I gather, the improvement across network can be simply amazing (because of the way os.listdir has to stat everything separately). ChrisA
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web