Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #91860 > unrolled thread

Find in ipython3

Started byCecil Westerhof <Cecil@decebal.nl>
First post2015-06-02 18:13 +0200
Last post2015-06-07 22:13 +1000
Articles 20 — 10 participants

Back to article view | Back to comp.lang.python


Contents

  Find in ipython3 Cecil Westerhof <Cecil@decebal.nl> - 2015-06-02 18:13 +0200
    Re: Find in ipython3 Cameron Simpson <cs@zip.com.au> - 2015-06-04 12:54 +1000
      Re: Find in ipython3 Cecil Westerhof <Cecil@decebal.nl> - 2015-06-04 07:09 +0200
        Re: Find in ipython3 Cameron Simpson <cs@zip.com.au> - 2015-06-04 15:43 +1000
        Re: Find in ipython3 Grant Edwards <invalid@invalid.invalid> - 2015-06-04 14:27 +0000
          Re: Find in ipython3 Cecil Westerhof <Cecil@decebal.nl> - 2015-06-04 17:12 +0200
            Re: Find in ipython3 Michael Torrie <torriem@gmail.com> - 2015-06-04 13:11 -0600
    Re: Find in ipython3 Michael Torrie <torriem@gmail.com> - 2015-06-04 13:09 -0600
    Re: Find in ipython3 Tim Chase <python.list@tim.thechases.com> - 2015-06-04 14:17 -0500
    Re: Find in ipython3 random832@fastmail.us - 2015-06-04 16:13 -0400
      Re: Find in ipython3 Cecil Westerhof <Cecil@decebal.nl> - 2015-06-05 09:17 +0200
        Re: Find in ipython3 Cecil Westerhof <Cecil@decebal.nl> - 2015-06-06 11:57 +0200
          Re: Find in ipython3 Laura Creighton <lac@openend.se> - 2015-06-06 13:07 +0200
            Re: Find in ipython3 Cecil Westerhof <Cecil@decebal.nl> - 2015-06-07 08:20 +0200
              Re: Find in ipython3 Cameron Simpson <cs@zip.com.au> - 2015-06-07 17:38 +1000
              Re: Find in ipython3 Laura Creighton <lac@openend.se> - 2015-06-07 11:33 +0200
                Re: Find in ipython3 Steven D'Aprano <steve@pearwood.info> - 2015-06-07 23:16 +1000
              Re: Find in ipython3 Peter Otten <__peter__@web.de> - 2015-06-07 12:27 +0200
              Re: Find in ipython3 Laura Creighton <lac@openend.se> - 2015-06-07 15:01 +0200
              Re: Find in ipython3 Chris Angelico <rosuav@gmail.com> - 2015-06-07 22:13 +1000

#91860 — Find in ipython3

FromCecil Westerhof <Cecil@decebal.nl>
Date2015-06-02 18:13 +0200
SubjectFind in ipython3
Message-ID<87y4k2hyvf.fsf@Equus.decebal.nl>
I am thinking about using ipython3 instead of bash. When I want to
find a file I can do the following:
    !find ~ -iname '*python*.pdf'
but is there a python way?

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [next] | [standalone]


#92022

FromCameron Simpson <cs@zip.com.au>
Date2015-06-04 12:54 +1000
Message-ID<mailman.142.1433388453.13271.python-list@python.org>
In reply to#91860
On 02Jun2015 18:13, Cecil Westerhof <Cecil@decebal.nl> wrote:
>I am thinking about using ipython3 instead of bash. When I want to
>find a file I can do the following:
>    !find ~ -iname '*python*.pdf'
>but is there a python way?

That succinct? Not out of the box, but something can easily be built on top of 
the os.walk() function.

Cheers,
Cameron Simpson <cs@zip.com.au>

Perl combines all of the worst aspects of BASIC, C and line noise.
        - Keith Packard <keithp@ncd.com>

[toc] | [prev] | [next] | [standalone]


#92030

FromCecil Westerhof <Cecil@decebal.nl>
Date2015-06-04 07:09 +0200
Message-ID<87vbf4giu6.fsf@Equus.decebal.nl>
In reply to#92022
Op Thursday 4 Jun 2015 04:54 CEST schreef Cameron Simpson:

> On 02Jun2015 18:13, Cecil Westerhof <Cecil@decebal.nl> wrote:
>> I am thinking about using ipython3 instead of bash. When I want to
>> find a file I can do the following:
>> !find ~ -iname '*python*.pdf'
>> but is there a python way?
>
> That succinct? Not out of the box, but something can easily be built
> on top of the os.walk() function.

OK, I will write it. And something to implement du.

Thanks.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [prev] | [next] | [standalone]


#92032

FromCameron Simpson <cs@zip.com.au>
Date2015-06-04 15:43 +1000
Message-ID<mailman.145.1433396629.13271.python-list@python.org>
In reply to#92030
On 04Jun2015 07:09, Cecil Westerhof <Cecil@decebal.nl> wrote:
>Op Thursday 4 Jun 2015 04:54 CEST schreef Cameron Simpson:
>
>> On 02Jun2015 18:13, Cecil Westerhof <Cecil@decebal.nl> wrote:
>>> I am thinking about using ipython3 instead of bash. When I want to
>>> find a file I can do the following:
>>> !find ~ -iname '*python*.pdf'
>>> but is there a python way?
>>
>> That succinct? Not out of the box, but something can easily be built
>> on top of the os.walk() function.
>
>OK, I will write it. And something to implement du.
>Thanks.

Have fun.

For added points, do not forget that du knows about hardlinks and only counts a 
file once.  And that it counts the blocks field, not the byte size (think: 
sparse files).

Cheers,
Cameron Simpson <cs@zip.com.au>

Thermochromic ink is the Pet Rock ink of the New Millennium.
- overhead by WIRED at the Intelligent Printing conference Oct2006

[toc] | [prev] | [next] | [standalone]


#92061

FromGrant Edwards <invalid@invalid.invalid>
Date2015-06-04 14:27 +0000
Message-ID<mkpn9f$ghc$1@reader1.panix.com>
In reply to#92030
On 2015-06-04, Cecil Westerhof <Cecil@decebal.nl> wrote:
> Op Thursday 4 Jun 2015 04:54 CEST schreef Cameron Simpson:
>
>> On 02Jun2015 18:13, Cecil Westerhof <Cecil@decebal.nl> wrote:
>>> I am thinking about using ipython3 instead of bash. When I want to
>>> find a file I can do the following:
>>> !find ~ -iname '*python*.pdf'
>>> but is there a python way?
>>
>> That succinct? Not out of the box, but something can easily be built
>> on top of the os.walk() function.
>
> OK, I will write it. And something to implement du.

Can't IPython just call the find and du utilities?

-- 
Grant Edwards               grant.b.edwards        Yow! Are we THERE yet?
                                  at               
                              gmail.com            

[toc] | [prev] | [next] | [standalone]


#92065

FromCecil Westerhof <Cecil@decebal.nl>
Date2015-06-04 17:12 +0200
Message-ID<87iob3h5id.fsf@Equus.decebal.nl>
In reply to#92061
Op Thursday 4 Jun 2015 16:27 CEST schreef Grant Edwards:

> On 2015-06-04, Cecil Westerhof <Cecil@decebal.nl> wrote:
>> Op Thursday 4 Jun 2015 04:54 CEST schreef Cameron Simpson:
>>
>>> On 02Jun2015 18:13, Cecil Westerhof <Cecil@decebal.nl> wrote:
>>>> I am thinking about using ipython3 instead of bash. When I want
>>>> to find a file I can do the following: !find ~ -iname
>>>> '*python*.pdf' but is there a python way?
>>>
>>> That succinct? Not out of the box, but something can easily be
>>> built on top of the os.walk() function.
>>
>> OK, I will write it. And something to implement du.
>
> Can't IPython just call the find and du utilities?

That is what
    !find ~ -iname '*python*.pdf'
does. But I do not find that aesthetically.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [prev] | [next] | [standalone]


#92087

FromMichael Torrie <torriem@gmail.com>
Date2015-06-04 13:11 -0600
Message-ID<mailman.174.1433445065.13271.python-list@python.org>
In reply to#92065
On 06/04/2015 09:12 AM, Cecil Westerhof wrote:
>> Can't IPython just call the find and du utilities?
> 
> That is what
>     !find ~ -iname '*python*.pdf'
> does. But I do not find that aesthetically.

Like I said, I find ipython to be hackish, but invoking find this way is
no more hackish than writing a 100 lines of python just to emulate it.
Using these existing external commands is more portable and more robust.

You could always write an ipython function that wraps the call to the
find command so you don't always have to see it.

[toc] | [prev] | [next] | [standalone]


#92086

FromMichael Torrie <torriem@gmail.com>
Date2015-06-04 13:09 -0600
Message-ID<mailman.173.1433444951.13271.python-list@python.org>
In reply to#91860
On 06/02/2015 10:13 AM, Cecil Westerhof wrote:
> I am thinking about using ipython3 instead of bash. When I want to
> find a file I can do the following:
>     !find ~ -iname '*python*.pdf'
> but is there a python way?

No more than there is a bash-native way of doing find.  Bash scripts use
a myriad of external, utility programs like find, cut, awk, sed, etc to
do heavy lifting.  The same things work just fine in iPython.  Why do
you need a "python way?"  Besides, Python isn't really a shell scripting
language, and ipython seems more like a hack to me.  Bash, Zsh, etc are
all better suited to actual shell scripting because their syntaxs
integrate the spawning external processes and building pipes in a nice
way (if you can call bash syntax nice).  That's not to say Python can't
be used for system programming. It certainly can.  And it can replace
perl easily at fast text processing.  Generator expressions are the
single most powerful tool in this.

Why not use Python for what it's good for and say pipe the results of
find into your python script?  Reinventing find poorly isn't going to
buy you anything.

[toc] | [prev] | [next] | [standalone]


#92089

FromTim Chase <python.list@tim.thechases.com>
Date2015-06-04 14:17 -0500
Message-ID<mailman.176.1433445966.13271.python-list@python.org>
In reply to#91860
On 2015-06-04 13:09, Michael Torrie wrote:
> Why not use Python for what it's good for and say pipe the results
> of find into your python script?  Reinventing find poorly isn't
> going to buy you anything.

Until you port your app to Windows where find(1) is unavailable
natively ;-)

-tkc


[toc] | [prev] | [next] | [standalone]


#92090

Fromrandom832@fastmail.us
Date2015-06-04 16:13 -0400
Message-ID<mailman.177.1433448836.13271.python-list@python.org>
In reply to#91860
On Tue, Jun 2, 2015, at 12:13, Cecil Westerhof wrote:
> I am thinking about using ipython3 instead of bash. When I want to
> find a file I can do the following:
>     !find ~ -iname '*python*.pdf'
> but is there a python way?

Python really isn't a good substitute for a shell, but the normal python
way to do this task is:

import os, os.path, fnmatch

home = os.path.expanduser('~')  # only needed since you used ~
for dirpath, dirnames, filenames in os.walk(home):
    print(dirpath)
    for filename in filenames:
        if(fnmatch.fnmatch(filename.lower(), '*python*.pdf')):
            print(os.path.join(dirpath, filename))

Note that if you have filenames with invalid unicode characters (or any
non-ASCII characters at all on Windows) you may have to do additional
processing to the filename before printing it. And of course instead of
printing it you may want to store the filenames in a list for further
processing. But these are the basic building blocks.

I don't use ipython, so I don't know what it provides if anything to
make any of this easier.

[toc] | [prev] | [next] | [standalone]


#92127

FromCecil Westerhof <Cecil@decebal.nl>
Date2015-06-05 09:17 +0200
Message-ID<87bnguhbec.fsf@Equus.decebal.nl>
In reply to#92090
Op Thursday 4 Jun 2015 22:13 CEST schreef random:

> On Tue, Jun 2, 2015, at 12:13, Cecil Westerhof wrote:
>> I am thinking about using ipython3 instead of bash. When I want to
>> find a file I can do the following:
>> !find ~ -iname '*python*.pdf'
>> but is there a python way?
>
> Python really isn't a good substitute for a shell, but the normal
> python way to do this task is:
>
> import os, os.path, fnmatch
>
> home = os.path.expanduser('~')  # only needed since you used ~
> for dirpath, dirnames, filenames in os.walk(home):
> print(dirpath)
> for filename in filenames:
> if(fnmatch.fnmatch(filename.lower(), '*python*.pdf')):
> print(os.path.join(dirpath, filename))

I was already thinking along those lines. I made it:
    def find(directory, to_match):
        to_match = to_match.lower()
        results = []
        for dirpath, dirnames, filenames in os.walk(expanduser(directory)):
            for filename in filenames:
                if(fnmatch(filename.lower(), to_match)):
                    results.append(os.path.join(dirpath, filename))
        return results

> Note that if you have filenames with invalid unicode characters (or
> any non-ASCII characters at all on Windows) you may have to do
> additional processing to the filename before printing it. And of
> course instead of printing it you may want to store the filenames in
> a list for further processing. But these are the basic building
> blocks.

I have to look into it further. For one thing default the match should
be case dependent and an option used to make it independent.


> I don't use ipython, so I don't know what it provides if anything to
> make any of this easier.

I think it is useful to have it in Python also, so I should not use
ipython specific things.

In ‘~/.ipython/profile_default/startup/00-init.ipy’ I have:
    from utilDecebal import find

and now ‘find('~', '*Python*.pdf')’ gives what I want.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [prev] | [next] | [standalone]


#92178

FromCecil Westerhof <Cecil@decebal.nl>
Date2015-06-06 11:57 +0200
Message-ID<874mmlqhul.fsf@Equus.decebal.nl>
In reply to#92127
On Friday  5 Jun 2015 09:17 CEST, Cecil Westerhof wrote:

> I was already thinking along those lines. I made it:
> def find(directory, to_match):
> to_match = to_match.lower()
> results = []
> for dirpath, dirnames, filenames in os.walk(expanduser(directory)):
> for filename in filenames:
> if(fnmatch(filename.lower(), to_match)):
> results.append(os.path.join(dirpath, filename))
> return results

I have a slightly better variant:
    def find(directory, to_match, ignore_case = False):
        to_match =  to_match + r'$'
        if ignore_case:
            p = re.compile(to_match, re.IGNORECASE)
        else:
            p = re.compile(to_match)
        results = []
        for dirpath, dirnames, filenames in os.walk(expanduser(directory)):
            for filename in filenames:
                if p.match(filename):
                    results.append(os.path.join(dirpath, filename))
        return results

Default it works now case sensitive. But I now use regular expression.
That is a lot more efficient. The old version took 4.4 seconds and
this version takes 2.4 seconds. But the ‘!find’ version takes about
half a second. Why is this version so much less efficient?

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [prev] | [next] | [standalone]


#92180

FromLaura Creighton <lac@openend.se>
Date2015-06-06 13:07 +0200
Message-ID<mailman.215.1433588892.13271.python-list@python.org>
In reply to#92178
The !find version is C code optimised to do one thing, find files in
your directory structure, which happens to be what you want to do.
General regular expression matching is harder.

Carl Friedrich Bolz investigated regular expression algorithms and their
implementation to see if this is the sort of task that a JIT can improve.
He blogged about it in 2 posts (part1 and part2).  There are benchmarks
for part2.  Benchmarks in part2.

see:
http://morepypy.blogspot.se/2010/05/efficient-and-elegant-regular.html
http://morepypy.blogspot.se/2010/06/jit-for-regular-expression-matching.html

You may get faster results if you use Matthew Barnett's replacement
for re here: https://pypi.python.org/pypi/regex

You will get faster results if you build your IPython shell to use PyPy,
but I would still be very surprised if it beat the C program find.

Laura

[toc] | [prev] | [next] | [standalone]


#92206

FromCecil Westerhof <Cecil@decebal.nl>
Date2015-06-07 08:20 +0200
Message-ID<87sia4ox8h.fsf@Equus.decebal.nl>
In reply to#92180
On Saturday  6 Jun 2015 13:07 CEST, Laura Creighton wrote:

> The !find version is C code optimised to do one thing, find files in
> your directory structure, which happens to be what you want to do.
> General regular expression matching is harder.
>
> Carl Friedrich Bolz investigated regular expression algorithms and
> their implementation to see if this is the sort of task that a JIT
> can improve. He blogged about it in 2 posts (part1 and part2). There
> are benchmarks for part2. Benchmarks in part2.
>
> see:
> http://morepypy.blogspot.se/2010/05/efficient-and-elegant-regular.html
> http://morepypy.blogspot.se/2010/06/jit-for-regular-expression-matching.html
>
> You may get faster results if you use Matthew Barnett's replacement
> for re here: https://pypi.python.org/pypi/regex
>
> You will get faster results if you build your IPython shell to use
> PyPy, but I would still be very surprised if it beat the C program
> find.

I have to look into that. But I prefer to write a version that can be
used by ‘everyone’.

It is of-course not a very big program. The difference is significant,
but I do not use find that much. And if it is significant I still can
use the shell version.

There is no gain to get in standard Python? By switching from fnmatch
to re I got almost a speed gain of two. So I was wondering if I could
do more.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [prev] | [next] | [standalone]


#92210

FromCameron Simpson <cs@zip.com.au>
Date2015-06-07 17:38 +1000
Message-ID<mailman.232.1433664199.13271.python-list@python.org>
In reply to#92206
On 07Jun2015 08:20, Cecil Westerhof <Cecil@decebal.nl> wrote:
>There is no gain to get in standard Python? By switching from fnmatch
>to re I got almost a speed gain of two. So I was wondering if I could
>do more.

Maybe write a few versions: one really dumb using filename == matchstring (like 
-name foo), one using filename.startswith(matchstring) (like -name '*foo'), one 
using matchstring in filename (like -name '*foo*') and your current one. See 
how much these affect the runtime. You _will_ need to make multiple identical 
runs (OSes do lots of caching, and other processes competing for I/O or the CPU 
will also perturb things).

One the topic of multiple runs, have a look at the timeit module, psecificly 
designed for testing code with multiple runs.

Cheers,
Cameron Simpson <cs@zip.com.au>

[toc] | [prev] | [next] | [standalone]


#92218

FromLaura Creighton <lac@openend.se>
Date2015-06-07 11:33 +0200
Message-ID<mailman.233.1433669631.13271.python-list@python.org>
In reply to#92206
In a message of Sun, 07 Jun 2015 08:20:46 +0200, Cecil Westerhof writes:
>> You may get faster results if you use Matthew Barnett's replacement
>> for re here: https://pypi.python.org/pypi/regex
>>
>> You will get faster results if you build your IPython shell to use
>> PyPy, but I would still be very surprised if it beat the C program
>> find.
>
>I have to look into that. But I prefer to write a version that can be
>used by ‘everyone’.

Well, everybody can download Matthew Barnett's regex, and get pypy,
and the claim is that ipython just works with pypy, and if it ever doesn't
the ipython team wants a bug report, so I am not sure what you mean by
'used by "everyone"' here that these don't have.

>There is no gain to get in standard Python? By switching from fnmatch
>to re I got almost a speed gain of two. So I was wondering if I could
>do more.

That's because speeding up regular expression matching is something
that people have put a significant amount of effort in, but under the
hood, so to speak.  Other modules are likely to be slower, as people
haven't gone to so much effort to make them fast.

You have reached the point where, if your python code is too slow, and
you don't want to use PyPy, people go grab Cython (or Boost or Swig, but
Cython is a whole lot easier and more fun to use) and make themselves
a C extension.  Which you can, of course, do as part of your voyage
of discovery.

But I am still betting that it won't perform as well as the linux utility
find. :)

Laura

[toc] | [prev] | [next] | [standalone]


#92248

FromSteven D'Aprano <steve@pearwood.info>
Date2015-06-07 23:16 +1000
Message-ID<55744426$0$12986$c3e8da3$5496439d@news.astraweb.com>
In reply to#92218
On Sun, 7 Jun 2015 07:33 pm, Laura Creighton wrote:

> In a message of Sun, 07 Jun 2015 08:20:46 +0200, Cecil Westerhof writes:
>>> You may get faster results if you use Matthew Barnett's replacement
>>> for re here: https://pypi.python.org/pypi/regex
>>>
>>> You will get faster results if you build your IPython shell to use
>>> PyPy, but I would still be very surprised if it beat the C program
>>> find.
>>
>>I have to look into that. But I prefer to write a version that can be
>>used by ‘everyone’.
> 
> Well, everybody can download Matthew Barnett's regex, and get pypy,
> and the claim is that ipython just works with pypy, and if it ever doesn't
> the ipython team wants a bug report, so I am not sure what you mean by
> 'used by "everyone"' here that these don't have.


Pedantically speaking, not everyone has Internet access, or even
electricity, but you know that :-)

More to the point, I think what Cecil might be trying to say is that he
wants to rely only on the standard library. Many people in schools, or
corporate environments, do not control what is on their computer and cannot
just download regex and pypy (not if they want to keep their job). There
may be processes to follow, forms to fill out, paperwork to file, approval
to be gained, and sometimes in a culture which is perplexed by, if not
actively hostile to, FOSS culture. ("What do you mean it's free? You mean
its pirated?")

For many of those people, they've already gone through the process to get
Python and the standard library approved. It might be part of the school's
SOE, or already installed on the corporate server, for example, so they can
use the standard library easily enough, but nothing else.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#92225

FromPeter Otten <__peter__@web.de>
Date2015-06-07 12:27 +0200
Message-ID<mailman.236.1433672873.13271.python-list@python.org>
In reply to#92206
Cecil Westerhof wrote:

> On Saturday  6 Jun 2015 13:07 CEST, Laura Creighton wrote:
> 
>> The !find version is C code optimised to do one thing, find files in
>> your directory structure, which happens to be what you want to do.
>> General regular expression matching is harder.
>>
>> Carl Friedrich Bolz investigated regular expression algorithms and
>> their implementation to see if this is the sort of task that a JIT
>> can improve. He blogged about it in 2 posts (part1 and part2). There
>> are benchmarks for part2. Benchmarks in part2.
>>
>> see:
>> http://morepypy.blogspot.se/2010/05/efficient-and-elegant-regular.html
>> http://morepypy.blogspot.se/2010/06/jit-for-regular-expression-matching.html
>>
>> You may get faster results if you use Matthew Barnett's replacement
>> for re here: https://pypi.python.org/pypi/regex
>>
>> You will get faster results if you build your IPython shell to use
>> PyPy, but I would still be very surprised if it beat the C program
>> find.
> 
> I have to look into that. But I prefer to write a version that can be
> used by ‘everyone’.
> 
> It is of-course not a very big program. The difference is significant,
> but I do not use find that much. And if it is significant I still can
> use the shell version.
> 
> There is no gain to get in standard Python? By switching from fnmatch
> to re I got almost a speed gain of two. So I was wondering if I could
> do more.

Just wait for Python 3.5. The switch from os.listdir() to the (new) 
os.scandir() in the implementation of os.walk() is likely to improve the 
situation: 

$ cat findfiles.py                    
import fnmatch
import os
import re
import subprocess
import time


def find_re(root, pattern, ignore_case=False):
    match = re.compile(
        fnmatch.translate(pattern),
        re.IGNORECASE if ignore_case else 0).match

    results = []
    for path, _folders, files in os.walk(root):
        for filename in files:
            if match(filename):
                results.append(os.path.join(path, filename))
    return results


def find_sp(
        root, pattern, ignore_case=False,
        encoding="utf-8", errors="surrogateescape"):

    name_opt = "-iname" if ignore_case else "-name"
    matches = subprocess.Popen(
        ["find", root, name_opt, pattern, "-print0"], stdout=subprocess.PIPE
    ).communicate()[0].decode(encoding, errors=errors).split("\0")
    assert len(matches[-1]) == 0
    del matches[-1]
    return matches


def measure(f, *args):
    start = time.time()
    try:
        return f(*args)
    finally:
        end = time.time()
        print("{}{}".format(f.__name__, args), end - start)


def main():
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("root")
    parser.add_argument("pattern")
    parser.add_argument("-i", "--ignore-case", action="store_true")
    args = parser.parse_args()

    a = measure(find_re, args.root, args.pattern, args.ignore_case)
    b = measure(find_sp, args.root, args.pattern, args.ignore_case)
    measure(find_re, args.root, args.pattern, args.ignore_case)
    assert sorted(a) == sorted(b)
    print(len(a), "matches")


if __name__ == "__main__":
    main()

$ python3.4 findfiles.py . '*a*.PY' -i
find_re('.', '*a*.PY', True) 0.14614605903625488
find_sp('.', '*a*.PY', True) 0.043445587158203125
find_re('.', '*a*.PY', True) 0.16485309600830078
1454 matches

$ python3.5 findfiles.py . '*a*.PY' -i
find_re('.', '*a*.PY', True) 0.07263660430908203
find_sp('.', '*a*.PY', True) 0.04418468475341797
find_re('.', '*a*.PY', True) 0.07320952415466309
1454 matches

[toc] | [prev] | [next] | [standalone]


#92245

FromLaura Creighton <lac@openend.se>
Date2015-06-07 15:01 +0200
Message-ID<mailman.244.1433682115.13271.python-list@python.org>
In reply to#92206
In a message of Sun, 07 Jun 2015 12:27:05 +0200, Peter Otten writes:
>> There is no gain to get in standard Python? By switching from fnmatch
>> to re I got almost a speed gain of two. So I was wondering if I could
>> do more.
>
>Just wait for Python 3.5. The switch from os.listdir() to the (new) 
>os.scandir() in the implementation of os.walk() is likely to improve the 
>situation:

Oh cool.  I had no idea.  Thank you for showing us this.
Laura

[toc] | [prev] | [next] | [standalone]


#92253

FromChris Angelico <rosuav@gmail.com>
Date2015-06-07 22:13 +1000
Message-ID<mailman.248.1433684927.13271.python-list@python.org>
In reply to#92206
On Sun, Jun 7, 2015 at 8:27 PM, Peter Otten <__peter__@web.de> wrote:
> Just wait for Python 3.5. The switch from os.listdir() to the (new)
> os.scandir() in the implementation of os.walk() is likely to improve the
> situation

Why wait? I've been using 3.5 for ages (and actually, my
/usr/local/bin/python3 now announces itself as 3.6), and the betas
have been available with all the regular installers. Python betas are
pretty stable, and apart from a few glitches with the installers on
Windows, I haven't heard any showstopper bugs. Aside from not
depending on them for your nuclear power plant safety systems, there's
not a lot that the betas can't be used for.

Mind you, a lot of the benefit of os.scandir() comes from its
behaviour across network mounts and such, which is why you're seeing
no more than about a 2:1 difference here. From what I gather, the
improvement across network can be simply amazing (because of the way
os.listdir has to stat everything separately).

ChrisA

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web