Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #34981

Re: Iterating over files of a huge directory

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <mail@timgolden.me.uk>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.002
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'python': 0.09; 'iterate': 0.09; 'linux.': 0.09; 'pywin32': 0.09; 'subject:files': 0.09; 'windows,': 0.09; 'cc:addr:python-list': 0.10; 'dec': 0.15; '2:28': 0.16; 'from:addr:timgolden.me.uk': 0.16; 'from:name:tim golden': 0.16; 'gilles': 0.16; 'googled': 0.16; 'iterates': 0.16; 'iterator': 0.16; 'message-id:@timgolden.me.uk': 0.16; 'reasonably': 0.16; 'received:74.55.86': 0.16; 'received:74.55.86.74': 0.16; 'received:smtp.webfaction.com': 0.16; 'received:webfaction.com': 0.16; 'tjg': 0.16; 'later': 0.16; 'wrote:': 0.17; 'package.': 0.17; 'memory': 0.18; 'windows': 0.19; 'produces': 0.22; 'cc:2**0': 0.23; 'cc:no real name:2**0': 0.24; 'external': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply- To:1': 0.25; 'header:User-Agent:1': 0.26; 'creating': 0.26; 'am,': 0.27; '(as': 0.27; 'in.': 0.27; 'chris': 0.28; 'giant': 0.29; 'yields': 0.29; 'this.': 0.29; "i'm": 0.29; 'problem.': 0.32; 'url:python': 0.32; 'file': 0.32; 'structure': 0.32; 'says': 0.33; 'received:192.168.100': 0.33; 'surely': 0.33; 'hi,': 0.33; 'list': 0.35; 'built-in': 0.35; 'there': 0.35; 'but': 0.36; 'url:org': 0.36; 'url:library': 0.36; 'does': 0.37; 'previous': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'skip:o 20': 0.38; 'some': 0.38; 'url:docs': 0.38; 'received:192': 0.39; 'received:192.168': 0.40; 'skip:u 10': 0.60; 'first': 0.61; 'kind': 0.61; 'customer': 0.61; 'potentially': 0.66; 'from:addr:mail': 0.71; 'sounds': 0.71; 'subject:over': 0.84; 'zen': 0.84; 'to:none': 0.93; 'hundred': 0.95
Date Mon, 17 Dec 2012 15:48:19 +0000
From Tim Golden <mail@timgolden.me.uk>
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1
MIME-Version 1.0
CC python-list@python.org
Subject Re: Iterating over files of a huge directory
References <c2b15410-12e0-4645-a77f-9944bfd674a8@googlegroups.com> <CAPTjJmrE+XOs4y9t8oyG4E_OcTAk21NnwCntr1TjwH7abh1EZA@mail.gmail.com>
In-Reply-To <CAPTjJmrE+XOs4y9t8oyG4E_OcTAk21NnwCntr1TjwH7abh1EZA@mail.gmail.com>
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding 7bit
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.962.1355759309.29569.python-list@python.org> (permalink)
Lines 30
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1355759309 news.xs4all.nl 6881 [2001:888:2000:d::a6]:60441
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:34981

Show key headers only | View raw


On 17/12/2012 15:41, Chris Angelico wrote:
> On Tue, Dec 18, 2012 at 2:28 AM, Gilles Lenfant 
> <gilles.lenfant@gmail.com> wrote:
>> Hi,
>> 
>> I have googled but did not find an efficient solution to my
>> problem. My customer provides a directory with a huuuuge list of
>> files (flat, potentially 100000+) and I cannot reasonably use
>> os.listdir(this_path) unless creating a big memory footprint.
>> 
>> So I'm looking for an iterator that yields the file names of a
>> directory and does not make a giant list of what's in.
> 
> Sounds like you want os.walk. But... a hundred thousand files? I
> know the Zen of Python says that flat is better than nested, but
> surely there's some kind of directory structure that would make this 
> marginally manageable?
> 
> http://docs.python.org/3.3/library/os.html#os.walk

Unfortunately all of the built-in functions (os.walk, glob.glob,
os.listdir) rely on the os.listdir functionality which produces a list
first even if (as in glob.iglob) it later iterates over it.

There are external functions to iterate over large directories in both
Windows & Linux. I *think* the OP is on *nix from his previous posts, in
which case someone else will have to produce the Linux-speak for this.
If it's Windows, you can use the FindFilesIterator in the pywin32 package.

TJG

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Iterating over files of a huge directory Gilles Lenfant <gilles.lenfant@gmail.com> - 2012-12-17 07:28 -0800
  Re: Iterating over files of a huge directory Chris Angelico <rosuav@gmail.com> - 2012-12-18 02:41 +1100
    Re: Iterating over files of a huge directory Paul Rudin <paul.nospam@rudin.co.uk> - 2012-12-17 17:27 +0000
      Re: Iterating over files of a huge directory MRAB <python@mrabarnett.plus.com> - 2012-12-17 18:29 +0000
      Re: Iterating over files of a huge directory Chris Angelico <rosuav@gmail.com> - 2012-12-18 08:10 +1100
  Re: Iterating over files of a huge directory Tim Golden <mail@timgolden.me.uk> - 2012-12-17 15:48 +0000
  Re: Iterating over files of a huge directory Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-12-17 15:52 +0000
    Re: Iterating over files of a huge directory Gilles Lenfant <gilles.lenfant@gmail.com> - 2012-12-17 08:06 -0800
    Re: Iterating over files of a huge directory Gilles Lenfant <gilles.lenfant@gmail.com> - 2012-12-17 08:06 -0800
  Re: Iterating over files of a huge directory marduk <marduk@python.net> - 2012-12-17 10:50 -0500
  Re: Re: Iterating over files of a huge directory Evan Driscoll <driscoll@cs.wisc.edu> - 2012-12-17 12:40 -0600
  Re: Re: Iterating over files of a huge directory Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-12-17 19:50 +0000
  Re: Iterating over files of a huge directory Evan Driscoll <driscoll@cs.wisc.edu> - 2012-12-17 14:09 -0600
  Re: Iterating over files of a huge directory Terry Reedy <tjreedy@udel.edu> - 2012-12-17 16:27 -0500

csiph-web