Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #34981
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <mail@timgolden.me.uk> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.002 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'python': 0.09; 'iterate': 0.09; 'linux.': 0.09; 'pywin32': 0.09; 'subject:files': 0.09; 'windows,': 0.09; 'cc:addr:python-list': 0.10; 'dec': 0.15; '2:28': 0.16; 'from:addr:timgolden.me.uk': 0.16; 'from:name:tim golden': 0.16; 'gilles': 0.16; 'googled': 0.16; 'iterates': 0.16; 'iterator': 0.16; 'message-id:@timgolden.me.uk': 0.16; 'reasonably': 0.16; 'received:74.55.86': 0.16; 'received:74.55.86.74': 0.16; 'received:smtp.webfaction.com': 0.16; 'received:webfaction.com': 0.16; 'tjg': 0.16; 'later': 0.16; 'wrote:': 0.17; 'package.': 0.17; 'memory': 0.18; 'windows': 0.19; 'produces': 0.22; 'cc:2**0': 0.23; 'cc:no real name:2**0': 0.24; 'external': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply- To:1': 0.25; 'header:User-Agent:1': 0.26; 'creating': 0.26; 'am,': 0.27; '(as': 0.27; 'in.': 0.27; 'chris': 0.28; 'giant': 0.29; 'yields': 0.29; 'this.': 0.29; "i'm": 0.29; 'problem.': 0.32; 'url:python': 0.32; 'file': 0.32; 'structure': 0.32; 'says': 0.33; 'received:192.168.100': 0.33; 'surely': 0.33; 'hi,': 0.33; 'list': 0.35; 'built-in': 0.35; 'there': 0.35; 'but': 0.36; 'url:org': 0.36; 'url:library': 0.36; 'does': 0.37; 'previous': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'skip:o 20': 0.38; 'some': 0.38; 'url:docs': 0.38; 'received:192': 0.39; 'received:192.168': 0.40; 'skip:u 10': 0.60; 'first': 0.61; 'kind': 0.61; 'customer': 0.61; 'potentially': 0.66; 'from:addr:mail': 0.71; 'sounds': 0.71; 'subject:over': 0.84; 'zen': 0.84; 'to:none': 0.93; 'hundred': 0.95 |
| Date | Mon, 17 Dec 2012 15:48:19 +0000 |
| From | Tim Golden <mail@timgolden.me.uk> |
| User-Agent | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 |
| MIME-Version | 1.0 |
| CC | python-list@python.org |
| Subject | Re: Iterating over files of a huge directory |
| References | <c2b15410-12e0-4645-a77f-9944bfd674a8@googlegroups.com> <CAPTjJmrE+XOs4y9t8oyG4E_OcTAk21NnwCntr1TjwH7abh1EZA@mail.gmail.com> |
| In-Reply-To | <CAPTjJmrE+XOs4y9t8oyG4E_OcTAk21NnwCntr1TjwH7abh1EZA@mail.gmail.com> |
| Content-Type | text/plain; charset=ISO-8859-1 |
| Content-Transfer-Encoding | 7bit |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.962.1355759309.29569.python-list@python.org> (permalink) |
| Lines | 30 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1355759309 news.xs4all.nl 6881 [2001:888:2000:d::a6]:60441 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:34981 |
Show key headers only | View raw
On 17/12/2012 15:41, Chris Angelico wrote: > On Tue, Dec 18, 2012 at 2:28 AM, Gilles Lenfant > <gilles.lenfant@gmail.com> wrote: >> Hi, >> >> I have googled but did not find an efficient solution to my >> problem. My customer provides a directory with a huuuuge list of >> files (flat, potentially 100000+) and I cannot reasonably use >> os.listdir(this_path) unless creating a big memory footprint. >> >> So I'm looking for an iterator that yields the file names of a >> directory and does not make a giant list of what's in. > > Sounds like you want os.walk. But... a hundred thousand files? I > know the Zen of Python says that flat is better than nested, but > surely there's some kind of directory structure that would make this > marginally manageable? > > http://docs.python.org/3.3/library/os.html#os.walk Unfortunately all of the built-in functions (os.walk, glob.glob, os.listdir) rely on the os.listdir functionality which produces a list first even if (as in glob.iglob) it later iterates over it. There are external functions to iterate over large directories in both Windows & Linux. I *think* the OP is on *nix from his previous posts, in which case someone else will have to produce the Linux-speak for this. If it's Windows, you can use the FindFilesIterator in the pywin32 package. TJG
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Iterating over files of a huge directory Gilles Lenfant <gilles.lenfant@gmail.com> - 2012-12-17 07:28 -0800
Re: Iterating over files of a huge directory Chris Angelico <rosuav@gmail.com> - 2012-12-18 02:41 +1100
Re: Iterating over files of a huge directory Paul Rudin <paul.nospam@rudin.co.uk> - 2012-12-17 17:27 +0000
Re: Iterating over files of a huge directory MRAB <python@mrabarnett.plus.com> - 2012-12-17 18:29 +0000
Re: Iterating over files of a huge directory Chris Angelico <rosuav@gmail.com> - 2012-12-18 08:10 +1100
Re: Iterating over files of a huge directory Tim Golden <mail@timgolden.me.uk> - 2012-12-17 15:48 +0000
Re: Iterating over files of a huge directory Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-12-17 15:52 +0000
Re: Iterating over files of a huge directory Gilles Lenfant <gilles.lenfant@gmail.com> - 2012-12-17 08:06 -0800
Re: Iterating over files of a huge directory Gilles Lenfant <gilles.lenfant@gmail.com> - 2012-12-17 08:06 -0800
Re: Iterating over files of a huge directory marduk <marduk@python.net> - 2012-12-17 10:50 -0500
Re: Re: Iterating over files of a huge directory Evan Driscoll <driscoll@cs.wisc.edu> - 2012-12-17 12:40 -0600
Re: Re: Iterating over files of a huge directory Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-12-17 19:50 +0000
Re: Iterating over files of a huge directory Evan Driscoll <driscoll@cs.wisc.edu> - 2012-12-17 14:09 -0600
Re: Iterating over files of a huge directory Terry Reedy <tjreedy@udel.edu> - 2012-12-17 16:27 -0500
csiph-web