Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #75813 > unrolled thread

Test for an empty directory that could be very large if it is not empty?

Started byVirgil Stokes <vs@it.uu.se>
First post2014-08-06 23:46 +0200
Last post2014-08-07 20:15 +0000
Articles 10 — 7 participants

Back to article view | Back to comp.lang.python


Contents

  Test for an empty directory that could be very large if it is not empty? Virgil Stokes <vs@it.uu.se> - 2014-08-06 23:46 +0200
    Re: Test for an empty directory that could be very large if it is not   empty? Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-08-07 18:14 +1200
      Re: Test for an empty directory that could be very large if it is not   empty? Cameron Simpson <cs@zip.com.au> - 2014-08-07 17:08 +1000
      Re: Test for an empty directory that could be very large if it is not   empty? Roy Smith <roy@panix.com> - 2014-08-07 07:54 -0400
        Re: Test for an empty directory that could be very large if it is not empty? Peter Otten <__peter__@web.de> - 2014-08-07 14:06 +0200
        Re: Test for an empty directory that could be very large if it is not empty? Tim Chase <python.list@tim.thechases.com> - 2014-08-07 07:05 -0500
          Re: Test for an empty directory that could be very large if it is not empty? Roy Smith <roy@panix.com> - 2014-08-07 08:19 -0400
            Re: Test for an empty directory that could be very large if it is not empty? Tim Chase <python.list@tim.thechases.com> - 2014-08-07 12:37 -0500
              Re: Test for an empty directory that could be very large if it is not empty? Roy Smith <roy@panix.com> - 2014-08-07 21:10 -0400
    Re: Test for an empty directory that could be very large if it is not empty? John Gordon <gordon@panix.com> - 2014-08-07 20:15 +0000

#75813 — Test for an empty directory that could be very large if it is not empty?

FromVirgil Stokes <vs@it.uu.se>
Date2014-08-06 23:46 +0200
SubjectTest for an empty directory that could be very large if it is not empty?
Message-ID<mailman.12711.1407363468.18130.python-list@python.org>
Suppose I have a directory C:/Test that is either empty or contains more 
than 2000000 files, all with the same extension (e.g. *.txt). How can I 
determine if the directory is empty WITHOUT the generation of a list of 
the file names in it (e.g. using os.listdir('C:/Test')) when it is not 
empty?

[toc] | [next] | [standalone]


#75835 — Re: Test for an empty directory that could be very large if it is not empty?

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2014-08-07 18:14 +1200
SubjectRe: Test for an empty directory that could be very large if it is not empty?
Message-ID<c4gjqvF8cmiU1@mid.individual.net>
In reply to#75813
Virgil Stokes wrote:
> How can I 
> determine if the directory is empty WITHOUT the generation of a list of 
> the file names

Which platform?

On Windows, I have no idea.

On Unix you can't really do this properly without access
to opendir() and readdir(), which Python doesn't currently
wrap.

Will the empty directories be newly created, or could they
be ones that *used* to contain 200000 files that have since
been deleted?

If they're new or nearly new, you could probably tell from
looking at the size reported by stat() on the directory.
The difference between a fresh empty directory and one with
200000 files in it should be fairly obvious.

A viable strategy might be: If the directory is very large,
assume it's not empty. If it's smallish, list its contents
to find out for sure.

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#75838 — Re: Test for an empty directory that could be very large if it is not empty?

FromCameron Simpson <cs@zip.com.au>
Date2014-08-07 17:08 +1000
SubjectRe: Test for an empty directory that could be very large if it is not empty?
Message-ID<mailman.12721.1407395347.18130.python-list@python.org>
In reply to#75835
On 07Aug2014 18:14, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
>Virgil Stokes wrote:
>>How can I determine if the directory is empty WITHOUT the generation 
>>of a list of the file names
>
>Which platform?
>
>On Windows, I have no idea.
>
>On Unix you can't really do this properly without access
>to opendir() and readdir(), which Python doesn't currently
>wrap. [...]

On UNIX (the OP seemed to be using Windows, alas), if you are prepared to be 
destructive you can just do an rmdir. It will fail if the directory is not 
empty, and performs ok on the hypothesised remote once-ginormous directory.

The commonest reason for wanting to know if a directory is empty that I can 
imagine is when you want to remove it, and if that applies here it is better to 
just try to remove it and ask for forgiveness later.

Disclaimer: Windows may not offer this handy safety net.

Cheers,
Cameron Simpson <cs@zip.com.au>

Of course no description of a Ducati engine is complete without mentioning
the sound.  That deep bass exhaust rumble along with the mechanical music of
the desmodromic valves...too bad it isn't available on compact disc.
- Sport Rider, evidently before the release of the "Ducati Passions" CD

[toc] | [prev] | [next] | [standalone]


#75844 — Re: Test for an empty directory that could be very large if it is not empty?

FromRoy Smith <roy@panix.com>
Date2014-08-07 07:54 -0400
SubjectRe: Test for an empty directory that could be very large if it is not empty?
Message-ID<roy-B1A7CD.07544807082014@news.panix.com>
In reply to#75835
In article <c4gjqvF8cmiU1@mid.individual.net>,
 Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote:

> Virgil Stokes wrote:
> > How can I 
> > determine if the directory is empty WITHOUT the generation of a list of 
> > the file names
> 
> Which platform?
> 
> On Windows, I have no idea.
> 
> On Unix you can't really do this properly without access
> to opendir() and readdir(), which Python doesn't currently
> wrap.
> 
> Will the empty directories be newly created, or could they
> be ones that *used* to contain 200000 files that have since
> been deleted?
> 
> If they're new or nearly new, you could probably tell from
> looking at the size reported by stat() on the directory.
> The difference between a fresh empty directory and one with
> 200000 files in it should be fairly obvious.
> 
> A viable strategy might be: If the directory is very large,
> assume it's not empty. If it's smallish, list its contents
> to find out for sure.

I wonder if glob.iglob('*') might help here?

[toc] | [prev] | [next] | [standalone]


#75845

FromPeter Otten <__peter__@web.de>
Date2014-08-07 14:06 +0200
Message-ID<mailman.12724.1407413193.18130.python-list@python.org>
In reply to#75844
Roy Smith wrote:

> In article <c4gjqvF8cmiU1@mid.individual.net>,
>  Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote:
> 
>> Virgil Stokes wrote:
>> > How can I
>> > determine if the directory is empty WITHOUT the generation of a list of
>> > the file names
>> 
>> Which platform?
>> 
>> On Windows, I have no idea.
>> 
>> On Unix you can't really do this properly without access
>> to opendir() and readdir(), which Python doesn't currently
>> wrap.
>> 
>> Will the empty directories be newly created, or could they
>> be ones that *used* to contain 200000 files that have since
>> been deleted?
>> 
>> If they're new or nearly new, you could probably tell from
>> looking at the size reported by stat() on the directory.
>> The difference between a fresh empty directory and one with
>> 200000 files in it should be fairly obvious.
>> 
>> A viable strategy might be: If the directory is very large,
>> assume it's not empty. If it's smallish, list its contents
>> to find out for sure.
> 
> I wonder if glob.iglob('*') might help here?

No, the glob module uses os.listdir() under the hood. Therefore iglob() is 
lazy for multiple directories only.

[toc] | [prev] | [next] | [standalone]


#75846

FromTim Chase <python.list@tim.thechases.com>
Date2014-08-07 07:05 -0500
Message-ID<mailman.12725.1407413212.18130.python-list@python.org>
In reply to#75844
On 2014-08-07 07:54, Roy Smith wrote:
> I wonder if glob.iglob('*') might help here?

My glob.iglob() uses os.listdir() behind the scenes (see glob1() in
glob.py)

-tkc

[toc] | [prev] | [next] | [standalone]


#75847

FromRoy Smith <roy@panix.com>
Date2014-08-07 08:19 -0400
Message-ID<roy-AF3AC4.08190707082014@news.panix.com>
In reply to#75846
In article <mailman.12725.1407413212.18130.python-list@python.org>,
 Tim Chase <python.list@tim.thechases.com> wrote:

> On 2014-08-07 07:54, Roy Smith wrote:
> > I wonder if glob.iglob('*') might help here?
> 
> My glob.iglob() uses os.listdir() behind the scenes (see glob1() in
> glob.py)
> 
> -tkc

In which case, the documentation for iglob() is broken.  It says:

"Return an iterator which yields the same values as glob() without 
actually storing them all simultaneously."

If it's calling something which does store them all simultaneously, 
that's like contracting with somebody to commit a crime, and then trying 
to claim you're innocent because you didn't commit the crime yourself.

[toc] | [prev] | [next] | [standalone]


#75855

FromTim Chase <python.list@tim.thechases.com>
Date2014-08-07 12:37 -0500
Message-ID<mailman.12729.1407433146.18130.python-list@python.org>
In reply to#75847
On 2014-08-07 08:19, Roy Smith wrote:
> > My glob.iglob() uses os.listdir() behind the scenes (see glob1()
> > in glob.py)
> > 
> > -tkc  
> 
> In which case, the documentation for iglob() is broken.  It says:
> 
> "Return an iterator which yields the same values as glob() without 
> actually storing them all simultaneously."

I'd tend to agree that iglob() is broken and should use the
proposed .scandir() instead for exactly those reasons.
Unfortunately, it seems that it might not get back-ported
until .scandir() hits.

-tkc

[toc] | [prev] | [next] | [standalone]


#75859

FromRoy Smith <roy@panix.com>
Date2014-08-07 21:10 -0400
Message-ID<roy-5A9393.21105207082014@news.panix.com>
In reply to#75855
In article <mailman.12729.1407433146.18130.python-list@python.org>,
 Tim Chase <python.list@tim.thechases.com> wrote:

> On 2014-08-07 08:19, Roy Smith wrote:
> > > My glob.iglob() uses os.listdir() behind the scenes (see glob1()
> > > in glob.py)
> > > 
> > > -tkc  
> > 
> > In which case, the documentation for iglob() is broken.  It says:
> > 
> > "Return an iterator which yields the same values as glob() without 
> > actually storing them all simultaneously."
> 
> I'd tend to agree that iglob() is broken and should use the
> proposed .scandir() instead for exactly those reasons.
> Unfortunately, it seems that it might not get back-ported
> until .scandir() hits.
> 
> -tkc

I opened a bug against the 2.7 docs:

http://bugs.python.org/issue22167

[toc] | [prev] | [next] | [standalone]


#75856

FromJohn Gordon <gordon@panix.com>
Date2014-08-07 20:15 +0000
Message-ID<ls0mp5$djf$1@reader1.panix.com>
In reply to#75813
In <mailman.12711.1407363468.18130.python-list@python.org> Virgil Stokes <vs@it.uu.se> writes:

> Suppose I have a directory C:/Test that is either empty or contains more 
> than 2000000 files, all with the same extension (e.g. *.txt). How can I 
> determine if the directory is empty WITHOUT the generation of a list of 
> the file names in it (e.g. using os.listdir('C:/Test')) when it is not 
> empty?

Is it one directory that is sometimes empty and other times teeming with
files, or is it a series of directories which are created afresh and then
await arrival of the files?

If the latter, you could try looking at the size of the directory entry
itself.  On the system I'm writing from, a freshly-created directory is
4K in size, and will grow in 4K chunks as more and more files are created
within the directory.  However, the directory entry does not shrink when
files are removed.

--
John Gordon         Imagine what it must be like for a real medical doctor to
gordon@panix.com    watch 'House', or a real serial killer to watch 'Dexter'.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web