Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #102925 > unrolled thread

Re: Make a unique filesystem path, without creating the file

Started byBen Finney <ben+python@benfinney.id.au>
First post2016-02-15 11:08 +1100
Last post2016-02-14 20:48 -0800
Articles 14 — 9 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Make a unique filesystem path, without creating the file Ben Finney <ben+python@benfinney.id.au> - 2016-02-15 11:08 +1100
    Re: Make a unique filesystem path, without creating the file Dan Sommers <dan@tombstonezero.net> - 2016-02-15 01:07 +0000
      Re: Make a unique filesystem path, without creating the file Ben Finney <ben+python@benfinney.id.au> - 2016-02-15 12:19 +1100
        Re: Make a unique filesystem path, without creating the file Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-02-15 15:54 +1100
          Re: Make a unique filesystem path, without creating the file Ben Finney <ben+python@benfinney.id.au> - 2016-02-15 16:25 +1100
          Re: Make a unique filesystem path, without creating the file Rick Johnson <rantingrickjohnson@gmail.com> - 2016-02-15 18:26 -0800
        Re: Make a unique filesystem path, without creating the file Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-02-15 21:00 +1300
          Re: Make a unique filesystem path, without creating the file Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2016-02-16 01:18 +0100
        Re: Make a unique filesystem path, without creating the file Grant Edwards <invalid@invalid.invalid> - 2016-02-15 15:49 +0000
    Re: Make a unique filesystem path, without creating the file Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-02-15 15:06 +1100
      Re: Make a unique filesystem path, without creating the file Ben Finney <ben+python@benfinney.id.au> - 2016-02-15 15:28 +1100
        Re: Make a unique filesystem path, without creating the file Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2016-02-15 21:11 +1300
        Re: Make a unique filesystem path, without creating the file Nobody <nobody@nowhere.invalid> - 2016-02-16 02:14 +0000
      Re: Make a unique filesystem path, without creating the file "Martin A. Brown" <martin@linux-ip.net> - 2016-02-14 20:48 -0800

#102925 — Re: Make a unique filesystem path, without creating the file

FromBen Finney <ben+python@benfinney.id.au>
Date2016-02-15 11:08 +1100
SubjectRe: Make a unique filesystem path, without creating the file
Message-ID<mailman.121.1455494940.22075.python-list@python.org>
Matt Wheeler <m@funkyhat.org> writes:

> On 14 Feb 2016 21:46, "Ben Finney" <ben+python@benfinney.id.au> wrote:
> > What standard library function should I be using to generate
> > ‘tempfile.mktemp’-like unique paths, and *not* ever create a real
> > file by that path?
>
> Could you use tempfile.TemporaryDirectory and then just use a
> consistent name within that directory.

That fails because it touches the filesystem. I want to avoid using a
real file or a real directory.

> It's guaranteed not to exist

I am unconcerned with whether there is a real filesystem entry of that
name; the goal entails having no filesystem activity for this. I want a
valid unique filesystem path, without touching the filesystem.

-- 
 \             “I believe our future depends powerfully on how well we |
  `\     understand this cosmos, in which we float like a mote of dust |
_o__)                 in the morning sky.” —Carl Sagan, _Cosmos_, 1980 |
Ben Finney

[toc] | [next] | [standalone]


#102931

FromDan Sommers <dan@tombstonezero.net>
Date2016-02-15 01:07 +0000
Message-ID<n9r8bk$evf$1@dont-email.me>
In reply to#102925
On Mon, 15 Feb 2016 11:08:52 +1100, Ben Finney wrote:

> I am unconcerned with whether there is a real filesystem entry of that
> name; the goal entails having no filesystem activity for this. I want
> a valid unique filesystem path, without touching the filesystem.

That's an odd use case.

If it's really just one valid filesystem path (your original post said
*paths*, plural), then how about __file__? or os.__file__?

[toc] | [prev] | [next] | [standalone]


#102932

FromBen Finney <ben+python@benfinney.id.au>
Date2016-02-15 12:19 +1100
Message-ID<mailman.126.1455499198.22075.python-list@python.org>
In reply to#102931
Dan Sommers <dan@tombstonezero.net> writes:

> On Mon, 15 Feb 2016 11:08:52 +1100, Ben Finney wrote:
>
> > I am unconcerned with whether there is a real filesystem entry of
> > that name; the goal entails having no filesystem activity for this.
> > I want a valid unique filesystem path, without touching the
> > filesystem.
>
> That's an odd use case.

It's very common to want filesystem paths divorced from accessing a
filesystem entry.

For example: test paths in a unit test. Filesystem access is orders of
magnitude slower than accessing fake files in memory only, it is more
complex and prone to irrelevant failures. So in such a test case
filesystem access should be avoided as unnecessary.

> If it's really just one valid filesystem path (your original post said
> *paths*, plural), then how about __file__? or os.__file__?

One valid filesystem path each time it's accessed. That is, behaviour
equivalent to ‘tempfile.mktemp’.

My question is because the standard library clearly has this useful
functionality implemented, but simultaneously warns strongly against its
use.

I'm looking for how to get at that functionality in a non-deprecated
way, without re-implementing it myself.

-- 
 \      “The most common way people give up their power is by thinking |
  `\                               they don't have any.” —Alice Walker |
_o__)                                                                  |
Ben Finney

[toc] | [prev] | [next] | [standalone]


#102938

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2016-02-15 15:54 +1100
Message-ID<56c15a25$0$1622$c3e8da3$5496439d@news.astraweb.com>
In reply to#102932
On Monday 15 February 2016 12:19, Ben Finney wrote:

> One valid filesystem path each time it's accessed. That is, behaviour
> equivalent to ‘tempfile.mktemp’.
> 
> My question is because the standard library clearly has this useful
> functionality implemented, but simultaneously warns strongly against its
> use.

If you can absolutely guarantee that this string will never actually be used 
on a real filesystem, then go right ahead and use it. There's nothing wrong 
with (for instance) calling mktemp to generate *strings* that merely *look* 
like pathnames.

If you want to guarantee that these faux pathnames can't leak out of your 
test suite and touch the file system, prepend an ASCII NUL to them. That 
will make it an illegal path on all file systems that I'm aware of.


> I'm looking for how to get at that functionality in a non-deprecated
> way, without re-implementing it myself.

You probably can't, not if you want to future-proof your code against the 
day when tempfile.mktemp is removed.

But you can simply fork that module, delete all the irrelevant bits, and 
make the mktemp function a private utility in your test suite.


-- 
Steve

[toc] | [prev] | [next] | [standalone]


#102939

FromBen Finney <ben+python@benfinney.id.au>
Date2016-02-15 16:25 +1100
Message-ID<mailman.129.1455513940.22075.python-list@python.org>
In reply to#102938
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:

> If you can absolutely guarantee that this string will never actually
> be used on a real filesystem, then go right ahead and use it.

I'm giving advice in examples in documentation. It's not enough to have
some private usage that I know is good, I am looking for a standard API
that when the reader looks it up will not be laden with big scary
warnings.

Currently I can write about the public API ‘tempfile.mktemp’ in
documentation, but the conscientious reader will be correct to have
concerns when the examples I give are sternly deprecated in the standard
library documentation.

Or I can write about the private API ‘tempfile._RandomNameSequence’ in
the documentation, and the conscientious reader will be correct to have
concerns about use of an undocumented private-use API.

I'm looking for a way to give examples that use that standard library
functionality, with an API that is both public and not discouraged.

> > I'm looking for how to get at that functionality in a non-deprecated
> > way, without re-implementing it myself.
>
> You probably can't, not if you want to future-proof your code against
> the day when tempfile.mktemp is removed.

That's disappointing. It is already implemented and well-tested, it is
useful as is. Forking and duplicating it is poor practice if it can
simply be used in a standard place.

I have reported <URL:https://bugs.python.org/issue26362> for this
request.

-- 
 \     “Nothing worth saying is inoffensive to everyone. Nothing worth |
  `\    saying will fail to make you enemies. And nothing worth saying |
_o__)            will not produce a confrontation.” —Johann Hari, 2011 |
Ben Finney

[toc] | [prev] | [next] | [standalone]


#102981

FromRick Johnson <rantingrickjohnson@gmail.com>
Date2016-02-15 18:26 -0800
Message-ID<e60f2e38-8ab2-45ce-ab36-d76f11cb5a80@googlegroups.com>
In reply to#102938
On Sunday, February 14, 2016 at 10:55:11 PM UTC-6, Steven D'Aprano wrote:
> If you want to guarantee that these faux pathnames can't
> leak out of your test suite and touch the file system,
> prepend an ASCII NUL to them. That will make it an illegal
> path on all file systems that I'm aware of.

Hmm, the unfounded fears in this thread are beginning to
remind me of a famous Black Sabbath song.

  Finished with "py tempfile",
  'cause it,
  couldn't help to,
  ease my mind.
  
  People think i'm insane,
  because,
  i want "faux paths",
  all the time.
  
  All day long i think of ways, 
  but nothing seems to,
  satisfy.
  
  Think i'll loose my mind,
  if i don't,
  find a py-module to,
  pacify.
  
  CAN YOU HELP ME?

  MAKE "FAUX PATHS" TODAAAAY,

  OH YEAH...
  

[toc] | [prev] | [next] | [standalone]


#102947

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2016-02-15 21:00 +1300
Message-ID<didet7Ft04gU1@mid.individual.net>
In reply to#102932
Ben Finney wrote:
> One valid filesystem path each time it's accessed. That is, behaviour
> equivalent to ‘tempfile.mktemp’.
> 
> My question is because the standard library clearly has this useful
> functionality implemented, but simultaneously warns strongly against its
> use.

But it *doesn't*, if your requirement is truly to not touch
the filesystem at all, because tempfile.mktemp() *reads* the
file system to make sure the name it's returning isn't
in use.

What's more, because you're *not* creating the file, mktemp()
would be within its rights to return the same file name the
second time you call it.

If you want something that really doesn't go near the file
system and/or is guaranteed to produce multiple different
non-existing file names, you'll have to write it yourself.

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#102977

FromThomas 'PointedEars' Lahn <PointedEars@web.de>
Date2016-02-16 01:18 +0100
Message-ID<2015485.VjBY4A5gp9@PointedEars.de>
In reply to#102947
Gregory Ewing wrote:

> Ben Finney wrote:
>> One valid filesystem path each time it's accessed. That is, behaviour
>> equivalent to ‘tempfile.mktemp’.
>> 
>> My question is because the standard library clearly has this useful
>> functionality implemented, but simultaneously warns strongly against its
>> use.
> 
> But it *doesn't*,

Yes, it does.

> if your requirement is truly to not touch the filesystem at all, because
> tempfile.mktemp() *reads* the file system to make sure the name it's
> returning isn't in use.

But there is a race condition occurring between the moment that the 
filesystem has been read and is being written to by another user.  Hence the 
deprecation in favor of tempfile.mkstemp() which also *creates* the file 
instead, and the warning about the security hole if tempfile.mktemp() is 
used anyway.

You can use tempfile.mktemp() only as long as it is irrelevant if a file 
with that name already exists, or exists later but was not created by you.

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [next] | [standalone]


#102970

FromGrant Edwards <invalid@invalid.invalid>
Date2016-02-15 15:49 +0000
Message-ID<n9ss2g$51v$2@reader1.panix.com>
In reply to#102932
On 2016-02-15, Ben Finney <ben+python@benfinney.id.au> wrote:
> Dan Sommers <dan@tombstonezero.net> writes:
>
>> On Mon, 15 Feb 2016 11:08:52 +1100, Ben Finney wrote:
>>
>> > I am unconcerned with whether there is a real filesystem entry of
>> > that name; the goal entails having no filesystem activity for this.
>> > I want a valid unique filesystem path, without touching the
>> > filesystem.
>>
>> That's an odd use case.
>
> It's very common to want filesystem paths divorced from accessing a
> filesystem entry.

If the filesystem paths are not associated with a filesystem, what do
you mean by "unique"?  You want to make sure that path <whatever>
which doesn't exist in some filesystem is different from all other
paths that don't exist in some filesystem?

> For example: test paths in a unit test. Filesystem access is orders
> of magnitude slower than accessing fake files in memory only,

How is "fake files in memory" not a filesystem?

-- 
Grant Edwards               grant.b.edwards        Yow! The Korean War must
                                  at               have been fun.
                              gmail.com            

[toc] | [prev] | [next] | [standalone]


#102935

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2016-02-15 15:06 +1100
Message-ID<56c14ed7$0$11089$c3e8da3@news.astraweb.com>
In reply to#102925
On Monday 15 February 2016 11:08, Ben Finney wrote:

> I am unconcerned with whether there is a real filesystem entry of that
> name; the goal entails having no filesystem activity for this. I want a
> valid unique filesystem path, without touching the filesystem.

Your phrasing is ambiguous.

If you are unconcerned whether or not a file of that name exists, then just 
pick a name and use that:


    unique_path = /tmp/foo

is guaranteed to be valid on POSIX systems and unique, and it may or may not 
exist.

If you actually do care that /tmp/foo *doesn't* exist, then you have a 
problem: whatever name you pick *now* may no longer "not exist" a 
millisecond later. In general there's no way to create a valid pathname 
which doesn't exist *now* and is guaranteed to continue to not exist unless 
you touch the file system.

But if you explain in more detail why you want this filename, perhaps we can 
come up with some ideas that will help.


-- 
Steve

[toc] | [prev] | [next] | [standalone]


#102936

FromBen Finney <ben+python@benfinney.id.au>
Date2016-02-15 15:28 +1100
Message-ID<mailman.127.1455510515.22075.python-list@python.org>
In reply to#102935
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:

> On Monday 15 February 2016 11:08, Ben Finney wrote:
>
> > I am unconcerned with whether there is a real filesystem entry of
> > that name; the goal entails having no filesystem activity for this.
> > I want a valid unique filesystem path, without touching the
> > filesystem.
>
> Your phrasing is ambiguous.

The existing behaviour of ‘tempfile.mktemp’ – actually of its internal
class ‘tempfile._RandomNameSequence’ – is to generate unpredictable,
unique, valid filesystem paths that are different each time.

That's the behaviour I want, in a public API that exposes what
‘tempfile’ already has implemented, documented in a way that doesn't
create a scare about security.

> But if you explain in more detail why you want this filename, perhaps
> we can come up with some ideas that will help.

The behaviour is already implemented in the standard library. What I'm
looking for is a way to use it (not re-implement it) that is public API
and isn't scolded by the library documentation.

-- 
 \     “Try adding “as long as you don't breach the terms of service – |
  `\          according to our sole judgement” to the end of any cloud |
_o__)                      computing pitch.” —Simon Phipps, 2010-12-11 |
Ben Finney

[toc] | [prev] | [next] | [standalone]


#102949

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2016-02-15 21:11 +1300
Message-ID<didfgnFt6ivU1@mid.individual.net>
In reply to#102936
Ben Finney wrote:

> The existing behaviour of ‘tempfile.mktemp’ – actually of its internal
> class ‘tempfile._RandomNameSequence’ – is to generate unpredictable,
> unique, valid filesystem paths that are different each time.

But that's not documented behaviour, so even if mktemp()
weren't marked as deprecated, you'd still be relying on
undocumented and potentially changeable behaviour.

> What I'm
> looking for is a way to use it (not re-implement it) that is public API
> and isn't scolded by the library documentation.

Then you're looking for something that doesn't exist,
I'm sorry to say, and it's unlikely you'll persuade
anyone to make it exist.

If you want to leverage stdlib functionality for this,
I'd suggest something along the lines of:

   def fakefilename(dir, ext):
     return os.path.join(dir, str(uuid.uuid4())) + ext

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#102980

FromNobody <nobody@nowhere.invalid>
Date2016-02-16 02:14 +0000
Message-ID<pan.2016.02.16.02.14.08.635000@nowhere.invalid>
In reply to#102936
On Mon, 15 Feb 2016 15:28:27 +1100, Ben Finney wrote:

> The behaviour is already implemented in the standard library. What I'm
> looking for is a way to use it (not re-implement it) that is public API
> and isn't scolded by the library documentation.

So, basically you want (essentially) the exact behaviour of
tempfile.mktemp(), except without any mention of the (genuine) risks that
such a function presents?

I suspect that you'll have to settle for either a) using that function and
simply documenting the reasons why it isn't an issue in this particular
case, or b) re-implementing it (so that you can choose to avoid mentioning
the issue in its documentation).

At the outside, you *might* have a third option: c) persuade the
maintainers to tweak the documentation to further clarify that the risk
arises from creating a file with the returned name, not from simply
calling the function. But actually it's already fairly clear if you
actually read it.

If it's the bold-face "Warning:" and the red background that you don't
like, I wouldn't expect those to go away either for mktemp() or for any
other function with similar behaviour (i.e. something which someone
*might* try to use to actually create temporary files). The simple fact
that it might get used that way is enough to warrant a prominent warning.

[toc] | [prev] | [next] | [standalone]


#102937

From"Martin A. Brown" <martin@linux-ip.net>
Date2016-02-14 20:48 -0800
Message-ID<mailman.128.1455511750.22075.python-list@python.org>
In reply to#102935
Good evening/morning Ben,

>> > I am unconcerned with whether there is a real filesystem entry of
>> > that name; the goal entails having no filesystem activity for this.
>> > I want a valid unique filesystem path, without touching the
>> > filesystem.
>>
>> Your phrasing is ambiguous.
>
>The existing behaviour of ‘tempfile.mktemp’ – actually of its 
>internal class ‘tempfile._RandomNameSequence’ – is to generate 
>unpredictable, unique, valid filesystem paths that are different 
>each time.
>
>That's the behaviour I want, in a public API that exposes what 
>‘tempfile’ already has implemented, documented in a way that 
>doesn't create a scare about security.

If your code is not actually touching the filesystem, then it will 
not be affected by the race condition identified in the 
tempfile.mktemp() warning anyway.  So, I'm unsure of your worry.

>> But if you explain in more detail why you want this filename, perhaps
>> we can come up with some ideas that will help.
>
>The behaviour is already implemented in the standard library. What 
>I'm looking for is a way to use it (not re-implement it) that is 
>public API and isn't scolded by the library documentation.

I might also suggest the (bound) method _create_tmp() on class 
mailbox.Maildir, which achieves roughly the same goals, but for a 
permanent file.

Of course, that particular method also touches the filesystem.  The 
Maildir naming approach is based on the assumptions* that time is 
monotonically increasing, that system nodes never share the same 
name and that you don't need more than 1 uniquely named file per 
directory per millisecond.

If so, then you can use the 9 or 10 lines of that method.

Good luck,

-Martin

  * I was tempted to joke about these two guarantees, but I think 
    that undermines my basic message.  To wit, you can probably rely 
    on this naming technique about as much as you can rely on your 
    system clock.  I'll assume that you aren't naming all of your 
    nodes 'franklin.p.gundersnip'.

-- 
Martin A. Brown
http://linux-ip.net/

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web