Groups > comp.lang.python > #102989 > unrolled thread

Re: Make a unique filesystem path, without creating the file

Started by	Ben Finney <ben+python@benfinney.id.au>
First post	2016-02-16 16:56 +1100
Last post	2016-02-29 17:38 +1100
Articles	9 — 4 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Make a unique filesystem path, without creating the file Ben Finney <ben+python@benfinney.id.au> - 2016-02-16 16:56 +1100
    Re: Make a unique filesystem path, without creating the file Steven D'Aprano <steve@pearwood.info> - 2016-02-17 04:36 +1100
      Re: Make a unique filesystem path, without creating the file Ben Finney <ben+python@benfinney.id.au> - 2016-02-17 06:48 +1100
    Re: Make a unique filesystem path, without creating the file Alan Bawden <alan@csail.mit.edu> - 2016-02-16 19:24 -0500
      Re: Make a unique filesystem path, without creating the file Cameron Simpson <cs@zip.com.au> - 2016-02-22 07:35 +1100
        Re: Make a unique filesystem path, without creating the file Alan Bawden <alan@csail.mit.edu> - 2016-02-22 12:34 -0500
          Re: Make a unique filesystem path, without creating the file Cameron Simpson <cs@zip.com.au> - 2016-02-23 10:02 +1100
            Re: Make a unique filesystem path, without creating the file Alan Bawden <alan@csail.mit.edu> - 2016-02-29 00:47 -0500
              Re: Make a unique filesystem path, without creating the file Cameron Simpson <cs@zip.com.au> - 2016-02-29 17:38 +1100

#102989 — Re: Make a unique filesystem path, without creating the file

From	Ben Finney <ben+python@benfinney.id.au>
Date	2016-02-16 16:56 +1100
Subject	Re: Make a unique filesystem path, without creating the file
Message-ID	<mailman.155.1455602182.22075.python-list@python.org>

Cameron Simpson <cs@zip.com.au> writes:

> I've been watching this for a few days, and am struggling to
> understand your use case.

Yes, you're not alone. This surprises me, which is why I'm persisting.

> Can you elaborate with a concrete example and its purpose which would
> work with a mktemp-ish official function?

An example::

    import io
    import tempfile
    names = tempfile._get_candidate_names()

    def test_frobnicates_configured_spungfile():
        """ ‘foo’ should frobnicate the configured spungfile. """

        fake_file_path = os.path.join(tempfile.gettempdir(), names.next())
        fake_file = io.BytesIO("Lorem ipsum, dolor sit amet".encode("utf-8"))

        patch_builtins_open(
                when_accessing_path=fake_file_path,
                provide_file=fake_file)

        system_under_test.config.spungfile_path = fake_file_path
        system_under_test.foo()
        assert_correctly_frobnicated(fake_file)

So the test case creates a fake file, makes a valid filesystem path to
associate with it, then patches the ‘open’ function so that it will
return the fake file when that specific path is requested.

Then the test case alters the system under test's configuration, giving
it the generated filesystem path for an important file. The test case
then calls the function about which the unit test is asserting
behaviour, ‘system_under_test.foo’. When that call returns, the test
case asserts some properties of the fake file to ensure the system under
test actually accessed that file.

With a supported standard library API for this – ‘tempfile.makepath’ for
example – the generation of the filesystem path would change from four
separate function calls, one of which is a private API::

    names = tempfile._get_candidate_names()
    fake_file_path = os.path.join(tempfile.gettempdir(), names.next())

to a simple public function call::

    fake_file_path = tempfile.makepath()

This whole thread began because I expected such an API would exist.

> I don't see how it is useful to have a notion of a filepath at all
> in this case, and therefore I don't see why you would want a
> mktemp-like function available.

Because the system under test expects to be dealing with a filesystem,
including normal restrictions on filesystem paths.

The filesystem path needs to be valid because the test case isn't making
assertions about what the system does with invalid paths. A test case
should be very narrow in what it asserts so that the failure's cause is
as obvious as possible.

The filesystem path needs to be unpredictable to make sure we're not
using some hard-coded value; the test case asserts that the system under
test will access whatever file is named in the configuration.

The file object needs to be fake because the test case should not be
prone to irrelevant failures when the real filesystem isn't behaving as
expected; this test case makes assertions only about what
‘system_under_test.foo’ does internally, not what the filesystem does.

The system library functionality should be providing this because it's
*already implemented there* and well tested and maintained. It should be
in a public non-deprecated API because merely generating filesystem
paths is not a security risk.

> But.. then why a filesystem path at all in that case?

Because the system under test is expecting valid filesystem paths, and I
have no good reason to violate that constraint.

> Why use a filesystem as a reference at all?

An actual running filesystem is irrelevant to this inquiry.

I'm only wanting to use functionality, with the constraints I enumerated
earlier (already implemented in the standard library), to generate
filesystem paths.

> The only modes I can imagine for such a thing (a generated but unused
> filename) are:
>
>  checking that the name is syntactly valid, for whatever constrains
> you may have (but if you're calling an opaque mktemp-like function, is
> this feasible or remediable?)

Almost. I want the filesystem paths to be valid because the system under
test expects them, it may perform its own validation, and I have no good
reason to complicate the unit test by possibly supplying an invalid path
when that's not relevant to the test case.

>  generating test paths without using a real filesystem as a reference,
> but then you can't even use mktemp

I hadn't realised the filesystem was accessed by ‘tempfile.mktemp’, and
I apologise for the complication that entails.

I would prefer to access some standard public documented non-deprecated
function that internally uses ‘tempfile._get_candidate_names’ and
returns a new path each time.

> I think "the standard library clearly has this useful functionality
> implemented, but simultaneously warns strongly against its use" pretty
> much precludes this.

I hope to get that addressed with <URL:https://bugs.python.org/issue26362>.

-- 
 \       “Timid men prefer the calm of despotism to the boisterous sea |
  `\                                    of liberty.” —Thomas Jefferson |
_o__)                                                                  |
Ben Finney

[toc] | [next] | [standalone]

#103024

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-02-17 04:36 +1100
Message-ID	<56c35e34$0$1600$c3e8da3$5496439d@news.astraweb.com>
In reply to	#102989

On Tue, 16 Feb 2016 04:56 pm, Ben Finney wrote:

> An example::
> 
>     import io
>     import tempfile
>     names = tempfile._get_candidate_names()

I'm not sure that calling a private function of the tempfile module is
better than calling a deprecated function.

>     def test_frobnicates_configured_spungfile():
>         """ ‘foo’ should frobnicate the configured spungfile. """
> 
>         fake_file_path = os.path.join(tempfile.gettempdir(), names.next())

At this point, you have a valid pathname, but no guarantee whether it refers
to a real file on the file system or not. That's the whole problem with
tempfile.makepath -- it can return a file name which is not in use, but by
the time it returns to you, you cannot guarantee that it still doesn't
exist.

Now, since this is a test which doesn't actually open that file, it doesn't
matter. There's no actual security vulnerability here. So your test doesn't
actually require that the file is unique, or that it doesn't actually
exist. (Which is good, because you can't guarantee that it doesn't exist.)

So why not just pick a random bunch of characters?

    chars = list(string.ascii_letters)
    random.shuffle(chars)
    fake_file_path = ''.join(chars[:10])

>         fake_file = io.BytesIO("Lorem ipsum, dolor sit
>         amet".encode("utf-8"))
>
>         patch_builtins_open(
>                 when_accessing_path=fake_file_path,
>                 provide_file=fake_file)

There's nothing apparent in this that requires that fake_file_path not
actually exist, which is good since (as I've pointed out before) you cannot
guarantee that it doesn't exist. One could just as easily, and just as
correctly, write:

        patch_builtins_open(
                when_accessing_path='/foo/bar/baz',
                provide_file=fake_file)

and regardless of whether /foo/bar/baz actually exists or not, you are
guaranteed to get the fake file rather than the real file. So I question
whether you actually need this tempfile.makepath function at all.

*But* having questioned it, for the sake of the argument I'll assume you do
need it, and continue accordingly.

>         system_under_test.config.spungfile_path = fake_file_path
>         system_under_test.foo()
>         assert_correctly_frobnicated(fake_file)
> 
> So the test case creates a fake file, makes a valid filesystem path to
> associate with it, then patches the ‘open’ function so that it will
> return the fake file when that specific path is requested.
> 
> Then the test case alters the system under test's configuration, giving
> it the generated filesystem path for an important file. The test case
> then calls the function about which the unit test is asserting
> behaviour, ‘system_under_test.foo’. When that call returns, the test
> case asserts some properties of the fake file to ensure the system under
> test actually accessed that file.

Personally, I think it would be simpler and easier to understand if, instead
of patching open, you allowed the test to read and write real files:

    file_path = '/tmp/spam'
    system_under_test.config.spungfile_path = file_path
    system_under_test.foo()
    assert_correctly_frobnicated(file_path)
    os.unlink(file_path)

In practice, I'd want to only unlike the file if the test passes. If it
fails, I'd want to look at the file to see why it wasn't frobnicated.

I think that a correctly-working filesystem is a perfectly reasonable
prerequisite for the test, just like a working CPU, memory, power supply,
operating system and Python interpreter. You don't have to guard against
every imaginable failure ("fixme: test may return invalid results if the
speed of light changes by more than 0.0001%"), and you might as well take
advantage of real files for debugging. But that's my opinion, and if you
have another, that's your personal choice.

> With a supported standard library API for this – ‘tempfile.makepath’ for
> example – the generation of the filesystem path would change from four
> separate function calls, one of which is a private API::
> 
>     names = tempfile._get_candidate_names()
>     fake_file_path = os.path.join(tempfile.gettempdir(), names.next())
> 
> to a simple public function call::
> 
>     fake_file_path = tempfile.makepath()

Nobody doubts that your use of tempfile.makepath is legitimate for your
use-case. But it is *not* legitimate for the tempfile module, and it is a
mistake that it was added in the first place, hence the deprecation.
Assuming that your test suite needs this function, your test library, or
test suite, should provide that function, not tempfile. I believe it is
unreasonable to expect the tempfile module to keep a function which is a
security risk in the context of "temp files" just because it is useful for
some completely unrelated use-cases.

After all, your use of this doesn't actually have anything to do with
temporary files. It is a mocked *permanent* file, not a real temporary one.

> This whole thread began because I expected such an API would exist.
> 
> 
>> I don't see how it is useful to have a notion of a filepath at all
>> in this case, and therefore I don't see why you would want a
>> mktemp-like function available.
> 
> Because the system under test expects to be dealing with a filesystem,
> including normal restrictions on filesystem paths.

Yes, but the system doesn't try to enforce the filesystem's rules, does it?
Apart from simple restrictions like "path must be a string", I wouldn't
expect your system to make rules like:

- file names must be 8 characters followed by a dot followed by 3
characters;

- paths must not contain ASCII nulls;

etc. That's for the file system to enforce. So you don't really know ahead
of time what "normal restrictions" exist. (Are newlines allowed in
filenames? ASCII null bytes? limited to 2**16 path components?)

So your test can completely ignore any and all restrictions, and your
monkeypatched open() will be perfectly happy to deal with them:

    fake_file_path = ""  # Empty string.
    patch_builtins_open(
            when_accessing_path=fake_file_path,
            provide_file=fake_file)

and your system shouldn't care.

(Well, perhaps it will reject the empty string as a path. But it shouldn't
reject anything else.)

> The file object needs to be fake because the test case should not be
> prone to irrelevant failures when the real filesystem isn't behaving as
> expected; this test case makes assertions only about what
> ‘system_under_test.foo’ does internally, not what the filesystem does.

Since you're monkeypatching open(), the patched open() can define any path
it likes as "fake".

Admittedly it might be a bit scary to see things like:

    open("/dev/sda", "w").write("pwned!!!")

in the test suite, but it's perfectly safe (assuming your test harness is
working correctly to patch open).

>> But.. then why a filesystem path at all in that case?
> 
> Because the system under test is expecting valid filesystem paths, and I
> have no good reason to violate that constraint.

Since your test doesn't know what filesystem your code will be running on,
you can't make any assumptions about what paths are valid or not valid.

> Almost. I want the filesystem paths to be valid because the system under
> test expects them, it may perform its own validation, 

If the system tries to validate paths, it is broken. That's how you get
broken applications that insist that all file names must be located in 
C:\\My Documents. The application should allow the file system to validate
paths.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#103028

From	Ben Finney <ben+python@benfinney.id.au>
Date	2016-02-17 06:48 +1100
Message-ID	<mailman.182.1455652104.22075.python-list@python.org>
In reply to	#103024

Steven D'Aprano <steve@pearwood.info> writes:

> On Tue, 16 Feb 2016 04:56 pm, Ben Finney wrote:
>
> >     names = tempfile._get_candidate_names()
>
> I'm not sure that calling a private function of the tempfile module is
> better than calling a deprecated function.

Agreed, which is why I'm seeking a public API that is not deprecated.

> So why not just pick a random bunch of characters?
>
>     chars = list(string.ascii_letters)
>     random.shuffle(chars)
>     fake_file_path = ''.join(chars[:10])

This (an equivalent) is already implemented, internally to ‘tempfile’
and tested and maintained and more robust than me re-inventing the wheel.

> Yes, but the system doesn't try to enforce the filesystem's rules,
> does it?

The test case I'm writing should not be prone to failure if the system
happens to perform some arbitrary validation of filesystem paths.

‘tempfile’ already knows how to generate filesystem paths, I want to use
that and not have to get it right myself.

> and your system shouldn't care.

If it does, this test case should not fail.

> Since your test doesn't know what filesystem your code will be running
> on, you can't make any assumptions about what paths are valid or not
> valid.

That implies that ‘tempfile._get_candidate_names’ would generate paths
that would potentially be invalid. Is that what you intend to imply?

> > Almost. I want the filesystem paths to be valid because the system
> > under test expects them, it may perform its own validation,
>
> If the system tries to validate paths, it is broken.

This is “you don't want what you say you want”, and seeing the
justifications presented I don't agree.

-- 
 \     “I must say that I find television very educational. The minute |
  `\       somebody turns it on, I go to the library and read a book.” |
_o__)                                                    —Groucho Marx |
Ben Finney

[toc] | [prev] | [next] | [standalone]

#103040

From	Alan Bawden <alan@csail.mit.edu>
Date	2016-02-16 19:24 -0500
Message-ID	<w2dlh6knpzf.fsf@daphne.csail.mit.edu>
In reply to	#102989

Ben Finney <ben+python@benfinney.id.au> writes:

> Cameron Simpson <cs@zip.com.au> writes:
>
>> I've been watching this for a few days, and am struggling to
>> understand your use case.
>
> Yes, you're not alone. This surprises me, which is why I'm persisting.
>
>> Can you elaborate with a concrete example and its purpose which would
>> work with a mktemp-ish official function?
>
> An example::

Let me present another example that might strike some as more
straightforward.

If I want to create a temporary file, I can call mkstemp().  
If I want to create a temporary directory, I can call mkdtemp().

Suppose that instead of a file or a directory, I want a FIFO or a
socket.

A FIFO is created by passing a pathname to os.mkfifo().  A socket is
created by passing a pathname to an AF_UNIX socket's bind() method.  In
both cases, the pathname must not name anything yet (not even a symbolic
link), otherwise the call will fail.

So in the FIFO case, I might write something like the following:

    def make_temp_fifo(mode=0o600):
        while True:
            path = tempfile.mktemp()
            try:
                os.mkfifo(path, mode=mode)
            except FileExistsError:
                pass
            else:
                return path

mktemp() is convenient here, because I don't have to worry about whether
I should be using "/tmp" or "/var/tmp" or "c:\temp", or whether the
TMPDIR environment variable is set, or whether I have permission to
create entries in those directories.  It just gives me a pathname
without making me think about the rest of that stuff.  Yes, I have to
defend against the possibility that somebody else creates something with
the same name first, but as you can see, I did that, and it wasn't
rocket science.

So is there something wrong with the above code?  Other than the fact
that the documentation says something scary about mktemp()?

It looks to me like mktemp() provides some real utility, packaged up in
a way that is orthogonal to the type of file system entry I want to
create, the permissions I want to give to that entry, and the mode I
want use to open it.  It looks like a useful, albeit low-level,
primitive that it is perfectly reasonable for the tempfile module to
supply.

And yet the documentation condemns it as "deprecated", and tells me I
should use mkstemp() instead.  (As if that would be of any use in the
situation above!)  It looks like anxiety that some people might use
mktemp() in a stupid way has caused an over-reaction.  Let the
documentation warn about the problem and point to prepackaged solutions
in the common cases of making files and directories, but I see no good
reason to deprecate this useful utility.

-- 
Alan Bawden

[toc] | [prev] | [next] | [standalone]

#103317

From	Cameron Simpson <cs@zip.com.au>
Date	2016-02-22 07:35 +1100
Message-ID	<mailman.24.1456086947.20994.python-list@python.org>
In reply to	#103040

On 16Feb2016 19:24, Alan Bawden <alan@csail.mit.edu> wrote:
>Ben Finney <ben+python@benfinney.id.au> writes:
>> Cameron Simpson <cs@zip.com.au> writes:
>>> I've been watching this for a few days, and am struggling to
>>> understand your use case.
>>
>> Yes, you're not alone. This surprises me, which is why I'm persisting.
>>
>>> Can you elaborate with a concrete example and its purpose which would
>>> work with a mktemp-ish official function?
>>
>> An example::
>
>Let me present another example that might strike some as more
>straightforward.
>
>If I want to create a temporary file, I can call mkstemp().
>If I want to create a temporary directory, I can call mkdtemp().
>
>Suppose that instead of a file or a directory, I want a FIFO or a
>socket.
>
>A FIFO is created by passing a pathname to os.mkfifo().  A socket is
>created by passing a pathname to an AF_UNIX socket's bind() method.  In
>both cases, the pathname must not name anything yet (not even a symbolic
>link), otherwise the call will fail.
>
>So in the FIFO case, I might write something like the following:
>
>    def make_temp_fifo(mode=0o600):
>        while True:
>            path = tempfile.mktemp()
>            try:
>                os.mkfifo(path, mode=mode)
>            except FileExistsError:
>                pass
>            else:
>                return path
>
>mktemp() is convenient here, because I don't have to worry about whether
>I should be using "/tmp" or "/var/tmp" or "c:\temp", or whether the
>TMPDIR environment variable is set, or whether I have permission to
>create entries in those directories.  It just gives me a pathname
>without making me think about the rest of that stuff.

Yes, that is highly desirable.

>Yes, I have to
>defend against the possibility that somebody else creates something with
>the same name first, but as you can see, I did that, and it wasn't
>rocket science.
>
>So is there something wrong with the above code?  Other than the fact
>that the documentation says something scary about mktemp()?

Well, it has a few shortcomings.

It relies on mkfifo reliably failing if the name exists. It shounds like mkfifo 
is reliable this way, but I can imagine analogous use cases without such a 
convenient core action, and your code only avoids mktemp's security issue 
_because_ mkfifo has that fortuitous aspect.

>It looks to me like mktemp() provides some real utility, packaged up in
>a way that is orthogonal to the type of file system entry I want to
>create, the permissions I want to give to that entry, and the mode I
>want use to open it.  It looks like a useful, albeit low-level,
>primitive that it is perfectly reasonable for the tempfile module to
>supply.

Secondly, why is your example better than::

  os.mkfifo(os.path.join(mkdtemp(), 'myfifo'))

On that basis, this example doesn't present a use case what can't be addressed 
by mkstemp or mkdtemp.

By contrast, Ben's example does look like it needs something like mktemp.

>And yet the documentation condemns it as "deprecated", and tells me I
>should use mkstemp() instead.

You _do_ understand the security issue, yes? I sure looked like you did, until 
here. 

>(As if that would be of any use in the
>situation above!)  It looks like anxiety that some people might use
>mktemp() in a stupid way has caused an over-reaction.

No, it is anxiety that mktemp's _normal_ use is inherently unsafe.

>Let the
>documentation warn about the problem and point to prepackaged solutions
>in the common cases of making files and directories, but I see no good
>reason to deprecate this useful utility.

I think it is like C's gets() function (albeit not as dangerous). It really 
shouldn't be used.

One of the things about mktemp() is its raciness, which is the core of the 
security issue. People look at the term "security issue" and think "Ah, it can 
be attacked." But the flipside is that it is simply unreliable. Its normal use 
was to make an ordinary temp file. Consider the case where two instances of the 
same task are running at the same time, doing that. They can easily, by 
accident, end us using the same scratch file!

This is by no means unlikely; any shell script running tasks in parallel can 
arrange it, any procmail script filing a message with a "copy" rule (which 
causes procmail simply to fork and proceed), etc.

This is neither weird nor even unlikely which is why kmtemp is strongly 
discouraged - naive (and standard) use is not safe.

That you have contrived a use case where you can _carefully_ use mktemp in 
safety in no way makes mktemp recommendable. In fact your use case isn't safe, 
because _another_ task using mktemp in conflict as a plain old temporary file 
may grab your fifo.

Cheers,
Cameron Simpson <cs@zip.com.au>

[toc] | [prev] | [next] | [standalone]

#103348

From	Alan Bawden <alan@csail.mit.edu>
Date	2016-02-22 12:34 -0500
Message-ID	<w2dd1ro644o.fsf@daphne.csail.mit.edu>
In reply to	#103317

Cameron Simpson <cs@zip.com.au> writes:

> On 16Feb2016 19:24, Alan Bawden <alan@csail.mit.edu> wrote:
>>So in the FIFO case, I might write something like the following:
>>
>>    def make_temp_fifo(mode=0o600):
>>        while True:
>>            path = tempfile.mktemp()
>>            try:
>>                os.mkfifo(path, mode=mode)
>>            except FileExistsError:
>>                pass
>>            else:
>>                return path
>>
>>So is there something wrong with the above code?  Other than the fact
>>that the documentation says something scary about mktemp()?
>
> Well, it has a few shortcomings.
>
> It relies on mkfifo reliably failing if the name exists. It shounds like
> mkfifo is reliable this way, but I can imagine analogous use cases without
> such a convenient core action, and your code only avoids mktemp's security
> issue _because_ mkfifo has that fortuitous aspect.

I don't understand your use of the word "fortuitous" here.  mkfifo is
defined to act that way according to POSIX.  I wrote the code that way
precisely because of that property.  I sometimes write code knowing that
adding two even numbers together results in an even answer.  I suppose
you might describe that as "fortuitous", but it's just things behaving
as they are defined to behave!

> Secondly, why is your example better than::
>
>  os.mkfifo(os.path.join(mkdtemp(), 'myfifo'))

My way is not much better, but I think it is a little better because
your way I have to worry about deleting both the file and the directory
when I am done, and I have to get the permissions right on two
filesystem objects.  (If I can use a TemporaryDirectory() context
manager, the cleaning up part does get easier.)

And it also seems wasteful to me, given that the way mkdtemp() is
implemented is to generate a possible name, try creating it, and loop if
the mkdir() call fails.  (POSIX makes the same guarantee for mkdir() as
it does for mkfifo().)  Why not just let me do an equivalent loop
myself?

> On that basis, this example doesn't present a use case what can't be
> addressed by mkstemp or mkdtemp.

Yes, if mktemp() were taken away from me, I could work around it.  I'm
just saying that in order to justify taking something like this away, it
has to be both below some threshold of utility and above some threshold
of dangerousness.  In the canonical case of gets() in C, not only is
fgets() almost a perfectly exact replacement for gets(), gets() is
insanely dangerous.  But the case of mktemp() doesn't seem to me to come
close to this combination of redundancy and danger.

> You _do_ understand the security issue, yes? I sure looked like you did,
> until here.

Well, it's always dangerous to say that you understand all the security
issues of anything.  In part that is why I wrote the code quoted above.
I am open to the possibility that there is a security problem here that
I haven't thought of.  But so far the only problem anybody has with it
is that you think there is something "fortuitous" about the way that it
works.

>>(As if that would be of any use in the
>>situation above!)  It looks like anxiety that some people might use
>>mktemp() in a stupid way has caused an over-reaction.
>
> No, it is anxiety that mktemp's _normal_ use is inherently unsafe.

So are you saying that the way I used mktemp() above is _abnormal_?

> [ Here I have removed some perfectly reasonable text describing the
>   race condition in question -- yes I really do understand that. ]
>
> This is neither weird nor even unlikely which is why kmtemp is strongly
> discouraged - naive (and standard) use is not safe.
>
> That you have contrived a use case where you can _carefully_ use mktemp in
> safety in no way makes mktemp recommendable.

OK, so you _do_ seem to be saying that I have used mktemp() in a
"contrived" and "non-standard" (and "non-naive"!) way.  I'm genuinely
surprised.  I though I was just writing straightforward correct code and
demonstrating that this was a useful utility that it was not hard to use
safely.  You seem to think what I did is something that ordinary
programmers can not be expected to do.  Your judgement is definitely
different from mine!

And ultimately this does all boil down to making judgements.  It does
make sense to remove things from libraries that are safety hazards (like
gets() in C), I'm just trying to argue that mktemp() isn't nearly
dangerous enough to deserve more than a warning in its documentation.
You don't agree.  Oh well...

Up until this point, you haven't said anything that I actually think is
flat out wrong, we just disagree about what tools it is reasonable to
take away from _all_ programmers just because _some_ programmers might
use them to make a mess.

> In fact your use case isn't safe, because _another_ task using mktemp
> in conflict as a plain old temporary file may grab your fifo.

But here in very last sentence I really must disagree.  If the code I
wrote above is "unsafe" because some _other_ process might be using
mktemp() badly and stumble over the same path, then the current
implementation of tempfile.mkdtemp() is also "unsafe" for exactly the
same reason: some other process using mktemp() badly to create its own
directory might accidentally grab the same directory.

Heck, that other process doesn't even need to be using mktemp().  It
might just be using the hardwired pathname "/tmp/tmp_xyzzy" -- that has
_exactly_ the same chance of collision.  Surely you don't think that the
standard tempfile.mkdtemp() is "unsafe" (in whatever sense of the word
you intend) as long as other programmers are allowed to be stupid and
use constant pathnames?

-- 
Alan Bawden

[toc] | [prev] | [next] | [standalone]

#103370

From	Cameron Simpson <cs@zip.com.au>
Date	2016-02-23 10:02 +1100
Message-ID	<mailman.53.1456182173.20994.python-list@python.org>
In reply to	#103348

On 22Feb2016 12:34, Alan Bawden <alan@csail.mit.edu> wrote:
>Cameron Simpson <cs@zip.com.au> writes:
>
>> On 16Feb2016 19:24, Alan Bawden <alan@csail.mit.edu> wrote:
>>>So in the FIFO case, I might write something like the following:
>>>
>>>    def make_temp_fifo(mode=0o600):
>>>        while True:
>>>            path = tempfile.mktemp()
>>>            try:
>>>                os.mkfifo(path, mode=mode)
>>>            except FileExistsError:
>>>                pass
>>>            else:
>>>                return path
>>>
>>>So is there something wrong with the above code?  Other than the fact
>>>that the documentation says something scary about mktemp()?
>>
>> Well, it has a few shortcomings.
>>
>> It relies on mkfifo reliably failing if the name exists. It shounds like
>> mkfifo is reliable this way, but I can imagine analogous use cases without
>> such a convenient core action, and your code only avoids mktemp's security
>> issue _because_ mkfifo has that fortuitous aspect.
>
>I don't understand your use of the word "fortuitous" here.  mkfifo is
>defined to act that way according to POSIX.  I wrote the code that way
>precisely because of that property.  I sometimes write code knowing that
>adding two even numbers together results in an even answer.  I suppose
>you might describe that as "fortuitous", but it's just things behaving
>as they are defined to behave!

I mean here that your scheme isn't adaptable to a system call which will reuse 
an existing name. Of course, mkfifo, mkdir and open(.., O_EXCL) all have this 
nice feature.

>> Secondly, why is your example better than::
>>  os.mkfifo(os.path.join(mkdtemp(), 'myfifo'))
>
>My way is not much better, but I think it is a little better because
>your way I have to worry about deleting both the file and the directory
>when I am done, and I have to get the permissions right on two
>filesystem objects.  (If I can use a TemporaryDirectory() context
>manager, the cleaning up part does get easier.)
>
>And it also seems wasteful to me, given that the way mkdtemp() is
>implemented is to generate a possible name, try creating it, and loop if
>the mkdir() call fails.  (POSIX makes the same guarantee for mkdir() as
>it does for mkfifo().)  Why not just let me do an equivalent loop
>myself?

Go ahead. But I think Ben's specificly trying to avoid writing his own loop.

>> On that basis, this example doesn't present a use case what can't be
>> addressed by mkstemp or mkdtemp.
>
>Yes, if mktemp() were taken away from me, I could work around it.  I'm
>just saying that in order to justify taking something like this away, it
>has to be both below some threshold of utility and above some threshold
>of dangerousness.  In the canonical case of gets() in C, not only is
>fgets() almost a perfectly exact replacement for gets(), gets() is
>insanely dangerous.  But the case of mktemp() doesn't seem to me to come
>close to this combination of redundancy and danger.
>
>> You _do_ understand the security issue, yes? I sure looked like you did,
>> until here.
>
>Well, it's always dangerous to say that you understand all the security
>issues of anything.  In part that is why I wrote the code quoted above.
>I am open to the possibility that there is a security problem here that
>I haven't thought of.  But so far the only problem anybody has with it
>is that you think there is something "fortuitous" about the way that it
>works.
>
>>>(As if that would be of any use in the
>>>situation above!)  It looks like anxiety that some people might use
>>>mktemp() in a stupid way has caused an over-reaction.
>>
>> No, it is anxiety that mktemp's _normal_ use is inherently unsafe.
>
>So are you saying that the way I used mktemp() above is _abnormal_?

In that you're not making a file. I mean "abnormal" in a statistical sense, and 
also in the "anticipated use case for mktemp's design". I'm not suggestioning 
you're wrong to use it like this.

>> [ Here I have removed some perfectly reasonable text describing the
>>   race condition in question -- yes I really do understand that. ]
>>
>> This is neither weird nor even unlikely which is why kmtemp is strongly
>> discouraged - naive (and standard) use is not safe.
>>
>> That you have contrived a use case where you can _carefully_ use mktemp in
>> safety in no way makes mktemp recommendable.
>
>OK, so you _do_ seem to be saying that I have used mktemp() in a
>"contrived" and "non-standard" (and "non-naive"!) way.  I'm genuinely
>surprised.  I though I was just writing straightforward correct code and
>demonstrating that this was a useful utility that it was not hard to use
>safely.  You seem to think what I did is something that ordinary
>programmers can not be expected to do.  Your judgement is definitely
>different from mine!

No, I meant only that (a) mktemp is normally used for regular files and (b) 
that mkdtemp()/mkfifo() present equivalent results without hand making a 
pick-a-name loop. Of course any programmer should be able to read the mktemp() 
spec and built from it.

>And ultimately this does all boil down to making judgements.  It does
>make sense to remove things from libraries that are safety hazards (like
>gets() in C), I'm just trying to argue that mktemp() isn't nearly
>dangerous enough to deserve more than a warning in its documentation.
>You don't agree.  Oh well...

I think mktemp() is like system() (or popen with a shell command string instead 
of a list of arguments suitable for exec()): it is easy and obvious and there 
is a much safe tool sitting right beside it that will frequently be overlooked, 
and that is why I'm on the "discourage its use" side of the discussion, even to 
the point of deprecating its use.

It isn't that it cannot be carefully used ever, it is that it is too easy to 
use when there is a more reliable way to do it.

>Up until this point, you haven't said anything that I actually think is
>flat out wrong, we just disagree about what tools it is reasonable to
>take away from _all_ programmers just because _some_ programmers might
>use them to make a mess.
>
>> In fact your use case isn't safe, because _another_ task using mktemp
>> in conflict as a plain old temporary file may grab your fifo.
>
>But here in very last sentence I really must disagree.  If the code I
>wrote above is "unsafe" because some _other_ process might be using
>mktemp() badly and stumble over the same path, then the current
>implementation of tempfile.mkdtemp() is also "unsafe" for exactly the
>same reason: some other process using mktemp() badly to create its own
>directory might accidentally grab the same directory.

When the other taks goes mkdir with the generated name it will fail, so no.

In your mkfifo case, another task using mktemp thus:

  path = mktemp()
  tmpfp = open(path, "w")

can succeed in the circumstance where your fifo gets made first.

>Heck, that other process doesn't even need to be using mktemp().  It
>might just be using the hardwired pathname "/tmp/tmp_xyzzy" -- that has
>_exactly_ the same chance of collision.  Surely you don't think that the
>standard tempfile.mkdtemp() is "unsafe" (in whatever sense of the word
>you intend) as long as other programmers are allowed to be stupid and
>use constant pathnames?

Of course not.

Cheers,
Cameron Simpson <cs@zip.com.au>

[toc] | [prev] | [next] | [standalone]

#103683

From	Alan Bawden <alan@csail.mit.edu>
Date	2016-02-29 00:47 -0500
Message-ID	<w2d1t7wukxh.fsf@daphne.csail.mit.edu>
In reply to	#103370

Cameron Simpson <cs@zip.com.au> writes:
> On 22Feb2016 12:34, Alan Bawden <alan@csail.mit.edu> wrote:

I have deleted the part of discussion where it seems that we must simply
agree to disagree.  You think mktemp() is _way_ more dangerous that I
do.

>>> In fact your use case isn't safe, because _another_ task using mktemp
>>> in conflict as a plain old temporary file may grab your fifo.
>>
>>But here in very last sentence I really must disagree.  If the code I
>>wrote above is "unsafe" because some _other_ process might be using
>>mktemp() badly and stumble over the same path, then the current
>>implementation of tempfile.mkdtemp() is also "unsafe" for exactly the
>>same reason: some other process using mktemp() badly to create its own
>>directory might accidentally grab the same directory.
>
> When the other taks goes mkdir with the generated name it will fail, so no.

Quite right.  I sabotaged my own argument by picking mkdtemp() instead
of mkstemp().  I was trying to shorten my text by taking advantage of
the fact that I had _already_ mentioned that mkdtemp() performs exactly
the same generate-and-open loop than the code I had written.  I
apologize for the confusion.

In fact, mkstemp() also performs that same generate-and-open loop, and of
course it is careful to use os.O_EXCL along with os.O_CREAT when it
opens the file.  So let me re-state my argument using mkstemp() instead:

If the code I wrote in my original message is "unsafe" because some
_other_ process might be using mktemp() badly and stumble over the same
path, then the current implementation of tempfile.mkstemp() is also
"unsafe" for exactly the same reason: some other process badly using
mktemp() to create its own file might accidentally grab the same file.

In other words, if that other process does:

  path = mktemp()
  tmpfp = open(path, "w")

Then yes indeed, it might accidentally grab my fifo when I used my
original code for making a temporary fifo.  But it might _also_ succeed
in grabbing any temporary files I make using tempfile.mkstemp()!  So if
you think what I wrote is "unsafe", it seems that you must conclude that
the standard tempfile.mkstemp() is exactly as "unsafe".  So is that what
you think?

-- 
Alan Bawden

[toc] | [prev] | [next] | [standalone]

#103685

From	Cameron Simpson <cs@zip.com.au>
Date	2016-02-29 17:38 +1100
Message-ID	<mailman.7.1456727926.2321.python-list@python.org>
In reply to	#103683

On 29Feb2016 00:47, Alan Bawden <alan@csail.mit.edu> wrote:
>Cameron Simpson <cs@zip.com.au> writes:
>> On 22Feb2016 12:34, Alan Bawden <alan@csail.mit.edu> wrote:
>
>I have deleted the part of discussion where it seems that we must simply
>agree to disagree.  You think mktemp() is _way_ more dangerous that I
>do.

I certainly think the habit of using it is. And thus we're off into the realm 
of risk assessment I suppose, where one's value sets greatly affect the 
outcome. But there are concrete arguments to be made about risks.

To your other question...

[...]
>In fact, mkstemp() also performs that same generate-and-open loop, and of
>course it is careful to use os.O_EXCL along with os.O_CREAT when it
>opens the file.  So let me re-state my argument using mkstemp() instead:
>
>If the code I wrote in my original message is "unsafe" because some
>_other_ process might be using mktemp() badly and stumble over the same
>path, then the current implementation of tempfile.mkstemp() is also
>"unsafe" for exactly the same reason: some other process badly using
>mktemp() to create its own file might accidentally grab the same file.
>
>In other words, if that other process does:
>
>  path = mktemp()
>  tmpfp = open(path, "w")
>
>Then yes indeed, it might accidentally grab my fifo when I used my
>original code for making a temporary fifo.  But it might _also_ succeed
>in grabbing any temporary files I make using tempfile.mkstemp()!  So if
>you think what I wrote is "unsafe", it seems that you must conclude that
>the standard tempfile.mkstemp() is exactly as "unsafe".
>
>So is that what you think?

Yes and no?

You're quite right that a task using mkstemp is not safe against a task 
misusing mktemp.

_However_:

In a space where everyone uses mktemp, everyone is unsafe from collision.

In a space where everyone uses mkstemp, everyone is safe from collision.

So provided everyone "upgrades", safety is reliable without any added burden in 
program complexity.

Of course, that sidesteps the scenario where someone is using mktemp to obtain 
a pathname for a non-file, but I am of the opinion that in almost all such 
cases the programmer is better off using mkdtemp and making their non-file 
inside the temporary directory. Again, provided everyone "upgrades" to such a 
practice, safety is arranged.

Because of this, I think that _any_ use of mktemp invites risk of collision, 
and needs to be justified with a robust argument establishing that the problem 
cannot be solved with mkstemp or mkdtemp.

Your example was not such a case. Ben's is, in that (a) he needs a "valid" name 
and (b) he isn't going to make an actual filesystem object using the name 
obtained. As it happens it looks like the uuid generation functions from the 
stdlib may meet his needs, addressing his desire to do it simply with the 
stdlib instead of making his own wheel.

So I remain against mktemp without an outsandingly special use case.

Cheers,
Cameron Simpson <cs@zip.com.au>

[toc] | [prev] | [standalone]

csiph-web

Re: Make a unique filesystem path, without creating the file

Contents

#102989 — Re: Make a unique filesystem path, without creating the file

#103024

#103028

#103040

#103317

#103348

#103370

#103683

#103685