Groups > comp.lang.python > #68276 > unrolled thread

Save to a file, but avoid overwriting an existing file

Started by	zoom <zoom@yahoo.com>
First post	2014-03-12 13:29 +0100
Last post	2014-03-13 11:22 +1100
Articles	8 — 8 participants

Back to article view | Back to comp.lang.python

  Save to a file, but avoid overwriting an existing file zoom <zoom@yahoo.com> - 2014-03-12 13:29 +0100
    Re: Save to a file, but avoid overwriting an existing file Skip Montanaro <skip@pobox.com> - 2014-03-12 07:37 -0500
    Re: Save to a file, but avoid overwriting an existing file Tim Chase <python.list@tim.thechases.com> - 2014-03-12 08:33 -0500
    Re:Save to a file, but avoid overwriting an existing file Dave Angel <davea@davea.name> - 2014-03-12 14:22 -0400
    Re: Save to a file, but avoid overwriting an existing file Emile van Sebille <emile@fenx.com> - 2014-03-12 12:38 -0700
    Re: Save to a file, but avoid overwriting an existing file Cameron Simpson <cs@zip.com.au> - 2014-03-13 09:19 +1100
    Re: Save to a file, but avoid overwriting an existing file Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-12 23:04 +0000
    Re: Save to a file, but avoid overwriting an existing file Ben Finney <ben+python@benfinney.id.au> - 2014-03-13 11:22 +1100

#68276 — Save to a file, but avoid overwriting an existing file

From	zoom <zoom@yahoo.com>
Date	2014-03-12 13:29 +0100
Subject	Save to a file, but avoid overwriting an existing file
Message-ID	<lfpjv9$tki$1@news1.carnet.hr>

Hi!

I would like to assure that when writing to a file I do not overwrite an 
existing file, but I'm unsure which is the best way to approach to this 
problem. As I can see, there are at least two possibilities:

1. I could use fd = os.open("x", os.O_WRONLY | os.O_CREAT | os.O_EXCL)
which will fail - if the file exists. However, I would prefer if the 
program would try to save under different name in this case, instead of 
discarding all the calculation done until now - but I' not too well with 
catching exceptions.

2. Alternatively, a unique string could be generated to assure that no 
same file exists. I can see one approach to this is to include date and 
time in the file name. But this seems to me a bit clumsy, and is not 
unique, i.e. it could happen (at least in theory) that two processes 
finish in the same second.

Any suggestions, please?

[toc] | [next] | [standalone]

#68277

From	Skip Montanaro <skip@pobox.com>
Date	2014-03-12 07:37 -0500
Message-ID	<mailman.8086.1394627874.18130.python-list@python.org>
In reply to	#68276

This seems to be an application-level decision. If so, in your
application, why not just check to see if the file exists, and
implement whatever workaround you deem correct for your needs? For
example (to choose a simple, but rather silly, file naming strategy):

fname = "x"
while os.path.exists(fname):
    fname = "%s.%f" % (fname, random.random())
fd = open(fname, "w")

It's clearly not going to be safe from race conditions, but I leave
solving that problem as an exercise for the reader.

Skip

[toc] | [prev] | [next] | [standalone]

#68279

From	Tim Chase <python.list@tim.thechases.com>
Date	2014-03-12 08:33 -0500
Message-ID	<mailman.8089.1394631173.18130.python-list@python.org>
In reply to	#68276

On 2014-03-12 13:29, zoom wrote:
> 2. Alternatively, a unique string could be generated to assure that
> no same file exists. I can see one approach to this is to include
> date and time in the file name. But this seems to me a bit clumsy,
> and is not unique, i.e. it could happen (at least in theory) that
> two processes finish in the same second.

Python offers a "tempfile" module that gives this (and a whole lot
more) to you out of the box.

-tkc

[toc] | [prev] | [next] | [standalone]

#68290

From	Dave Angel <davea@davea.name>
Date	2014-03-12 14:22 -0400
Message-ID	<mailman.8097.1394648316.18130.python-list@python.org>
In reply to	#68276

 zoom <zoom@yahoo.com> Wrote in message:
> Hi!
> 
> I would like to assure that when writing to a file I do not overwrite an 
> existing file, but I'm unsure which is the best way to approach to this 
> problem. As I can see, there are at least two possibilities:
> 
> 1. I could use fd = os.open("x", os.O_WRONLY | os.O_CREAT | os.O_EXCL)
> which will fail - if the file exists. However, I would prefer if the 
> program would try to save under different name in this case, instead of 
> discarding all the calculation done until now - but I' not too well with 
> catching exceptions.
> 

The tempfile module is your best answer,  but if you really need
 to keep the file afterwards,  you'll have the same problem when
 you rename it later. 

I suggest you learn about try/except.  For simple cases it's not
 that tough, though if you want to ask about it, you'll need to
 specify your Python version. 

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#68293

From	Emile van Sebille <emile@fenx.com>
Date	2014-03-12 12:38 -0700
Message-ID	<mailman.8099.1394653162.18130.python-list@python.org>
In reply to	#68276

On 3/12/2014 5:29 AM, zoom wrote:

> 2. Alternatively, a unique string could be generated to assure that no
> same file exists. I can see one approach to this is to include date and
> time in the file name. But this seems to me a bit clumsy, and is not
> unique, i.e. it could happen (at least in theory) that two processes
> finish in the same second.

I tend to use this method -- prepending the job name or targeting 
different directories per job precludes duplication.  Unless you're 
running the same job at the same time, in which case tempfile is the way 
to go (which I use for archiving spooled print files which can occur 
simultaneously.)

Emile

[toc] | [prev] | [next] | [standalone]

#68304

From	Cameron Simpson <cs@zip.com.au>
Date	2014-03-13 09:19 +1100
Message-ID	<mailman.8109.1394664190.18130.python-list@python.org>
In reply to	#68276

On 12Mar2014 13:29, zoom <zoom@yahoo.com> wrote:
> I would like to assure that when writing to a file I do not
> overwrite an existing file, but I'm unsure which is the best way to
> approach to this problem. As I can see, there are at least two
> possibilities:
> 
> 1. I could use fd = os.open("x", os.O_WRONLY | os.O_CREAT | os.O_EXCL)
> which will fail - if the file exists. However, I would prefer if the
> program would try to save under different name in this case, instead
> of discarding all the calculation done until now - but I' not too
> well with catching exceptions.

Others have menthions tempfile, though of course you have the same collision
issue when you come to rename the temp file if you are keeping it.

I would run with option 1 for your task.

Just iterate until os.open succeeds.

However, you need to distinuish _why_ an open fails. For example,
if you were trying to make files in a directory to which you do not
have write permission, or just a directory that did not exist,
os.open would fail not matter what name you used, so your loop would
run forever.

Therefore you need to continue _only_ if you get EEXIST. Otherwise abort.

So you'd have some code like this (totally untested):

  # at top of script
  import errno

  # where you make the file
  def open_new(primary_name):
    try:
      fd = os.open(primary_name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
    except OSError as e:
      if e.errno != errno.EEXIST:
        raise
    else:
      return primary_name, fd
    n = 1
    while True:
      secondary_name = "%s.%d" % (primary_name, n)
      try:
        fd = os.open(secondary_name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
      except OSError as e:
        if e.errno != errno.EEXIST:
          raise
      else:
        return secondary_name, fd
      n += 1

  # where you need the file
  path, fd = open_new("x")

That gets you a function your can reuse which returns the file's
name and the file descriptor.

Cheers,
-- 
Cameron Simpson <cs@zip.com.au>

Reason #173 to fear technology:

    o       o       o       o       o       o      <o      <o>
   ^|\     ^|^     v|^     v|v     |/v     |X|      \|      |
    /\      >\     /<       >\     /<       >\     /<       >\

    o>      o       o       o       o       o       o       o
    \       x      </      <|>     </>     <\>     <)>      |\
   /<       >\     /<       >\     /<       >\      >>      L

             Mr. email does the Macarena.

[toc] | [prev] | [next] | [standalone]

#68306

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2014-03-12 23:04 +0000
Message-ID	<mailman.8111.1394665486.18130.python-list@python.org>
In reply to	#68276

On 12/03/2014 22:19, Cameron Simpson wrote:
> On 12Mar2014 13:29, zoom <zoom@yahoo.com> wrote:
>> I would like to assure that when writing to a file I do not
>> overwrite an existing file, but I'm unsure which is the best way to
>> approach to this problem. As I can see, there are at least two
>> possibilities:
>>
>> 1. I could use fd = os.open("x", os.O_WRONLY | os.O_CREAT | os.O_EXCL)
>> which will fail - if the file exists. However, I would prefer if the
>> program would try to save under different name in this case, instead
>> of discarding all the calculation done until now - but I' not too
>> well with catching exceptions.
>
> Others have menthions tempfile, though of course you have the same collision
> issue when you come to rename the temp file if you are keeping it.
>
> I would run with option 1 for your task.
>
> Just iterate until os.open succeeds.
>
> However, you need to distinuish _why_ an open fails. For example,
> if you were trying to make files in a directory to which you do not
> have write permission, or just a directory that did not exist,
> os.open would fail not matter what name you used, so your loop would
> run forever.
>
> Therefore you need to continue _only_ if you get EEXIST. Otherwise abort.
>
> So you'd have some code like this (totally untested):
>
>    # at top of script
>    import errno
>
>    # where you make the file
>    def open_new(primary_name):
>      try:
>        fd = os.open(primary_name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
>      except OSError as e:
>        if e.errno != errno.EEXIST:
>          raise
>      else:
>        return primary_name, fd
>      n = 1
>      while True:
>        secondary_name = "%s.%d" % (primary_name, n)
>        try:
>          fd = os.open(secondary_name, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
>        except OSError as e:
>          if e.errno != errno.EEXIST:
>            raise
>        else:
>          return secondary_name, fd
>        n += 1
>
>    # where you need the file
>    path, fd = open_new("x")
>
> That gets you a function your can reuse which returns the file's
> name and the file descriptor.
>
> Cheers,
>

I haven't looked but would things be easier if the new exception 
hierarchy were used 
http://docs.python.org/3.3/whatsnew/3.3.html#pep-3151-reworking-the-os-and-io-exception-hierarchy 
?

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

[toc] | [prev] | [next] | [standalone]

#68315

From	Ben Finney <ben+python@benfinney.id.au>
Date	2014-03-13 11:22 +1100
Message-ID	<mailman.8115.1394670188.18130.python-list@python.org>
In reply to	#68276

Cameron Simpson <cs@zip.com.au> writes:

> Therefore you need to continue _only_ if you get EEXIST. Otherwise
> abort.

If you target Python 3.3 or later, you can catch “FileExistsError”
<URL:http://docs.python.org/3/library/exceptions.html#FileExistsError>
which is far simpler than messing around with ‘errno’ values.

-- 
 \     “I know you believe you understood what you think I said, but I |
  `\         am not sure you realize that what you heard is not what I |
_o__)                                     meant.” —Robert J. McCloskey |
Ben Finney

[toc] | [prev] | [standalone]

csiph-web

Save to a file, but avoid overwriting an existing file

Contents

#68276 — Save to a file, but avoid overwriting an existing file

#68277

#68279

#68290

#68293

#68304

#68306

#68315