Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #25055 > unrolled thread

Re: How to safely maintain a status file

Started byChristian Heimes <lists@cheimes.de>
First post2012-07-08 23:52 +0200
Last post2012-07-12 14:31 +0200
Articles 5 — 4 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: How to safely maintain a status file Christian Heimes <lists@cheimes.de> - 2012-07-08 23:52 +0200
    Re: How to safely maintain a status file John Nagle <nagle@animats.com> - 2012-07-09 13:24 -0700
      Re: How to safely maintain a status file Christian Heimes <lists@cheimes.de> - 2012-07-10 01:41 +0200
      Re: How to safely maintain a status file alex23 <wuwei23@gmail.com> - 2012-07-09 19:04 -0700
      Re: How to safely maintain a status file Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-12 14:31 +0200

#25055 — Re: How to safely maintain a status file

FromChristian Heimes <lists@cheimes.de>
Date2012-07-08 23:52 +0200
SubjectRe: How to safely maintain a status file
Message-ID<mailman.1929.1341784379.4697.python-list@python.org>
Am 08.07.2012 22:57, schrieb Laszlo Nagy:
> But even if the rename operation is atomic, there is still a race
> condition. Your program can be terminated after the original status file
> has been deleted, and before the temp file was renamed. In this case,
> you will be missing the status file (although your program already did
> something just it could not write out the new status).

You are contradicting yourself. Either the OS is providing a fully
atomic rename or it doesn't. All POSIX compatible OS provide an atomic
rename functionality that renames the file atomically or fails without
loosing the target side. On POSIX OS it doesn't matter if the target exists.

You don't need locks or any other fancy stuff. You just need to make
sure that you flush the data and metadata correctly to the disk and
force a re-write of the directory inode, too. It's a standard pattern on
POSIX platforms and well documented in e.g. the maildir RFC.

You can use the same pattern on Windows but it doesn't work as good and
doesn't guaranteed file integrity for two reasons:

1) Windows's rename isn't atomic if the right side exists.

2) Windows locks file when a program opens a file. Other programs can't
rename or overwrite the file. (You can get around the issue with some
extra work, though.)

Christian

[toc] | [next] | [standalone]


#25094

FromJohn Nagle <nagle@animats.com>
Date2012-07-09 13:24 -0700
Message-ID<jtfelb$q0g$1@dont-email.me>
In reply to#25055
On 7/8/2012 2:52 PM, Christian Heimes wrote:
> You are contradicting yourself. Either the OS is providing a fully
> atomic rename or it doesn't. All POSIX compatible OS provide an atomic
> rename functionality that renames the file atomically or fails without
> loosing the target side. On POSIX OS it doesn't matter if the target exists.

     Rename on some file system types (particularly NFS) may not be atomic.
>
> You don't need locks or any other fancy stuff. You just need to make
> sure that you flush the data and metadata correctly to the disk and
> force a re-write of the directory inode, too. It's a standard pattern on
> POSIX platforms and well documented in e.g. the maildir RFC.
>
> You can use the same pattern on Windows but it doesn't work as good.

   That's because you're using the wrong approach. See how to use
ReplaceFile under Win32:

http://msdn.microsoft.com/en-us/library/aa365512%28VS.85%29.aspx

     Renaming files is the wrong way to synchronize a
crawler.  Use a database that has ACID properties, such as
SQLite.  Far fewer I/O operations are required for small updates.
It's not the 1980s any more.

     I use a MySQL database to synchronize multiple processes
which crawl web sites.  The tables of past activity are InnoDB
tables, which support transactions.  The table of what's going
on right now is a MEMORY table.  If the database crashes, the
past activity is recovered cleanly, the MEMORY table comes back
empty, and all the crawler processes lose their database
connections, abort, and are restarted.  This allows multiple
servers to coordinate through one database.

				John Nagle



[toc] | [prev] | [next] | [standalone]


#25109

FromChristian Heimes <lists@cheimes.de>
Date2012-07-10 01:41 +0200
Message-ID<mailman.1968.1341877336.4697.python-list@python.org>
In reply to#25094
Am 09.07.2012 22:24, schrieb John Nagle:
>     Rename on some file system types (particularly NFS) may not be atomic.

The actual operation is always atomic but the NFS server may not notify
you about success or failure atomically.

See http://linux.die.net/man/2/rename, section BUGS.

>   That's because you're using the wrong approach. See how to use
> ReplaceFile under Win32:
> 
> http://msdn.microsoft.com/en-us/library/aa365512%28VS.85%29.aspx

The page doesn't say that ReplaceFile is an atomic op.

Christian

[toc] | [prev] | [next] | [standalone]


#25118

Fromalex23 <wuwei23@gmail.com>
Date2012-07-09 19:04 -0700
Message-ID<83b4889f-8201-4369-9fe4-631888a78a16@oo8g2000pbc.googlegroups.com>
In reply to#25094
On Jul 10, 6:24 am, John Nagle <na...@animats.com> wrote:
> That's because you're using the wrong approach. See how to use
> ReplaceFile under Win32:
>
> http://msdn.microsoft.com/en-us/library/aa365512%28VS.85%29.aspx

I'm not convinced ReplaceFile is atomic:

"The ReplaceFile function combines several steps within a single
function. An application can call ReplaceFile instead of calling
separate functions to save the data to a new file, rename the original
file using a temporary name, rename the new file to have the same name
as the original file, and delete the original file."

About the best you can get in Windows, I think, is MoveFileTransacted,
but you need to be running Vista or later:

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365241(v=vs.85).aspx

I agree with your suggestion of using something transactional that
isn't bound to later Window versions, though.

[toc] | [prev] | [next] | [standalone]


#25211

FromLaszlo Nagy <gandalf@shopzeus.com>
Date2012-07-12 14:31 +0200
Message-ID<mailman.2035.1342096316.4697.python-list@python.org>
In reply to#25094
>     Renaming files is the wrong way to synchronize a
> crawler.  Use a database that has ACID properties, such as
> SQLite.  Far fewer I/O operations are required for small updates.
> It's not the 1980s any more.
I agree with this approach. However, the OP specifically asked about 
"how to update status file".

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web