Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #25094
| From | John Nagle <nagle@animats.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: How to safely maintain a status file |
| Date | 2012-07-09 13:24 -0700 |
| Organization | A noiseless patient Spider |
| Message-ID | <jtfelb$q0g$1@dont-email.me> (permalink) |
| References | <CAOV1wRVtm27yWez1HZuN8=ia-TyM2aXp9QCUbSZ5aZExP_ZChA@mail.gmail.com> <sanjv7lo0vb3rlhip4ov1gpgp4gs51bvfr@invalid.netcom.com> <4FF9F454.40207@shopzeus.com> <mailman.1929.1341784379.4697.python-list@python.org> |
On 7/8/2012 2:52 PM, Christian Heimes wrote:
> You are contradicting yourself. Either the OS is providing a fully
> atomic rename or it doesn't. All POSIX compatible OS provide an atomic
> rename functionality that renames the file atomically or fails without
> loosing the target side. On POSIX OS it doesn't matter if the target exists.
Rename on some file system types (particularly NFS) may not be atomic.
>
> You don't need locks or any other fancy stuff. You just need to make
> sure that you flush the data and metadata correctly to the disk and
> force a re-write of the directory inode, too. It's a standard pattern on
> POSIX platforms and well documented in e.g. the maildir RFC.
>
> You can use the same pattern on Windows but it doesn't work as good.
That's because you're using the wrong approach. See how to use
ReplaceFile under Win32:
http://msdn.microsoft.com/en-us/library/aa365512%28VS.85%29.aspx
Renaming files is the wrong way to synchronize a
crawler. Use a database that has ACID properties, such as
SQLite. Far fewer I/O operations are required for small updates.
It's not the 1980s any more.
I use a MySQL database to synchronize multiple processes
which crawl web sites. The tables of past activity are InnoDB
tables, which support transactions. The table of what's going
on right now is a MEMORY table. If the database crashes, the
past activity is recovered cleanly, the MEMORY table comes back
empty, and all the crawler processes lose their database
connections, abort, and are restarted. This allows multiple
servers to coordinate through one database.
John Nagle
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Re: How to safely maintain a status file Christian Heimes <lists@cheimes.de> - 2012-07-08 23:52 +0200
Re: How to safely maintain a status file John Nagle <nagle@animats.com> - 2012-07-09 13:24 -0700
Re: How to safely maintain a status file Christian Heimes <lists@cheimes.de> - 2012-07-10 01:41 +0200
Re: How to safely maintain a status file alex23 <wuwei23@gmail.com> - 2012-07-09 19:04 -0700
Re: How to safely maintain a status file Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-12 14:31 +0200
csiph-web