Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #27069

Flushing buffer on file copy on linux

Path csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <dreadpiratejeff@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'jeff': 0.04; 'sys': 0.05; 'shutil': 0.07; 'subject:file': 0.07; 'python': 0.09; 'buffer,': 0.09; 'concurrent': 0.09; 'enhancements': 0.09; 'etc).': 0.09; 'flush': 0.09; 'mode,': 0.09; 'received:mail- lpp01m010-f46.google.com': 0.09; 'specified,': 0.09; 'tends': 0.09; 'threads.': 0.09; 'size,': 0.13; 'properly': 0.15; '(either': 0.16; "(i'm": 0.16; 'cleaner': 0.16; 'cleans': 0.16; 'close()': 0.16; 'disk.': 0.16; 'flush()': 0.16; 'flushed': 0.16; 'hashed': 0.16; 'loops': 0.16; 'occurs.': 0.16; 'threaded': 0.16; 'threads': 0.16; 'to:name:python list': 0.16; 'implementing': 0.17; 'appears': 0.18; 'issue.': 0.20; 'not,': 0.21; "i'd": 0.22; 'cheers,': 0.23; "i've": 0.23; 'random': 0.24; 'idea': 0.24; 'device': 0.24; 'linux': 0.24; 'script': 0.24; 'testing': 0.24; 'looks': 0.26; 'possibly': 0.27; 'disk': 0.27; 'operations,': 0.27; 'message-id:@mail.gmail.com': 0.27; 'run': 0.28; '(maybe': 0.29; 'buffers': 0.29; 'forces': 0.29; 'hash': 0.29; 'parent': 0.29; "i'm": 0.29; 'that.': 0.30; 'received:209.85.215.46': 0.30; 'writes': 0.30; 'basic': 0.30; 'problem.': 0.32; 'file': 0.32; 'could': 0.32; 'problem': 0.33; 'to:addr:python-list': 0.33; 'another': 0.33; 'received:google.com': 0.34; 'open': 0.35; 'doing': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'there': 0.35; 'but': 0.36; 'wanted': 0.36; "i'll": 0.36; 'enough': 0.36; 'does': 0.37; 'uses': 0.37; 'being': 0.37; 'received:209': 0.37; 'data': 0.37; 'perform': 0.38; 'files': 0.38; 'object': 0.38; 'some': 0.38; 'instead': 0.39; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'your': 0.60; 'mentioned': 0.63; 'hoping': 0.72; '10mb': 0.84; 'tomorrow,': 0.84; 'device,': 0.91; 'enhancement': 0.95
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=QZ9G71i6PXfv5PtVhIoBDruTf1Uv9qXcYIG3Z8WuXio=; b=yKbXK6mWsQGa/IWZQB07yejHesofp4zIf943A/cgXmOJRjuZnPKB9JtCw5TIAJIMxS lKV4VNCvwtF/rrLnARaS9gStNTRc55OtnKGL51RzvBKyXZwbgqBY0oO3yqVeKuLbn9WZ X5fBeGS0/UOubjvB4sL2z+pyn+jaoElIcoUeQO9KM32ddvj9co5C/x3v9Ovx65GohTpk IW7qmjnX4fFrVyvGNWAgjeNe0XWMxzxrJ+hwd2w2VeFrSYk3CCuxSvyvzlooabG7IhuK zB0vBjJH6+ZXg6hJh7rvORDNsWYB6VqgmAQb30yBk1ogquhvwqb2IFjr6G8AolTbHwuK eV+g==
MIME-Version 1.0
From J <dreadpiratejeff@gmail.com>
Date Tue, 14 Aug 2012 22:55:27 -0400
Subject Flushing buffer on file copy on linux
To Python List <python-list@python.org>
Content-Type text/plain; charset=ISO-8859-1
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3292.1344999350.4697.python-list@python.org> (permalink)
Lines 50
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1344999350 news.xs4all.nl 6923 [2001:888:2000:d::a6]:58720
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:27069

Show key headers only | View raw


I was hoping someone could give me some ideas for a particular problem.

I've got a python program that is used for basic testing of removable
storage devices (usb, mmc, firewire, etc).

Essentially, it looks for a mounted device (either user specified, or
if not, the program loops through all removable disks), generates a
file of random data of a predetermined size, does a md5sum of the
parent, uses shutil copy() to copy the file to the device, then does a
md5sum of the copy on removable media, compares, cleans up and Bob's
your uncle.

Now, I'm working on enhancements and one enhancement is a threaded
"stress test" that will perform the exact same operations, but in
individual threads.

So if you the script to do 10 iterations using 10MB files, instead of
doing 10 loops, it does 10 concurrent threads (I'm new to doing
threads so this is a learning experience there as well).

Now, the problem I have is that linux tends to buffer data writes to a
device, and I want to work around that.  When run in normal non-stress
mode, the program is slow enough that the linux buffers flush and put
the file on disk before the hash occurs.  However, when run in stress
mode, what I'm finding is that it appears that the files are possibly
being hashed while still in the buffer, before being flushed to disk.

So ultimately, I wanted to see if there were some ideas for working
around this issue.

One idea I had was to do the following:

Generate the parent data file
hash parent
instead of copy, open parent and write to a new file object on disk
with a 0 size buffer
or flush() before close()
hash the copy.

Does that seem reasonable? or is there a cleaner way to copy a file
from one place to another and ensure the buffers are properly flushed
(maybe something in os or sys that forces file buffers to be flushed?)

Anyway, I'd appreciate any suggestions with that.  I'll try
implementing the idea mentioned above tomorrow, but if there's a
cleaner way I'd be interested in learning it.

Cheers,

jeff

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Flushing buffer on file copy on linux J <dreadpiratejeff@gmail.com> - 2012-08-14 22:55 -0400

csiph-web