Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'jeff': 0.04; 'sys': 0.05; 'shutil': 0.07; 'subject:file': 0.07; 'python': 0.09; 'buffer,': 0.09; 'concurrent': 0.09; 'enhancements': 0.09; 'etc).': 0.09; 'flush': 0.09; 'mode,': 0.09; 'received:mail- lpp01m010-f46.google.com': 0.09; 'specified,': 0.09; 'tends': 0.09; 'threads.': 0.09; 'size,': 0.13; 'properly': 0.15; '(either': 0.16; "(i'm": 0.16; 'cleaner': 0.16; 'cleans': 0.16; 'close()': 0.16; 'disk.': 0.16; 'flush()': 0.16; 'flushed': 0.16; 'hashed': 0.16; 'loops': 0.16; 'occurs.': 0.16; 'threaded': 0.16; 'threads': 0.16; 'to:name:python list': 0.16; 'implementing': 0.17; 'appears': 0.18; 'issue.': 0.20; 'not,': 0.21; "i'd": 0.22; 'cheers,': 0.23; "i've": 0.23; 'random': 0.24; 'idea': 0.24; 'device': 0.24; 'linux': 0.24; 'script': 0.24; 'testing': 0.24; 'looks': 0.26; 'possibly': 0.27; 'disk': 0.27; 'operations,': 0.27; 'message-id:@mail.gmail.com': 0.27; 'run': 0.28; '(maybe': 0.29; 'buffers': 0.29; 'forces': 0.29; 'hash': 0.29; 'parent': 0.29; "i'm": 0.29; 'that.': 0.30; 'received:209.85.215.46': 0.30; 'writes': 0.30; 'basic': 0.30; 'problem.': 0.32; 'file': 0.32; 'could': 0.32; 'problem': 0.33; 'to:addr:python-list': 0.33; 'another': 0.33; 'received:google.com': 0.34; 'open': 0.35; 'doing': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'there': 0.35; 'but': 0.36; 'wanted': 0.36; "i'll": 0.36; 'enough': 0.36; 'does': 0.37; 'uses': 0.37; 'being': 0.37; 'received:209': 0.37; 'data': 0.37; 'perform': 0.38; 'files': 0.38; 'object': 0.38; 'some': 0.38; 'instead': 0.39; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'your': 0.60; 'mentioned': 0.63; 'hoping': 0.72; '10mb': 0.84; 'tomorrow,': 0.84; 'device,': 0.91; 'enhancement': 0.95 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=QZ9G71i6PXfv5PtVhIoBDruTf1Uv9qXcYIG3Z8WuXio=; b=yKbXK6mWsQGa/IWZQB07yejHesofp4zIf943A/cgXmOJRjuZnPKB9JtCw5TIAJIMxS lKV4VNCvwtF/rrLnARaS9gStNTRc55OtnKGL51RzvBKyXZwbgqBY0oO3yqVeKuLbn9WZ X5fBeGS0/UOubjvB4sL2z+pyn+jaoElIcoUeQO9KM32ddvj9co5C/x3v9Ovx65GohTpk IW7qmjnX4fFrVyvGNWAgjeNe0XWMxzxrJ+hwd2w2VeFrSYk3CCuxSvyvzlooabG7IhuK zB0vBjJH6+ZXg6hJh7rvORDNsWYB6VqgmAQb30yBk1ogquhvwqb2IFjr6G8AolTbHwuK eV+g== MIME-Version: 1.0 From: J Date: Tue, 14 Aug 2012 22:55:27 -0400 Subject: Flushing buffer on file copy on linux To: Python List Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 50 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1344999350 news.xs4all.nl 6923 [2001:888:2000:d::a6]:58720 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27069 I was hoping someone could give me some ideas for a particular problem. I've got a python program that is used for basic testing of removable storage devices (usb, mmc, firewire, etc). Essentially, it looks for a mounted device (either user specified, or if not, the program loops through all removable disks), generates a file of random data of a predetermined size, does a md5sum of the parent, uses shutil copy() to copy the file to the device, then does a md5sum of the copy on removable media, compares, cleans up and Bob's your uncle. Now, I'm working on enhancements and one enhancement is a threaded "stress test" that will perform the exact same operations, but in individual threads. So if you the script to do 10 iterations using 10MB files, instead of doing 10 loops, it does 10 concurrent threads (I'm new to doing threads so this is a learning experience there as well). Now, the problem I have is that linux tends to buffer data writes to a device, and I want to work around that. When run in normal non-stress mode, the program is slow enough that the linux buffers flush and put the file on disk before the hash occurs. However, when run in stress mode, what I'm finding is that it appears that the files are possibly being hashed while still in the buffer, before being flushed to disk. So ultimately, I wanted to see if there were some ideas for working around this issue. One idea I had was to do the following: Generate the parent data file hash parent instead of copy, open parent and write to a new file object on disk with a 0 size buffer or flush() before close() hash the copy. Does that seem reasonable? or is there a cleaner way to copy a file from one place to another and ensure the buffers are properly flushed (maybe something in os or sys that forces file buffers to be flushed?) Anyway, I'd appreciate any suggestions with that. I'll try implementing the idea mentioned above tomorrow, but if there's a cleaner way I'd be interested in learning it. Cheers, jeff