Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!news.mixmin.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'operator': 0.03; 'heavily': 0.04; 'output': 0.05; 'subject:file': 0.07; 'sys': 0.07; 'calculating': 0.09; 'filename': 0.09; 'high-level': 0.09; 'newline': 0.09; 'oh,': 0.09; 'percentage': 0.09; 'subject:into': 0.09; 'subject:How': 0.10; 'python': 0.11; 'bug': 0.12; "'w')": 0.16; '50mb.': 0.16; 'command,': 0.16; 'copied,': 0.16; 'deprecated.': 0.16; 'language:': 0.16; 'limit.': 0.16; 'lot!': 0.16; 'mean,': 0.16; 'received:65.55.116.7': 0.16; 'seconds,': 0.16; 'str.format()': 0.16; 'subject:python': 0.16; ':-)': 0.16; 'seems': 0.21; 'preferred': 0.22; 'to:name:python- list@python.org': 0.22; 'creating': 0.23; 'bytes': 0.24; 'copied': 0.24; 'received:65.55.116': 0.24; 'replace': 0.24; 'skip:{ 20': 0.24; 'header:In-Reply-To:1': 0.27; 'idea': 0.28; 'function': 0.29; 'fastest': 0.30; 'forgot': 0.30; 'see,': 0.30; 'waste': 0.30; "i'm": 0.30; 'url:mailman': 0.30; "skip:' 10": 0.31; 'overhead': 0.31; 'steven': 0.31; 'file': 0.32; 'run': 0.32; 'another': 0.32; 'quite': 0.32; 'url:python': 0.33; 'running': 0.33; 'computer.': 0.33; 'actual': 0.34; 'but': 0.35; 'add': 0.35; 'there': 0.35; 'format.': 0.36; 'ordered': 0.36; 'url:listinfo': 0.36; "didn't": 0.36; 'method': 0.36; 'thanks': 0.36; 'subject:?': 0.36; 'url:org': 0.36; 'seconds': 0.37; 'so,': 0.37; 'stopped': 0.38; 'version,': 0.38; 'needed': 0.38; 'to:addr:python-list': 0.38; 'little': 0.38; 'to:addr:python.org': 0.39; 'skip:x 10': 0.40; 'url:mail': 0.40; 'read': 0.60; 'above,': 0.60; 'deliver': 0.61; 'profile': 0.61; "you're": 0.61; "you've": 0.63; 'name': 0.63; 'real': 0.63; 'such': 0.63; 'total': 0.65; 'great': 0.65; 'by:': 0.65; 'kept': 0.65; 'here': 0.66; 'optimized': 0.68; 'results': 0.69; 'records': 0.73; 'received:65.55.116.40': 0.84; 'received:blu0-omc1-s29.blu0.hotmail.com': 0.84 X-TMN: [P+0Kx5d3yzamT8aCGmp/aQY81RL/YJaw] X-Originating-Email: [carlosnepomuceno@outlook.com] From: Carlos Nepomuceno To: "python-list@python.org" Subject: RE: How to write fast into a file in python? Date: Fri, 17 May 2013 21:18:15 +0300 Importance: Normal In-Reply-To: <51966d15$0$29997$c3e8da3$5496439d@news.astraweb.com> References: , <87f9a3d4-427e-472f-bee7-9501ba842b36@googlegroups.com>, <51961B73.2070401@davea.name>, , <51966d15$0$29997$c3e8da3$5496439d@news.astraweb.com> Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginalArrivalTime: 17 May 2013 18:18:15.0652 (UTC) FILETIME=[E6388A40:01CE532A] X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 116 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1368814763 news.xs4all.nl 15948 [2001:888:2000:d::a6]:41456 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:45482 You've hit the bullseye! =3B)=0A= =0A= Thanks a lot!!!=0A= =0A= > Oh=2C I forgot to mention: you have a bug in this function. You're alread= y=0A= > including the newline in the len(line)=2C so there is no need to add one.= =0A= > The result is that you only generate 44MB instead of 50MB.=0A= =0A= That's because I'm running on Windows.=0A= What's the fastest way to check if '\n' translates to 2 bytes on file?=0A= =0A= > Here are the results of profiling the above on my computer. Including the= =0A= > overhead of the profiler=2C it takes just over 50 seconds to run your fil= e=0A= > on my computer.=0A= >=0A= > [steve@ando ~]$ python -m cProfile fastwrite5.py=0A= > 17846645 function calls in 53.575 seconds=0A= >=0A= =0A= Didn't know the cProfile module.Thanks a lot!=0A= =0A= > Ordered by: standard name=0A= >=0A= > ncalls tottime percall cumtime percall filename:lineno(function)=0A= > 1 30.561 30.561 53.575 53.575 fastwrite5.py:1()=0A= > 1 0.000 0.000 0.000 0.000 {cStringIO.StringIO}=0A= > 5948879 5.582 0.000 5.582 0.000 {len}=0A= > 1 0.004 0.004 0.004 0.004 {method 'close' of 'cStringIO.StringO' objects}= =0A= > 1 0.000 0.000 0.000 0.000 {method 'close' of 'file' objects}=0A= > 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects= }=0A= > 5948879 9.979 0.000 9.979 0.000 {method 'format' of 'str' objects}=0A= > 1 0.103 0.103 0.103 0.103 {method 'getvalue' of 'cStringIO.StringO' objec= ts}=0A= > 5948879 7.135 0.000 7.135 0.000 {method 'write' of 'cStringIO.StringO' ob= jects}=0A= > 1 0.211 0.211 0.211 0.211 {method 'write' of 'file' objects}=0A= > 1 0.000 0.000 0.000 0.000 {open}=0A= >=0A= >=0A= > As you can see=2C the time is dominated by repeatedly calling len()=2C=0A= > str.format() and StringIO.write() methods. Actually writing the data to= =0A= > the file is quite a small percentage of the cumulative time.=0A= >=0A= > So=2C here's another version=2C this time using a pre-calculated limit. I= =0A= > cheated and just copied the result from the fastwrite5 output :-)=0A= >=0A= > # fasterwrite.py=0A= > filename =3D 'fasterwrite.dat'=0A= > with open(filename=2C 'w') as f:=0A= > for i in xrange(5948879): # Actually only 44MB=2C not 50MB.=0A= > f.write('%d\n' % i)=0A= >=0A= =0A= I had the same idea but kept the original method because I didn't want to w= aste time creating a function for calculating the actual number of iteratio= ns needed to deliver 50MB of data. =3B)=0A= =0A= > And the profile results are about twice as fast as fastwrite5 above=2C wi= th=0A= > only 8 seconds in total writing to my HDD.=0A= >=0A= > [steve@ando ~]$ python -m cProfile fasterwrite.py=0A= > 5948882 function calls in 28.840 seconds=0A= >=0A= > Ordered by: standard name=0A= >=0A= > ncalls tottime percall cumtime percall filename:lineno(function)=0A= > 1 20.592 20.592 28.840 28.840 fasterwrite.py:1()=0A= > 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects= }=0A= > 5948879 8.229 0.000 8.229 0.000 {method 'write' of 'file' objects}=0A= > 1 0.019 0.019 0.019 0.019 {open}=0A= >=0A= =0A= I thought there would be a call to format method by "'%d\n' % i". It seems = the % operator is a lot faster than format.=0A= I just stopped using it because I read it was going to be deprecated. :(=0A= Why replace such a great and fast operator by a slow method? I mean=2C why = format is been preferred over %?=0A= =0A= > Without the overhead of the profiler=2C it is a little faster:=0A= >=0A= > [steve@ando ~]$ time python fasterwrite.py=0A= >=0A= > real 0m16.187s=0A= > user 0m13.553s=0A= > sys 0m0.508s=0A= >=0A= >=0A= > Although it is still slower than the heavily optimized dd command=2C=0A= > but not unreasonably slow for a high-level language:=0A= >=0A= > [steve@ando ~]$ time dd if=3Dfasterwrite.dat of=3Dcopy.dat=0A= > 90781+1 records in=0A= > 90781+1 records out=0A= > 46479922 bytes (46 MB) copied=2C 0.737009 seconds=2C 63.1 MB/s=0A= >=0A= > real 0m0.786s=0A= > user 0m0.071s=0A= > sys 0m0.595s=0A= >=0A= >=0A= >=0A= >=0A= > --=0A= > Steven=0A= > --=0A= > http://mail.python.org/mailman/listinfo/python-list =