X-Received: by 10.68.227.9 with SMTP id rw9mr546501pbc.12.1447842696137; Wed, 18 Nov 2015 02:31:36 -0800 (PST) X-Received: by 10.50.8.68 with SMTP id p4mr46259iga.8.1447842696101; Wed, 18 Nov 2015 02:31:36 -0800 (PST) Path: csiph.com!optima2.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!i2no4682950igv.0!news-out.google.com!f6ni6138igq.0!nntp.google.com!i2no4682945igv.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.python Date: Wed, 18 Nov 2015 02:31:34 -0800 (PST) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=195.249.183.252; posting-account=ZxV-SgoAAADbox0Kt5vMsxYlC8niBQCt NNTP-Posting-Host: 195.249.183.252 References: <463ad93c-0186-4911-9cd1-92d97b9dc87b@googlegroups.com> <54330891-6568-4469-93ae-7a7825961500@googlegroups.com> <420ec4e9-6af6-49bd-a9f4-8b47ef1f136e@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: cPickle.load vs. file.read+cPickle.loads on large binary files From: andrea.gavana@gmail.com Injection-Date: Wed, 18 Nov 2015 10:31:36 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Xref: csiph.com comp.lang.python:98956 Hi, On Wednesday, November 18, 2015 at 10:00:43 AM UTC+1, Nagy L=E1szl=F3 Zsolt= wrote: > > Perhaps there is a size threshold? You could experiment with different = block=20 > > sizes in the following f.read() replacement: > > > > def read_chunked(f, size=3D2**20): > > read =3D functools.partial(f.read, size) > > return "".join(iter(read, "")) > > > Under win32 platform, my experience is that the fastest way to read > binary file from disk is the mmap module. You should try that too. Thank you for your suggestion. I have tried that now, and with my naive app= roach I have done this: start =3D time.time() fid =3D open(filename, 'r+b') strs =3D mmap.mmap(fid.fileno(), 0, access=3Dmmap.ACCESS_READ)[:] end =3D time.time() print 'mmap.read time:', end-start And it takes about 2.7 seconds. Not a bad improvement :-) . Unfortunately, = when the file is on a network drive, all the other approaches ran at around= 25-30 seconds loading time, while the mmap one clocks at 110 seconds :-( Andrea.