Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #43312

Re: performance of script to write very long lines of random chars

Date 2013-04-11 04:09 +0100
From MRAB <python@mrabarnett.plus.com>
Subject Re: performance of script to write very long lines of random chars
References <24dc619b-7abd-4be3-aa92-f858eb4ab85f@n4g2000yqj.googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.436.1365649974.3114.python-list@python.org> (permalink)

Show all headers | View raw


On 11/04/2013 02:21, gry wrote:
> Dear pythonistas,
>     I am writing a tiny utility to produce a file consisting of a
> specified number of lines of a given length of random ascii
> characters.  I am hoping to find a more time and memory efficient way,
> that is still fairly simple clear, and _pythonic_.
>
> I would like to have something that I can use at both extremes of
> data:
>
>     32M chars per line * 100 lines
> or
>     5 chars per line * 1e8 lines.
>
> E.g., the output of bigrand.py for 10 characters, 2 lines might be:
>
> gw2+M/5t&.
> S[[db/l?Vx
>
> I'm using python 2.7.0 on linux.  I need to use only out-of-the box
> modules, since this has to work on a bunch of different computers.
> At this point I'm especially concerned with the case of a few very
> long lines, since that seems to use a lot of memory, and take a long
> time.
> Characters are a slight subset of the printable ascii's, specified in
> the examples below.  My first naive try was:
>
> from sys import stdout
> import random
> nchars = 32000000
> rows = 10
> avail_chrs =
> '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&
> \'()*+,-./:;<=>?@[\\]^_`{}'
>
> def make_varchar(nchars):
>      return (''.join([random.choice(avail_chrs) for i in
> range(nchars)]))
>
> for l in range(rows):
>      stdout.write(make_varchar(nchars))
>      stdout.write('\n')
>
> This version used around 1.2GB resident/1.2GB virtual of memory for
> 3min 38sec.
>
>
> My second try uses much less RAM, but more CPU time, and seems rather,
> umm, un-pythonic (the array module always seems a little un
> pythonic...)
>
> from sys import stdout
> from array import array
> import random
> nchars = 32000000
> rows = 10
> avail_chrs =
> '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&
> \'()*+,-./:;<=>?@[\\]^_`{}'
> a = array('c', 'X' * nchars)
>
> for l in range(rows):
>      for i in xrange(nchars):
>          a[i] = random.choice(avail_chrs)
>      a.tofile(stdout)
>      stdout.write('\n')
>
> This version using array took 4 min, 29 sec, using 34MB resident/110
> virtual. So, much smaller than the first attempt, but a bit slower.
> Can someone suggest a better code?  And help me understand the
> performance issues here?
>
Names in the global scope are stored in a dict, but local to a function
are stored in slots and can be accessed more quickly.

'avail_chrs' and 'random.choice' are referred to many times, so making
'avail_chrs' local and making a local reference to 'random.choice' will
help.


from sys import stdout
from array import array
import random

def generate():
     avail_chrs = 
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{}'
     rnd = random.choice

     for l in range(rows):
         stdout.write(''.join([rnd(avail_chrs) for i in xrange(nchars)]))
         stdout.write('\n')

nchars = 32000000
rows = 10
generate()

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

performance of script to write very long lines of random chars gry <georgeryoung@gmail.com> - 2013-04-10 18:21 -0700
  Re: performance of script to write very long lines of random chars Chris Angelico <rosuav@gmail.com> - 2013-04-11 11:45 +1000
    Re: performance of script to write very long lines of random chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-11 05:33 +0000
      Re: performance of script to write very long lines of random chars Chris Angelico <rosuav@gmail.com> - 2013-04-11 15:53 +1000
  Re: performance of script to write very long lines of random chars Michael Torrie <torriem@gmail.com> - 2013-04-10 19:52 -0600
    Re: performance of script to write very long lines of random chars gry <georgeryoung@gmail.com> - 2013-04-10 19:40 -0700
      Re: performance of script to write very long lines of random chars Chris Angelico <rosuav@gmail.com> - 2013-04-11 13:14 +1000
  Re: performance of script to write very long lines of random chars MRAB <python@mrabarnett.plus.com> - 2013-04-11 04:09 +0100
  Re: performance of script to write very long lines of random chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-11 07:47 +0000
    Re: performance of script to write very long lines of random chars Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-11 10:47 +0100
      Re: performance of script to write very long lines of random chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-11 10:50 +0000
        Re: performance of script to write very long lines of random chars Robert Kern <robert.kern@gmail.com> - 2013-04-11 16:49 +0530
        Re: performance of script to write very long lines of random chars Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-11 13:05 +0100
        Re: performance of script to write very long lines of random chars Robert Kern <robert.kern@gmail.com> - 2013-04-11 19:06 +0530
        Re: performance of script to write very long lines of random chars Chris Angelico <rosuav@gmail.com> - 2013-04-11 23:56 +1000
  Re: performance of script to write very long lines of random chars Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-11 10:47 +0100

csiph-web