Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #44130
| From | Rodrick Brown <rodrick.brown@gmail.com> |
|---|---|
| Date | 2013-04-22 21:19 -0400 |
| Subject | optomizations |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.944.1366680414.3114.python-list@python.org> (permalink) |
[Multipart message — attachments visible in raw view] - view raw
I would like some feedback on possible solutions to make this script run
faster.
The system is pegged at 100% CPU and it takes a long time to complete.
#!/usr/bin/env python
import gzip
import re
import os
import sys
from datetime import datetime
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-f', dest='inputfile', type=str, help='data file
to parse')
parser.add_argument('-o', dest='outputdir', type=str,
default=os.getcwd(), help='Output directory')
args = parser.parse_args()
if len(sys.argv[1:]) < 1:
parser.print_usage()
sys.exit(-1)
print(args)
if args.inputfile and os.path.exists(args.inputfile):
try:
with gzip.open(args.inputfile) as datafile:
for line in datafile:
line = line.replace('mediacdn.xxx.com', 'media.xxx.com')
line = line.replace('staticcdn.xxx.co.uk', '
static.xxx.co.uk')
line = line.replace('cdn.xxx', 'www.xxx')
line = line.replace('cdn.xxx', 'www.xxx')
line = line.replace('cdn.xx', 'www.xx')
siteurl = line.split()[6].split('/')[2]
line = re.sub(r'\bhttps?://%s\b' % siteurl, "", line, 1)
(day, month, year, hour, minute, second) =
(line.split()[3]).replace('[','').replace(':','/').split('/')
datelog = '{} {} {}'.format(month, day, year)
dateobj = datetime.strptime(datelog, '%b %d %Y')
outfile = '{}{}{}_combined.log'.format(dateobj.year,
dateobj.month, dateobj.day)
outdir = (args.outputdir + os.sep + siteurl)
if not os.path.exists(outdir):
os.makedirs(outdir)
with open(outdir + os.sep + outfile, 'w+') as outf:
outf.write(line)
except IOError, err:
sys.stderr.write("Error unable to read or extract inputfile: {}
{}\n".format(args.inputfile, err))
sys.exit(-1)
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
optomizations Rodrick Brown <rodrick.brown@gmail.com> - 2013-04-22 21:19 -0400
Re: optomizations Roy Smith <roy@panix.com> - 2013-04-22 21:53 -0400
Re: optomizations Dan Stromberg <drsalists@gmail.com> - 2013-04-22 20:15 -0700
Re: optomizations Rodrick Brown <rodrick.brown@gmail.com> - 2013-04-23 00:20 -0400
Re: optomizations Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-23 04:38 +0000
Re: optomizations Chris Angelico <rosuav@gmail.com> - 2013-04-23 12:03 +1000
Re: optomizations Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-23 04:00 +0000
Re: optomizations Chris Angelico <rosuav@gmail.com> - 2013-04-23 14:08 +1000
percent faster than format()? (was: Re: optomizations) Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2013-04-23 09:46 +0200
Re: percent faster than format()? (was: Re: optomizations) Chris “Kwpolska” Warrick <kwpolska@gmail.com> - 2013-04-23 10:26 +0200
Re: percent faster than format()? Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2013-04-23 16:57 +0200
Re: percent faster than format()? Lele Gaifax <lele@metapensiero.it> - 2013-04-23 17:44 +0200
Re: percent faster than format()? (was: Re: optomizations) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-23 14:36 +0000
Re: percent faster than format()? (was: Re: optomizations) Chris Angelico <rosuav@gmail.com> - 2013-04-24 00:52 +1000
csiph-web