Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #44130
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder7.xlned.com!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <rodrick.brown@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.008 |
| X-Spam-Evidence | '*H*': 0.98; '*S*': 0.00; 'args': 0.07; 'parser': 0.07; 'sys': 0.07; '__name__': 0.09; 'received:209.85.219': 0.09; 'try:': 0.09; 'python': 0.11; "'%b": 0.16; "'__main__':": 0.16; 'err:': 0.16; 'ioerror,': 0.16; 'reimport': 0.16; 'skip:{ 30': 0.16; 'year)': 0.16; 'year,': 0.18; '<': 0.19; '8bit%:5': 0.22; 'import': 0.22; 'to:name:python-list@python.org': 0.22; 'month,': 0.24; 'skip:l 30': 0.24; 'skip:{ 20': 0.24; 'script': 0.25; 'message-id:@mail.gmail.com': 0.30; "skip:' 10": 0.31; '"",': 0.31; 'extract': 0.31; "skip:' 40": 0.31; 'file': 0.32; 'run': 0.32; 'skip:# 10': 0.33; 'skip:d 20': 0.34; 'received:209.85': 0.35; 'except': 0.35; 'received:google.com': 0.35; 'complete.': 0.36; 'possible': 0.36; 'received:209': 0.37; 'feedback': 0.38; 'skip:o 20': 0.38; 'skip:& 10': 0.38; '8bit%:4': 0.38; 'to:addr :python-list': 0.38; 'to:addr:python.org': 0.39; 'unable': 0.39; 'skip:p 20': 0.39; 'read': 0.60; 'skip:o 30': 0.61; 'url:co': 0.67; 'line,': 0.68; '100%': 0.77; 'faster.': 0.84; 'hour,': 0.84; 'subject:skip:o 10': 0.84 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:from:date:message-id:subject:to :content-type; bh=92EMwGDxE4IHmfHO0Kgn8L2Ln9WpvLyy0dvhbav1WgQ=; b=m2sx+ztSFEV7WBDFHd2c3SWjd9WBmCRudz+dhmDnSZ5qlP6xhhY/4DHiZ4gsX6U4Sn 0vBVdIGTdziwlkcqMrUQ2qFJarJ0v0ihHRZtEaUtaZR1RP0T8DPuJnxjyoKG6EqNTBai vFjsVc5gVsvmWm1E0cOWZhNSkLHpkSKrIcozqm5f2G4keL7kKyyq+0I2NXXyvwoO4zeC BKz6vN5ilbXgp8fU3OAeZQ13lZiuvz9W5J16cWjgup7qzwdkLLsvcYl395OIjWd5t+is IXrQw2TXrM1q2jeqhFdaTR/8BeAX/UxRX3P8AcR1p7AVbmg6GC8ltj4RUaIcbbVhZ3Rk Y01w== |
| X-Received | by 10.60.17.105 with SMTP id n9mr16210477oed.64.1366679993395; Mon, 22 Apr 2013 18:19:53 -0700 (PDT) |
| MIME-Version | 1.0 |
| From | Rodrick Brown <rodrick.brown@gmail.com> |
| Date | Mon, 22 Apr 2013 21:19:23 -0400 |
| Subject | optomizations |
| To | "python-list@python.org" <python-list@python.org> |
| Content-Type | multipart/alternative; boundary=089e013c682c872fbc04dafcfbb4 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.944.1366680414.3114.python-list@python.org> (permalink) |
| Lines | 135 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1366680414 news.xs4all.nl 2181 [2001:888:2000:d::a6]:47093 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:44130 |
Show key headers only | View raw
[Multipart message — attachments visible in raw view] - view raw
I would like some feedback on possible solutions to make this script run
faster.
The system is pegged at 100% CPU and it takes a long time to complete.
#!/usr/bin/env python
import gzip
import re
import os
import sys
from datetime import datetime
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-f', dest='inputfile', type=str, help='data file
to parse')
parser.add_argument('-o', dest='outputdir', type=str,
default=os.getcwd(), help='Output directory')
args = parser.parse_args()
if len(sys.argv[1:]) < 1:
parser.print_usage()
sys.exit(-1)
print(args)
if args.inputfile and os.path.exists(args.inputfile):
try:
with gzip.open(args.inputfile) as datafile:
for line in datafile:
line = line.replace('mediacdn.xxx.com', 'media.xxx.com')
line = line.replace('staticcdn.xxx.co.uk', '
static.xxx.co.uk')
line = line.replace('cdn.xxx', 'www.xxx')
line = line.replace('cdn.xxx', 'www.xxx')
line = line.replace('cdn.xx', 'www.xx')
siteurl = line.split()[6].split('/')[2]
line = re.sub(r'\bhttps?://%s\b' % siteurl, "", line, 1)
(day, month, year, hour, minute, second) =
(line.split()[3]).replace('[','').replace(':','/').split('/')
datelog = '{} {} {}'.format(month, day, year)
dateobj = datetime.strptime(datelog, '%b %d %Y')
outfile = '{}{}{}_combined.log'.format(dateobj.year,
dateobj.month, dateobj.day)
outdir = (args.outputdir + os.sep + siteurl)
if not os.path.exists(outdir):
os.makedirs(outdir)
with open(outdir + os.sep + outfile, 'w+') as outf:
outf.write(line)
except IOError, err:
sys.stderr.write("Error unable to read or extract inputfile: {}
{}\n".format(args.inputfile, err))
sys.exit(-1)
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
optomizations Rodrick Brown <rodrick.brown@gmail.com> - 2013-04-22 21:19 -0400
Re: optomizations Roy Smith <roy@panix.com> - 2013-04-22 21:53 -0400
Re: optomizations Dan Stromberg <drsalists@gmail.com> - 2013-04-22 20:15 -0700
Re: optomizations Rodrick Brown <rodrick.brown@gmail.com> - 2013-04-23 00:20 -0400
Re: optomizations Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-23 04:38 +0000
Re: optomizations Chris Angelico <rosuav@gmail.com> - 2013-04-23 12:03 +1000
Re: optomizations Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-23 04:00 +0000
Re: optomizations Chris Angelico <rosuav@gmail.com> - 2013-04-23 14:08 +1000
percent faster than format()? (was: Re: optomizations) Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2013-04-23 09:46 +0200
Re: percent faster than format()? (was: Re: optomizations) Chris “Kwpolska” Warrick <kwpolska@gmail.com> - 2013-04-23 10:26 +0200
Re: percent faster than format()? Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2013-04-23 16:57 +0200
Re: percent faster than format()? Lele Gaifax <lele@metapensiero.it> - 2013-04-23 17:44 +0200
Re: percent faster than format()? (was: Re: optomizations) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-23 14:36 +0000
Re: percent faster than format()? (was: Re: optomizations) Chris Angelico <rosuav@gmail.com> - 2013-04-24 00:52 +1000
csiph-web