Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #105190

Re: sobering observation, python vs. perl

Path csiph.com!feeder.erje.net!2.eu.feeder.erje.net!newsfeed0.kamp.net!newsfeed.kamp.net!fu-berlin.de!uni-berlin.de!not-for-mail
From Peter Otten <__peter__@web.de>
Newsgroups comp.lang.python
Subject Re: sobering observation, python vs. perl
Date Fri, 18 Mar 2016 10:26:34 +0100
Organization None
Lines 189
Message-ID <mailman.307.1458293216.12893.python-list@python.org> (permalink)
References <nceihb$vpg$1@dont-email.me> <mailman.274.1458230151.12893.python-list@python.org> <ncekqc$vpg$6@dont-email.me>
Mime-Version 1.0
Content-Type text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding 7Bit
X-Trace news.uni-berlin.de JN1XL4yoE+Bn16x7QuvlNQeFC77bsT6DMh1dhdooRCrA==
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'args': 0.04; 'elif': 0.04; 'importerror:': 0.05; 'python3': 0.05; 'sys': 0.05; '"__main__":': 0.07; '__name__': 0.07; 'difference,': 0.07; 'expressions': 0.07; 'filename': 0.07; 'line:': 0.07; 'main()': 0.07; 'shutil': 0.07; 'width': 0.07; '"w")': 0.09; 'iterate': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'res': 0.09; 'python': 0.10; 'python.': 0.11; 'index': 0.13; 'def': 0.13; 'subject:python': 0.14; 'skip:p 40': 0.15; 'thu,': 0.15; '2016': 0.16; 'argparse': 0.16; 'itertools': 0.16; 'jumped': 0.16; 'main():': 0.16; 'match:': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'tempted': 0.16; 'true:': 0.16; 'wrote:': 0.16; 'string': 0.17; 'try:': 0.18; '>>>': 0.20; 'preferred': 0.20; 'saying': 0.22; '(by': 0.22; 'aspect': 0.22; 'parser': 0.22; 'pos': 0.22; 'pass': 0.22; "python's": 0.23; 'import': 0.24; 'tim': 0.24; 'words': 0.24; 'header:User-Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'external': 0.27; 'skip:# 10': 0.27; 'module.': 0.27; 'said,': 0.27; 'yield': 0.27; 'skip:( 20': 0.28; 'regular': 0.29; '-0500,': 0.29; 'cat': 0.29; 'chase': 0.29; 'perl': 0.29; 'random': 0.29; 'print': 0.30; 'skip:[ 10': 0.31; "i'd": 0.31; 'skip:s 30': 0.31; 'except': 0.34; 'skip:d 20': 0.34; 'text': 0.35; 'tasks': 0.35; 'something': 0.35; 'skip:p 30': 0.35; 'but': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'received:org': 0.37; 'files': 0.38; 'skip:o 20': 0.38; 'test': 0.39; 'data': 0.39; 'well.': 0.40; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'some': 0.40; 'real': 0.62; 'more': 0.63; 'great': 0.63; 'mar': 0.65; 'boost': 0.67; 'subject:. ': 0.67; 'smith': 0.76; 'type=int,': 0.84; 'numbers:': 0.91
X-Injected-Via-Gmane http://gmane.org/
X-Gmane-NNTP-Posting-Host p57bd973b.dip0.t-ipconnect.de
User-Agent KNode/4.13.3
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.21
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:105190

Show key headers only | View raw


Charles T. Smith wrote:

> On Thu, 17 Mar 2016 10:52:30 -0500, Tim Chase wrote:
> 
>>> Not saying this will make a great deal of difference, but these two
>> items jumped out at me.  I'd even be tempted to just use string
>> manipulations for the isready aspect as well.  Something like
>> (untested)
> 
> well, I don't want to forgo REs in order to have python's numbers be
> better....

As has been said, for simple text processing tasks string methods are the 
preferred approach in Python. I think this is more for clarity than 
performance.

If you need regular expressions a simple way to boost performance may be to 
use the external regex module.

(By the way, if you are looking for a simple way to iterate over multiple 
files use
for line in fileinput.input():
    ...
)

Some numbers:

$ time perl find.pl data/sample*.txt > r1.txt

real    0m0.504s
user    0m0.466s
sys     0m0.036s
$ time python find.py data/sample*.txt > r2.txt

real    0m2.403s
user    0m2.339s
sys     0m0.059s
$ time python find_regex.py data/sample*.txt > r3.txt

real    0m0.693s
user    0m0.631s
sys     0m0.060s
$ time python find_no_re.py data/sample*.txt > r4.txt

real    0m0.319s
user    0m0.267s
sys     0m0.048s

Python 3 slows down things:

$ time python3 find_no_re.py data/sample*.txt > r5.txt

real    0m0.497s
user    0m0.444s
sys     0m0.051s

The scripts:
$ cat find.pl
#!/usr/bin/env perl

while (<>) {
    if (/(.*) is ready/) {
        $tn = $1;
    }
    elsif (/release_req/) {
        print "$tn\n";
    }
}
$ cat find.py
#!/usr/bin/env python
import sys
import re

def main():
    isready = re.compile ("(.*) is ready").match
    relreq = re.compile (".*release_req").match

    tn = ""
    for fn in sys.argv[1:]:
        with open(fn) as fd:
            for line in fd:
                match = isready(line)
                if match:
                    tn = match.group(1)
                elif relreq(line):
                    print(tn)

main()

$ cat find_regex.py
#!/usr/bin/env python
import sys
import regex as re
[rest the same as find.py]

$ cat find_no_re.py
#!/usr/bin/env python
import sys

def main():
    tn = ""
    for fn in sys.argv[1:]:
        with open(fn) as fd:
            for line in fd:
                if " is ready" in line:
                    tn = line.partition(" is ready")[0]
                elif "release_req" in line:
                    print(tn)

main()

The test data was generated with

$ cat make_test_data.py
#!/usr/bin/env python3
import os
import random
import shutil

from itertools import islice


def make_line_factory(words, line_length, isready):
    choice = random.choice

    def make_line():
        while True:
            line = [choice(words)]
            length = len(line[0])
            while length < line_length:
                word = choice(words)
                line.append(word)
                length += len(word) + 1
            if random.randrange(100) < isready:
                pos = random.randrange(len(line))
                line[pos:pos+1] = ["is", "ready"]
            elif random.randrange(100) < isready:
                pos = random.randrange(len(line))
                line[pos:pos] = ["release_req"]
            yield " ".join(line)

    return make_line


def main():
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("--words", default="/usr/share/dict/words")
    parser.add_argument("--line-length", type=int, default=80)
    parser.add_argument("--num-lines", type=eval, default=10**5)
    parser.add_argument("--num-files", type=int, default=4)
    parser.add_argument("--name-template", default="sample{:0{}}.txt")
    parser.add_argument("--data-folder", default="data")
    parser.add_argument("--remove-data-folder", action="store_true")
    parser.add_argument("--first-match-percent", type=int, default=10)
    try:
        import argcomplete
    except ImportError:
        pass
    else:
        argcomplete.autocomplete(parser)

    args = parser.parse_args()

    if args.remove_data_folder:
        shutil.rmtree(args.data_folder)
    os.mkdir(args.data_folder)

    with open(args.words) as f:
        words = [line.strip() for line in f]

    make_line = make_line_factory(
        words, args.line_length, args.first_match_percent)()

    width = len(str(args.num_files))
    for index in range(1, args.num_files+1):
        filename = os.path.join(
            args.data_folder,
            args.name_template.format(index, width))
        print(filename)
        with open(filename, "w") as f:
            for line in islice(make_line, args.num_lines):
                print(line, file=f)


if __name__ == "__main__":
    main()

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

sobering observation, python vs. perl "Charles T. Smith" <cts.private.yahoo@gmail.com> - 2016-03-17 15:29 +0000
  Re: sobering observation, python vs. perl "Charles T. Smith" <cts.private.yahoo@gmail.com> - 2016-03-17 15:40 +0000
    Re: sobering observation, python vs. perl Marko Rauhamaa <marko@pacujo.net> - 2016-03-17 17:48 +0200
      Re: sobering observation, python vs. perl "Charles T. Smith" <cts.private.yahoo@gmail.com> - 2016-03-17 15:59 +0000
        Re: sobering observation, python vs. perl Marko Rauhamaa <marko@pacujo.net> - 2016-03-17 18:07 +0200
          Re: sobering observation, python vs. perl "Charles T. Smith" <cts.private.yahoo@gmail.com> - 2016-03-17 16:15 +0000
  Re: sobering observation, python vs. perl Marko Rauhamaa <marko@pacujo.net> - 2016-03-17 17:47 +0200
    Re: sobering observation, python vs. perl "Charles T. Smith" <cts.private.yahoo@gmail.com> - 2016-03-17 16:06 +0000
      Re: sobering observation, python vs. perl Marko Rauhamaa <marko@pacujo.net> - 2016-03-17 18:30 +0200
        Re: sobering observation, python vs. perl "Charles T. Smith" <cts.private.yahoo@gmail.com> - 2016-03-17 16:32 +0000
  Re: sobering observation, python vs. perl srinivas devaki <mr.eightnoteight@gmail.com> - 2016-03-17 21:18 +0530
    Re: sobering observation, python vs. perl "Charles T. Smith" <cts.private.yahoo@gmail.com> - 2016-03-17 16:15 +0000
  Re: sobering observation, python vs. perl Tim Chase <python.list@tim.thechases.com> - 2016-03-17 10:52 -0500
    Re: sobering observation, python vs. perl "Charles T. Smith" <cts.private.yahoo@gmail.com> - 2016-03-17 16:08 +0000
      Re: sobering observation, python vs. perl Ethan Furman <ethan@stoneleaf.us> - 2016-03-17 09:21 -0700
        Re: sobering observation, python vs. perl "Charles T. Smith" <cts.private.yahoo@gmail.com> - 2016-03-17 16:36 +0000
          Re: sobering observation, python vs. perl Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-17 17:09 +0000
          Re: sobering observation, python vs. perl Ethan Furman <ethan@stoneleaf.us> - 2016-03-17 10:26 -0700
            Re: sobering observation, python vs. perl "Charles T. Smith" <cts.private.yahoo@gmail.com> - 2016-03-17 17:35 +0000
              Re: sobering observation, python vs. perl Ethan Furman <ethan@stoneleaf.us> - 2016-03-17 11:21 -0700
          DSLs in perl and python (Was sobering observation) Rustom Mody <rustompmody@gmail.com> - 2016-03-17 10:47 -0700
            Re: DSLs in perl and python (Was sobering observation) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-03-17 22:22 +0000
            Re: DSLs in perl and python (Was sobering observation) MRAB <python@mrabarnett.plus.com> - 2016-03-17 22:43 +0000
              Re: DSLs in perl and python (Was sobering observation) Rustom Mody <rustompmody@gmail.com> - 2016-03-18 05:57 -0700
                Re: DSLs in perl and python (Was sobering observation) Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-03-18 15:18 +0200
                Re: DSLs in perl and python (Was sobering observation) Peter Otten <__peter__@web.de> - 2016-03-18 14:22 +0100
                Re: DSLs in perl and python (Was sobering observation) Rustom Mody <rustompmody@gmail.com> - 2016-03-18 19:07 -0700
                DSL design (was DSLs in perl and python) Rustom Mody <rustompmody@gmail.com> - 2016-03-29 06:28 -0700
                Re: DSL design (was DSLs in perl and python) Chris Angelico <rosuav@gmail.com> - 2016-03-30 00:41 +1100
                Re: DSL design (was DSLs in perl and python) Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-03-29 16:45 +0300
                Finding methods, was Re: DSL design (was DSLs in perl and python) Peter Otten <__peter__@web.de> - 2016-03-29 15:51 +0200
      Re: sobering observation, python vs. perl Marko Rauhamaa <marko@pacujo.net> - 2016-03-17 18:34 +0200
        Re: sobering observation, python vs. perl "Charles T. Smith" <cts.private.yahoo@gmail.com> - 2016-03-17 16:42 +0000
          Re: sobering observation, python vs. perl Marko Rauhamaa <marko@pacujo.net> - 2016-03-17 19:08 +0200
            Re: sobering observation, python vs. perl "Charles T. Smith" <cts.private.yahoo@gmail.com> - 2016-03-17 17:25 +0000
              Re: sobering observation, python vs. perl BartC <bc@freeuk.com> - 2016-03-17 17:53 +0000
                Re: sobering observation, python vs. perl Rustom Mody <rustompmody@gmail.com> - 2016-03-17 10:59 -0700
                Re: sobering observation, python vs. perl Marko Rauhamaa <marko@pacujo.net> - 2016-03-17 20:53 +0200
                Re: sobering observation, python vs. perl BartC <bc@freeuk.com> - 2016-03-17 19:06 +0000
                Re: sobering observation, python vs. perl Marko Rauhamaa <marko@pacujo.net> - 2016-03-17 21:11 +0200
            Re: sobering observation, python vs. perl Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-03-17 20:47 +0000
      Re: sobering observation, python vs. perl Peter Otten <__peter__@web.de> - 2016-03-18 10:26 +0100
      Re: sobering observation, python vs. perl Steven D'Aprano <steve@pearwood.info> - 2016-03-18 22:47 +1100

csiph-web