Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #28475
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Subject | Re: Comparing strings from the back? |
| Date | 2012-09-05 11:48 +0200 |
| Organization | None |
| References | <roy-B37C28.21540103092012@news.panix.com> <k272gn$8ls$1@ger.gmane.org> <CAPTjJmoQxsbK3q6U01mTyUsROMay=yicz+g5dxnsEHDCdfNC4Q@mail.gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.228.1346838532.27098.python-list@python.org> (permalink) |
Chris Angelico wrote:
> On Wed, Sep 5, 2012 at 6:29 PM, Peter Otten <__peter__@web.de> wrote:
>> comparing every pair in a sample of 1000 8-char words
>> taken from '/usr/share/dict/words'
>>
>> head
>> 1: 477222 ************************************************************
>> 2: 18870 **
>> ...
tail
1: 386633 ************************************************************
2: 74966 ***********
> Not understanding this. What are the statistics,
I measured how many chars they have in common for all combinations of 1000
words taken from /usr/share/dict/words.
477222 pairs have one char in common, 18870 pairs have two chars in common
when compared from the *beginning* of the string.
386633 pairs have one char in common, 74966 pairs have two chars in common
when compared from the *end* of the string.
and what (if it's not obvious from the previous answer) do they prove?
They demonstrate that for two words from that particular corpus it is likely
that a[::-1] == b[::-1] has to take more (1.34 versus 1.05 on average)
characters into account than a == b, i. e. comparing from the back should be
slower rather than faster.
If that doesn't help, here's the code ;)
import random
import itertools
from contextlib import contextmanager
from collections import defaultdict
@contextmanager
def open_corpus(args):
with open(args.name) as instream:
words = (line.strip() for line in instream)
#words = (word for word in words if not word.endswith("'s"))
yield words
def pick(items, count):
items = list(items)
return random.sample(items, count)
def count_common(a, b):
i = 0
for i, (x, y) in enumerate(zip(a, b), 1):
if x != y:
break
return i
def corpus(args):
with open_corpus(args) as words:
wordlens = (len(line.strip()) for line in words)
freq = defaultdict(int)
for wordlen in wordlens:
freq[wordlen] += 1
show_histogram(freq)
def show_histogram(freq):
peak = max(freq.values())
freq_width = len(str(peak))
max_freq = max(freq)
len_width = len(str(max_freq))
for i in range(min(freq), max_freq+1):
value = freq[i]
print("{:{}}: {:{}} {}".format(
i, len_width, value,
freq_width, "*" * (60 * value // peak)))
def compare(args):
with open_corpus(args) as words:
words = pick(
(word for word in words if len(word) == args.word_len),
args.sample_size)
n = 0
head_common = defaultdict(int)
tail_common = defaultdict(int)
n_tail = n_head = 0
for n, (a, b) in enumerate(itertools.combinations(words, 2), 1):
cc = count_common(a, b)
n_head += cc
head_common[cc] += 1
ccr = count_common(a[::-1], b[::-1])
n_tail += ccr
tail_common[ccr] += 1
print(n, "combinations")
print("average common chars (head) {:.3}".format(n_head/n))
print("average common chars (tail) {:.3}".format(n_tail/n))
print("comparing every pair in a "
"sample of {sample} {len}-char words\n"
"taken from {name!r}".format(
sample=args.sample_size,
len=args.word_len,
name=args.name))
print()
print("head")
show_histogram(head_common)
print()
print("tail")
show_histogram(tail_common)
def main():
import argparse
parser = argparse.ArgumentParser()
parsers = parser.add_subparsers()
pcorpus = parsers.add_parser("corpus")
pcorpus.add_argument("name")
pcorpus.set_defaults(func=corpus)
pcompare = parsers.add_parser("compare")
pcompare.add_argument("name")
pcompare.add_argument("--sample-size", type=int, default=1000)
pcompare.add_argument("--word-len", type=int, default=8)
pcompare.set_defaults(func=compare)
args = parser.parse_args()
args.func(args)
if __name__ == "__main__":
main()
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Comparing strings from the back? Roy Smith <roy@panix.com> - 2012-09-03 21:54 -0400
Re: Comparing strings from the back? Chris Angelico <rosuav@gmail.com> - 2012-09-04 12:07 +1000
Re: Comparing strings from the back? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-04 02:17 +0000
Re: Comparing strings from the back? Dan Sommers <dan@tombstonezero.net> - 2012-09-03 21:56 -0700
Re: Comparing strings from the back? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-04 08:50 +0100
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-04 18:32 +0200
Re: Comparing strings from the back? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-04 18:07 +0000
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-05 11:17 +0200
Re: Comparing strings from the back? Chris Angelico <rosuav@gmail.com> - 2012-09-05 07:59 +1000
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-05 11:24 +0200
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-05 11:43 +0200
Re: Comparing strings from the back? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-05 14:30 +0000
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-05 16:51 +0200
Re: Comparing strings from the back? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-05 16:24 +0000
Re: Comparing strings from the back? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-09-05 22:47 +0000
Re: Comparing strings from the back? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-06 08:33 +0000
Re: Comparing strings from the back? Dave Angel <d@davea.name> - 2012-09-06 06:07 -0400
Re: Comparing strings from the back? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-07 04:06 +0000
Re: Comparing strings from the back? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-09-07 19:10 +0000
Re: Comparing strings from the back? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-08 00:55 +0000
Re: Comparing strings from the back? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-09-08 11:53 +0000
RE: Comparing strings from the back? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-09-13 18:39 +0000
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-13 15:37 -0400
Re: Comparing strings from the back? alex23 <wuwei23@gmail.com> - 2012-09-13 20:48 -0700
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-14 00:46 -0400
Re: Comparing strings from the back? alex23 <wuwei23@gmail.com> - 2012-09-13 21:54 -0700
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-14 01:38 -0400
Re: Comparing strings from the back? alex23 <wuwei23@gmail.com> - 2012-09-13 23:06 -0700
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-14 04:03 -0400
Re: Comparing strings from the back? alex23 <wuwei23@gmail.com> - 2012-09-14 01:20 -0700
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-14 04:53 -0400
Re: Comparing strings from the back? alex23 <wuwei23@gmail.com> - 2012-09-14 03:26 -0700
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-14 07:36 -0400
Re: Comparing strings from the back? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-14 10:16 +0000
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-14 07:43 -0400
RE: Comparing strings from the back? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-09-14 22:43 +0000
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-14 23:10 -0400
Re: Comparing strings from the back? alex23 <wuwei23@gmail.com> - 2012-09-16 18:11 -0700
Re: Comparing strings from the back? Chris Angelico <rosuav@gmail.com> - 2012-09-17 14:05 +1000
Re: Comparing strings from the back? alex23 <wuwei23@gmail.com> - 2012-09-16 23:06 -0700
Re: Comparing strings from the back? Ethan Furman <ethan@stoneleaf.us> - 2012-09-17 13:35 -0700
Re: Comparing strings from the back? Neil Hodgson <nhodgson@iinet.net.au> - 2012-09-18 09:14 +1000
Re: Comparing strings from the back? Ethan Furman <ethan@stoneleaf.us> - 2012-09-18 08:12 -0700
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-18 11:55 -0400
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-18 11:59 -0400
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-18 12:17 -0400
Re: Comparing strings from the back? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-19 00:47 +0000
Re: Comparing strings from the back? Chris Angelico <rosuav@gmail.com> - 2012-09-19 02:20 +1000
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-18 16:40 -0400
Re: Comparing strings from the back? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-19 00:48 +0100
Re: Comparing strings from the back? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-13 20:53 +0100
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-13 17:06 -0400
Re: Comparing strings from the back? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-14 03:39 +0000
Re: Comparing strings from the back? Chris Angelico <rosuav@gmail.com> - 2012-09-14 14:15 +1000
Re: Comparing strings from the back? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-13 22:17 +0100
Re: Comparing strings from the back? Dwight Hutto <dwightdhutto@gmail.com> - 2012-09-13 17:35 -0400
RE: Comparing strings from the back? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-09-14 21:32 +0000
Re: Comparing strings from the back? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-09-07 19:40 +0000
Re: Comparing strings from the back? Gelonida N <gelonida@gmail.com> - 2012-09-08 17:52 +0200
Re: Comparing strings from the back? Duncan Booth <duncan.booth@invalid.invalid> - 2012-09-10 08:59 +0000
Re: Comparing strings from the back? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-10 13:45 +0000
Re: Comparing strings from the back? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-09-10 14:06 +0000
Re: Comparing strings from the back? Duncan Booth <duncan.booth@invalid.invalid> - 2012-09-11 09:51 +0000
Re: Comparing strings from the back? Terry Reedy <tjreedy@udel.edu> - 2012-09-11 11:55 -0400
Re: Comparing strings from the back? Chris Angelico <rosuav@gmail.com> - 2012-09-11 00:26 +1000
Re: Comparing strings from the back? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-09-10 14:32 +0000
Re: Comparing strings from the back? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-09-10 14:43 +0000
Re: Comparing strings from the back? Chris Angelico <rosuav@gmail.com> - 2012-09-11 00:56 +1000
Re: Comparing strings from the back? Duncan Booth <duncan.booth@invalid.invalid> - 2012-09-11 09:41 +0000
Re: Comparing strings from the back? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-09-06 12:04 +0100
Re: Comparing strings from the back? Steve Howell <showell30@yahoo.com> - 2012-09-14 17:51 -0700
Re: Comparing strings from the back? Roy Smith <roy@panix.com> - 2012-09-06 08:13 -0400
Re: Comparing strings from the back? Chris Angelico <rosuav@gmail.com> - 2012-09-06 22:29 +1000
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-06 15:43 +0200
Re: Comparing strings from the back? Dave Angel <d@davea.name> - 2012-09-06 10:23 -0400
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-06 16:33 +0200
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-06 16:42 +0200
Re: Comparing strings from the back? Dave Angel <d@davea.name> - 2012-09-06 11:54 -0400
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-06 16:34 +0200
Re: Comparing strings from the back? Gelonida N <gelonida@gmail.com> - 2012-09-08 17:50 +0200
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-06 15:37 +0200
Re: Comparing strings from the back? Chris Angelico <rosuav@gmail.com> - 2012-09-07 00:39 +1000
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-06 17:36 +0200
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-06 17:44 +0200
Re: Comparing strings from the back? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-07 04:07 +0000
Re: Comparing strings from the back? Terry Reedy <tjreedy@udel.edu> - 2012-09-04 01:13 -0400
Re: Comparing strings from the back? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-09-04 08:56 +0100
Re: Comparing strings from the back? Alain Ketterlin <alain@dpt-info.u-strasbg.fr> - 2012-09-04 11:58 +0200
Re: Comparing strings from the back? Neil Hodgson <nhodgson@iinet.net.au> - 2012-09-05 12:18 +1000
Re: Comparing strings from the back? MRAB <python@mrabarnett.plus.com> - 2012-09-05 03:39 +0100
Re: Comparing strings from the back? Roy Smith <roy@panix.com> - 2012-09-04 22:48 -0400
Re: Comparing strings from the back? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-09-05 16:33 +0200
Re: Comparing strings from the back? Peter Otten <__peter__@web.de> - 2012-09-05 10:29 +0200
Re: Comparing strings from the back? Chris Angelico <rosuav@gmail.com> - 2012-09-05 18:33 +1000
Re: Comparing strings from the back? Peter Otten <__peter__@web.de> - 2012-09-05 11:48 +0200
Re: Comparing strings from the back? Peter Otten <__peter__@web.de> - 2012-09-05 17:45 +0200
Re: Comparing strings from the back? Dan Goodman <dg.gmane@thesamovar.net> - 2012-09-10 18:07 +0200
Re: Comparing strings from the back? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-09-10 16:33 +0000
Re: Comparing strings from the back? Dan Goodman <dg.gmane@thesamovar.net> - 2012-09-10 19:32 +0200
Re: Comparing strings from the back? Dan Goodman <dg.gmane@thesamovar.net> - 2012-09-10 19:44 +0200
Re: Comparing strings from the back? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-09-10 21:52 +0000
csiph-web