Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.tele.dk!feed118.news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.021 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'algorithm': 0.04; 'string': 0.09; 'dan': 0.09; 'exist,': 0.09; 'input,': 0.09; 'repeated': 0.09; 'sentence': 0.09; 'def': 0.12; '(),': 0.16; 'already,': 0.16; 'for,': 0.16; 'generators.': 0.16; 'merely': 0.16; 'redundant': 0.16; 'remainder': 0.16; 'subject:combinations': 0.16; 'subject:generator': 0.16; 'to:name:python list': 0.16; 'sender:addr:gmail.com': 0.17; 'fix': 0.17; 'wrote:': 0.18; 'wed,': 0.18; 'all,': 0.19; 'thu,': 0.19; 'fit': 0.20; 'input': 0.22; 'filtering': 0.24; 'header:In-Reply- To:1': 0.27; 'words': 0.29; 'characters': 0.30; "i'm": 0.30; 'reply.': 0.31; "skip:' 10": 0.31; 'away.': 0.31; 'lot.': 0.31; 'this.': 0.32; 'probably': 0.32; 'says': 0.33; 'problem': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'opposed': 0.36; 'words,': 0.36; 'yield': 0.36; 'charset:us- ascii': 0.36; 'thanks': 0.36; 'subject:?': 0.36; 'should': 0.36; 'two': 0.37; 'problems': 0.38; 'nov': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'rather': 0.38; 'short': 0.38; 'anything': 0.39; 'to:addr:python.org': 0.39; 'even': 0.60; 'most': 0.60; 'john': 0.61; "you'll": 0.62; 'story': 0.63; 'interest': 0.64; 'more': 0.64; 'fire': 0.65; 'here': 0.66; '20,': 0.68; 'combining': 0.68; 'subject': 0.69; 'results': 0.69; 'containing': 0.69; 'characters,': 0.84; 'recursive.': 0.84; 'story:': 0.84; 'incredibly': 0.96; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:subject:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=mfQ+rSbKAL/+CkkvukfIrgT+p+rzm4Na2qwEuD1WXjY=; b=sXOwBjvKq6m9o31ZzEOLrnpJg5Z6elia0NhCbNJgYw/2iCNXCAd47CY1VQm5WtFrOZ D1QfA+T7AbfX8FBq6zDUi/KGcExVd7AZjEXlpdWHCWTFEQb6UFekoEg3+IDnXZCW/Uly OfpXyVAjXThgkwWy44L282AGmj6eeoAvi8aiz2LL+FQCzGyQQYna4NJfSh29q074r+FX wwwXl9KCaSUqRW0+5mu8wseEqOOPOGyY+6feg/4Qn4xg8uq0w42EoPROrG5RGCL6SI25 GWECf3F9QIVEqYcWq65pirRNEaDLExAzYzd0zT07yqtNNq0OYy6UqCBKJpmRDrT8P7m6 VoDw== X-Received: by 10.66.197.135 with SMTP id iu7mr14985274pac.149.1385168326118; Fri, 22 Nov 2013 16:58:46 -0800 (PST) Sender: "John O'Hagan" Date: Sat, 23 Nov 2013 11:58:38 +1100 From: John O'Hagan To: Python List Subject: Re: Recursive generator for combinations of a multiset? In-Reply-To: References: <20131121174614.53450d51@mini.home> X-Mailer: Claws Mail 3.9.2 (GTK+ 2.24.22; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 56 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1385168330 news.xs4all.nl 15879 [2001:888:2000:d::a6]:36029 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:60257 On Thu, 21 Nov 2013 12:59:26 -0800 Dan Stromberg wrote: > On Wed, Nov 20, 2013 at 10:46 PM, John O'Hagan > wrote: > > > > > Short story: the subject says it all, so if you have an answer > > already, fire away. Below is the long story of what I'm using it > > for, and why I think it needs to be recursive. It may even be of > > more general interest in terms of filtering the results of > > generators. > > > > I think you probably need permutations rather than combinations. > > Also, I think you'll need to form a word (partitioned off by spaces), > and then check it against a set containing /usr/share/dict/words > before recursing for the remainder of the sentence - this should > speed things up a LOT. Thanks for the reply. If I understand you correctly, you are suggesting permuting the input _characters_ to form words and then seeing if they exist, as opposed to my approach of combining known words and seeing if they are anagrams. (Permutations of words would not help find anagrams as they merely change the word order). Here is an attempt at that: def anagrams(partition, input_string): """Find anagrams which fit given partition of input string length""" if not partition: yield (), input_string return for words, checkstring in anagrams(partition[:-1], input_string): for word in itertools.permutations(checkstring, partition[-1]): word = ''.join(word) if word in WORDS: #WORDS is collection of dictionary words newstring = checkstring for l in word: newstring = newstring.replace(l, '' , 1) yield words + (word,), newstring There are two problems with this. If there are repeated characters in the input, redundant results are produced; a multiset-permutation algorithm would fix this. But the main problem is it is incredibly slow: on my run-of-the-mill laptop, it chokes on anything longer than about 10 characters, spending most of its time rejecting non-words. Or have I misunderstood your suggestion? Regards, -- John