Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder1.xlned.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.011 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'algorithm': 0.04; 'problem:': 0.07; '22,': 0.09; 'dan': 0.09; 'exist,': 0.09; 'interpreted': 0.09; 'mind,': 0.09; 'sentence': 0.09; 'already,': 0.16; 'combinations': 0.16; 'corrects': 0.16; 'for,': 0.16; 'generators.': 0.16; 'i.e.,': 0.16; 'lengths': 0.16; 'merely': 0.16; 'once.': 0.16; 'redundant': 0.16; 'remainder': 0.16; 'separated': 0.16; 'subject:combinations': 0.16; 'subject:generator': 0.16; 'subroutine': 0.16; 'those,': 0.16; 'to:name:python list': 0.16; 'sender:addr:gmail.com': 0.17; 'wrote:': 0.18; '(not': 0.18; 'wed,': 0.18; 'all,': 0.19; 'producing': 0.19; 'thu,': 0.19; 'seems': 0.21; 'input': 0.22; 'filtering': 0.24; 'space.': 0.24; "i've": 0.25; 'header:In-Reply- To:1': 0.27; 'correct': 0.29; 'words': 0.29; "i'm": 0.30; 'reply.': 0.31; 'away.': 0.31; 'correctly.': 0.31; 'faster,': 0.31; 'keys': 0.31; 'lot.': 0.31; 'produces': 0.31; 'unique,': 0.31; 'lists': 0.32; 'probably': 0.32; 'says': 0.33; 'checking': 0.33; 'fri,': 0.33; 'done.': 0.35; 'one,': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'version': 0.36; 'combination': 0.36; 'opposed': 0.36; "didn't": 0.36; 'charset:us-ascii': 0.36; 'thanks': 0.36; 'subject:?': 0.36; 'should': 0.36; 'list': 0.37; 'ahead': 0.38; 'nov': 0.38; 'question,': 0.38; 'to:addr:python- list': 0.38; 'pm,': 0.38; 'rather': 0.38; 'short': 0.38; 'to:addr:python.org': 0.39; 'unable': 0.39; 'called': 0.40; 'even': 0.60; 'results.': 0.60; 'length': 0.61; 'john': 0.61; 'back': 0.62; "you'll": 0.62; "you've": 0.63; 'story': 0.63; 'skip:n 10': 0.64; 'decided': 0.64; 'interest': 0.64; 'more': 0.64; 'fire': 0.65; 'here': 0.66; 'details,': 0.68; '20,': 0.68; 'combining': 0.68; 'subject': 0.69; 'results': 0.69; 'containing': 0.69; 'dense': 0.84; 'recursive.': 0.84; 'story:': 0.84; 'have.': 0.93; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:subject:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=XI1U9/C11KE8sZu681FPBzQLVEWvkpe6lA3k17jKVqs=; b=OxiRxrjyKFd9DannoSnwedumoKhzfC0j59RAWXq4nuTQUwBJspfOIQYe5WGEw0G3Ms 5y30ip/e8c6GmsICM9gJ/0OU3gMBlxCUTMJy7muTwHrKxLZNp2bahpECJwqmGN6HaNBk gm/9Po6xVPfrlR1a4qy0KTbMG5Yz3ELSuVAQbwjNfv3XHWjopookcWOU0OW4LLnV79eQ CB86fY2gExNYaql8zqidfcIIg1UkxKYaIU9495RL/AGZ7zNVfZ4FjbMaRyLE5KiI/aCd 0LvY/uE27jvjls/KLJoWS6ZpKgDoLC1+mWIQEGBEyCHwnFn3J50FZrDTaVIFjtkq9Ht6 55tQ== X-Received: by 10.66.164.41 with SMTP id yn9mr20547971pab.100.1385261868642; Sat, 23 Nov 2013 18:57:48 -0800 (PST) Sender: "John O'Hagan" Date: Sun, 24 Nov 2013 13:57:34 +1100 From: John O'Hagan To: Python List Subject: Re: Recursive generator for combinations of a multiset? In-Reply-To: References: <20131121174614.53450d51@mini.home> <20131123115838.4016c671@mini.home> X-Mailer: Claws Mail 3.9.2 (GTK+ 2.24.22; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 65 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1385261877 news.xs4all.nl 15952 [2001:888:2000:d::a6]:41975 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:60347 On Fri, 22 Nov 2013 22:33:29 -0800 Dan Stromberg wrote: > On Fri, Nov 22, 2013 at 4:58 PM, John O'Hagan > wrote: > > > On Thu, 21 Nov 2013 12:59:26 -0800 > > Dan Stromberg wrote: > > > > > On Wed, Nov 20, 2013 at 10:46 PM, John O'Hagan > > > wrote: > > > > > > > > > > > Short story: the subject says it all, so if you have an answer > > > > already, fire away. Below is the long story of what I'm using it > > > > for, and why I think it needs to be recursive. It may even be of > > > > more general interest in terms of filtering the results of > > > > generators. > > > > > > > > > > I think you probably need permutations rather than combinations. > > > > > > Also, I think you'll need to form a word (partitioned off by > > > spaces), and then check it against a set > > > containing /usr/share/dict/words before recursing for the > > > remainder of the sentence - this should speed things up a LOT. > > > > Thanks for the reply. If I understand you correctly, you are > > suggesting permuting the input _characters_ to form words and then > > seeing if they exist, as opposed to my approach of combining known > > words and seeing if they are anagrams. (Permutations of words would > > not help find anagrams as they merely change the word order). Here > > is an attempt at that: > > > You've interpreted me correctly. > > However, I was thinking about this in the back of my mind, and > decided it would probably be best to inhale /usr/share/dict/words (if > on Linux), and pull out words of the corrects lengths (as separated > by the blanks) over the correct (possible) alphabet, and permute > Those, afterward checking if they form good anagrams of the original > sentence. This would probably be much faster, since English isn't > that dense of a space. If you look back at my original question, you'll see that's pretty much what I've done. I didn't spell out all the details, but I made a dictionary of wordlength keys containing lists of all dictionary words of that length made of the correct sub-alphabet. But to to recap the problem: to produce non-redundant anagram phrases, I need the cartesian product (not permutations) of these lists if each is unique, but with a subroutine producing multiset combinations if a list is repeated, i.e., if a particular length is called for more than once. The version I have so far is correct but substantially slower than the product-only one, which just goes ahead and produces all the redundant results. This seems counter-intuitive, and my theory is that this is because I am unable to "prune" the non-recursive combination algorithm I currently have. Regards, -- John