Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'elif': 0.05; 'matches': 0.07; 'remaining': 0.07; 'result,': 0.07; 'logic': 0.09; 'subset': 0.09; 'target,': 0.09; 'cc:addr:python- list': 0.11; 'def': 0.12; "'':": 0.16; 'benjamin': 0.16; 'cc:name:python list': 0.16; 'champion': 0.16; 'grouped': 0.16; 'iterates': 0.16; 'length,': 0.16; 'redundancy': 0.16; 'redundant': 0.16; 'simplified': 0.16; 'subject:combinations': 0.16; 'subject:generator': 0.16; 'wrote:': 0.18; 'bit': 0.19; 'version.': 0.19; 'seems': 0.21; '+0000': 0.22; 'cc:addr:python.org': 0.22; 'helper': 0.24; 'cc:2**0': 0.24; 'this:': 0.26; 'header:In-Reply-To:1': 0.27; 'function': 0.29; 'words': 0.29; 'involving': 0.30; 'message-id:@mail.gmail.com': 0.30; 'lists': 0.32; 'skip:m 30': 0.32; 'guess': 0.33; 'totally': 0.33; 'skip:s 30': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'version': 0.36; 'yield': 0.36; "i'll": 0.36; 'subject:?': 0.36; 'list': 0.37; 'nov': 0.38; 'length': 0.61; 'new': 0.61; 'john': 0.61; 'more': 0.64; 'different': 0.65; 'oscar': 0.84; 'recursion,': 0.84; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=ooOKaVtn3EYqMprGObXFLbyrs1qMgzdb7zG+3aiPSC4=; b=e5mYzsnlmOvRi11Q59V95ulUzSLKrpwgyOrGJfZ/xiHEEcIHoztxCKyv2WqbPsuFis YqPoXIQXoAMLcWemrk79kVumjs5Z8aCf9wUyPDp2Kxgi5hKNqIxAI3XkwU0b/L2h0QcW JeWMy404yufNg0nrnt4CbuQ3ByFaiEY6Cf50/3Uzkh+f326VEn1cTEuKLebYkXNQTsuy oSLh/vCRnhYbgoQpjmbVOuyPB58KwgPKsBzABz5kxvaYDjJOP6EiAcqSXPRUhw6+d3tO MVNwU0mS1HEb8SiaM6Anfh5VJVy4qE1Y4UE1venB8dAI3twqRllzHjW8Fou0rYnwioKW IisA== X-Received: by 10.52.171.79 with SMTP id as15mr30436528vdc.1.1385592624671; Wed, 27 Nov 2013 14:50:24 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20131127212512.21dbd3a3@mini.home> References: <20131121174614.53450d51@mini.home> <20131122000115.3eb9e560@mini.home> <20131126171841.521bf1c8@mini.home> <20131127212512.21dbd3a3@mini.home> From: Oscar Benjamin Date: Wed, 27 Nov 2013 22:50:04 +0000 Subject: Re: Recursive generator for combinations of a multiset? To: "John O'Hagan" Content-Type: text/plain; charset=ISO-8859-1 Cc: Python List X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 56 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1385592633 news.xs4all.nl 15959 [2001:888:2000:d::a6]:51455 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:60645 On 27 November 2013 10:25, John O'Hagan wrote: > On Tue, 26 Nov 2013 10:33:06 +0000 > Oscar Benjamin wrote: > > I simplified it a bit more to this: > > def word_sequences(prepend, target, subwords): > """subwords is a list of lists of subwords grouped by length, > in order of length""" > for words in subwords: > for word in words: > recurse_target = subset_subtract(target, word) > if recurse_target: > yield from word_sequences(prepend + ' ' + word, > recurse_target, subwords[:len(recurse_target)]) > elif recurse_target == '': > yield prepend + ' ' + word > > with just one function to do the subset testing: > > def subset_subtract(target, word): > for i in word: > if i in target: > target = target.replace(i, '' ,1) > else: > return > return target > > Speed-wise it is somewhat faster than any of my non-duplicate-producing > attempts, but still way slower than the current champion, the redundant > cartesian product-only version. > > However, because it iterates through all the remaining words on each > recursion, it seems to produce n! of each unique result, where n in the > number of words in the result, so this is the new champion as far as > redundancy is concerned. I'll keep working on it, the totally > different approach is interesting. Whoops, I guess this is what happens when you send untested (pseudo-)code out. It needs an outer helper function that can do something like: def word_sequences_top(target, subwords): for word in copy(subwords): recurse_target = multiset_subtrace(target,word) yield from word_sequences(words, recurse_target, subwords) remove_word_from_list(word, subwords) This way we yield all matches involving the word once and then go on to all matches that don't include the word. Also the partition length logic from your original version can be used in word_sequences to prune recursion branches. Oscar