Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder3.xlned.com!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'python.': 0.02; 'read.': 0.03; 'algorithm': 0.04; 'mrab': 0.05; 'url:pipermail': 0.05; 'string': 0.09; 'dan': 0.09; 'exist,': 0.09; 'input,': 0.09; 'repeated': 0.09; 'sentence': 0.09; 'def': 0.12; 'thread': 0.14; '(),': 0.16; 'already,': 0.16; 'combinations': 0.16; 'cyclic': 0.16; 'for,': 0.16; 'generators.': 0.16; 'itertools': 0.16; 'merely': 0.16; 'redundant': 0.16; 'remainder': 0.16; 'subject:combinations': 0.16; 'subject:generator': 0.16; 'uses,': 0.16; 'sat,': 0.16; 'sender:addr:gmail.com': 0.17; 'fix': 0.17; 'wrote:': 0.18; 'wed,': 0.18; 'all,': 0.19; 'implementing': 0.19; 'thu,': 0.19; 'fit': 0.20; 'seems': 0.21; 'input': 0.22; '+0000': 0.22; 'filtering': 0.24; '(see': 0.26; 'header:In-Reply-To:1': 0.27; 'function': 0.29; 'words': 0.29; "doesn't": 0.30; 'characters': 0.30; 'involving': 0.30; "i'm": 0.30; 'reply.': 0.31; "skip:' 10": 0.31; 'arrangements': 0.31; 'away.': 0.31; 'equivalent.': 0.31; 'lot.': 0.31; 'ordinary': 0.31; 'shelf': 0.31; 'this.': 0.32; 'probably': 0.32; '(including': 0.33; 'says': 0.33; 'url:python': 0.33; 'beginning': 0.33; 'problem': 0.35; 'classes': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'c++': 0.36; 'opposed': 0.36; 'words,': 0.36; 'yield': 0.36; 'charset:us-ascii': 0.36; 'thanks': 0.36; 'subject:?': 0.36; 'url:org': 0.36; 'should': 0.36; 'two': 0.37; 'project': 0.37; 'problems': 0.38; 'nov': 0.38; 'to:addr:python- list': 0.38; 'pm,': 0.38; 'rather': 0.38; 'little': 0.38; 'short': 0.38; 'anything': 0.39; 'does': 0.39; 'aside': 0.39; 'stock': 0.39; 'to:addr:python.org': 0.39; 'url:mail': 0.40; 'how': 0.40; 'even': 0.60; 'read': 0.60; 'disclaimer:': 0.60; 'most': 0.60; 'matter': 0.61; 'john': 0.61; 'here:': 0.62; 'back': 0.62; "you'll": 0.62; 'story': 0.63; 'interest': 0.64; 'more': 0.64; 'fire': 0.65; 'size.': 0.65; 'here': 0.66; '20,': 0.68; 'combining': 0.68; 'smith': 0.68; 'subject': 0.69; 'results': 0.69; 'containing': 0.69; 'characters,': 0.84; 'maths': 0.84; 'musical': 0.84; 'recursive.': 0.84; 'story:': 0.84; 'incredibly': 0.96; 'mrs': 0.96; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:subject:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=4HEePDlWoO9wC+42qiXXejtOq3YJ7LaohW9wmLGTcZo=; b=okSLkQryrZXviqYCA6YvMBNKBsxx/UF42eef40tKZJBzqypyaxkirV8Ta0hgihB5z0 c+Xz2s4OLXIVMOLeIC2zNaQGphO9XpHaZ1ToUKEG/cN9ZGpm9KWJpqC66IZJLlmXdC+Q 7yrBQb6d1tjPTj+34HeqEdJXf48/6AHpvIM006lTcRofVZia/7V0xMxtY5xmbPYAKCmB VbX3Pst/JH42yLijFRnqGWTU3NOyIJ/BOtvl7abTWbJktMfaIboqKtmI0OX0bLupYb+C vCC8naLtaizU8HHWT+gXR61jnybE71cAHOesjlfxuRjLAS7QBRqqzVTyj+FTaXJba0xa 5RdA== X-Received: by 10.68.236.103 with SMTP id ut7mr484800pbc.118.1385260146986; Sat, 23 Nov 2013 18:29:06 -0800 (PST) Sender: "John O'Hagan" Date: Sun, 24 Nov 2013 13:28:58 +1100 From: John O'Hagan To: python-list@python.org Subject: Re: Recursive generator for combinations of a multiset? In-Reply-To: <52902DCE.8030706@mrabarnett.plus.com> References: <20131121174614.53450d51@mini.home> <20131123115838.4016c671@mini.home> <52902DCE.8030706@mrabarnett.plus.com> X-Mailer: Claws Mail 3.9.2 (GTK+ 2.24.22; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 92 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1385260157 news.xs4all.nl 15882 [2001:888:2000:d::a6]:37387 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:60346 On Sat, 23 Nov 2013 04:23:42 +0000 MRAB wrote: > On 23/11/2013 00:58, John O'Hagan wrote: > > On Thu, 21 Nov 2013 12:59:26 -0800 > > Dan Stromberg wrote: > > > >> On Wed, Nov 20, 2013 at 10:46 PM, John O'Hagan > >> wrote: > >> > >> > > >> > Short story: the subject says it all, so if you have an answer > >> > already, fire away. Below is the long story of what I'm using it > >> > for, and why I think it needs to be recursive. It may even be of > >> > more general interest in terms of filtering the results of > >> > generators. > >> > > >> > >> I think you probably need permutations rather than combinations. > >> > >> Also, I think you'll need to form a word (partitioned off by > >> spaces), and then check it against a set > >> containing /usr/share/dict/words before recursing for the > >> remainder of the sentence - this should speed things up a LOT. > > > > Thanks for the reply. If I understand you correctly, you are > > suggesting permuting the input _characters_ to form words and then > > seeing if they exist, as opposed to my approach of combining known > > words and seeing if they are anagrams. (Permutations of words would > > not help find anagrams as they merely change the word order). Here > > is an attempt at that: > > > > def anagrams(partition, input_string): > > """Find anagrams which fit given partition of input string > > length""" if not partition: > > yield (), input_string > > return > > for words, checkstring in anagrams(partition[:-1], > > input_string): for word in itertools.permutations(checkstring, > > partition[-1]): word = ''.join(word) > > if word in WORDS: #WORDS is collection of dictionary > > words newstring = checkstring > > for l in word: > > newstring = newstring.replace(l, '' , 1) > > yield words + (word,), newstring > > > > There are two problems with this. If there are repeated characters > > in the input, redundant results are produced; a multiset-permutation > > algorithm would fix this. But the main problem is it is incredibly > > slow: on my run-of-the-mill laptop, it chokes on anything longer > > than about 10 characters, spending most of its time rejecting > > non-words. > > > > Or have I misunderstood your suggestion? > > > If you want to know how to get unique permutations, have a look here: > > http://mail.python.org/pipermail/python-ideas/2013-October/023610.html > For this particular problem I don't need multiset permutations but multiset combinations (see original post). But that thread was a good read. This is a little OT, but I did need multiset permutations a couple of years back for a project involving the generation of musical structures. There was zero interest here at the time (fair enough!) and I ended up implementing the C++ function "next_permutation" in Python. So it was good to read in that thread that there seems to be some interest in incorporating multiset combinatorics into itertools (including your excellent contribution). IMHO the scepticism there about non-exotic use-cases was unjustified. Leaving aside my probably atypical uses, it crops in many ordinary situations dealing with arrangements of multiple items of several types where each instance of a type is equivalent. Take stock control: when stacking a warehouse shelf it doesn't matter which particular box goes where, only how many of each size. Or timetabling: if Mrs Smith teaches the same students maths on Tuesdays and Thursdays, swapping the classes does nothing. The same goes for cyclic and cyclic-multiset ("necklaces") combinatorics, where the beginning and end of an arrangement is not significant, eg. 24-hour rostering, laying tiles, etc. And musical scales. Disclaimer: I am far from expert on combinatorics but seem to end up using it a lot. -- John