Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <20131127212512.21dbd3a3@mini.home>
References: <20131121174614.53450d51@mini.home> <CAHVvXxQ0Kdd91nCmmz6fw0fMiA=1nrPT=DLZAxhrFNhsuS89DA@mail.gmail.com> <20131122000115.3eb9e560@mini.home> <CAHVvXxT=ZSrLiuRGZP_UFZbgQatzfKUpGCHyzA=vVDtDc0BDrg@mail.gmail.com> <20131126171841.521bf1c8@mini.home> <CAHVvXxS9NCzD_DEOT_t9Hg=SKC_FZ4AyeXfOV18SygCT=KmsPQ@mail.gmail.com> <20131127212512.21dbd3a3@mini.home>
From: Oscar Benjamin <oscar.j.benjamin@gmail.com>
Date: Wed, 27 Nov 2013 22:50:04 +0000
Subject: Re: Recursive generator for combinations of a multiset?
To: "John O'Hagan" <research@johnohagan.com>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Python List <python-list@python.org>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3326.1385592633.18130.python-list@python.org>
Lines: 56
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:60645

On 27 November 2013 10:25, John O'Hagan <research@johnohagan.com> wrote:
> On Tue, 26 Nov 2013 10:33:06 +0000
> Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
>
> I simplified it a bit more to this:
>
> def word_sequences(prepend, target, subwords):
>     """subwords is a list of lists of subwords grouped by length,
>         in order of length"""
>     for words in subwords:
>         for word in words:
>             recurse_target = subset_subtract(target, word)
>             if recurse_target:
>                 yield from word_sequences(prepend + ' ' + word,
>                         recurse_target, subwords[:len(recurse_target)])
>             elif recurse_target == '':
>                 yield prepend + ' ' + word
>
> with just one function to do the subset testing:
>
> def subset_subtract(target, word):
>     for i in word:
>         if i in target:
>             target = target.replace(i, '' ,1)
>         else:
>             return
>     return target
>
> Speed-wise it is somewhat faster than any of my non-duplicate-producing
> attempts, but still way slower than the current champion, the redundant
> cartesian product-only version.
>
> However, because it iterates through all the remaining words on each
> recursion, it seems to produce n! of each unique result, where n in the
> number of words in the result, so this is the new champion as far as
> redundancy is concerned. I'll keep working on it, the totally
> different approach is interesting.

Whoops, I guess this is what happens when you send untested
(pseudo-)code out. It needs an outer helper function that can do
something like:

def word_sequences_top(target, subwords):
    for word in copy(subwords):
        recurse_target = multiset_subtrace(target,word)
        yield from word_sequences(words, recurse_target, subwords)
        remove_word_from_list(word, subwords)

This way we yield all matches involving the word once and then go on
to all matches that don't include the word.

Also the partition length logic from your original version can be used
in word_sequences to prune recursion branches.


Oscar