Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: Michael Selik <michael.selik@gmail.com>
Newsgroups: comp.lang.python
Subject: Re: Detecting repeated subsequences of identical items
Date: Thu, 21 Apr 2016 07:05:29 +0000
Lines: 24
Message-ID: <mailman.6.1461222341.23626.python-list@python.org>
References: <571843f9$0$1585$c3e8da3$5496439d@news.astraweb.com> <CAHzaPEPGvxoMAO7f0=DLBDf=dS1AhTP55Pzm_5ANbkOQ-sZAUA@mail.gmail.com> <CAGgTfkN9T6m51gMuOHAGDEvHi26MAyRVoUC_YX__RV8S7+SsEw@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
In-Reply-To: <CAHzaPEPGvxoMAO7f0=DLBDf=dS1AhTP55Pzm_5ANbkOQ-sZAUA@mail.gmail.com>
Precedence: list
Xref: csiph.com comp.lang.python:107440

On Thu, Apr 21, 2016 at 2:55 AM Vlastimil Brom <vlastimil.brom@gmail.com>
wrote:

> 2016-04-21 5:07 GMT+02:00 Steven D'Aprano <steve@pearwood.info>:
> > I want to group subsequences.
> > "ABCABCABCDEABCDEFABCABCABCB"
> > ABC ABC ABCDE ABCDE F ABC ABC ABC B
> > or:
> > ABC ABC ABC D E A B C D E F ABC ABC ABC B
>
> if I am not missing something, the latter form of grouping might be
> achieved with the following regex: [snip]
> The former one seems to be more tricky...
>

Right. If the problem is constrained to say that repeated subsequences can
have no nested repeated subsequences, it's much easier to solve.

If you had "ABCABCABCABC" should that result in
ABC ABC ABC ABC, with 4 repetitions
or ABCABC ABCABC with 2 repetitions?
In this example, one might say the higher count is obviously better, but I
think it depends on the context. Maybe the user is looking for the biggest
patterns rather than the biggest counts.