Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #107437

Re: Detecting repeated subsequences of identical items

From Michael Selik <michael.selik@gmail.com>
Newsgroups comp.lang.python
Subject Re: Detecting repeated subsequences of identical items
Date 2016-04-21 06:35 +0000
Message-ID <mailman.3.1461220545.23626.python-list@python.org> (permalink)
References <571843f9$0$1585$c3e8da3$5496439d@news.astraweb.com> <CAGgTfkMFqsiqfb-bV7e10D+FNXCh-TLeSr5vP37xdffNy-a0aw@mail.gmail.com>

Show all headers | View raw


On Wed, Apr 20, 2016 at 11:11 PM Steven D'Aprano <steve@pearwood.info>
wrote:

> I want to group [repeated] subsequences. For example, I have:
> "ABCABCABCDEABCDEFABCABCABCB"
> and I want to group it into repeating subsequences. I can see two
> ways... How can I do this? Does this problem have a standard name and/or
> solution?
>

I'm not aware of a standard name. This sounds like an unsupervised learning
problem. There's no objectively correct answer unless you add more
specificity to the problem statement.

Regexes may sound tempting at first, but because a repeating subsequence
may have nested repeating subsequences and this can go on infinitely, I
think we at least need a push-down automata.

I checked out some links for clustering algorithms that work on series
subsequences and I found some fun results.

Clustering is meaningless!
http://www.cs.ucr.edu/~eamonn/meaningless.pdf

I think you're in "no free lunch" territory. "Clustering of subsequence
time series remains an open issue in time series clustering"
http://www.hindawi.com/journals/tswj/2014/312521/

Any more detail on the problem to add constraints?

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-21 13:07 +1000
  Re: Detecting repeated subsequences of identical items Ethan Furman <ethan@stoneleaf.us> - 2016-04-20 20:57 -0700
  Re: Detecting repeated subsequences of identical items Ethan Furman <ethan@stoneleaf.us> - 2016-04-20 21:15 -0700
  Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-21 15:37 +1000
  Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 06:35 +0000
    Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-04-21 18:05 +1000
      Re: Detecting repeated subsequences of identical items Nobody <nobody@nowhere.invalid> - 2016-04-21 13:02 +0100
  Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 06:49 +0000
  Re: Detecting repeated subsequences of identical items Vlastimil Brom <vlastimil.brom@gmail.com> - 2016-04-21 08:54 +0200
  Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 07:05 +0000
  Re: Detecting repeated subsequences of identical items Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> - 2016-04-21 09:25 +0200
  Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 09:53 +0100
    Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-21 22:15 +1000
      Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 15:01 +0100
      Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-22 00:12 +1000
        Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-23 01:00 +1000
      Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 15:30 +0100
      Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-22 01:02 +1000
  Re: Detecting repeated subsequences of identical items Serhiy Storchaka <storchaka@gmail.com> - 2016-04-21 14:56 +0300

csiph-web