Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #107438
| From | Michael Selik <michael.selik@gmail.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Detecting repeated subsequences of identical items |
| Date | 2016-04-21 06:49 +0000 |
| Message-ID | <mailman.4.1461221408.23626.python-list@python.org> (permalink) |
| References | <571843f9$0$1585$c3e8da3$5496439d@news.astraweb.com> <CAGgTfkMFqsiqfb-bV7e10D+FNXCh-TLeSr5vP37xdffNy-a0aw@mail.gmail.com> <CAGgTfkPRaq3+s3kuSvGa7k_-tFhYSuimjcm6mfhEWqbd2vY3oQ@mail.gmail.com> |
On Thu, Apr 21, 2016 at 2:35 AM Michael Selik <michael.selik@gmail.com> wrote: > On Wed, Apr 20, 2016 at 11:11 PM Steven D'Aprano <steve@pearwood.info> > wrote: > >> I want to group [repeated] subsequences. For example, I have: >> "ABCABCABCDEABCDEFABCABCABCB" >> and I want to group it into repeating subsequences. I can see two >> ways... How can I do this? Does this problem have a standard name and/or >> solution? >> > > I'm not aware of a standard name. This sounds like an unsupervised > learning problem. There's no objectively correct answer unless you add more > specificity to the problem statement. > > Regexes may sound tempting at first, but because a repeating subsequence > may have nested repeating subsequences and this can go on infinitely, I > think we at least need a push-down automata. > > I checked out some links for clustering algorithms that work on series > subsequences and I found some fun results. > > Clustering is meaningless! > http://www.cs.ucr.edu/~eamonn/meaningless.pdf > > I think you're in "no free lunch" territory. "Clustering of subsequence > time series remains an open issue in time series clustering" > http://www.hindawi.com/journals/tswj/2014/312521/ > > Any more detail on the problem to add constraints? > Some light reading suggests that you can improve your problem by defining a minimum size for a subsequence to qualify. One paper suggests calling these more interesting repetitions a "motif" to use a music metaphor. Looking for any repetitions results in too many trivial results. Is that valid for your usage?
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-21 13:07 +1000
Re: Detecting repeated subsequences of identical items Ethan Furman <ethan@stoneleaf.us> - 2016-04-20 20:57 -0700
Re: Detecting repeated subsequences of identical items Ethan Furman <ethan@stoneleaf.us> - 2016-04-20 21:15 -0700
Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-21 15:37 +1000
Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 06:35 +0000
Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-04-21 18:05 +1000
Re: Detecting repeated subsequences of identical items Nobody <nobody@nowhere.invalid> - 2016-04-21 13:02 +0100
Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 06:49 +0000
Re: Detecting repeated subsequences of identical items Vlastimil Brom <vlastimil.brom@gmail.com> - 2016-04-21 08:54 +0200
Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 07:05 +0000
Re: Detecting repeated subsequences of identical items Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> - 2016-04-21 09:25 +0200
Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 09:53 +0100
Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-21 22:15 +1000
Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 15:01 +0100
Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-22 00:12 +1000
Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-23 01:00 +1000
Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 15:30 +0100
Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-22 01:02 +1000
Re: Detecting repeated subsequences of identical items Serhiy Storchaka <storchaka@gmail.com> - 2016-04-21 14:56 +0300
csiph-web