Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #107437
| Path | csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Michael Selik <michael.selik@gmail.com> |
| Newsgroups | comp.lang.python |
| Subject | Re: Detecting repeated subsequences of identical items |
| Date | Thu, 21 Apr 2016 06:35:33 +0000 |
| Lines | 29 |
| Message-ID | <mailman.3.1461220545.23626.python-list@python.org> (permalink) |
| References | <571843f9$0$1585$c3e8da3$5496439d@news.astraweb.com> <CAGgTfkMFqsiqfb-bV7e10D+FNXCh-TLeSr5vP37xdffNy-a0aw@mail.gmail.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=UTF-8 |
| X-Trace | news.uni-berlin.de kjrj+OiTj8w/ObOrjnO2DwuyNVdl0F3Nm65DTZ/986ig== |
| Return-Path | <michael.selik@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.038 |
| X-Spam-Evidence | '*H*': 0.92; '*S*': 0.00; 'subsequence': 0.07; 'wed,': 0.15; '2016': 0.16; 'clustering': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'statement.': 0.16; 'subject:Detecting': 0.16; 'subsequences': 0.16; 'to:addr:pearwood.info': 0.16; "to:name:steven d'aprano": 0.16; 'wrote:': 0.16; 'have:': 0.18; 'nested': 0.18; 'first,': 0.20; 'to:2**1': 0.21; 'url:edu': 0.24; 'header:In-Reply-To:1': 0.24; 'skip:" 20': 0.26; 'least': 0.27; 'message-id:@mail.gmail.com': 0.27; 'correct': 0.28; '"no': 0.29; "i'm": 0.30; 'skip:& 30': 0.30; 'checked': 0.31; 'problem': 0.33; "d'aprano": 0.33; 'steven': 0.33; 'open': 0.33; 'this?': 0.34; 'add': 0.34; 'received:google.com': 0.35; 'received:74.125.82': 0.35; 'problem.': 0.35; 'but': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'detail': 0.38; 'does': 0.39; 'to:addr:python.org': 0.40; 'some': 0.40; 'more': 0.63; 'url:pdf': 0.64; 'series': 0.65; '20,': 0.66; 'url:2014': 0.66; 'results.': 0.67; 'sound': 0.72; 'sounds': 0.76; 'territory.': 0.84; 'url:ucr': 0.84 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=pNwdSuJ78KlDrpBiTzj2Wf+Xjo3nHzMZrh196CLeV+E=; b=yF3sXEtgz61G+QW5qU9amO+h18JraCaV0jmC1WFxzVnxXUWsZaC4O/q6WRVbMGSqaD ig4I0YW9nORKKazDbUtNcqfYsny7iFmeH6hcsNwIMDOQIRttvsgjbRq3lQ1yBD+mkyUZ 7HrjEPOwa8AZ1j7jYxwg07evr4wwopRRC2SkcEpQBEzV7NQCBurentrDP8Qqy8vDBHnO +leLeQCndZpJzcn3G5qIvH8IS9W+bd/tqW9aWQo6rit5Bl2ZTKMlVJNzlwj6X2tK8xr6 teeLobM8F62wULQuoGuR5R/rS0ts6qEpfxAJnRsQd/ec/MPq1uIzLzAwMprdPvYvyqP5 4uWg== |
| X-Google-DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=pNwdSuJ78KlDrpBiTzj2Wf+Xjo3nHzMZrh196CLeV+E=; b=hkZ3BL3NjPfls6yHA5xgzM2Rsz1OJxsbAbGMUJzcmHls2oCRaT87S37Q7o79ABV8ie bp6VWh9oFgH/apT9Mdxh7Gl3pnN8LBV7NmDdkSXc7b7udVwfCvJNugfl0EuNjc+dd2xw yswM0fiimJNTYfMKK/S9bjdLCjoDaE8NMhIuW0xFkAjXfuHKbn6F0GYKEWwNR+uTVwKD mOu4Cstx4Ud1TlbgztsylPepwJKkINySo7lSb4sBzxMYm77EEm5QhLyJe833/zfHCx83 ALSTPK51P2SMCSwba6ZyHSK16zeZE84DXwvheTvar5UfUismj6ShTqn/OPptcH9Ryevk zwcQ== |
| X-Gm-Message-State | AOPr4FUcBvGFFXQOx8dHYtSqR7AjhUZXn+o8SCXdlChFtvFtuMSSWDrT4bbuO34sfqVdxJyMFtgYELeJt8z8fg== |
| X-Received | by 10.195.6.65 with SMTP id cs1mr6205820wjd.8.1461220543761; Wed, 20 Apr 2016 23:35:43 -0700 (PDT) |
| In-Reply-To | <571843f9$0$1585$c3e8da3$5496439d@news.astraweb.com> |
| X-Content-Filtered-By | Mailman/MimeDel 2.1.22 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.22 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| X-Mailman-Original-Message-ID | <CAGgTfkMFqsiqfb-bV7e10D+FNXCh-TLeSr5vP37xdffNy-a0aw@mail.gmail.com> |
| X-Mailman-Original-References | <571843f9$0$1585$c3e8da3$5496439d@news.astraweb.com> |
| Xref | csiph.com comp.lang.python:107437 |
Show key headers only | View raw
On Wed, Apr 20, 2016 at 11:11 PM Steven D'Aprano <steve@pearwood.info> wrote: > I want to group [repeated] subsequences. For example, I have: > "ABCABCABCDEABCDEFABCABCABCB" > and I want to group it into repeating subsequences. I can see two > ways... How can I do this? Does this problem have a standard name and/or > solution? > I'm not aware of a standard name. This sounds like an unsupervised learning problem. There's no objectively correct answer unless you add more specificity to the problem statement. Regexes may sound tempting at first, but because a repeating subsequence may have nested repeating subsequences and this can go on infinitely, I think we at least need a push-down automata. I checked out some links for clustering algorithms that work on series subsequences and I found some fun results. Clustering is meaningless! http://www.cs.ucr.edu/~eamonn/meaningless.pdf I think you're in "no free lunch" territory. "Clustering of subsequence time series remains an open issue in time series clustering" http://www.hindawi.com/journals/tswj/2014/312521/ Any more detail on the problem to add constraints?
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-21 13:07 +1000
Re: Detecting repeated subsequences of identical items Ethan Furman <ethan@stoneleaf.us> - 2016-04-20 20:57 -0700
Re: Detecting repeated subsequences of identical items Ethan Furman <ethan@stoneleaf.us> - 2016-04-20 21:15 -0700
Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-21 15:37 +1000
Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 06:35 +0000
Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-04-21 18:05 +1000
Re: Detecting repeated subsequences of identical items Nobody <nobody@nowhere.invalid> - 2016-04-21 13:02 +0100
Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 06:49 +0000
Re: Detecting repeated subsequences of identical items Vlastimil Brom <vlastimil.brom@gmail.com> - 2016-04-21 08:54 +0200
Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 07:05 +0000
Re: Detecting repeated subsequences of identical items Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> - 2016-04-21 09:25 +0200
Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 09:53 +0100
Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-21 22:15 +1000
Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 15:01 +0100
Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-22 00:12 +1000
Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-23 01:00 +1000
Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 15:30 +0100
Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-22 01:02 +1000
Re: Detecting repeated subsequences of identical items Serhiy Storchaka <storchaka@gmail.com> - 2016-04-21 14:56 +0300
csiph-web