Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #107452
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
| Newsgroups | comp.lang.python |
| Subject | Re: Detecting repeated subsequences of identical items |
| Date | Thu, 21 Apr 2016 15:01:27 +0100 |
| Lines | 75 |
| Message-ID | <mailman.11.1461247315.23626.python-list@python.org> (permalink) |
| References | <571843f9$0$1585$c3e8da3$5496439d@news.astraweb.com> <CAHVvXxTCQT0Tp3-Pd0O5mqSMDA6s=dqj+DDTpkq1AvYoCYs-7Q@mail.gmail.com> <mailman.7.1461228807.23626.python-list@python.org> <5718c457$0$1605$c3e8da3$5496439d@news.astraweb.com> <CAHVvXxQAPP8-31Nb3vM+=O4wYpXEuH1FH3n05N59KnqpC8C0PQ@mail.gmail.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=UTF-8 |
| X-Trace | news.uni-berlin.de wtCpHugHlva/Mf7UVlRCbgGQkRiGtqzC+BFWRvJr7sYQ== |
| Return-Path | <oscar.j.benjamin@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'elif': 0.04; 'python)': 0.05; 'calls.': 0.07; 'overflow': 0.07; 'raises': 0.07; 'repeated': 0.07; 'cc:addr:python-list': 0.09; 'benjamin': 0.09; 'valueerror': 0.09; 'bug': 0.10; 'exception': 0.13; 'stack': 0.13; 'def': 0.13; 'thu,': 0.15; '2016': 0.16; 'cc:name:python list': 0.16; 'itertools': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'sequence.': 0.16; 'subject:Detecting': 0.16; 'to:addr:pearwood.info': 0.16; "to:name:steven d'aprano": 0.16; 'wrote:': 0.16; 'detect': 0.18; 'have:': 0.18; '>>>': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'import': 0.24; 'header:In-Reply-To:1': 0.24; "doesn't": 0.26; 'skip:" 20': 0.26; 'message-id:@mail.gmail.com': 0.27; 'sequence': 0.27; 'function': 0.28; 'key,': 0.29; 'periodic': 0.29; 'url:wikipedia': 0.29; 'raise': 0.29; 'url:wiki': 0.30; "i'd": 0.31; 'usually': 0.33; "d'aprano": 0.33; 'steven': 0.33; 'though.': 0.33; 'received:google.com': 0.35; "isn't": 0.35; 'item': 0.35; 'but': 0.36; 'there': 0.36; 'url:org': 0.36; 'received:209.85': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'received:209': 0.38; 'end': 0.39; 'url:en': 0.39; 'still': 0.40; 'some': 0.40; "you'll": 0.61; 'leading': 0.61; 'matter': 0.63; 'necessarily': 0.63; 'here:': 0.63; 'oscar': 0.84; 'abc': 0.91 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=1d3HcHLXm83DqC+hE5qbdjDz3hdaDnm/3MlUohICoLg=; b=hP3FL+YF9l8C0Raea+8KnvuUKw9UU4aCRj5ZlIeDKOEL9kLyj5tmDx0iNbR45ZB9Yr nRiQ61s2xLX/Qy/mrFHYgHVSTYtEH6p34r57jn6HaexGMp5YC3n4kZ86VMR0l3lCSIkP 2pIMXcqKOdCYaBVRmzCAGRZkT3mVGVzAJQZ6etJnoDEEGJCxHT7mKprhHtUgxyP4Hprv q122CtfLNn29NQQKgsKZWQexxLH85mfFt9Hij5Xc16GhmickQ7MInsXS0aRlDUz8NfBY IlfBsimVjKVRbKltwv2kRpdkRkYCVbDGgTSYSl3oJWbgRED1OFCoDXXndnRFhaTlR44C D1XQ== |
| X-Google-DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=1d3HcHLXm83DqC+hE5qbdjDz3hdaDnm/3MlUohICoLg=; b=DQP+jFTIGyw4oMU0G2nvPjLT7IjXu//LAeNbMutAmoBwQ0nLdJ++zZiGMT0JkSxNt4 BFxABDmtg3vfhTRdzis2DbP80Ck+Kxbv6deJlU6ws4/DErzU1/LO0FwmrEu/4eb6+S9Y v+EF4O6Ny4ntE4WAyXfeSswpx5UQkWAcj3dYEIiznaP4ArUHAaKnW2URo3EoIxG/8fVx /PhCRNzakhpW/pIoUIvjp9lgq2AWzI9oHimPRb9eEHIzExUrBUDVNyPILybtFwFlXSyK jYh/7e7OOketSyI3eEox/4N20RSZzlo8dxZV7+auF5Z0jR5KuF8Pp+2i1Ukl2CVeX06N rejw== |
| X-Gm-Message-State | AOPr4FXhUlPIJ4axnTUNhOWZiCQrZRIzKbS6xobkLt6t7SK2Abiql7ptorkHpcgczXGe8TUSCWWSyUxlYa4u8Q== |
| X-Received | by 10.25.24.170 with SMTP id 42mr4783311lfy.47.1461247306728; Thu, 21 Apr 2016 07:01:46 -0700 (PDT) |
| In-Reply-To | <5718c457$0$1605$c3e8da3$5496439d@news.astraweb.com> |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.22 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| X-Mailman-Original-Message-ID | <CAHVvXxQAPP8-31Nb3vM+=O4wYpXEuH1FH3n05N59KnqpC8C0PQ@mail.gmail.com> |
| X-Mailman-Original-References | <571843f9$0$1585$c3e8da3$5496439d@news.astraweb.com> <CAHVvXxTCQT0Tp3-Pd0O5mqSMDA6s=dqj+DDTpkq1AvYoCYs-7Q@mail.gmail.com> <mailman.7.1461228807.23626.python-list@python.org> <5718c457$0$1605$c3e8da3$5496439d@news.astraweb.com> |
| Xref | csiph.com comp.lang.python:107452 |
Show key headers only | View raw
On 21 April 2016 at 13:15, Steven D'Aprano <steve@pearwood.info> wrote:
> On Thu, 21 Apr 2016 06:53 pm, Oscar Benjamin wrote:
>
>> On 21 April 2016 at 04:07, Steven D'Aprano <steve@pearwood.info> wrote:
>>> I want to group repeated items in a sequence. For example, I can group
>>> repeated sequences of a single item at a time using groupby:
>>>
>>>
>>> from itertools import groupby
>>> for key, group in groupby("AAAABBCDDEEEFFFF"):
>>> group = list(group)
>>> print(key, "count =", len(group))
>>>
>>>
>>> outputs:
>>>
>>> A count = 4
>>> B count = 2
>>> C count = 1
>>> D count = 2
>>> E count = 3
>>> F count = 4
>>>
>>>
>>> Now I want to group subsequences. For example, I have:
>>>
>>> "ABCABCABCDEABCDEFABCABCABCB"
>>>
>>> and I want to group it into repeating subsequences. I can see two ways to
>>> group it:
>>>
>>> ABC ABC ABCDE ABCDE F ABC ABC ABC B
>>
>> There are some algorithms (helpfully shown in Python) here:
>>
>> https://en.wikipedia.org/wiki/Cycle_detection
>
> It's not necessarily a cycle though. Consider a sequence of function calls:
>
> def f(x):
> return g(x)
>
> def g(x):
> if x < 7:
> return h(x)
> elif x < 50:
> return g(x//2)
> else:
> return x+f(x-1)
>
> def h(x):
> raise ValueError # oops, a bug
>
>
> and a function call f(54). That will result in the chain of calls:
>
> f(54) -> g(54) -> f(53) -> g(53) -> f(52) -> g(52) -> f(51) -> g(51) ->
> f(50) -> g(50) -> g(25) -> g(12) -> g(6) -> h(6) raises
>
> So you have that almost-cycle f calls g calls f, but it isn't periodic
> because you break out of it. I'd still like to detect the repeated f->g
> calls.
It doesn't matter that you break out of it. It's periodic for some
part and the algorithms listed on that page can find the cycle. In the
recursive stack overflow case what you'll usually have is
1) A few frames leading up to the start of recursion
2) A long repetitive sequence of frames
3) A few frames at the end showing how the exception was ultimately triggered.
You just need to find the cycle that makes that big long sequence.
--
Oscar
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-21 13:07 +1000
Re: Detecting repeated subsequences of identical items Ethan Furman <ethan@stoneleaf.us> - 2016-04-20 20:57 -0700
Re: Detecting repeated subsequences of identical items Ethan Furman <ethan@stoneleaf.us> - 2016-04-20 21:15 -0700
Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-21 15:37 +1000
Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 06:35 +0000
Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-04-21 18:05 +1000
Re: Detecting repeated subsequences of identical items Nobody <nobody@nowhere.invalid> - 2016-04-21 13:02 +0100
Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 06:49 +0000
Re: Detecting repeated subsequences of identical items Vlastimil Brom <vlastimil.brom@gmail.com> - 2016-04-21 08:54 +0200
Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 07:05 +0000
Re: Detecting repeated subsequences of identical items Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> - 2016-04-21 09:25 +0200
Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 09:53 +0100
Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-21 22:15 +1000
Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 15:01 +0100
Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-22 00:12 +1000
Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-23 01:00 +1000
Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 15:30 +0100
Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-22 01:02 +1000
Re: Detecting repeated subsequences of identical items Serhiy Storchaka <storchaka@gmail.com> - 2016-04-21 14:56 +0300
csiph-web