Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #107432 > unrolled thread
| Started by | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| First post | 2016-04-21 13:07 +1000 |
| Last post | 2016-04-21 14:56 +0300 |
| Articles | 19 — 10 participants |
Back to article view | Back to comp.lang.python
Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-21 13:07 +1000
Re: Detecting repeated subsequences of identical items Ethan Furman <ethan@stoneleaf.us> - 2016-04-20 20:57 -0700
Re: Detecting repeated subsequences of identical items Ethan Furman <ethan@stoneleaf.us> - 2016-04-20 21:15 -0700
Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-21 15:37 +1000
Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 06:35 +0000
Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-04-21 18:05 +1000
Re: Detecting repeated subsequences of identical items Nobody <nobody@nowhere.invalid> - 2016-04-21 13:02 +0100
Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 06:49 +0000
Re: Detecting repeated subsequences of identical items Vlastimil Brom <vlastimil.brom@gmail.com> - 2016-04-21 08:54 +0200
Re: Detecting repeated subsequences of identical items Michael Selik <michael.selik@gmail.com> - 2016-04-21 07:05 +0000
Re: Detecting repeated subsequences of identical items Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> - 2016-04-21 09:25 +0200
Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 09:53 +0100
Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-21 22:15 +1000
Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 15:01 +0100
Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-22 00:12 +1000
Re: Detecting repeated subsequences of identical items Steven D'Aprano <steve@pearwood.info> - 2016-04-23 01:00 +1000
Re: Detecting repeated subsequences of identical items Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-04-21 15:30 +0100
Re: Detecting repeated subsequences of identical items Chris Angelico <rosuav@gmail.com> - 2016-04-22 01:02 +1000
Re: Detecting repeated subsequences of identical items Serhiy Storchaka <storchaka@gmail.com> - 2016-04-21 14:56 +0300
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-04-21 13:07 +1000 |
| Subject | Detecting repeated subsequences of identical items |
| Message-ID | <571843f9$0$1585$c3e8da3$5496439d@news.astraweb.com> |
I want to group repeated items in a sequence. For example, I can group
repeated sequences of a single item at a time using groupby:
from itertools import groupby
for key, group in groupby("AAAABBCDDEEEFFFF"):
group = list(group)
print(key, "count =", len(group))
outputs:
A count = 4
B count = 2
C count = 1
D count = 2
E count = 3
F count = 4
Now I want to group subsequences. For example, I have:
"ABCABCABCDEABCDEFABCABCABCB"
and I want to group it into repeating subsequences. I can see two ways to
group it:
ABC ABC ABCDE ABCDE F ABC ABC ABC B
giving counts:
(ABC) count = 2
(ABCDE) count = 2
F count = 1
(ABC) count = 3
B repeats 1 time
or:
ABC ABC ABC D E A B C D E F ABC ABC ABC B
giving counts:
(ABC) count = 3
D count = 1
E count = 1
A count = 1
B count = 1
C count = 1
D count = 1
E count = 1
F count = 1
(ABC) count = 3
B count = 1
How can I do this? Does this problem have a standard name and/or solution?
--
Steven
[toc] | [next] | [standalone]
| From | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| Date | 2016-04-20 20:57 -0700 |
| Message-ID | <mailman.0.1461210969.23626.python-list@python.org> |
| In reply to | #107432 |
On 04/20/2016 08:07 PM, Steven D'Aprano wrote: > Now I want to group subsequences. For example, I have: > > "ABCABCABCDEABCDEFABCABCABCB" > > and I want to group it into repeating subsequences. I can see two ways to > group it: > > ABC ABC ABCDE ABCDE F ABC ABC ABC B > > giving counts: > > (ABC) count = 2 > (ABCDE) count = 2 > F count = 1 > (ABC) count = 3 > B repeats 1 time or ABC ABC ABC D E A B C D E F ABC ABC B -- ~Ethan~
[toc] | [prev] | [next] | [standalone]
| From | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| Date | 2016-04-20 21:15 -0700 |
| Message-ID | <mailman.1.1461212038.23626.python-list@python.org> |
| In reply to | #107432 |
On 04/20/2016 08:57 PM, Ethan Furman wrote: > [snip same pattern as Steven wrote] Nevermind. It's obviously time for me to go to bed. :/ -- ~Ethan~
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-04-21 15:37 +1000 |
| Message-ID | <mailman.2.1461217068.23626.python-list@python.org> |
| In reply to | #107432 |
On Thu, Apr 21, 2016 at 1:07 PM, Steven D'Aprano <steve@pearwood.info> wrote: > Now I want to group subsequences. For example, I have: > > "ABCABCABCDEABCDEFABCABCABCB" > > and I want to group it into repeating subsequences. I can see two ways to > group it: > > ABC ABC ABCDE ABCDE F ABC ABC ABC B > > or: > > ABC ABC ABC D E A B C D E F ABC ABC ABC B Interesting. I've *almost* managed to (ab)use re.split for this purpose. A one-step solution can be done with re.match: >>> txt = "ABCABCABCDEABCDEFABCABCABCB" >>> re.match(r'(.+)\1+', txt) <_sre.SRE_Match object; span=(0, 9), match='ABCABCABC'> But split then returns only the grouped part: >>> re.split(r'(.+)\1+', txt) ['', 'ABC', 'DEABCDEF', 'ABC', 'B'] or *all* the grouped parts: >>> re.split(r'((.+)\2+)', txt) ['', 'ABCABCABC', 'ABC', 'DEABCDEF', 'ABCABCABC', 'ABC', 'B'] There's definitely a partial solution happening here, but I can't quite make it work. And no, I don't know if there's a standard name for it. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Michael Selik <michael.selik@gmail.com> |
|---|---|
| Date | 2016-04-21 06:35 +0000 |
| Message-ID | <mailman.3.1461220545.23626.python-list@python.org> |
| In reply to | #107432 |
On Wed, Apr 20, 2016 at 11:11 PM Steven D'Aprano <steve@pearwood.info> wrote: > I want to group [repeated] subsequences. For example, I have: > "ABCABCABCDEABCDEFABCABCABCB" > and I want to group it into repeating subsequences. I can see two > ways... How can I do this? Does this problem have a standard name and/or > solution? > I'm not aware of a standard name. This sounds like an unsupervised learning problem. There's no objectively correct answer unless you add more specificity to the problem statement. Regexes may sound tempting at first, but because a repeating subsequence may have nested repeating subsequences and this can go on infinitely, I think we at least need a push-down automata. I checked out some links for clustering algorithms that work on series subsequences and I found some fun results. Clustering is meaningless! http://www.cs.ucr.edu/~eamonn/meaningless.pdf I think you're in "no free lunch" territory. "Clustering of subsequence time series remains an open issue in time series clustering" http://www.hindawi.com/journals/tswj/2014/312521/ Any more detail on the problem to add constraints?
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2016-04-21 18:05 +1000 |
| Message-ID | <571889d8$0$2820$c3e8da3$76491128@news.astraweb.com> |
| In reply to | #107437 |
On Thursday 21 April 2016 16:35, Michael Selik wrote: > On Wed, Apr 20, 2016 at 11:11 PM Steven D'Aprano <steve@pearwood.info> > wrote: > >> I want to group [repeated] subsequences. For example, I have: >> "ABCABCABCDEABCDEFABCABCABCB" >> and I want to group it into repeating subsequences. I can see two >> ways... How can I do this? Does this problem have a standard name and/or >> solution? >> > > I'm not aware of a standard name. This sounds like an unsupervised > learning problem. There's no objectively correct answer unless you add > more specificity to the problem statement. > > Regexes may sound tempting at first, Ah, I may have mislead you all. I cannot use regexes, since the *actual* sequences I'm working on are sequences (lists, tuples, etc) or iterators of arbitrary items. The items themselves may be strings, but the sequence itself is definitely not a string. I just showed a string for convenience. Sorry about that. So no regexes. > but because a repeating subsequence > may have nested repeating subsequences and this can go on infinitely, I > think we at least need a push-down automata. Fortunately, for my *immediate* problem, I would be good with some fairly arbitrary restrictions on subsequence detection. > I checked out some links for clustering algorithms that work on series > subsequences and I found some fun results. > > Clustering is meaningless! > http://www.cs.ucr.edu/~eamonn/meaningless.pdf > > I think you're in "no free lunch" territory. "Clustering of subsequence > time series remains an open issue in time series clustering" > http://www.hindawi.com/journals/tswj/2014/312521/ > > Any more detail on the problem to add constraints? The specific problem I am trying to solve is that I have a sequence of strings (in this case, error messages from a Python traceback) and I'm looking for repeated groups that may indicate mutually recursive calls. E.g. suppose I have a function f which calls g, and g calls h, and h calls f again, and there's an exception, you will see a traceback in part: File "<stdin>", line 2, in f File "<stdin>", line 5, in g File "<stdin>", line 9, in h File "<stdin>", line 2, in f File "<stdin>", line 5, in g File "<stdin>", line 9, in h File "<stdin>", line 2, in f File "<stdin>", line 5, in g File "<stdin>", line 9, in h File "<stdin>", line 7, in f File "<stdin>", line 5, in g File "<stdin>", line 9, in h etc. Note that I only care about lines which are identical, e.g. if the line numbers differ (as in the last call to f), they will be treated as distinct items. So I'd like to group the above as: File "<stdin>", line 2, in f File "<stdin>", line 5, in g File "<stdin>", line 9, in h *** above 3 calls repeated 3 times *** File "<stdin>", line 7, in f File "<stdin>", line 5, in g File "<stdin>", line 9, in h -- Steve
[toc] | [prev] | [next] | [standalone]
| From | Nobody <nobody@nowhere.invalid> |
|---|---|
| Date | 2016-04-21 13:02 +0100 |
| Message-ID | <pan.2016.04.21.12.03.44.178000@nowhere.invalid> |
| In reply to | #107443 |
On Thu, 21 Apr 2016 18:05:40 +1000, Steven D'Aprano wrote:
> The specific problem I am trying to solve is that I have a sequence of
> strings (in this case, error messages from a Python traceback) and I'm
> looking for repeated groups that may indicate mutually recursive calls. E.g.
> suppose I have a function f which calls g, and g calls h, and h calls f
> again, and there's an exception, you will see a traceback in part:
This is a specific case of finding cycles in a directed graph. But
treating it as such probably isn't useful here, because you're interested
in a specific traversal of that graph rather than the graph itself.
One way to approach it is:
sofar = []
for line in traceback:
if line in sofar:
j = sofar.index(line)
if sofar[:j] == sofar[j:j*2]:
# found repeat
sofar = [line] + sofar
Note that sofar needs to be in reverse order, because list doesn't have
.rindex() or .rfind().
Detecting nested cycles is somewhat harder because given e.g.
ababxabababababxababab
you'd want the five repeats of ab in the middle to be treated as two
repeats of ab followed by three repeats of ab, but there's no way to
spot that until later.
[toc] | [prev] | [next] | [standalone]
| From | Michael Selik <michael.selik@gmail.com> |
|---|---|
| Date | 2016-04-21 06:49 +0000 |
| Message-ID | <mailman.4.1461221408.23626.python-list@python.org> |
| In reply to | #107432 |
On Thu, Apr 21, 2016 at 2:35 AM Michael Selik <michael.selik@gmail.com> wrote: > On Wed, Apr 20, 2016 at 11:11 PM Steven D'Aprano <steve@pearwood.info> > wrote: > >> I want to group [repeated] subsequences. For example, I have: >> "ABCABCABCDEABCDEFABCABCABCB" >> and I want to group it into repeating subsequences. I can see two >> ways... How can I do this? Does this problem have a standard name and/or >> solution? >> > > I'm not aware of a standard name. This sounds like an unsupervised > learning problem. There's no objectively correct answer unless you add more > specificity to the problem statement. > > Regexes may sound tempting at first, but because a repeating subsequence > may have nested repeating subsequences and this can go on infinitely, I > think we at least need a push-down automata. > > I checked out some links for clustering algorithms that work on series > subsequences and I found some fun results. > > Clustering is meaningless! > http://www.cs.ucr.edu/~eamonn/meaningless.pdf > > I think you're in "no free lunch" territory. "Clustering of subsequence > time series remains an open issue in time series clustering" > http://www.hindawi.com/journals/tswj/2014/312521/ > > Any more detail on the problem to add constraints? > Some light reading suggests that you can improve your problem by defining a minimum size for a subsequence to qualify. One paper suggests calling these more interesting repetitions a "motif" to use a music metaphor. Looking for any repetitions results in too many trivial results. Is that valid for your usage?
[toc] | [prev] | [next] | [standalone]
| From | Vlastimil Brom <vlastimil.brom@gmail.com> |
|---|---|
| Date | 2016-04-21 08:54 +0200 |
| Message-ID | <mailman.5.1461221647.23626.python-list@python.org> |
| In reply to | #107432 |
2016-04-21 5:07 GMT+02:00 Steven D'Aprano <steve@pearwood.info>:
> I want to group repeated items in a sequence. For example, I can group
> repeated sequences of a single item at a time using groupby:
>
>
> from itertools import groupby
> for key, group in groupby("AAAABBCDDEEEFFFF"):
> group = list(group)
> print(key, "count =", len(group))
>
>
> outputs:
>
> A count = 4
> B count = 2
> C count = 1
> D count = 2
> E count = 3
> F count = 4
>
>
> Now I want to group subsequences. For example, I have:
>
> "ABCABCABCDEABCDEFABCABCABCB"
>
> and I want to group it into repeating subsequences. I can see two ways to
> group it:
>
> ABC ABC ABCDE ABCDE F ABC ABC ABC B
>
> giving counts:
>
> (ABC) count = 2
> (ABCDE) count = 2
> F count = 1
> (ABC) count = 3
> B repeats 1 time
>
>
> or:
>
> ABC ABC ABC D E A B C D E F ABC ABC ABC B
>
> giving counts:
>
> (ABC) count = 3
> D count = 1
> E count = 1
> A count = 1
> B count = 1
> C count = 1
> D count = 1
> E count = 1
> F count = 1
> (ABC) count = 3
> B count = 1
>
>
>
> How can I do this? Does this problem have a standard name and/or solution?
>
>
>
>
> --
> Steven
>
> --
> https://mail.python.org/mailman/listinfo/python-list
Hi,
if I am not missing something, the latter form of grouping might be
achieved with the following regex:
t="ABCABCABCDEABCDEFABCABCABCB"
grouped = re.findall(r"((?:(\w+?)\2+)|\w+?)", t)
print(grouped)
for grp, subseq in grouped:
if subseq:
print(subseq, grp.count(subseq))
else:
print(grp, "1")
the printed output is:
[('ABCABCABC', 'ABC'), ('D', ''), ('E', ''), ('A', ''), ('B', ''),
('C', ''), ('D', ''), ('E', ''), ('F', ''), ('ABCABCABC', 'ABC'),
('B', '')]
ABC 3
D 1
E 1
A 1
B 1
C 1
D 1
E 1
F 1
ABC 3
B 1
The former one seems to be more tricky...
hth,
vbr
[toc] | [prev] | [next] | [standalone]
| From | Michael Selik <michael.selik@gmail.com> |
|---|---|
| Date | 2016-04-21 07:05 +0000 |
| Message-ID | <mailman.6.1461222341.23626.python-list@python.org> |
| In reply to | #107432 |
On Thu, Apr 21, 2016 at 2:55 AM Vlastimil Brom <vlastimil.brom@gmail.com> wrote: > 2016-04-21 5:07 GMT+02:00 Steven D'Aprano <steve@pearwood.info>: > > I want to group subsequences. > > "ABCABCABCDEABCDEFABCABCABCB" > > ABC ABC ABCDE ABCDE F ABC ABC ABC B > > or: > > ABC ABC ABC D E A B C D E F ABC ABC ABC B > > if I am not missing something, the latter form of grouping might be > achieved with the following regex: [snip] > The former one seems to be more tricky... > Right. If the problem is constrained to say that repeated subsequences can have no nested repeated subsequences, it's much easier to solve. If you had "ABCABCABCABC" should that result in ABC ABC ABC ABC, with 4 repetitions or ABCABC ABCABC with 2 repetitions? In this example, one might say the higher count is obviously better, but I think it depends on the context. Maybe the user is looking for the biggest patterns rather than the biggest counts.
[toc] | [prev] | [next] | [standalone]
| From | Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> |
|---|---|
| Date | 2016-04-21 09:25 +0200 |
| Message-ID | <87inzbjueb.fsf@universite-de-strasbourg.fr.invalid> |
| In reply to | #107432 |
Steven D'Aprano <steve@pearwood.info> writes: > I want to group repeated items in a sequence. For example, I can group > repeated sequences of a single item at a time using groupby: [...] > Now I want to group subsequences. For example, I have: > > "ABCABCABCDEABCDEFABCABCABCB" > > and I want to group it into repeating subsequences. I can see two ways to > group it: > > ABC ABC ABCDE ABCDE F ABC ABC ABC B [...] > or: > > ABC ABC ABC D E A B C D E F ABC ABC ABC B [...] > How can I do this? Does this problem have a standard name and/or solution? Looks like a tough one. I don't think it has a name (I'm not even sure to be able to formally define it). Lets say it is an instance of "longest repeating substring" (https://en.wikipedia.org/wiki/Longest_repeated_substring_problem -- which really does not say much). Anyway, it looks like a job for a suffix trees. Depending on what you are after, you may also be interested in the sequitur algorithm (http://www.sequitur.info/). -- Alain.
[toc] | [prev] | [next] | [standalone]
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| Date | 2016-04-21 09:53 +0100 |
| Message-ID | <mailman.7.1461228807.23626.python-list@python.org> |
| In reply to | #107432 |
On 21 April 2016 at 04:07, Steven D'Aprano <steve@pearwood.info> wrote:
> I want to group repeated items in a sequence. For example, I can group
> repeated sequences of a single item at a time using groupby:
>
>
> from itertools import groupby
> for key, group in groupby("AAAABBCDDEEEFFFF"):
> group = list(group)
> print(key, "count =", len(group))
>
>
> outputs:
>
> A count = 4
> B count = 2
> C count = 1
> D count = 2
> E count = 3
> F count = 4
>
>
> Now I want to group subsequences. For example, I have:
>
> "ABCABCABCDEABCDEFABCABCABCB"
>
> and I want to group it into repeating subsequences. I can see two ways to
> group it:
>
> ABC ABC ABCDE ABCDE F ABC ABC ABC B
There are some algorithms (helpfully shown in Python) here:
https://en.wikipedia.org/wiki/Cycle_detection
Note that those are for a sequence made as x[n+1] = f(x[n]) for some
function f. In your case that's just the function that gets the next
frame up/down in the call stack.
--
Oscar
--
Oscar
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-04-21 22:15 +1000 |
| Message-ID | <5718c457$0$1605$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #107444 |
On Thu, 21 Apr 2016 06:53 pm, Oscar Benjamin wrote:
> On 21 April 2016 at 04:07, Steven D'Aprano <steve@pearwood.info> wrote:
>> I want to group repeated items in a sequence. For example, I can group
>> repeated sequences of a single item at a time using groupby:
>>
>>
>> from itertools import groupby
>> for key, group in groupby("AAAABBCDDEEEFFFF"):
>> group = list(group)
>> print(key, "count =", len(group))
>>
>>
>> outputs:
>>
>> A count = 4
>> B count = 2
>> C count = 1
>> D count = 2
>> E count = 3
>> F count = 4
>>
>>
>> Now I want to group subsequences. For example, I have:
>>
>> "ABCABCABCDEABCDEFABCABCABCB"
>>
>> and I want to group it into repeating subsequences. I can see two ways to
>> group it:
>>
>> ABC ABC ABCDE ABCDE F ABC ABC ABC B
>
> There are some algorithms (helpfully shown in Python) here:
>
> https://en.wikipedia.org/wiki/Cycle_detection
It's not necessarily a cycle though. Consider a sequence of function calls:
def f(x):
return g(x)
def g(x):
if x < 7:
return h(x)
elif x < 50:
return g(x//2)
else:
return x+f(x-1)
def h(x):
raise ValueError # oops, a bug
and a function call f(54). That will result in the chain of calls:
f(54) -> g(54) -> f(53) -> g(53) -> f(52) -> g(52) -> f(51) -> g(51) ->
f(50) -> g(50) -> g(25) -> g(12) -> g(6) -> h(6) raises
So you have that almost-cycle f calls g calls f, but it isn't periodic
because you break out of it. I'd still like to detect the repeated f->g
calls.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| Date | 2016-04-21 15:01 +0100 |
| Message-ID | <mailman.11.1461247315.23626.python-list@python.org> |
| In reply to | #107450 |
On 21 April 2016 at 13:15, Steven D'Aprano <steve@pearwood.info> wrote:
> On Thu, 21 Apr 2016 06:53 pm, Oscar Benjamin wrote:
>
>> On 21 April 2016 at 04:07, Steven D'Aprano <steve@pearwood.info> wrote:
>>> I want to group repeated items in a sequence. For example, I can group
>>> repeated sequences of a single item at a time using groupby:
>>>
>>>
>>> from itertools import groupby
>>> for key, group in groupby("AAAABBCDDEEEFFFF"):
>>> group = list(group)
>>> print(key, "count =", len(group))
>>>
>>>
>>> outputs:
>>>
>>> A count = 4
>>> B count = 2
>>> C count = 1
>>> D count = 2
>>> E count = 3
>>> F count = 4
>>>
>>>
>>> Now I want to group subsequences. For example, I have:
>>>
>>> "ABCABCABCDEABCDEFABCABCABCB"
>>>
>>> and I want to group it into repeating subsequences. I can see two ways to
>>> group it:
>>>
>>> ABC ABC ABCDE ABCDE F ABC ABC ABC B
>>
>> There are some algorithms (helpfully shown in Python) here:
>>
>> https://en.wikipedia.org/wiki/Cycle_detection
>
> It's not necessarily a cycle though. Consider a sequence of function calls:
>
> def f(x):
> return g(x)
>
> def g(x):
> if x < 7:
> return h(x)
> elif x < 50:
> return g(x//2)
> else:
> return x+f(x-1)
>
> def h(x):
> raise ValueError # oops, a bug
>
>
> and a function call f(54). That will result in the chain of calls:
>
> f(54) -> g(54) -> f(53) -> g(53) -> f(52) -> g(52) -> f(51) -> g(51) ->
> f(50) -> g(50) -> g(25) -> g(12) -> g(6) -> h(6) raises
>
> So you have that almost-cycle f calls g calls f, but it isn't periodic
> because you break out of it. I'd still like to detect the repeated f->g
> calls.
It doesn't matter that you break out of it. It's periodic for some
part and the algorithms listed on that page can find the cycle. In the
recursive stack overflow case what you'll usually have is
1) A few frames leading up to the start of recursion
2) A long repetitive sequence of frames
3) A few frames at the end showing how the exception was ultimately triggered.
You just need to find the cycle that makes that big long sequence.
--
Oscar
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-04-22 00:12 +1000 |
| Message-ID | <mailman.12.1461247946.23626.python-list@python.org> |
| In reply to | #107450 |
On Fri, Apr 22, 2016 at 12:01 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote: > In the recursive stack overflow case what you'll usually have is > > 1) A few frames leading up to the start of recursion > 2) A long repetitive sequence of frames > 3) A few frames at the end showing how the exception was ultimately triggered. > > You just need to find the cycle that makes that big long sequence. If the stack got overflowed, there won't usually be a part 3, as part 2 is the bit that hits sys.recursionlimit (unless increasing the recursion limit by a finite number would solve the problem). For other exceptions, yes, this is what you'd see. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-04-23 01:00 +1000 |
| Message-ID | <571a3c78$0$1597$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #107453 |
On Fri, 22 Apr 2016 12:12 am, Chris Angelico wrote:
> On Fri, Apr 22, 2016 at 12:01 AM, Oscar Benjamin
> <oscar.j.benjamin@gmail.com> wrote:
>> In the recursive stack overflow case what you'll usually have is
>>
>> 1) A few frames leading up to the start of recursion
>> 2) A long repetitive sequence of frames
>> 3) A few frames at the end showing how the exception was ultimately
>> triggered.
>>
>> You just need to find the cycle that makes that big long sequence.
I am not convinced that it is a cycle in the technical sense, or that the
two algorithms that Oscar linked to will necessarily find them. They may,
by chance, happen to find them sometimes, but the way the algorithms work
is by detecting periodic behaviour, and this is not periodic.
Look at the comment for Floyd's algorithm here:
https://en.wikipedia.org/wiki/Cycle_detection
# The hare moves twice as quickly as the tortoise and
# the distance between them increases by 1 at each step.
# Eventually they will both be inside the cycle and then, ...
but that is violated in this scenario. The hare can escape the (non-)cycle
before the tortoise even enters it. Even if both enter the "cycle", there's
no guarantee that the termination condition `tortoise == hare` will ever be
true because the cycle doesn't loop and doesn't go back to the beginning.
Hence it is not a cycle.
> If the stack got overflowed, there won't usually be a part 3, as part
> 2 is the bit that hits sys.recursionlimit (unless increasing the
> recursion limit by a finite number would solve the problem).
Don't just think of breaking the recursion limit. You might be 100 calls
deep in some complex chain of function calls when something raises
ValueError. In principle, there might not even be any recursive calls at
all.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| Date | 2016-04-21 15:30 +0100 |
| Message-ID | <mailman.13.1461249058.23626.python-list@python.org> |
| In reply to | #107450 |
On 21 April 2016 at 15:12, Chris Angelico <rosuav@gmail.com> wrote:
> On Fri, Apr 22, 2016 at 12:01 AM, Oscar Benjamin
> <oscar.j.benjamin@gmail.com> wrote:
>> In the recursive stack overflow case what you'll usually have is
>>
>> 1) A few frames leading up to the start of recursion
>> 2) A long repetitive sequence of frames
>> 3) A few frames at the end showing how the exception was ultimately triggered.
>>
>> You just need to find the cycle that makes that big long sequence.
>
> If the stack got overflowed, there won't usually be a part 3, as part
> 2 is the bit that hits sys.recursionlimit (unless increasing the
> recursion limit by a finite number would solve the problem). For other
> exceptions, yes, this is what you'd see.
If you have:
def f(x):
return g(x+1)
def g(x):
x = h(x) # <-- stack can overflow inside here
return f(x+1)
# etc.
So you have a long sequence that goes f, g, f, g but at the end the
stack can overflow while (not recursively) calling h leaving a small
non-cyclic part at the end.
--
Oscar
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-04-22 01:02 +1000 |
| Message-ID | <mailman.16.1461250956.23626.python-list@python.org> |
| In reply to | #107450 |
On Fri, Apr 22, 2016 at 12:30 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote: > On 21 April 2016 at 15:12, Chris Angelico <rosuav@gmail.com> wrote: >> On Fri, Apr 22, 2016 at 12:01 AM, Oscar Benjamin >> <oscar.j.benjamin@gmail.com> wrote: >>> In the recursive stack overflow case what you'll usually have is >>> >>> 1) A few frames leading up to the start of recursion >>> 2) A long repetitive sequence of frames >>> 3) A few frames at the end showing how the exception was ultimately triggered. >>> >>> You just need to find the cycle that makes that big long sequence. >> >> If the stack got overflowed, there won't usually be a part 3, as part >> 2 is the bit that hits sys.recursionlimit (unless increasing the >> recursion limit by a finite number would solve the problem). For other >> exceptions, yes, this is what you'd see. > > If you have: > > def f(x): > return g(x+1) > > def g(x): > x = h(x) # <-- stack can overflow inside here > return f(x+1) > > # etc. > > So you have a long sequence that goes f, g, f, g but at the end the > stack can overflow while (not recursively) calling h leaving a small > non-cyclic part at the end. Right, good point. Forgot about that. So this situation can be triggered by anywhere up to <cycle length> lines up. Not as helpful an optimization now. Oh well. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2016-04-21 14:56 +0300 |
| Message-ID | <mailman.10.1461239778.23626.python-list@python.org> |
| In reply to | #107432 |
On 21.04.16 06:07, Steven D'Aprano wrote: > Now I want to group subsequences. For example, I have: > > "ABCABCABCDEABCDEFABCABCABCB" > > and I want to group it into repeating subsequences. [...] > How can I do this? Does this problem have a standard name and/or solution? This is a part of lossless data compression algorithms. See for example LZ, LZW.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web