Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #2322 > unrolled thread
| Started by | candide <candide@free.invalid> |
|---|---|
| First post | 2011-04-01 02:43 +0200 |
| Last post | 2011-04-01 21:39 +0200 |
| Articles | 7 — 5 participants |
Back to article view | Back to comp.lang.python
Extracting subsequences composed of the same character candide <candide@free.invalid> - 2011-04-01 02:43 +0200
Re: Extracting subsequences composed of the same character MRAB <python@mrabarnett.plus.com> - 2011-04-01 02:16 +0100
Re: Extracting subsequences composed of the same character Roy Smith <roy@panix.com> - 2011-03-31 21:40 -0400
Re: Extracting subsequences composed of the same character Tim Chase <python.list@tim.thechases.com> - 2011-03-31 20:58 -0500
Re: Extracting subsequences composed of the same character Tim Chase <python.list@tim.thechases.com> - 2011-03-31 21:20 -0500
Re: Extracting subsequences composed of the same character Terry Reedy <tjreedy@udel.edu> - 2011-04-01 00:18 -0400
Re: Extracting subsequences composed of the same character candide <candide@free.invalid> - 2011-04-01 21:39 +0200
| From | candide <candide@free.invalid> |
|---|---|
| Date | 2011-04-01 02:43 +0200 |
| Subject | Extracting subsequences composed of the same character |
| Message-ID | <4d952008$0$3943$426a74cc@news.free.fr> |
Suppose you have a string, for instance
"pyyythhooonnn ---> ++++"
and you search for the subquences composed of the same character, here
you get :
'yyy', 'hh', 'ooo', 'nnn', '---', '++++'
It's not difficult to write a Python code that solves the problem, for
instance :
def f(text):
ch=text
r=[]
if not text:
return r
else:
x=ch[0]
i=0
for c in ch:
if c!=x:
if i>1:
r+=[x*i]
x=c
i=1
else:
i+=1
return r+(i>1)*[i*x]
print f("pyyythhooonnn ---> ++++")
I should confess that this code is rather cumbersome so I was looking
for an alternative. I imagine that a regular expressions approach could
provide a better method. Does a such code exist ? Note that the string
is not restricted to the ascii charset.
[toc] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2011-04-01 02:16 +0100 |
| Message-ID | <mailman.59.1301620676.2990.python-list@python.org> |
| In reply to | #2322 |
On 01/04/2011 01:43, candide wrote:
> Suppose you have a string, for instance
>
> "pyyythhooonnn ---> ++++"
>
> and you search for the subquences composed of the same character, here
> you get :
>
> 'yyy', 'hh', 'ooo', 'nnn', '---', '++++'
>
> It's not difficult to write a Python code that solves the problem, for
> instance :
>
[snip]
>
> I should confess that this code is rather cumbersome so I was looking
> for an alternative. I imagine that a regular expressions approach could
> provide a better method. Does a such code exist ? Note that the string
> is not restricted to the ascii charset.
>>> import re
>>> re.findall(r"((.)\2+)", s)
[('yyy', 'y'), ('hh', 'h'), ('ooo', 'o'), ('nnn', 'n'), ('---', '-'),
('++++', '+')]
>>> [m[0] for m in re.findall(r"((.)\2+)", s)]
['yyy', 'hh', 'ooo', 'nnn', '---', '++++']
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2011-03-31 21:40 -0400 |
| Message-ID | <roy-7F3220.21403831032011@news.panix.com> |
| In reply to | #2322 |
In article <4d952008$0$3943$426a74cc@news.free.fr>,
candide <candide@free.invalid> wrote:
> Suppose you have a string, for instance
>
> "pyyythhooonnn ---> ++++"
>
> and you search for the subquences composed of the same character, here
> you get :
>
> 'yyy', 'hh', 'ooo', 'nnn', '---', '++++'
I got the following. It's O(n) (with the minor exception that the string
addition isn't, but that's trivial to fix, and in practice, the bunches
are short enough it hardly matters).
#!/usr/bin/env python
s = "pyyythhooonnn ---> ++++"
answer = ['yyy', 'hh', 'ooo', 'nnn', '---', '++++']
last = None
bunches = []
bunch = ''
for c in s:
if c == last:
bunch += c
else:
if bunch:
bunches.append(bunch)
bunch = c
last = c
bunches.append(bunch)
multiples = [bunch for bunch in bunches if len(bunch) > 1]
print multiples
assert(multiples == answer)
[eagerly awaiting a PEP for collections.bunch and
collections.frozenbunch]
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2011-03-31 20:58 -0500 |
| Message-ID | <mailman.61.1301623137.2990.python-list@python.org> |
| In reply to | #2322 |
On 03/31/2011 07:43 PM, candide wrote:
> Suppose you have a string, for instance
>
> "pyyythhooonnn ---> ++++"
>
> and you search for the subquences composed of the same character, here
> you get :
>
> 'yyy', 'hh', 'ooo', 'nnn', '---', '++++'
>>> import re
>>> s = "pyyythhooonnn ---> ++++"
>>> [m.group(0) for m in re.finditer(r"(.)\1+", s)]
['yyy', 'hh', 'ooo', 'nnn', '---', '++++']
>>> [(m.group(0),m.group(1)) for m in re.finditer(r"(.)\1+", s)]
[('yyy', 'y'), ('hh', 'h'), ('ooo', 'o'), ('nnn', 'n'), ('---',
'-'), ('++++', '+')]
-tkc
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2011-03-31 21:20 -0500 |
| Message-ID | <mailman.62.1301624436.2990.python-list@python.org> |
| In reply to | #2322 |
On 03/31/2011 07:43 PM, candide wrote: > "pyyythhooonnn ---> ++++" > > and you search for the subquences composed of the same character, here > you get : > > 'yyy', 'hh', 'ooo', 'nnn', '---', '++++' Or, if you want to do it with itertools instead of the "re" module: >>> s = "pyyythhooonnn ---> ++++" >>> from itertools import groupby >>> [c*length for c, length in ((k, len(list(g))) for k, g in groupby(s)) if length > 1] ['yyy', 'hh', 'ooo', 'nnn', '---', '++++'] -tkc
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2011-04-01 00:18 -0400 |
| Message-ID | <mailman.63.1301631548.2990.python-list@python.org> |
| In reply to | #2322 |
On 3/31/2011 10:20 PM, Tim Chase wrote:
> On 03/31/2011 07:43 PM, candide wrote:
>> "pyyythhooonnn ---> ++++"
>>
>> and you search for the subquences composed of the same character, here
>> you get :
>>
>> 'yyy', 'hh', 'ooo', 'nnn', '---', '++++'
>
> Or, if you want to do it with itertools instead of the "re" module:
>
> >>> s = "pyyythhooonnn ---> ++++"
> >>> from itertools import groupby
> >>> [c*length for c, length in ((k, len(list(g))) for k, g in
> groupby(s)) if length > 1]
> ['yyy', 'hh', 'ooo', 'nnn', '---', '++++']
Slightly shorter:
[r for r in (''.join(g) for k, g in groupby(s)) if len(r) > 1]
--
Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | candide <candide@free.invalid> |
|---|---|
| Date | 2011-04-01 21:39 +0200 |
| Message-ID | <4d962a3e$0$14990$426a34cc@news.free.fr> |
| In reply to | #2322 |
Thanks, yours responses gave me the opportunity to understand the "backreference" feature, it was not clear in spite of my intensive study of the well known RE howto manual.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web