Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #2333

Re: Extracting subsequences composed of the same character

From Terry Reedy <tjreedy@udel.edu>
Subject Re: Extracting subsequences composed of the same character
Date 2011-04-01 00:18 -0400
References <4d952008$0$3943$426a74cc@news.free.fr> <4D95366D.9010706@tim.thechases.com>
Newsgroups comp.lang.python
Message-ID <mailman.63.1301631548.2990.python-list@python.org> (permalink)

Show all headers | View raw


On 3/31/2011 10:20 PM, Tim Chase wrote:
> On 03/31/2011 07:43 PM, candide wrote:
>> "pyyythhooonnn ---> ++++"
>>
>> and you search for the subquences composed of the same character, here
>> you get :
>>
>> 'yyy', 'hh', 'ooo', 'nnn', '---', '++++'
>
> Or, if you want to do it with itertools instead of the "re" module:
>
>  >>> s = "pyyythhooonnn ---> ++++"
>  >>> from itertools import groupby
>  >>> [c*length for c, length in ((k, len(list(g))) for k, g in
> groupby(s)) if length > 1]
> ['yyy', 'hh', 'ooo', 'nnn', '---', '++++']

Slightly shorter:
[r for r in (''.join(g) for k, g in groupby(s)) if len(r) > 1]

-- 
Terry Jan Reedy

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Extracting subsequences composed of the same character candide <candide@free.invalid> - 2011-04-01 02:43 +0200
  Re: Extracting subsequences composed of the same character MRAB <python@mrabarnett.plus.com> - 2011-04-01 02:16 +0100
  Re: Extracting subsequences composed of the same character Roy Smith <roy@panix.com> - 2011-03-31 21:40 -0400
  Re: Extracting subsequences composed of the same character Tim Chase <python.list@tim.thechases.com> - 2011-03-31 20:58 -0500
  Re: Extracting subsequences composed of the same character Tim Chase <python.list@tim.thechases.com> - 2011-03-31 21:20 -0500
  Re: Extracting subsequences composed of the same character Terry Reedy <tjreedy@udel.edu> - 2011-04-01 00:18 -0400
  Re: Extracting subsequences composed of the same character candide <candide@free.invalid> - 2011-04-01 21:39 +0200

csiph-web