Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #64355 > unrolled thread

Re: regex multiple patterns in order

Started byChris Angelico <rosuav@gmail.com>
First post2014-01-20 22:10 +1100
Last post2014-01-20 22:10 +1100
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: regex multiple patterns in order Chris Angelico <rosuav@gmail.com> - 2014-01-20 22:10 +1100

#64355 — Re: regex multiple patterns in order

FromChris Angelico <rosuav@gmail.com>
Date2014-01-20 22:10 +1100
SubjectRe: regex multiple patterns in order
Message-ID<mailman.5747.1390216250.18130.python-list@python.org>
On Mon, Jan 20, 2014 at 9:44 PM, km <srikrishnamohan@gmail.com> wrote:
>>>> p = re.compile('(CAA)+?(TCT)+?(TA)+?')
>>>> p.findall('CAACAACAATCTTCTTCTTCTTATATA')
> [('CAA', 'TCT', 'TA')]
>
> But I instead find only one instance of the CAA/TCT/TA in that order.
> How can I get 3 matches of CAA, followed by  four matches of TCT followed by
> 2 matches of TA ?
> Well these patterns (CAA/TCT/TA) can occur any number of  times and atleast
> once so I have to use + in the regex.

You're capturing the single instance, not the repeated one. It is
matching against all three CAA units, but capturing just the first.
Try this:

>>> p = re.compile('((?:CAA)+)((?:TCT)+)((?:TA)+)')
>>> p.findall('CAACAACAATCTTCTTCTTCTTATATA')
[('CAACAACAA', 'TCTTCTTCTTCT', 'TATATA')]

This groups "CAA" with non-capturing parentheses (?:regex) and then
captures that with the + around it.

ChrisA

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web