Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #64356 > unrolled thread
| Started by | Devin Jeanpierre <jeanpierreda@gmail.com> |
|---|---|
| First post | 2014-01-20 03:09 -0800 |
| Last post | 2014-01-20 03:09 -0800 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: regex multiple patterns in order Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-01-20 03:09 -0800
| From | Devin Jeanpierre <jeanpierreda@gmail.com> |
|---|---|
| Date | 2014-01-20 03:09 -0800 |
| Subject | Re: regex multiple patterns in order |
| Message-ID | <mailman.5746.1390216232.18130.python-list@python.org> |
On Mon, Jan 20, 2014 at 2:44 AM, km <srikrishnamohan@gmail.com> wrote:
> I am trying to find sub sequence patterns but constrained by the order in
> which they occur
> For example
>
>>>> p = re.compile('(CAA)+?(TCT)+?(TA)+?')
>>>> p.findall('CAACAACAATCTTCTTCTTCTTATATA')
> [('CAA', 'TCT', 'TA')]
>
> But I instead find only one instance of the CAA/TCT/TA in that order.
> How can I get 3 matches of CAA, followed by four matches of TCT followed by
> 2 matches of TA ?
> Well these patterns (CAA/TCT/TA) can occur any number of times and atleast
> once so I have to use + in the regex.
You want to include the '+' in the parens so that repetitions are
included in the match, but you still want to group CAA etc. together;
for that, you can use non-capturing groups.
I don't see how TA could ever match two, though. It'd match once
as-is, or thrice if you make the repetition greedy (get rid of the
?s).
>>> p = re.compile('((?:CAA)+?)((?:TCT)+?)((?:TA)+?)')
>>> p.findall('CAACAACAATCTTCTTCTTCTTATATA')
[('CAACAACAA', 'TCTTCTTCTTCT', 'TA')]
-- Devin
Back to top | Article view | comp.lang.python
csiph-web