Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #64357

Re: regex multiple patterns in order

Path csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'operator': 0.03; 'patterns': 0.04; 'resulting': 0.04; "'',": 0.07; 'matches': 0.07; 'parser': 0.07; 'parsing': 0.09; 'patterns,': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'sub': 0.09; "('',": 0.16; 'finney': 0.16; 'punish': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'regex,': 0.16; 'repetition': 0.16; 'repetitions': 0.16; 'tool.': 0.16; 'followed': 0.16; 'trying': 0.19; 'skip:p 40': 0.19; '(the': 0.22; '>>>': 0.22; 'example': 0.22; '(in': 0.22; 'header:User- Agent:1': 0.23; 'specify': 0.24; 'order.': 0.26; 'header:X -Complaints-To:1': 0.27; 'testing': 0.29; 'returned': 0.30; "skip:' 10": 0.31; 'grouping': 0.31; 'writes:': 0.31; 'case,': 0.35; 'but': 0.35; 'there': 0.35; 'received:com.au': 0.36; 'sequence': 0.36; 'possible': 0.36; 'problems': 0.38; 'ben': 0.38; 'skip:[ 10': 0.38; 'to:addr:python-list': 0.38; 'little': 0.38; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'how': 0.40; 'even': 0.60; 'skip:u 10': 0.60; 'authority': 0.60; 'times': 0.62; "you'll": 0.62; 'such': 0.63; 'group,': 0.63; 'more': 0.64; 'occur': 0.65; 'skip:r 40': 0.68; 'skip:r 30': 0.69; 'attractive': 0.81; 'received:125': 0.84
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Ben Finney <ben+python@benfinney.id.au>
Subject Re: regex multiple patterns in order
Date Mon, 20 Jan 2014 22:18:29 +1100
References <CAPV1RAAiD3qWqrAJYc4yb80CaHG4A9N4jz4aY_CsyJr_nAUx9Q@mail.gmail.com>
Mime-Version 1.0
Content-Type text/plain; charset=utf-8
Content-Transfer-Encoding 8bit
X-Gmane-NNTP-Posting-Host vmx15867.hosting24.com.au
X-Public-Key-ID 0xBD41714B
X-Public-Key-Fingerprint 9CFE 12B0 791A 4267 887F 520C B7AC 2E51 BD41 714B
X-Public-Key-URL http://www.benfinney.id.au/contact/bfinney-gpg.asc
X-Post-From Ben Finney <bignose+hates-spam@benfinney.id.au>
User-Agent Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)
Cancel-Lock sha1:1cFM6IuKhItG+xRyujtKkt69oV0=
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.5748.1390216721.18130.python-list@python.org> (permalink)
Lines 48
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1390216721 news.xs4all.nl 2924 [2001:888:2000:d::a6]:48215
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:64357

Show key headers only | View raw


km <srikrishnamohan@gmail.com> writes:

> I am trying to find sub sequence patterns but constrained by the order
> in which they occur

There are also specific resources for understanding and testing regex
patterns, such as <URL:http://www.pythonregex.com/>.

> For example
>
> >>> p = re.compile('(CAA)+?(TCT)+?(TA)+?')
> >>> p.findall('CAACAACAATCTTCTTCTTCTTATATA')
> [('CAA', 'TCT', 'TA')]
>
> But I instead find only one instance of the CAA/TCT/TA in that order.

Yes, because the grouping operator (the parens ‘()’) in each case
contains exactly “CAA”, “TCT”, “TA”. If you want the repetitions to be
part of the group, you need the repetition operator (in your case, ‘+’)
to be part of the group.

> How can I get 3 matches of CAA, followed by  four matches of TCT followed
> by 2 matches of TA ?

With a little experimenting I get:

    >>> p = re.compile('((?:CAA)+)?((?:TCT)+)?((?:TA)+)?')
    >>> p.findall('CAACAACAATCTTCTTCTTCTTATATA')
    [('CAACAACAA', 'TCTTCTTCTTCT', 'TATATA'), ('', '', '')]

Remember that you'll get no more than one group returned for each group
you specify in the pattern.

> Well these patterns (CAA/TCT/TA) can occur any number of times and
> atleast once so I have to use + in the regex.

Be aware that regex is not the solution to all parsing problems; for
many parsing problems it is an attractive but inappropriate tool. You
may need to construct a more specific parser for your needs. Even if
it's possible with regex, the resulting pattern may be so complex that
it's better to write it out more explicitly.

-- 
 \     “To punish me for my contempt of authority, Fate has made me an |
  `\                   authority myself.” —Albert Einstein, 1930-09-18 |
_o__)                                                                  |
Ben Finney

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Re: regex multiple patterns in order Ben Finney <ben+python@benfinney.id.au> - 2014-01-20 22:18 +1100
  Re: regex multiple patterns in order Roy Smith <roy@panix.com> - 2014-01-20 09:52 -0500
    Re: regex multiple patterns in order Neil Cerutti <neilc@norwich.edu> - 2014-01-20 16:04 +0000
    Re: regex multiple patterns in order Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-20 16:16 +0000
    Re: regex multiple patterns in order Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-01-20 08:40 -0800
      Re: regex multiple patterns in order Rustom Mody <rustompmody@gmail.com> - 2014-01-20 09:06 -0800
        Re: regex multiple patterns in order Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-20 17:30 +0000
    Re: regex multiple patterns in order Neil Cerutti <neilc@norwich.edu> - 2014-01-20 17:09 +0000
    Re: regex multiple patterns in order Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-20 17:33 +0000

csiph-web