Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.fsmpi.rwth-aachen.de!newsfeed.kamp.net!newsfeed.kamp.net!newsfeed.freenet.ag!ecngs!feeder2.ecngs.de!novso.com!newsfeed.xs4all.nl!newsfeed2a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'operator': 0.03; 'patterns': 0.04; 'resulting': 0.04; "'',": 0.07; 'matches': 0.07; 'parser': 0.07; 'parsing': 0.09; 'patterns,': 0.09; 'sub': 0.09; 'cc:addr:python-list': 0.11; 'jan': 0.12; "('',": 0.16; 'finney': 0.16; 'punish': 0.16; 'regex,': 0.16; 'repetition': 0.16; 'repetitions': 0.16; 'tool.': 0.16; 'followed': 0.16; 'wrote:': 0.18; 'trying': 0.19; 'skip:p 40': 0.19; '(the': 0.22; '>>>': 0.22; 'example': 0.22; '(in': 0.22; 'email addr:gmail.com>': 0.22; 'cc:addr:python.org': 0.22; '>>>': 0.24; 'specify': 0.24; 'mon,': 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; '>': 0.26; 'order.': 0.26; 'header:In-Reply-To:1': 0.27; 'testing': 0.29; 'returned': 0.30; 'message-id:@mail.gmail.com': 0.30; 'url:mailman': 0.30; "skip:' 10": 0.31; 'grouping': 0.31; 'writes:': 0.31; 'url:python': 0.33; 'case,': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'sequence': 0.36; 'url:listinfo': 0.36; 'possible': 0.36; 'url:org': 0.36; 'skip:& 10': 0.38; 'thank': 0.38; 'problems': 0.38; 'ben': 0.38; 'skip:[ 10': 0.38; 'pm,': 0.38; 'little': 0.38; 'url:mail': 0.40; 'how': 0.40; 'even': 0.60; 'skip:u 10': 0.60; 'authority': 0.60; 'times': 0.62; "you'll": 0.62; 'such': 0.63; 'group,': 0.63; 'more': 0.64; 'occur': 0.65; 'charset:windows-1252': 0.65; '20,': 0.68; 'skip:r 40': 0.68; 'skip:r 30': 0.69; 'attractive': 0.81; 'krishna': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=9+EJftnr/C5gFKOKEY82K8ipXa7jUF8pAHZgUFWgIPo=; b=mTyeVmiohQYol0FbeUJllcrZfnsh3StFoGnq2rJV9CiffAeOF/2FcVoW1RpAUdlrDp Sy41P580Nw+VM4i3M0SzuCEoccqyc5ZCdUwHU1UsqiZVmlikDXs1rxvvJ1wARo3zxvPY R8z/zXtiRyQNRMpXqkytGBOKy5aD4EtVsuIuaUu98Gr8hZliMxhiO8cvARhiHJUvxJQP k3W2XbidJxolok9TnXvAr1dcRACnttWLRQFNey3PCgEXE7kQcLDnEe6NERnDu+woOC1u ZnN9ic2M7f6B7FWKX0/U7L1BzC/aUELIKHeSgf7X71+l03Q8RJ0NkYTeYxvbjBAMLDiH K7VA== MIME-Version: 1.0 X-Received: by 10.224.88.3 with SMTP id y3mr27008807qal.80.1390217259671; Mon, 20 Jan 2014 03:27:39 -0800 (PST) In-Reply-To: <857g9ux5oa.fsf@benfinney.id.au> References: <857g9ux5oa.fsf@benfinney.id.au> Date: Mon, 20 Jan 2014 16:57:39 +0530 Subject: Re: regex multiple patterns in order From: km To: Ben Finney Content-Type: multipart/alternative; boundary=001a11c3dbbcec838b04f0652d19 X-Mailman-Approved-At: Mon, 20 Jan 2014 12:28:21 +0100 Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 153 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1390217302 news.xs4all.nl 2845 [2001:888:2000:d::a6]:52531 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:64358 --001a11c3dbbcec838b04f0652d19 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Aah! I understand now. Thank you Regards, Krishna Mohan On Mon, Jan 20, 2014 at 4:48 PM, Ben Finney wro= te: > km writes: > > > I am trying to find sub sequence patterns but constrained by the order > > in which they occur > > There are also specific resources for understanding and testing regex > patterns, such as . > > > For example > > > > >>> p =3D re.compile('(CAA)+?(TCT)+?(TA)+?') > > >>> p.findall('CAACAACAATCTTCTTCTTCTTATATA') > > [('CAA', 'TCT', 'TA')] > > > > But I instead find only one instance of the CAA/TCT/TA in that order. > > Yes, because the grouping operator (the parens =91()=92) in each case > contains exactly =93CAA=94, =93TCT=94, =93TA=94. If you want the repetiti= ons to be > part of the group, you need the repetition operator (in your case, =91+= =92) > to be part of the group. > > > How can I get 3 matches of CAA, followed by four matches of TCT follow= ed > > by 2 matches of TA ? > > With a little experimenting I get: > > >>> p =3D re.compile('((?:CAA)+)?((?:TCT)+)?((?:TA)+)?') > >>> p.findall('CAACAACAATCTTCTTCTTCTTATATA') > [('CAACAACAA', 'TCTTCTTCTTCT', 'TATATA'), ('', '', '')] > > Remember that you'll get no more than one group returned for each group > you specify in the pattern. > > > Well these patterns (CAA/TCT/TA) can occur any number of times and > > atleast once so I have to use + in the regex. > > Be aware that regex is not the solution to all parsing problems; for > many parsing problems it is an attractive but inappropriate tool. You > may need to construct a more specific parser for your needs. Even if > it's possible with regex, the resulting pattern may be so complex that > it's better to write it out more explicitly. > > -- > \ =93To punish me for my contempt of authority, Fate has made me an = | > `\ authority myself.=94 =97Albert Einstein, 1930-09-1= 8 | > _o__) | > Ben Finney > > -- > https://mail.python.org/mailman/listinfo/python-list > --001a11c3dbbcec838b04f0652d19 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
Aah! I understand now.
Thank you

Regards,
Krishna Mohan



On Mon, Jan 20, 2014 at 4:48 PM, Ben Finney <ben+python@benfinney.id.au> wrote:
km <srikrishnamohan@gmail.com> writes:

> I am trying to find sub sequence patterns but constrained by the order=
> in which they occur

There are also specific resources for understanding and testing regex=
patterns, such as <URL:http://www.pythonregex.com/>.

> For example
>
> >>> p =3D re.compile('(CAA)+?(TCT)+?(TA)+?')
> >>> p.findall('CAACAACAATCTTCTTCTTCTTATATA')
> [('CAA', 'TCT', 'TA')]
>
> But I instead find only one instance of the CAA/TCT/TA in that order.<= br>
Yes, because the grouping operator (the parens =91()=92) in each case=
contains exactly =93CAA=94, =93TCT=94, =93TA=94. If you want the repetition= s to be
part of the group, you need the repetition operator (in your case, =91+=92)=
to be part of the group.

> How can I get 3 matches of CAA, followed by =A0four matches of TCT fol= lowed
> by 2 matches of TA ?

With a little experimenting I get:

=A0 =A0 >>> p =3D re.compile('((?:CAA)+)?((?:TCT)+)?((?:TA)+)?= ')
=A0 =A0 >>> p.findall('CAACAACAATCTTCTTCTTCTTATATA')
=A0 =A0 [('CAACAACAA', 'TCTTCTTCTTCT', 'TATATA'), (= '', '', '')]

Remember that you'll get no more than one group returned for each group=
you specify in the pattern.

> Well these patterns (CAA/TCT/TA) can occur any number of times and
> atleast once so I have to use + in the regex.

Be aware that regex is not the solution to all parsing problems; for<= br> many parsing problems it is an attractive but inappropriate tool. You
may need to construct a more specific parser for your needs. Even if
it's possible with regex, the resulting pattern may be so complex that<= br> it's better to write it out more explicitly.

--
=A0\ =A0 =A0 =93To punish me for my contempt of authority, Fate has made me= an |
=A0 `\ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 authority myself.=94 =97Albert E= instein, 1930-09-18 |
_o__) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|
Ben Finney

--
https://mail.python.org/mailman/listinfo/python-list

--001a11c3dbbcec838b04f0652d19--