Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'operator': 0.03; 'patterns': 0.04; 'resulting': 0.04; "'',": 0.07; 'matches': 0.07; 'parser': 0.07; 'parsing': 0.09; 'patterns,': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'sub': 0.09; "('',": 0.16; 'finney': 0.16; 'punish': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'regex,': 0.16; 'repetition': 0.16; 'repetitions': 0.16; 'tool.': 0.16; 'followed': 0.16; 'trying': 0.19; 'skip:p 40': 0.19; '(the': 0.22; '>>>': 0.22; 'example': 0.22; '(in': 0.22; 'header:User- Agent:1': 0.23; 'specify': 0.24; 'order.': 0.26; 'header:X -Complaints-To:1': 0.27; 'testing': 0.29; 'returned': 0.30; "skip:' 10": 0.31; 'grouping': 0.31; 'writes:': 0.31; 'case,': 0.35; 'but': 0.35; 'there': 0.35; 'received:com.au': 0.36; 'sequence': 0.36; 'possible': 0.36; 'problems': 0.38; 'ben': 0.38; 'skip:[ 10': 0.38; 'to:addr:python-list': 0.38; 'little': 0.38; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'how': 0.40; 'even': 0.60; 'skip:u 10': 0.60; 'authority': 0.60; 'times': 0.62; "you'll": 0.62; 'such': 0.63; 'group,': 0.63; 'more': 0.64; 'occur': 0.65; 'skip:r 40': 0.68; 'skip:r 30': 0.69; 'attractive': 0.81; 'received:125': 0.84 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Ben Finney Subject: Re: regex multiple patterns in order Date: Mon, 20 Jan 2014 22:18:29 +1100 References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Gmane-NNTP-Posting-Host: vmx15867.hosting24.com.au X-Public-Key-ID: 0xBD41714B X-Public-Key-Fingerprint: 9CFE 12B0 791A 4267 887F 520C B7AC 2E51 BD41 714B X-Public-Key-URL: http://www.benfinney.id.au/contact/bfinney-gpg.asc X-Post-From: Ben Finney User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) Cancel-Lock: sha1:1cFM6IuKhItG+xRyujtKkt69oV0= X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 48 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1390216721 news.xs4all.nl 2924 [2001:888:2000:d::a6]:48215 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:64357 km writes: > I am trying to find sub sequence patterns but constrained by the order > in which they occur There are also specific resources for understanding and testing regex patterns, such as . > For example > > >>> p = re.compile('(CAA)+?(TCT)+?(TA)+?') > >>> p.findall('CAACAACAATCTTCTTCTTCTTATATA') > [('CAA', 'TCT', 'TA')] > > But I instead find only one instance of the CAA/TCT/TA in that order. Yes, because the grouping operator (the parens ‘()’) in each case contains exactly “CAA”, “TCT”, “TA”. If you want the repetitions to be part of the group, you need the repetition operator (in your case, ‘+’) to be part of the group. > How can I get 3 matches of CAA, followed by four matches of TCT followed > by 2 matches of TA ? With a little experimenting I get: >>> p = re.compile('((?:CAA)+)?((?:TCT)+)?((?:TA)+)?') >>> p.findall('CAACAACAATCTTCTTCTTCTTATATA') [('CAACAACAA', 'TCTTCTTCTTCT', 'TATATA'), ('', '', '')] Remember that you'll get no more than one group returned for each group you specify in the pattern. > Well these patterns (CAA/TCT/TA) can occur any number of times and > atleast once so I have to use + in the regex. Be aware that regex is not the solution to all parsing problems; for many parsing problems it is an attractive but inappropriate tool. You may need to construct a more specific parser for your needs. Even if it's possible with regex, the resulting pattern may be so complex that it's better to write it out more explicitly. -- \ “To punish me for my contempt of authority, Fate has made me an | `\ authority myself.” —Albert Einstein, 1930-09-18 | _o__) | Ben Finney