Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Peter Otten <__peter__@web.de> Newsgroups: comp.lang.python Subject: Re: Help for a complex RE Date: Sun, 08 May 2016 18:15:25 +0200 Organization: None Lines: 55 Message-ID: References: <2aa55bd8-2ea4-41f7-b188-d45dff7d3bb7@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Trace: news.uni-berlin.de x3bbS7uWm2OOIIVYzAgBUgr7/sQasKfzm+WmsElDo/lA== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '"""': 0.05; "'a'": 0.07; 'matches': 0.07; 'stops': 0.07; '[1]:': 0.09; '[2]:': 0.09; '[3]:': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:Help': 0.10; 'python': 0.10; 'python.': 0.11; '":"': 0.16; '[4]:': 0.16; 'matching.': 0.16; 'r"""': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'wrote:': 0.16; '>>>': 0.20; 'feb': 0.23; 'matching': 0.23; 'import': 0.24; 'header:User- Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'followed': 0.27; 'colon': 0.29; 'enhanced': 0.33; 'possible.': 0.36; 'to:addr :python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'why': 0.39; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'space': 0.40; 'your': 0.60; 'default': 0.61; 'engine': 0.62; 'more': 0.63; 'compare:': 0.84; 'sergio': 0.84 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: p57bd8cd3.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: <2aa55bd8-2ea4-41f7-b188-d45dff7d3bb7@googlegroups.com> Xref: csiph.com comp.lang.python:108367 Sergio Spina wrote: > In the following ipython session: > >> Python 3.5.1+ (default, Feb 24 2016, 11:28:57) >> Type "copyright", "credits" or "license" for more information. >> >> IPython 2.3.0 -- An enhanced Interactive Python. >> >> In [1]: import re >> >> In [2]: patt = r""" # the match pattern is: >> ...: .+ # one or more characters >> ...: [ ] # followed by a space >> ...: (?=[@#D]:) # that is followed by one of the >> ...: # chars "@#D" and a colon ":" >> ...: """ >> >> In [3]: pattern = re.compile(patt, re.VERBOSE) >> >> In [4]: m = pattern.match("Jun@i Bun#i @:Janji") >> >> In [5]: m.group() >> Out[5]: 'Jun@i Bun#i ' >> >> In [6]: m = pattern.match("Jun@i Bun#i @:Janji D:Banji") >> >> In [7]: m.group() >> Out[7]: 'Jun@i Bun#i @:Janji ' >> >> In [8]: m = pattern.match("Jun@i Bun#i @:Janji D:Banji #:Junji") >> >> In [9]: m.group() >> Out[9]: 'Jun@i Bun#i @:Janji D:Banji ' > > Why the regex engine stops the search at last piece of string? > Why not at the first match of the group "@:"? > What can it be a regex pattern with the following result? > >> In [1]: m = pattern.match("Jun@i Bun#i @:Janji D:Banji #:Junji") >> >> In [2]: m.group() >> Out[2]: 'Jun@i Bun#i ' Compare: >>> re.compile("a+").match("aaaa").group() 'aaaa' >>> re.compile("a+?").match("aaaa").group() 'a' By default pattern matching is "greedy" -- the ".+" part of your regex matches as many characters as possible. Adding a ? like in ".+?" triggers non-greedy matching.