Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Ben Finney Newsgroups: comp.lang.python Subject: Re: Regular expressions Date: Thu, 05 Nov 2015 14:03:45 +1100 Lines: 45 Message-ID: References: <662g3blobme52hfoududj27err185v2npm@4ax.com> <56397a18$0$11094$c3e8da3@news.astraweb.com> <56397FC6.9040700@gmail.com> <563abee1$0$1614$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de qJ3UBI0B0Ec5D9UXgLQ+XA2nA6m6FbngkWGQgf9sF79A== Cancel-Lock: sha1:ABiUmiBfSxrrMHmz6ngUkNeY6lI= Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'patterns': 0.04; 'string.': 0.04; 'expressions': 0.07; 'matches': 0.07; 'atom': 0.09; 'grep': 0.09; 'notation.': 0.09; 'occurrences': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'times,': 0.13; 'useful,': 0.13; 'wed,': 0.15; 'cases:': 0.16; 'occurences': 0.16; 'omitted,': 0.16; 'received:80.91.229.3': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'segment': 0.16; 'subject:Regular': 0.16; 'subject:expressions': 0.16; "tim's": 0.16; 'wrote:': 0.16; '2015': 0.20; 'matching': 0.23; 'tried': 0.24; 'header:User-Agent:1': 0.26; 'example': 0.26; 'header:X-Complaints-To:1': 0.26; 'least': 0.27; 'this.': 0.28; 'regular': 0.29; 'preceding': 0.29; 'anywhere': 0.30; 'common': 0.33; "d'aprano": 0.33; 'steven': 0.33; 'case,': 0.34; 'so,': 0.35; 'nov': 0.35; 'quite': 0.35; 'there': 0.36; 'tool': 0.36; 'possible': 0.36; 'to:addr:python-list': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'means': 0.39; 'does': 0.39; 'to:addr:python.org': 0.40; 'some': 0.40; 'more': 0.63; 'within': 0.64; 'here': 0.66; 'cut': 0.67; '8bit%:100': 0.70; 'study': 0.70; '\xe2\x80\x93': 0.72; '_o__)': 0.84; 'experiment': 0.84; 'otten': 0.84; 'received:125': 0.84; 'skip:\xe5 40': 0.84; 'waste.': 0.84 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: jigong.madmonks.org X-Public-Key-ID: 0xAC128405 X-Public-Key-Fingerprint: 517C F14B B2F3 98B0 CB35 4855 B8B2 4C06 AC12 8405 X-Public-Key-URL: http://www.benfinney.id.au/contact/bfinney-pubkey.asc X-Post-From: Ben Finney User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:98269 Steven D'Aprano writes: > On Wed, 4 Nov 2015 07:57 pm, Peter Otten wrote: > > > I tried Tim's example > > > > $ seq 5 | grep '1*' > > 1 > > 2 > > 3 > > 4 > > 5 > > $ > > I don't understand this. What on earth is grep matching? How does "4" > match "1*"? You can experiment with regular expressions to find out. Here's a link to the RegExr tool for the above pattern . Matching patterns can include specifications meaning “match some number of the preceding segment”, with the ‘{n,m}’ notation. That means “match at least n, and at most m, occurrences of the preceding segment”. Either ‘n’ or ‘m’ can be omitted, meaning “at least 0” and “no maximum” respectively. Those are quite useful, so there are shortcuts for the most common cases: ‘?’ is a short cut for ‘{0,1}’, ‘*’ is a short cut for ‘{0,}’, and ‘+’ is a short cut for ‘{1,}’. In this case, ‘*’ is a short cut for ‘{0,}’ meaning “match 0 or more occurences of the preceding segment”. The segment here is the atom ‘1’. Since ‘1*’ is the entirety of the pattern, the pattern can match zero characters, anywhere within any string. So, it matches every possible string. To match (some atom) 1 or more times, ‘+’ is a short cut for ‘(1,}’ meaning “match 1 or more occurrences of the preceding segment”. -- \ 學而不思則罔,思而不學則殆。 (To study and not think is a waste. | `\ To think and not study is dangerous.) | _o__) —孔夫子 Confucius (551 BCE – 479 BCE) | Ben Finney