Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Peter Otten <__peter__@web.de> Newsgroups: comp.lang.python Subject: Re: Regular expressions Date: Thu, 05 Nov 2015 09:33:39 +0100 Organization: None Lines: 50 Message-ID: References: <662g3blobme52hfoududj27err185v2npm@4ax.com> <56397a18$0$11094$c3e8da3@news.astraweb.com> <56397FC6.9040700@gmail.com> <563abee1$0$1614$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Trace: news.uni-berlin.de Svp2UASjADEjtYozEJ7ggwkywjIcceUEIiHhtSge7JaQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '"""': 0.05; 'expressions': 0.07; 'grep': 0.09; 'python:': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'wed,': 0.15; '\\).': 0.16; 'egrep': 0.16; 'mistake.': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'relevant,': 0.16; 'remembered': 0.16; 'subject:Regular': 0.16; 'subject:expressions': 0.16; "tim's": 0.16; 'wrote:': 0.16; 'versions': 0.20; '2015': 0.20; 'work,': 0.21; 'do.': 0.22; 'appears': 0.23; 'matching': 0.23; "python's": 0.23; 'tried': 0.24; 'written': 0.24; 'header:User- Agent:1': 0.26; 'example': 0.26; 'header:X-Complaints-To:1': 0.26; 'point.': 0.27; 'page.': 0.28; 'this.': 0.28; 'regular': 0.29; 'subset': 0.29; 'another': 0.32; 'usually': 0.33; "d'aprano": 0.33; 'steven': 0.33; 'surprised': 0.33; 'nov': 0.35; 'but': 0.36; 'instead': 0.36; 'there': 0.36; 'lines': 0.36; 'basic': 0.36; 'to:addr:python-list': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'why': 0.39; 'does': 0.39; 'to:addr:python.org': 0.40; 'received:de': 0.40; 'called': 0.40; 'is.': 0.63; 'lose': 0.63; 'more': 0.63; 'special': 0.73; 'headed': 0.84; 'otten': 0.84; 'ominous': 0.91; 'on?': 0.91 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: p57bd9b86.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:98291 Steven D'Aprano wrote: > On Wed, 4 Nov 2015 07:57 pm, Peter Otten wrote: > >> I tried Tim's example >> >> $ seq 5 | grep '1*' >> 1 >> 2 >> 3 >> 4 >> 5 >> $ > > I don't understand this. What on earth is grep matching? How does "4" > match "1*"? Look for zero or more "1". Written in Python: for line in sys.stdin: if re.compile("1*").search(line): print(line, end="") >> which surprised me because I remembered that there usually weren't any >> matching lines when I invoked grep instead of egrep by mistake. So I >> tried another one >> >> $ seq 5 | grep '[1-3]+' >> $ >> >> and then headed for the man page. Apparently there is a subset called >> "basic regular expressions": >> >> """ >> Basic vs Extended Regular Expressions >> In basic regular expressions the meta-characters ?, +, {, |, (, >> and ) lose their special meaning; instead use the backslashed >> versions \?, \+, \{, \|, \(, and \). >> """ > > None of this appears relevant, as the metacharacter * is not listed. That's the very point. > So what's going on? Most special characters are not working with grep, but * is. The quote explains why many regular expressions like "[1-3]+" that you may know from Python's re don't work, but a small subset including the ominous "1*" do.