Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Peter Otten <__peter__@web.de> Newsgroups: comp.lang.python Subject: Re: Whittle it on down Date: Thu, 05 May 2016 10:17:47 +0200 Organization: None Lines: 68 Message-ID: References: <572ae25f$0$2821$c3e8da3$76491128@news.astraweb.com> <1462430766.25079.598726825.1B90C7A1@webmail.messagingengine.com> <572af811$0$1608$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Trace: news.uni-berlin.de 0UDOLOVMMG3UhbT2432tpALEulMPRMFKH21UlUhgfKsg== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'only,': 0.07; 'trailing': 0.07; 'non-ascii': 0.09; 'oh,': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'def': 0.13; 'wed,': 0.15; '"&"': 0.16; '("",': 0.16; '2016': 0.16; 'a",': 0.16; 'match:': 0.16; 'optionally': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'rejected.': 0.16; 'uppercase': 0.16; 'wrote:': 0.16; 'string': 0.17; "shouldn't": 0.18; 'version.': 0.18; '>>>': 0.20; 'ascii': 0.22; 'stephen': 0.22; 'seems': 0.23; "python's": 0.23; 'this:': 0.23; 'words': 0.24; 'header:User-Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; '(e.g.': 0.27; 'otherwise.': 0.27; 'function': 0.28; 'idea': 0.28; 'regular': 0.29; 'faster,': 0.29; 'separated': 0.29; 'spaces': 0.29; "i'm": 0.30; 'putting': 0.30; 'aside': 0.32; 'problem': 0.33; "d'aprano": 0.33; 'steven': 0.33; 'something': 0.35; 'problem.': 0.35; 'sometimes': 0.35; 'but': 0.36; 'should': 0.36; 'to:addr:python-list': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'really': 0.37; 'received:org': 0.37; 'doing': 0.38; 'brief': 0.38; 'test': 0.39; 'to:addr:python.org': 0.40; 'where': 0.40; 'received:de': 0.40; 'care': 0.60; 'your': 0.60; 'leading': 0.61; 'side': 0.62; 'further': 0.62; 'more': 0.63; 'thursday': 0.66; 'letters': 0.67; 'distinguish': 0.84; 'subject:down': 0.84 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: p57bd8c9d.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: <572ae25f$0$2821$c3e8da3$76491128@news.astraweb.com> <1462430766.25079.598726825.1B90C7A1@webmail.messagingengine.com> <572af811$0$1608$c3e8da3$5496439d@news.astraweb.com> Xref: csiph.com comp.lang.python:108167 Steven D'Aprano wrote: > Oh, a further thought... > > > On Thursday 05 May 2016 16:46, Stephen Hansen wrote: > >> On Wed, May 4, 2016, at 11:04 PM, Steven D'Aprano wrote: >>> Start by writing a function or a regex that will distinguish strings >>> that match your conditions from those that don't. A regex might be >>> faster, but here's a function version. >>> ... snip ... >> >> Yikes. I'm all for the idea that one shouldn't go to regex when Python's >> powerful string type can answer the problem more clearly, but this seems >> to go out of its way to do otherwise. >> >> I don't even care about faster: Its overly complicated. Sometimes a >> regular expression really is the clearest way to solve a problem. > > Putting non-ASCII letters aside for the moment, how would you match these > specs as a regular expression? > > - All uppercase ASCII letters (A to Z only), optionally separated into > words by either a bare ampersand (e.g. "AAA&AAA") or an ampersand with > leading and > trailing spaces (spaces only, not arbitrary whitespace): "AAA & AAA". > > - The number of spaces on either side of the ampersands need not be the > same: "AAA& BBB & CCC" should match. > > - Leading or trailing spaces, or spaces not surrounding an ampersand, must > not match: "AAA BBB" must be rejected. > > - Leading or trailing ampersands must also be rejected. This includes the > case where the string is nothing but ampersands. > > - Consecutive ampersands "AAA&&&BBB" and the empty string must be > rejected. > > > I get something like this: > > r"(^[A-Z]+$)|(^([A-Z]+[ ]*\&[ ]*[A-Z]+)+$)" > > > but it fails on strings like "AA & A & A". What am I doing wrong? > > > For the record, here's my brief test suite: > > > def test(pat): > for s in ("", " ", "&" "A A", "A&", "&A", "A&&A", "A& &A"): > assert re.match(pat, s) is None > for s in ("A", "A & A", "AA&A", "AA & A & A"): > assert re.match(pat, s) >>> def test(pat): ... for s in ("", " ", "&" "A A", "A&", "&A", "A&&A", "A& &A"): ... assert re.match(pat, s) is None ... for s in ("A", "A & A", "AA&A", "AA & A & A"): ... assert re.match(pat, s) ... >>> test("^A+( *& *A+)*$") >>>