Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: Peter Otten <__peter__@web.de>
Newsgroups: comp.lang.python
Subject: Re: Whittle it on down
Date: Thu, 05 May 2016 10:17:47 +0200
Organization: None
Lines: 68
Message-ID: <mailman.402.1462436282.32212.python-list@python.org>
References: <ngejmj$gc4$1@dont-email.me> <572ae25f$0$2821$c3e8da3$76491128@news.astraweb.com> <1462430766.25079.598726825.1B90C7A1@webmail.messagingengine.com> <mailman.398.1462430769.32212.python-list@python.org> <572af811$0$1608$c3e8da3$5496439d@news.astraweb.com> <ngevjc$v4l$1@ger.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7Bit
User-Agent: KNode/4.13.3
Precedence: list
Xref: csiph.com comp.lang.python:108167

Steven D'Aprano wrote:

> Oh, a further thought...
> 
> 
> On Thursday 05 May 2016 16:46, Stephen Hansen wrote:
> 
>> On Wed, May 4, 2016, at 11:04 PM, Steven D'Aprano wrote:
>>> Start by writing a function or a regex that will distinguish strings
>>> that match your conditions from those that don't. A regex might be
>>> faster, but here's a function version.
>>> ... snip ...
>> 
>> Yikes. I'm all for the idea that one shouldn't go to regex when Python's
>> powerful string type can answer the problem more clearly, but this seems
>> to go out of its way to do otherwise.
>> 
>> I don't even care about faster: Its overly complicated. Sometimes a
>> regular expression really is the clearest way to solve a problem.
> 
> Putting non-ASCII letters aside for the moment, how would you match these
> specs as a regular expression?
> 
> - All uppercase ASCII letters (A to Z only), optionally separated into
> words by either a bare ampersand (e.g. "AAA&AAA") or an ampersand with
> leading and
> trailing spaces (spaces only, not arbitrary whitespace): "AAA   & AAA".
> 
> - The number of spaces on either side of the ampersands need not be the
> same: "AAA&   BBB &       CCC" should match.
> 
> - Leading or trailing spaces, or spaces not surrounding an ampersand, must
> not match: "AAA BBB" must be rejected.
> 
> - Leading or trailing ampersands must also be rejected. This includes the
> case where the string is nothing but ampersands.
> 
> - Consecutive ampersands "AAA&&&BBB" and the empty string must be
> rejected.
> 
> 
> I get something like this:
> 
> r"(^[A-Z]+$)|(^([A-Z]+[ ]*\&[ ]*[A-Z]+)+$)"
> 
> 
> but it fails on strings like "AA   &  A &  A". What am I doing wrong?
> 
> 
> For the record, here's my brief test suite:
> 
> 
> def test(pat):
>     for s in ("", " ", "&" "A A", "A&", "&A", "A&&A", "A& &A"):
>         assert re.match(pat, s) is None
>     for s in ("A", "A & A", "AA&A", "AA   &  A &  A"):
>         assert re.match(pat, s)

>>> def test(pat):
...     for s in ("", " ", "&" "A A", "A&", "&A", "A&&A", "A& &A"):
...         assert re.match(pat, s) is None
...     for s in ("A", "A & A", "AA&A", "AA   &  A &  A"):
...         assert re.match(pat, s)
... 
>>> test("^A+( *& *A+)*$")
>>>