Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Random832 Newsgroups: comp.lang.python Subject: Re: Whittle it on down Date: Thu, 05 May 2016 09:21:39 -0400 Lines: 23 Message-ID: References: <572ae25f$0$2821$c3e8da3$76491128@news.astraweb.com> <1462430766.25079.598726825.1B90C7A1@webmail.messagingengine.com> <572af811$0$1608$c3e8da3$5496439d@news.astraweb.com> <1462454499.2962191.598999745.40BB8A1E@webmail.messagingengine.com> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de U+XRT4JwOmD6bn/agISfcQ7y5NbqKQECRRupBJx9eYIg== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; '(except': 0.05; 'reject': 0.05; 'trailing': 0.07; 'non-ascii': 0.09; 'received:internal': 0.09; 'spec': 0.09; 'fitness': 0.13; 'thu,': 0.15; 'input:': 0.16; 'match:': 0.16; 'message-id:@webmail.messagingengine.com': 0.16; 'outputs': 0.16; 'received:10.202': 0.16; 'received:10.202.2': 0.16; 'received:66.111': 0.16; 'received:66.111.4': 0.16; 'received:io': 0.16; 'received:messagingengine.com': 0.16; 'received:psf.io': 0.16; 'rejected.': 0.16; 'wrote:': 0.16; 'obviously': 0.16; 'odd': 0.18; 'language': 0.19; '(not': 0.20; 'header:In-Reply-To:1': 0.24; 'regular': 0.29; 'accepts': 0.29; 'separated': 0.29; 'spaces': 0.29; 'putting': 0.30; 'aside': 0.32; "d'aprano": 0.33; 'steven': 0.33; 'list': 0.34; 'easiest': 0.35; 'something': 0.35; 'item': 0.35; 'but': 0.36; 'cases': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'received:66': 0.38; 'why': 0.39; 'to:addr:python.org': 0.40; 'your': 0.60; 'leading': 0.61; 'header:Message-Id:1': 0.61; 'more': 0.63; 'letters': 0.67; 'conservative': 0.84; "op's": 0.84; 'subject:down': 0.84; '"one': 0.91; 'write:': 0.91 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=fastmail.com; h= content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-sasl-enc :x-sasl-enc; s=mesmtp; bh=kigdczcQMttI8FL2zd0dqlD3VV4=; b=BPfzrg QqjAHKw874eb2IcpMAsGdzTlhOVT4c832018EWfrzEUkLYqzTrCkTW8DAypUFIyW Ob5tNvETRxyKcf3acy2BKBFksInLPQkhGeEtWTg4uiHkdvrSF4uWUr1YtitBSxJs +Ec0bcZzvwuA9czQ28+9c9cQoBcDBBc70GqHE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=kigdczcQMttI8FL 2zd0dqlD3VV4=; b=IIEzutk999AImskUP7/fq7FtOdWEzKU9CnkvVfi/aau1bNY FZouVor2az+t1mau5IbQq9N4IkA0avef4lQKFnCFgXN8DCFs/bwm/c+2eeRKfr+C jpYr5Q/tHv/0SbnbegH4pEH31gg4JtpR95APEGMwhAr6Dm6rdTdVIF27lCtQ= X-Sasl-Enc: p4IB+xFSVs1KW+M1PEOhEI8ftfcsQkHX6gWhQSn8iBMu 1462454499 X-Mailer: MessagingEngine.com Webmail Interface - ajax-140377c4 In-Reply-To: <572af811$0$1608$c3e8da3$5496439d@news.astraweb.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <1462454499.2962191.598999745.40BB8A1E@webmail.messagingengine.com> X-Mailman-Original-References: <572ae25f$0$2821$c3e8da3$76491128@news.astraweb.com> <1462430766.25079.598726825.1B90C7A1@webmail.messagingengine.com> <572af811$0$1608$c3e8da3$5496439d@news.astraweb.com> Xref: csiph.com comp.lang.python:108176 On Thu, May 5, 2016, at 03:36, Steven D'Aprano wrote: > Putting non-ASCII letters aside for the moment, how would you match these > specs as a regular expression? Well, obviously *your* language (not the OP's), given the cases you reject, is "one or more sequences of letters separated by space*-ampersand-space*", and that is actually one of the easiest kinds of regex to write: "[A-Z]+( *& *[A-Z]+)*". However, your spec is wrong: > - Leading or trailing spaces, or spaces not surrounding an ampersand, > must not match: "AAA BBB" must be rejected. The *very first* item in OP's list of good outputs is 'PHYSICAL FITNESS CONSULTANTS & TRAINERS'. If you want something that's extremely conservative (except for the *very odd in context* choice of allowing arbitrary numbers of spaces - why would you allow this but reject leading or trailing space?) and accepts all of OP's input: [A-Z]+(( *& *| +)[A-Z]+)*