Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.013 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'url:sourceforge': 0.03; 'expressions': 0.07; 'matches': 0.07; 'suppose': 0.07; 'string': 0.09; 'underscore': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'bug': 0.12; '>>': 0.16; 'empty,': 0.16; 'subject:Regular': 0.16; 'subject:expression': 0.16; 'substring': 0.16; '\xc2\xa0i': 0.16; 'followed': 0.16; 'wrote:': 0.18; 'trying': 0.19; 'properly': 0.19; "python's": 0.19; 'things.': 0.19; 'seems': 0.21; 'accepted.': 0.22; 'email addr:gmail.com>': 0.22; 'cc:addr:python.org': 0.22; 'this?': 0.23; 'form:': 0.24; 'string,': 0.24; 'why.': 0.24; 'initial': 0.24; 'mon,': 0.24; 'cc:2**0': 0.24; '>': 0.26; 'this:': 0.26; 'second': 0.26; 'gets': 0.27; 'header:In-Reply-To:1': 0.27; 'tried': 0.27; 'rest': 0.29; 'characters': 0.30; 'said,': 0.30; 'message-id:@mail.gmail.com': 0.30; 'url:mailman': 0.30; 'code': 0.31; 'post.': 0.31; 'produces': 0.31; 'second,': 0.31; 'work:': 0.31; 'allows': 0.31; 'regular': 0.32; 'open': 0.33; 'url:python': 0.33; 'beginning': 0.33; 'third': 0.33; 'table': 0.34; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'url:listinfo': 0.36; 'next': 0.36; "i'll": 0.36; 'url:org': 0.36; 'too': 0.37; 'list.': 0.37; 'starting': 0.37; 'skip:& 10': 0.38; 'jason': 0.38; 'whatever': 0.38; 'pm,': 0.38; 'skip:& 20': 0.39; 'does': 0.39; 'realize': 0.39; 'sure': 0.39; 'either': 0.39; 'url:mail': 0.40; 'expression': 0.60; 'ian': 0.60; 'entire': 0.61; 'simply': 0.61; "you're": 0.61; 'first': 0.61; 'telling': 0.64; 'more': 0.64; 'different': 0.65; 'forward': 0.65; 'anything.': 0.68; 'skip:r 40': 0.68; 'results': 0.69; 'skip:r 30': 0.69; 'jul': 0.74; '11:44': 0.84; 'assertion.': 0.84; 'capture': 0.91; 'to:none': 0.92; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=XCJ+AxigF2wJ48J1Jc6zXbJ7rsY5HpoYwNI+avDhK0I=; b=iE+QqpdrkuUQXpe0L6rs9chqi+BBrm25lVPguwPW8cXV06acpBRYcUnXMfli7c5T6e m4QRYbP5rCkCVfX5TwWzWTIhhTrbXBx9yZhnptvmflM9kA4Lqv7dlXOXBewg9MeM/QBz w5ihY2iGqKDahXijxLj5oZt93JLzhvpt0PlT7F/wP2LfGSf+rWGVGrfB5SeN7x1mEqEu UeZD8PFFv6n9TIlHncFHYfj021uObjW58rQDOCHj61dWGjyI+rvqtbdLoGM0qDDCnFCL yDdVrWxuor7hYSensIrSDQtAXAgLlHKskHq0iKhXBmMKOkTxJDR9oeumV/0CrMs0l+Qi Xsgw== MIME-Version: 1.0 X-Received: by 10.43.72.9 with SMTP id ym9mr1660233icb.102.1372906163179; Wed, 03 Jul 2013 19:49:23 -0700 (PDT) In-Reply-To: References: Date: Wed, 3 Jul 2013 20:49:23 -0600 Subject: Re: Regular expression negative look-ahead From: Jason Friedman Cc: Python Content-Type: multipart/alternative; boundary=001a11c1c8fa2a980204e0a6a056 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 198 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1372906589 news.xs4all.nl 15864 [2001:888:2000:d::a6]:49714 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:49798 --001a11c1c8fa2a980204e0a6a056 Content-Type: text/plain; charset=UTF-8 Huh, did not realize that endswith takes a list. I'll remember that in the future. This need is actually for http://schemaspy.sourceforge.net/, which allows one to include only tables/views that match a pattern. Either there is a bug in Schemaspy's code or Java's implementation of regular expressions is different than Python's or there is a flaw in my logic, because the pattern I verify using Python produces different results when used with Schemaspy. I suppose I'll open a bug there unless I can find the aforementioned flaw. On Mon, Jul 1, 2013 at 11:44 PM, Ian Kelly wrote: > On Mon, Jul 1, 2013 at 8:27 PM, Jason Friedman wrote: > > Found this: > > > http://stackoverflow.com/questions/13871833/negative-lookahead-assertion-not-working-in-python > . > > > > This pattern seems to work: > > pattern = re.compile(r"^(?!.*(CTL|DEL|RUN))") > > > > But I am not sure why. > > > > > > On Mon, Jul 1, 2013 at 5:07 PM, Jason Friedman > wrote: > >> > >> I have table names in this form: > >> MY_TABLE > >> MY_TABLE_CTL > >> MY_TABLE_DEL > >> MY_TABLE_RUN > >> YOUR_TABLE > >> YOUR_TABLE_CTL > >> YOUR_TABLE_DEL > >> YOUR_TABLE_RUN > >> > >> I am trying to create a regular expression that will return true for > only > >> these tables: > >> MY_TABLE > >> YOUR_TABLE > >> > >> I tried these: > >> pattern = re.compile(r"_(?!(CTL|DEL|RUN))") > >> pattern = re.compile(r"\w+(?!(CTL|DEL|RUN))") > >> pattern = re.compile(r"(?!(CTL|DEL|RUN)$)") > >> > >> But, both match. > >> I do not need to capture anything. > > > For some reason I don't seem to have a copy of your initial post. > > The reason that regex works is because you're anchoring it at the > start of the string and then telling it to match only if > ".*(CTL|DEL|RUN)" /doesn't/ match. That pattern does match starting > from the beginning of the string, so the pattern as a whole does not > match. > > The reason that the other three do not work is because the forward > assertions are not properly anchored. The first one can match the > first underscore in "MY_TABLE_CTL" instead of the second, and then the > next three characters are "TAB", not any of the verboten strings, so > it matches. The second one matches any substring of "MY_TABLE_CTL" > that isn't followed by "CTL". So it will just match the entire string > "MY_TABLE_CTL", and the rest of the string is then empty, so does not > match any of those three strings, so it too gets accepted. The third > one simply matches an empty string that isn't followed by one of those > three, so it will just match at the very start of the string and see > that the next three characters meet the forward assertion. > > Now, all that said, are you sure you actually need a regular > expression for this? It seems to me that you're overcomplicating > things. Since you don't need to capture anything, your need can be > met more simply with: > > if not table_name.endswith(('_CTL', '_DEL', '_RUN')): > # Do whatever > -- > http://mail.python.org/mailman/listinfo/python-list > --001a11c1c8fa2a980204e0a6a056 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Huh, did not realize that endswith takes a list. =C2=A0I&#= 39;ll remember that in the future.

This need is actually= for=C2=A0http://schemaspy.so= urceforge.net/, which allows one to include only tables/views that matc= h a pattern.

Either there is a bug in Schemaspy's code or Java&#= 39;s implementation of regular expressions is different than Python's o= r there is a flaw in my logic, because the pattern I verify using Python pr= oduces different results when used with Schemaspy. =C2=A0I suppose I'll= open a bug there unless I can find the aforementioned flaw.


On Mon,= Jul 1, 2013 at 11:44 PM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
On M= on, Jul 1, 2013 at 8:27 PM, Jason Friedman <jsf80238@gmail.com> wrote:
> Found this:
> http://stackoverflow= .com/questions/13871833/negative-lookahead-assertion-not-working-in-python<= /a>.
>
> This pattern seems to work:
> pattern =3D re.compile(r"^(?!.*(CTL|DEL|RUN))")
>
> But I am not sure why.
>
>
> On Mon, Jul 1, 2013 at 5:07 PM, Jason Friedman <
jsf80238@gmail.com> wrote:
>>
>> I have table names in this form:
>> MY_TABLE
>> MY_TABLE_CTL
>> MY_TABLE_DEL
>> MY_TABLE_RUN
>> YOUR_TABLE
>> YOUR_TABLE_CTL
>> YOUR_TABLE_DEL
>> YOUR_TABLE_RUN
>>
>> I am trying to create a regular expression that will return true f= or only
>> these tables:
>> MY_TABLE
>> YOUR_TABLE
>>
>> I tried these:
>> pattern =3D re.compile(r"_(?!(CTL|DEL|RUN))")
>> pattern =3D re.compile(r"\w+(?!(CTL|DEL|RUN))")
>> pattern =3D re.compile(r"(?!(CTL|DEL|RUN)$)")
>>
>> But, both match.
>> I do not need to capture anything.


For some reason I don't seem to have a copy of your initial= post.

The reason that regex works is because you're anchoring it at the
start of the string and then telling it to match only if
".*(CTL|DEL|RUN)" /doesn't/ match. =C2=A0That pattern does ma= tch starting
from the beginning of the string, so the pattern as a whole does not
match.

The reason that the other three do not work is because the forward
assertions are not properly anchored. =C2=A0The first one can match the
first underscore in "MY_TABLE_CTL" instead of the second, and the= n the
next three characters are "TAB", not any of the verboten strings,= so
it matches. =C2=A0The second one matches any substring of "MY_TABLE_CT= L"
that isn't followed by "CTL". =C2=A0So it will just match the= entire string
"MY_TABLE_CTL", and the rest of the string is then empty, so does= not
match any of those three strings, so it too gets accepted. =C2=A0The third<= br> one simply matches an empty string that isn't followed by one of those<= br> three, so it will just match at the very start of the string and see
that the next three characters meet the forward assertion.

Now, all that said, are you sure you actually need a regular
expression for this? =C2=A0It seems to me that you're overcomplicating<= br> things. =C2=A0Since you don't need to capture anything, your need can b= e
met more simply with:

if not table_name.endswith(('_CTL', '_DEL', '_RUN')= ):
=C2=A0 =C2=A0 # Do whatever
--
http://mail.python.org/mailman/listinfo/python-list

--001a11c1c8fa2a980204e0a6a056--