Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!cs.uu.nl!news.stack.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.006 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'flags': 0.05; 'ascii': 0.07; 'backwards': 0.07; 'mode,': 0.07; 'retained': 0.07; 'incompatible': 0.09; 'locale': 0.09; '(reverse': 0.16; 'additions': 0.16; 'flag,': 0.16; 'flags.': 0.16; 'hardcoded': 0.16; 'matched': 0.16; 'mode.': 0.16; 'on).': 0.16; 'patterns,': 0.16; 'relied': 0.16; 'subject:regex': 0.16; 'url:issues': 0.16; 'seems': 0.20; 'compatible': 0.21; 'discussion': 0.22; 'header:In- Reply-To:1': 0.22; 'obviously': 0.23; 'url:code': 0.23; 'url:wiki': 0.25; '(in': 0.26; "i'm": 0.27; 'received:209.85.220': 0.27; 'raise': 0.28; 'url:mailman': 0.28; 'unicode': 0.29; 'message-id:@mail.gmail.com': 0.29; 'operation.': 0.30; 'syntax,': 0.30; 'unicode,': 0.30; 'url:detail': 0.30; '(e.g.': 0.31; 'modules': 0.31; 'adds': 0.32; 'actually': 0.33; 'there': 0.33; 'to:addr:python-list': 0.33; 'url:listinfo': 0.33; 'flag': 0.34; 'legacy': 0.34; 'replacement': 0.34; 'url:python': 0.36; 'problems': 0.36; 'conflict': 0.37; 'instead.': 0.37; 'using': 0.37; 'could': 0.38; 'overview': 0.38; 'some': 0.38; 'received:google.com': 0.38; 'url:org': 0.38; 'received:209.85': 0.38; 'should': 0.38; 'subject:: ': 0.39; 'manually': 0.39; 'possible,': 0.39; 'either': 0.39; 'to:addr:python.org': 0.39; 'where': 0.40; 'entered': 0.40; 'more': 0.60; 'kind': 0.61; 'url:p': 0.62; 'designed': 0.65; 'here:': 0.65; 'here.': 0.66; 'enable': 0.67; 'compliant': 0.67; 'special': 0.67; 'kept': 0.68; '100%': 0.82; 'problematic': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=ATfIy1Krv+m7cq3u/X9yo2Mv/DY9CLqvov8sOB0pYwY=; b=F7sG3TU3WL2hntgFbABjqcX76L2gccAxTRnOWQJHgLAXZ70cINwniUwY7hgyqDxJex Xo3D/LCklItPwUnrPViv1BzR7BUQvCQqCjAFghCV615GmmTDawH/q1XVzVK5FdWeA1Hl 7BRS88k53LkIV/E9PiTTQ0bHUTqEett/vu/kI= MIME-Version: 1.0 In-Reply-To: <8e5b2e1c-bb7a-45a3-a0a3-23d25c3d16a7@w28g2000yqw.googlegroups.com> References: <8e5b2e1c-bb7a-45a3-a0a3-23d25c3d16a7@w28g2000yqw.googlegroups.com> Date: Sun, 28 Aug 2011 15:40:05 +0200 Subject: Re: On re / regex replacement From: Vlastimil Brom To: python Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 52 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1314538808 news.xs4all.nl 2467 [2001:888:2000:d::a6]:35643 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:12340 2011/8/28 jmfauth : > There is actually a discussion on the dev-list about the replacement > of "re" by "regex". >... > If I can undestand the ASCII flag, ASCII being the "lingua franca" of > almost all codings, I am more skeptical about the LOCALE/UNICODE > flags. > > There is in my mind some kind of conflict here. What is 100% unicode > compliant shoud be locale independent ("Unicode.org") and a locale > depedency means a loss of unicode compliance. > > I'm fearing some potential problems here: =A0Users or modules working > in one mode, while some others are working in the other mode. > >... > jmf > > -- > http://mail.python.org/mailman/listinfo/python-list > As I understand it, regex was designed to be as much compatible with re as possible, sometimes even some problematic (in some interpretation) behaviour is retained as default and "corrected" via the NEW flag (e.g. zero-width split). Also the LOCALE flag seems to be considered as legacy feature and kept with the same behaviour like re; cf.: http://code.google.com/p/mrab-regex-hg/issues/detail?id=3D6&can=3D1 In my opinon, the LOCALE flag is not reliable (in a way I would imagine) in either re or regex. In the area of flags regex should work the same way like re or it just adds more possibilities (REVERSE for backwards search, ASCII as the complement for unicode, NEW to enable some incompatible additions or corrections, where the original behaviour could be relied on). The only (understandable) incompatibility I encounter in regex are the new features requiring special syntax, which would obviously raise errors in re or which would be matched literally instead. see http://code.google.com/p/mrab-regex-hg/wiki/GeneralDetails#Additional_featu= res for an overview of the additions. Personally I am very happy with regex, both with its features as well as with the support and maintenance by its developer; however I am mostly using it for manually entered patterns, and less for hardcoded operation. regards, Vlastimil Brom