Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Tim Chase Newsgroups: comp.lang.python Subject: Re: Regular expressions Date: Tue, 3 Nov 2015 21:12:08 -0600 Lines: 54 Message-ID: References: <662g3blobme52hfoududj27err185v2npm@4ax.com> <563967A7.4060308@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de aDPNVUzJMb558vd0aC3bvg9n+wfWlNjW6sbooq2w0RjQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; "'')": 0.07; 'expressions': 0.07; 'postgresql': 0.07; '(aka': 0.09; 'grep': 0.09; 'indexes': 0.09; 'input,': 0.09; 'index': 0.13; 'output': 0.13; 'missed': 0.15; '"grep': 0.16; '-tkc': 0.16; '...)': 0.16; 'expression.': 0.16; 'expressions,': 0.16; 'expressions.': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:Regular': 0.16; 'subject:expressions': 0.16; 'wrote:': 0.16; 'otherwise,': 0.20; 'default,': 0.22; 'killer': 0.22; 'select': 0.23; '(or': 0.23; 'insert': 0.23; 'header:In-Reply-To:1': 0.24; "i've": 0.25; 'helpful': 0.27; 'supported': 0.27; 'followed': 0.27; 'least': 0.27; "skip:' 10": 0.28; 'values': 0.28; 'regular': 0.29; 'subset': 0.29; 'allows': 0.30; 'creating': 0.30; 'certainly': 0.30; 'e.g.': 0.30; 'query': 0.30; 'fixed': 0.31; 'table': 0.32; 'michael': 0.33; 'know.': 0.34; 'skip:c 30': 0.35; 'but': 0.36; 'should': 0.36; 'lines': 0.36; '(and': 0.36; 'mode': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'charset:us-ascii': 0.37; 'things': 0.38; 'doing': 0.38; 'whatever': 0.39; 'does': 0.39; "didn't": 0.39; 'rather': 0.39; 'to:addr:python.org': 0.40; 'where': 0.40; 'still': 0.40; 'hope': 0.61; 'entire': 0.61; 'default': 0.61; 'research': 0.62; 'skip:n 10': 0.62; 'received:50': 0.66; 'engines.': 0.93; 'contacts': 0.97 X-Sender-Id: wwwh|x-authuser|tim@thechases.com X-Sender-Id: wwwh|x-authuser|tim@thechases.com X-MC-Relay: Neutral X-MailChannels-SenderId: wwwh|x-authuser|tim@thechases.com X-MailChannels-Auth-Id: wwwh X-MC-Loop-Signature: 1446606809105:2614896613 X-MC-Ingress-Time: 1446606809105 In-Reply-To: <563967A7.4060308@gmail.com> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu) X-AuthUser: tim@thechases.com X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:98200 On 2015-11-03 19:04, Michael Torrie wrote: > Grep can use regular expressions (and I do so with it regularly), > but it's default mode is certainly not regular expressions, and it > is still very powerful. I suspect you're thinking of `fgrep` (AKA "grep -F") which uses fixed strings rather than regular expressions. By default, `grep` certainly does use regular expressions: tim@linux$ seq 5 | grep "1*" tim@bsd$ jot 5 | grep "1*" will output the entire input, not just lines containing a "1" followed by an asterisk. > I've never used regular expressions in a database query language; > until this moment I didn't know any supported such things in their > queries. Good to know. How you would index for regular > expressions in queries I don't know. At least PostgreSQL allows for creating indexes on a particular regular expression. E.g. (shooting from the hip so I might have missed something): CREATE TABLE contacts ( -- ... phonenumber VARCHAR(15), -- ... ) CREATE INDEX contacts_just_phone_digits_idx ON contacts((regexp_replace(phonenumber, '[^0-9]', ''))); INSERT INTO contacts(..., phonenumber, ...) VALUES (..., '800-555-1212', ...) SELECT * FROM contacts WHERE -- should use contacts_just_phone_digits_idx regexp_replace(phonenumber, '[^0-9]', '') = '8005551212'; It's not as helpful as one might hope because you're stuck using a fixed regexp rather than an arbitrary regexp, but if you have a particular regexp you search for frequently, you can index it. Otherwise, you'd be doing full table-scans (or at least a full scan of whatever subset the active non-regexp'ed index yields) which can be pretty killer on performance. You'd have to research on other DB engines. -tkc