Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: Tim Chase <python.list@tim.thechases.com>
Newsgroups: comp.lang.python
Subject: Re: Regular expressions
Date: Tue, 3 Nov 2015 21:12:08 -0600
Lines: 54
Message-ID: <mailman.1.1446607194.16136.python-list@python.org>
References: <662g3blobme52hfoududj27err185v2npm@4ax.com> <mailman.0.1446519578.8789.python-list@python.org> <hp9g3b9hsn06edb0po8bduegjqkmpo4p8n@4ax.com> <mailman.3.1446523111.8789.python-list@python.org> <d39290cf-cb26-470f-a987-2f71e3860f97@googlegroups.com> <mailman.5.1446525488.8789.python-list@python.org> <bb15756d-7181-421d-835e-b2fbfc1c1774@googlegroups.com> <563967A7.4060308@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
In-Reply-To: <563967A7.4060308@gmail.com>
Precedence: list
Xref: csiph.com comp.lang.python:98200

On 2015-11-03 19:04, Michael Torrie wrote:
> Grep can use regular expressions (and I do so with it regularly),
> but it's default mode is certainly not regular expressions, and it
> is still very powerful.

I suspect you're thinking of `fgrep` (AKA "grep -F") which uses fixed
strings rather than regular expressions.  By default, `grep` certainly
does use regular expressions:

  tim@linux$ seq 5 | grep "1*"
  tim@bsd$ jot 5 | grep "1*"

will output the entire input, not just lines containing a "1"
followed by an asterisk.

> I've never used regular expressions in a database query language;
> until this moment I didn't know any supported such things in their
> queries.  Good to know.  How you would index for regular
> expressions in queries I don't know.

At least PostgreSQL allows for creating indexes on a particular
regular expression.  E.g. (shooting from the hip so I might have
missed something):

  CREATE TABLE contacts (
   -- ...
   phonenumber VARCHAR(15),
   -- ...
   )
  CREATE INDEX contacts_just_phone_digits_idx
   ON contacts((regexp_replace(phonenumber, '[^0-9]', '')));

  INSERT INTO contacts(..., phonenumber, ...)
   VALUES (..., '800-555-1212', ...)

  SELECT *
  FROM contacts
  WHERE -- should use contacts_just_phone_digits_idx
   regexp_replace(phonenumber, '[^0-9]', '') = '8005551212';

It's not as helpful as one might hope because you're stuck using a
fixed regexp rather than an arbitrary regexp, but if you have a
particular regexp you search for frequently, you can index it.
Otherwise, you'd be doing full table-scans (or at least a full scan
of whatever subset the active non-regexp'ed index yields) which can
be pretty killer on performance.

You'd have to research on other DB engines.

-tkc