Path: csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico Newsgroups: comp.lang.python Subject: Re: Regular expressions Date: Wed, 4 Nov 2015 14:26:42 +1100 Lines: 23 Message-ID: References: <662g3blobme52hfoududj27err185v2npm@4ax.com> <563967A7.4060308@gmail.com> <20151103211208.2a7ec561@bigbox.christie.dr> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de YTDT7sSeZ/yvKbIf8R6pLAOT8kxHVy3UnFU4KAKt3dqg== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.013 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'received:209.85.223': 0.03; 'indexing': 0.07; 'mentions': 0.09; 'index': 0.13; 'wed,': 0.15; 'anchors': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:Regular': 0.16; 'subject:expressions': 0.16; 'wrote:': 0.16; 'alternate': 0.18; 'string,': 0.18; '2015': 0.20; 'otherwise,': 0.20; 'to:name:python-list@python.org': 0.20; 'killer': 0.22; '(or': 0.23; 'tim': 0.24; 'header:In-Reply-To:1': 0.24; 'helpful': 0.27; 'supported': 0.27; 'least': 0.27; 'message- id:@mail.gmail.com': 0.27; 'specifically': 0.28; 'chase': 0.29; 'searches': 0.29; 'subset': 0.29; 'fixed': 0.31; 'generally': 0.32; 'effort.': 0.33; 'received:google.com': 0.35; 'nov': 0.35; 'widely': 0.35; "isn't": 0.35; 'but': 0.36; 'url:org': 0.36; 'received:209.85': 0.36; 'to:addr:python-list': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'being': 0.37; 'doing': 0.38; 'received:209': 0.38; 'whatever': 0.39; 'rather': 0.39; 'to:addr:python.org': 0.40; 'some': 0.40; 'save': 0.60; 'hope': 0.61; 'skip:n 10': 0.62; 'more': 0.63; 'as:': 0.79; 'url:10': 0.79; 'chrisa': 0.84; 'url:2013': 0.91; 'info,': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=i61PueaPgKREkp619tdVGa5kgugCoOkrOR5n5b+Dpvw=; b=g4WWB9Aoc4Yt5TO2tKxPM0rStYPwiw3q3P4NDhy6pjq3TsYq1o+8eCKGN4s/fKoYtl 142BuYZnvQf3LSUu4EbvUU7NUqlHgjT+aXMzb33vkrNxeCcon7dno3j4EUquDU7T6+c7 8U0TUcIi2OOPxcOx0+HF7KoYQSdJhi1/Edz5OfOlxRSLPCBAZD33GxO3CO2vsovc2ceg oPbkWDdy5gVpMfD9NxXs8J/wWhwM/pl3gbIg97zS/RP3x2Ze/WV9FdqQ7JNhvoPWtckq 2MP4SKOeyUMuey3ukdTYOvzZgVVVGOP6JCNJ6d/VleZOozCH868cMjM1/v7Vdr1aHuj4 o8ig== X-Received: by 10.107.10.210 with SMTP id 79mr605497iok.31.1446607602468; Tue, 03 Nov 2015 19:26:42 -0800 (PST) In-Reply-To: <20151103211208.2a7ec561@bigbox.christie.dr> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:98202 On Wed, Nov 4, 2015 at 2:12 PM, Tim Chase wrote: > It's not as helpful as one might hope because you're stuck using a > fixed regexp rather than an arbitrary regexp, but if you have a > particular regexp you search for frequently, you can index it. > Otherwise, you'd be doing full table-scans (or at least a full scan > of whatever subset the active non-regexp'ed index yields) which can > be pretty killer on performance. If the regex anchors the start of the string, you can generally use an index to save at least some effort. Otherwise, you're relying on some kind of alternate indexing style, such as: http://www.postgresql.org/docs/current/static/pgtrgm.html which specifically mentions regex searches as being indexable. Some more info, including 'explain' results: http://www.depesz.com/2013/04/10/waiting-for-9-3-support-indexing-of-regular-expression-searches-in-contribpg_trgm/ But this kind of thing isn't widely supported across databases. ChrisA