Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.024 X-Spam-Evidence: '*H*': 0.95; '*S*': 0.00; "'''": 0.09; 'wrote:': 0.15; 'capturing': 0.16; 'sees': 0.16; 'subject:example': 0.16; 'cc:addr:python-list': 0.16; 'pm,': 0.16; '>>>': 0.16; 'cheers,': 0.19; 'cc:2**0': 0.21; "aren't": 0.22; 'cc:no real name:2**0': 0.22; 'header:In-Reply-To:1': 0.22; 'stops': 0.23; 'thus': 0.23; 'tue,': 0.23; 'lee': 0.28; 'message-id:@mail.gmail.com': 0.28; 'import': 0.29; 'cc:addr:python.org': 0.30; 'pattern': 0.30; 'print': 0.32; 'capture': 0.32; 'earlier': 0.32; 'characters': 0.34; "can't": 0.34; 'subject:text': 0.35; 'subject:/': 0.36; 'but': 0.37; 'received:google.com': 0.38; 'received:209.85.161': 0.38; 'received:209.85': 0.38; 'subject:: ': 0.38; 'either': 0.39; 'received:209': 0.40; 'accepts': 0.68; 'url:example': 0.84; 'xah': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=O8GZ4qMQHzH2y9ibYtFW7Y3BepS6yNHoqDy+sjLV59Y=; b=nxi1RJqSnEfP9FHFQWwMFJv8DVKCPBHL6hmRJTd/r6NgmYuZgrJARASXtYK5z2ry79 K9OXpKoSOSFF8sgbQte4e/2wUeb3yOM0PGQlMsGYFqcBh3JobrVTJcfHju9LevkwPxD5 1RNT02+HuNbxjtd9T/nPLU9Ny8hGDutjIbalg= MIME-Version: 1.0 In-Reply-To: References: From: Ian Kelly Date: Tue, 5 Jul 2011 16:09:46 -0600 Subject: Re: emacs lisp text processing example (html5 figure/figcaption) To: Xah Lee Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 30 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1309903819 news.xs4all.nl 21884 [2001:888:2000:d::a6]:45223 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:8863 On Tue, Jul 5, 2011 at 2:37 PM, Xah Lee wrote: > but in anycase, i can't see how this part would work >

((?:[^<]|<(?!/p>))+)

It's not that different from the pattern =E3=80=8Calt=3D"[^"]+"=E3=80=8D ea= rlier in the regex. The capture group accepts one or more characters that either aren't '<', or that are '<' but are not immediately followed by '/p>'. Thus it stops capturing when it sees exactly '

' without consuming the '<'. Using my regex with the example that you posted earlier demonstrates that it works: >>> import re >>> s =3D '''
... 3D"jamie's ...

jamie's cat! Her blog is http://example.com/jamie/

...
''' >>> print re.sub(pattern, replace, s)
3D"jamie's
jamie's cat! Her blog is http://example.com/jamie/
Cheers, Ian