Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Fri, 16 Dec 2011 18:19:52 +0000
From: MRAB <python@mrabarnett.plus.com>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20111105 Thunderbird/8.0
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: re.sub(): replace longest match instead of leftmost match?
References: <jcfsrk$skh$1@reader1.panix.com> <4EEB81B3.6020600@mrabarnett.plus.com> <CALwzidnvZOhNi_YFsp__e=C==12CtZ4QaJeDZESJxTqPn4ojpg@mail.gmail.com>
In-Reply-To: <CALwzidnvZOhNi_YFsp__e=C==12CtZ4QaJeDZESJxTqPn4ojpg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: list
Reply-To: python-list@python.org
Newsgroups: comp.lang.python
Message-ID: <mailman.3743.1324059573.27778.python-list@python.org>
Lines: 30
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:17371

On 16/12/2011 17:57, Ian Kelly wrote:
> On Fri, Dec 16, 2011 at 10:36 AM, MRAB<python@mrabarnett.plus.com>  wrote:
>>  On 16/12/2011 16:49, John Gordon wrote:
>>>
>>>  According to the documentation on re.sub(), it replaces the leftmost
>>>  matching pattern.
>>>
>>>  However, I want to replace the *longest* matching pattern, which is
>>>  not necessarily the leftmost match.  Any suggestions?
>>>
>>>  I'm working with IPv6 CIDR strings, and I want to replace the longest
>>>  match of "(0000:|0000$)+" with ":".  But when I use re.sub() it replaces
>>>  the leftmost match, even if there is a longer match later in the string.
>>>
>>>  I'm also looking for a regexp that will remove leading zeroes in each
>>>  four-digit group, but will leave a single zero if the group was all
>>>  zeroes.
>>>
>>  How about this:
>>
>>  result = re.sub(r"\b0+(\d)\b", r"\1", string)
>
> Close.
>
> pattern = r'\b0+([1-9a-f]+|0)\b'
> re.sub(pattern, r'\1', string, flags=re.IGNORECASE)
>
Ah, OK.

The OP said "digit" instead of "hex digit". That's my excuse. :-)