Path: csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: Peter Otten <__peter__@web.de>
Newsgroups: comp.lang.python
Subject: Re: Powerful perl paradigm I don't find in python
Date: Fri, 15 Jan 2016 14:34:40 +0100
Organization: None
Lines: 60
Message-ID: <mailman.12.1452864897.15297.python-list@python.org>
References: <n7adse$k6$1@dont-email.me> <n7af0o$kfr$1@ger.gmane.org> <mailman.5.1452854573.15297.python-list@python.org> <n7ajo0$k6v$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7Bit
User-Agent: KNode/4.13.3
Precedence: list
Xref: csiph.com comp.lang.python:101752

Charles T. Smith wrote:

> What the original snippet does is parse *and consume* a string - actually,
> to avoid maintaining a cursor traverse the string.  The perl feature is
> that substitute allows the found pattern to be replaced, but retains the
> group after the expression is complete.

That is too technical for my taste. When is your "paradigm" more useful than 
a simple

re.finditer(), re.findall(), or re.split()

? 

>> things = []
>> while some_str != tail:
>>      m = re.match(pattern_str, some_str)
>>      things.append(some_str[:m.end()])
>>      some_str = some_str[m.end():]
 
If that were common (or even ever occured) I'd write a helper which avoids 
the brittle some_str != tail comparison and exposes the functionality in a 
for loop:

class MissingTailError(ValueError):
    pass


class UnparsedRestError(ValueError):
    pass


def shave_off(regex, text, tail=None):
    """
    >>> for s in shave_off(r"[a-z]+ \\d+\\s*",
    ...        "foo 12 bar 34 baz", tail="baz"):
    ...     s
    'foo 12 '
    'bar 34 '
    """
    if tail is not None:
        if text.endswith(tail):
            end = len(text) - len(tail)
        else:
            raise MissingTailError("%r does not end with %r" % (text, tail))
    else:
        end = len(text)

    start = 0
    r = re.compile(regex)
    while start != end:
        m = r.match(text, start, end)
        if m is None:
            raise UnparsedRestError(
                "%r does not match pattern %r"
                % (text[start:end], r.pattern))
        yield text[m.start():m.end()]
        start = m.end()