Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Fri, 04 Jul 2014 16:57:19 +0100
From: MRAB <python@mrabarnett.plus.com>
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: Why is regexp not working?
References: <lp66j1$68k$1@ger.gmane.org>
In-Reply-To: <lp66j1$68k$1@ger.gmane.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.11500.1404489628.18130.python-list@python.org>
Lines: 65
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:73966

On 2014-07-04 13:27, Florian Lindner wrote:
> Hello,
>
> I have that piece of code:
>
>      def _split_block(self, block):
>          cre = [re.compile(r, flags = re.MULTILINE) for r in self.regexps]
>          block = "".join(block)
>          print(block)
>          print("-------------------")
>          for regexp in cre:
>              match = regexp.match(block)
>              for grp in regexp.groupindex:
>                  data = match.group(grp) if match else None
>                  self.data[grp].append(data)
>
>
> block is a list of strings, terminated by \n. self.regexps:
>
>
> self.regexps = [r"it (?P<coupling_iterations>\d+) .* dt complete yes |
> write-iteration-checkpoint |",
>                  r"it (?P<it_read_ahead>\d+) read ahead"
>
>
> If I run my program it looks like that:
>
>
> it 1 ahadf dt complete yes | write-iteration-checkpoint |
> Timestep completed
>
> -------------------
> it 1 read ahead
> it 2 ahgsaf dt complete yes | write-iteration-checkpoint |
> Timestep completed
>
> -------------------
> it 4 read ahead
> it 3 dfdsag dt complete yes | write-iteration-checkpoint |
> Timestep completed
>
> -------------------
> it 9 read ahead
> it 4 dsfdd dt complete yes | write-iteration-checkpoint |
> Timestep completed
>
> -------------------
> it 16 read ahead
> -------------------
> {'it_read_ahead': [None, '1', '4', '9', '16'], 'coupling_iterations': ['1',
> None, None, None, None]}
>
> it_read_ahead is always matched when it should (all blocks but the first).
> But why is the regexp containing coupling_iterations only matched in the
> first block?
>
> I tried different combinations using re.match vs. re.search and with or
> without re.MULTILINE.
>
The character '|' is a metacharacter that separates alternatives. For
example, the regex 'a|b' will match 'a' or b'.

Your regexes end with '|', which means that they will match an empty
string at the start of the target string.