Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #10444

Re: unexpected regexp behaviour using 'A|B|C.....'

From Peter Otten <__peter__@web.de>
Subject Re: unexpected regexp behaviour using 'A|B|C.....'
Date 2011-07-28 12:57 +0200
Organization None
References <6c92b791-58d2-4ea5-8997-48ef21ce69f8@z7g2000vbp.googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.1566.1311850649.1164.python-list@python.org> (permalink)

Show all headers | View raw


AlienBaby wrote:

> When using re patterns of the form 'A|B|C|...'  the docs seem to
> suggest that once any of A,B,C.. match, it is captured and no further
> patterns are tried.  But I am seeing,
> 
> st='  Id Name                    Prov Type  CopyOf              BsId
> Rd -Detailed_State-    Adm     Snp      Usr VSize'
> 
> p='Type *'
> re.search(p,st).group()
> 'Type  '
> 
> p='Type *|  *Type'
> re.search(p,st).group()
> ' Type'
> 
> 
> Shouldn’t the second search return the same as the first, if further
> patterns are not tried?
> 
> The documentation appears to suggest the first match should be
> returned, or am I misunderstanding?

All alternatives are tried at a given starting position in the string before 
the algorithm advances to the next position. The second alternative 
"  *Type", at least one space followed by the character sequence "Type" 
matches right after "Prov" in  your example, therefore the first 
alternative, "Type" and any following spaces, which would match after 
"Prov " is never tried. 

Maybe you accidentally typed one extra " "? If you didn't " +Type" would be 
clearer.

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

unexpected regexp behaviour using 'A|B|C.....' AlienBaby <matt.j.warren@gmail.com> - 2011-07-28 02:56 -0700
  Re: unexpected regexp behaviour using 'A|B|C.....' Peter Otten <__peter__@web.de> - 2011-07-28 12:57 +0200

csiph-web