Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Neil Cerutti Newsgroups: comp.lang.python Subject: Re: Groups in regular expressions don't repeat as expected Date: 21 Apr 2011 13:16:36 GMT Organization: Norwich University Lines: 43 Message-ID: <91ap1kF1pjU2@mid.individual.net> References: <4daf31e3$0$10596$742ec2ed@news.sonic.net> <918q69FjfgU2@mid.individual.net> <4daf4344$0$10519$742ec2ed@news.sonic.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: individual.net LwC+lQP+rMz2MDcUZuSYPwEAclgqC7b3GQPS0Rc2BllGxdgRm4 Cancel-Lock: sha1:OszoiCCJcCg7EBHTODUEkISbJdE= User-Agent: slrn/0.9.9p1/mm/ao (Win32) Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:3794 On 2011-04-20, John Nagle wrote: > Findall does something a bit different. It returns a list of > matches of the entire pattern, not repeats of groups within > the pattern. > > Consider a regular expression for matching domain names: > > >>> kre = re.compile(r'^([a-zA-Z0-9\-]+)(?:\.([a-zA-Z0-9\-]+))+$') > >>> s = 'www.example.com' > >>> ms = kre.match(s) > >>> ms.groups() > ('www', 'com') > >>> msall = kre.findall(s) > >>> msall > [('www', 'com')] > > This is just a simple example. But it illustrates an unnecessary > limitation. The matcher can do the repeated matching; you just can't > get the results out. Thanks for the further explantion. Assuming a fake API that returned multiple group matches as a tuple: >>? print(re.match(r"^([a-z])+$", "abcdef").groups()) (('a', 'b', 'c', 'd', 'e', 'f'),) I was thinking of applying findall something like this, but you have to make multiple calls: >>> m = re.match(r"^[a-z]+$", s) >>> if m: ... print(re.findall(r"[a-z]", m.group())) ... ['a', 'b', 'c', 'd', 'e', 'f'] I can see that getting really annoying. Is there a better way to make multiple group matches accessible without adding a third element type as a group element? -- Neil Cerutti