Groups > comp.lang.python > #5524 > unrolled thread

regular expression i'm going crazy

Started by	Tracubik <affdfsdfdsfsd@b.com>
First post	2011-05-16 16:25 +0000
Last post	2011-05-16 18:11 +0100
Articles	4 — 4 participants

Back to article view | Back to comp.lang.python

  regular expression i'm going crazy Tracubik <affdfsdfdsfsd@b.com> - 2011-05-16 16:25 +0000
    Re: regular expression i'm going crazy Robert Kern <robert.kern@gmail.com> - 2011-05-16 11:51 -0500
    Re: regular expression i'm going crazy Alexander Kapps <alex.kapps@web.de> - 2011-05-16 19:01 +0200
    Re: regular expression i'm going crazy andy baxter <andy@earthsong.free-online.co.uk> - 2011-05-16 18:11 +0100

#5524 — regular expression i'm going crazy

From	Tracubik <affdfsdfdsfsd@b.com>
Date	2011-05-16 16:25 +0000
Subject	regular expression i'm going crazy
Message-ID	<4dd14fdb$0$18238$4fafbaef@reader2.news.tin.it>

pls help me fixing this:

import re
s = "linka la baba"
re_s = re.compile(r'(link|l)a' , re.IGNORECASE)

print re_s.findall(s)

output: 
['link', 'l']

why?
i want my re_s to find linka and la, he just find link and l and forget 
about the ending a.

can anyone help me? trying the regular expression in redemo.py (program 
provided with python to explore the use of regular expression) i get what 
i want, so i guess re_s is ok, but it still fail...
why?
help!

Nico

[toc] | [next] | [standalone]

#5526

From	Robert Kern <robert.kern@gmail.com>
Date	2011-05-16 11:51 -0500
Message-ID	<mailman.1647.1305564724.9059.python-list@python.org>
In reply to	#5524

On 5/16/11 11:25 AM, Tracubik wrote:
> pls help me fixing this:
>
> import re
> s = "linka la baba"
> re_s = re.compile(r'(link|l)a' , re.IGNORECASE)
>
> print re_s.findall(s)
>
> output:
> ['link', 'l']
>
> why?
> i want my re_s to find linka and la, he just find link and l and forget
> about the ending a.
>
> can anyone help me? trying the regular expression in redemo.py (program
> provided with python to explore the use of regular expression) i get what
> i want, so i guess re_s is ok, but it still fail...
> why?

The parentheses () create a capturing group, which specifies that the contents 
of the group should be extracted. See the "(...)" entry here:

   http://docs.python.org/library/re#regular-expression-syntax

You can use the non-capturing version of parentheses if you want to just isolate 
the | from affecting the rest of the regex:

"""
(?:...)  A non-capturing version of regular parentheses. Matches whatever 
regular expression is inside the parentheses, but the substring matched by the 
group cannot be retrieved after performing a match or referenced later in the 
pattern.
"""

[~]
|1> import re

[~]
|2> s = "linka la baba"

[~]
|3> re_s = re.compile(r'(?:link|l)a' , re.IGNORECASE)

[~]
|4> print re_s.findall(s)
['linka', 'la']

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

[toc] | [prev] | [next] | [standalone]

#5527

From	Alexander Kapps <alex.kapps@web.de>
Date	2011-05-16 19:01 +0200
Message-ID	<mailman.1648.1305565658.9059.python-list@python.org>
In reply to	#5524

On 16.05.2011 18:25, Tracubik wrote:
> pls help me fixing this:
>
> import re
> s = "linka la baba"
> re_s = re.compile(r'(link|l)a' , re.IGNORECASE)
>
> print re_s.findall(s)
>
> output:
> ['link', 'l']
>
> why?

As the docs say:

"If one or more groups are present in the pattern, return a list of 
groups;"

http://docs.python.org/library/re.html?highlight=findall#re.findall

> i want my re_s to find linka and la, he just find link and l and forget
> about the ending a.

Try with non-grouping parentheses:

re_s = re.compile(r'(?:link|l)a' , re.IGNORECASE)

[toc] | [prev] | [next] | [standalone]

#5528

From	andy baxter <andy@earthsong.free-online.co.uk>
Date	2011-05-16 18:11 +0100
Message-ID	<mailman.1649.1305565877.9059.python-list@python.org>
In reply to	#5524

On 16/05/11 17:25, Tracubik wrote:
> pls help me fixing this:
>
> import re
> s = "linka la baba"
> re_s = re.compile(r'(link|l)a' , re.IGNORECASE)
>
> print re_s.findall(s)
>
> output:
> ['link', 'l']
>
> why?
> i want my re_s to find linka and la, he just find link and l and forget
> about the ending a.

The round brackets define a 'capturing group'. I.e. when you do findall 
it returns those elements in the string that match what's inside the 
brackets. If you want to get linka and la, you need something like this:

 >>> re_s = re.compile(r'((link|l)a)' , re.IGNORECASE)
 >>> print re_s.findall(s)
[('linka', 'link'), ('la', 'l')]

Then just look at the first element in each of the tuples in the array 
(which matches the outside set of brackets).

see:
http://www.regular-expressions.info/python.html

[toc] | [prev] | [standalone]

csiph-web

regular expression i'm going crazy

Contents

#5524 — regular expression i'm going crazy

#5526

#5527

#5528