Path: csiph.com!news.mixmin.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!nzpost1.xs4all.net!not-for-mail
Date: Wed, 30 Sep 2015 14:20:15 -0500
From: Tim Chase <python.list@tim.thechases.com>
To: python-list@python.org
Subject: Re: Question about regular expression
In-Reply-To: <811788b6-9955-4dcc-bf49-9647891d17ec@googlegroups.com>
References: <811788b6-9955-4dcc-bf49-9647891d17ec@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.276.1443641310.28679.python-list@python.org>
Lines: 65
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:97255

On 2015-09-30 11:34, massi_srb@msn.com wrote:
> firstly the description of my problem. I have a string in the
> following form:
> 
> s = "name1 name2(1) name3 name4 (1, 4) name5(2) ..."
> 
> that is a string made up of groups in the form 'name' (letters
> only) plus possibly a tuple containing 1 or 2 integer values.
> Blanks can be placed between names and tuples or not, but they
> surely are placed beween two groups. I would like to process this
> string in order to get a dictionary like this:
> 
> d = {
>     "name1":(0, 0),
>     "name2":(1, 0),
>     "name3":(0, 0),
>     "name4":(1, 4),
>     "name5":(2, 0),
> }
> 
> I guess this problem can be tackled with regular expressions, b

First out of the gate, I suggest you follow Emile's advice and try
using string expressions.  However, if you *want* to do it with
regular expressions, you can.  It's ugly and might be fragile, but

#############################################################
import re
s = "name1 name2(1) name3 name4 (1, 4) name5(2) ..."
r = re.compile(r"""
    \b       # start at a word boundary
    (\w+)    # capture the word
    \s*      # optional whitespace
    (?:      # start an optional grouping for things in the parens
     \(      # a literal open-paren
      \s*    # optional whitespace
      (\d+)  # capture the number in those parens
      (?:    # start a second optional grouping for the stuff after a comma
       \s*   # optional whitespace
       ,     # a literal comma
       \s*   # optional whitespace
       (\d+) # the second number
      )?     # make the command and following number optional
     \)      # a literal close-paren
    )?       # make that stuff in parens optional
    """, re.X)
d = {}
for m in r.finditer(s):
    a, b, c  = m.groups()
    d[a] = (int(b or 0), int(c or 0))

from pprint import pprint
pprint(d)
#############################################################


I'd stick with the commented version of the regexp if you were to use
this anywhere so that others can follow what you're doing.

-tkc