Path: csiph.com!news.mixmin.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!nzpost1.xs4all.net!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'subject:Question': 0.05; 'ugly': 0.07; '(1,': 0.09; '0))': 0.09; '0),': 0.09; 'literal': 0.09; 'optional': 0.09; 'skip:# 60': 0.09; 'tuple': 0.09; 'suggest': 0.15; '-tkc': 0.16; '4),': 0.16; 'comma': 0.16; 'commented': 0.16; 'expressions,': 0.16; 'expressions.': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'only)': 0.16; 'pprint': 0.16; 'subject:expression': 0.16; 'subject:regular': 0.16; 'wrote:': 0.16; 'string': 0.17; 'integer': 0.18; 'stick': 0.18; 'not,': 0.22; 'form:': 0.22; 'tuples': 0.22; 'this:': 0.23; 'second': 0.24; 'import': 0.24; 'header:In-Reply-To:1': 0.24; 'command': 0.26; 'regular': 0.29; 'boundary': 0.29; 'dictionary': 0.29; 'grouping': 0.29; 'whitespace': 0.29; 'anywhere': 0.30; "i'd": 0.31; 'guess': 0.31; 'possibly': 0.32; 'problem': 0.33; 'surely': 0.33; 'values.': 0.33; 'advice': 0.35; 'problem.': 0.35; 'but': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'two': 0.37; 'charset:us-ascii': 0.37; 'things': 0.38; 'version': 0.38; 'names': 0.38; 'stuff': 0.38; 'to:addr:python.org': 0.40; 'between': 0.65; 'capture': 0.66; 'groups.': 0.72; 'received:10.94': 0.84; 'received:23': 0.84 X-Sender-Id: wwwh|x-authuser|tim@thechases.com X-Sender-Id: wwwh|x-authuser|tim@thechases.com X-MC-Relay: Neutral X-MailChannels-SenderId: wwwh|x-authuser|tim@thechases.com X-MailChannels-Auth-Id: wwwh X-MC-Loop-Signature: 1443640871115:260629575 X-MC-Ingress-Time: 1443640871115 Date: Wed, 30 Sep 2015 14:20:15 -0500 From: Tim Chase To: python-list@python.org Subject: Re: Question about regular expression In-Reply-To: <811788b6-9955-4dcc-bf49-9647891d17ec@googlegroups.com> References: <811788b6-9955-4dcc-bf49-9647891d17ec@googlegroups.com> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-AuthUser: tim@thechases.com X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 65 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1443641310 news.xs4all.nl 23792 [2001:888:2000:d::a6]:51249 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:97255 On 2015-09-30 11:34, massi_srb@msn.com wrote: > firstly the description of my problem. I have a string in the > following form: > > s = "name1 name2(1) name3 name4 (1, 4) name5(2) ..." > > that is a string made up of groups in the form 'name' (letters > only) plus possibly a tuple containing 1 or 2 integer values. > Blanks can be placed between names and tuples or not, but they > surely are placed beween two groups. I would like to process this > string in order to get a dictionary like this: > > d = { > "name1":(0, 0), > "name2":(1, 0), > "name3":(0, 0), > "name4":(1, 4), > "name5":(2, 0), > } > > I guess this problem can be tackled with regular expressions, b First out of the gate, I suggest you follow Emile's advice and try using string expressions. However, if you *want* to do it with regular expressions, you can. It's ugly and might be fragile, but ############################################################# import re s = "name1 name2(1) name3 name4 (1, 4) name5(2) ..." r = re.compile(r""" \b # start at a word boundary (\w+) # capture the word \s* # optional whitespace (?: # start an optional grouping for things in the parens \( # a literal open-paren \s* # optional whitespace (\d+) # capture the number in those parens (?: # start a second optional grouping for the stuff after a comma \s* # optional whitespace , # a literal comma \s* # optional whitespace (\d+) # the second number )? # make the command and following number optional \) # a literal close-paren )? # make that stuff in parens optional """, re.X) d = {} for m in r.finditer(s): a, b, c = m.groups() d[a] = (int(b or 0), int(c or 0)) from pprint import pprint pprint(d) ############################################################# I'd stick with the commented version of the regexp if you were to use this anywhere so that others can follow what you're doing. -tkc