Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'method,': 0.07; 'received:localnet': 0.07; 'python': 0.08; 'iterate': 0.09; 'matt': 0.09; 'subject:python': 0.11; '"copyright",': 0.16; '"credits"': 0.16; '"license"': 0.16; '[gcc': 0.16; 'explanation': 0.16; 'expression:': 0.16; 'line.split()': 0.16; 'linux2': 0.16; 'objects.)': 0.16; 'route.': 0.16; 'row': 0.16; 'subject:expression': 0.16; 'subject:regular': 0.16; 'whitespace.': 0.16; 'this:': 0.16; 'wrote:': 0.16; 'subject:Help': 0.17; '>>>': 0.18; 'lines,': 0.18; "aren't": 0.21; 'header:In-Reply-To:1': 0.22; 'retaining': 0.23; 'explains': 0.24; 'matching': 0.24; 'string': 0.26; 'tried': 0.26; 'work.': 0.27; 'problem': 0.28; 'import': 0.28; 'matches': 0.29; "won't": 0.29; 'match': 0.30; 'not.': 0.30; 'carl': 0.30; 'separated': 0.30; 'thanks': 0.30; '(including': 0.30; 'error': 0.32; 'apr': 0.32; 'expression': 0.32; 'handling': 0.32; 'actually': 0.33; 'to:addr :python-list': 0.33; 'that,': 0.33; 'done': 0.34; 'header:User- Agent:1': 0.34; 'message-id:@gmail.com': 0.34; 'numbers.': 0.34; 'see,': 0.34; 'here,': 0.35; 'regular': 0.35; 'group,': 0.36; 'doing': 0.36; 'another': 0.37; 'friday,': 0.37; 'sequence': 0.37; 'but': 0.37; 'something': 0.37; 'two': 0.37; 'received:128': 0.38; 'could': 0.38; 'think': 0.38; 'some': 0.38; 'received:google.com': 0.38; 'received:209.85': 0.38; 'subject:: ': 0.39; 'subject:with': 0.39; 'are:': 0.39; 'why': 0.39; 'to:addr:python.org': 0.39; 'case': 0.39; 'where': 0.40; 'more': 0.60; 'results': 0.61; 'header:Message-Id:1': 0.61; 'plus': 0.65; 'account': 0.66; 'august': 0.70; 'header:Reply-To:1': 0.71; 'reply-to:no real name:2**0': 0.71; 'reply-to:addr:gmail.com': 0.78; 'obtained': 0.80; 'funk': 0.84; 'good,': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:reply-to:to:subject:date:user-agent:references:in-reply-to :mime-version:content-type:content-transfer-encoding:message-id; bh=ARWrhr1vd46sQ9Ow7IWlUgaVLmDhd8YkGFjSzYb+m/U=; b=urW/LDh1WhlBt5yAK+Smvtp4snPlB3tMKIcjcwyBEUzeqS0gcqxkQtY6a2mWxsVnha r/LWTA/Hu0MIAV90r+Cw4Fkl7cSCIejflxyWMs3SnSadTT0THCpokSpzBoovRuH5QCLk b4os7Kbr8mbs4b7ib+BttNNPORctQfXz2r1Po= From: Matt Funk To: python-list@python.org Subject: Re: Help with regular expression in python Date: Fri, 19 Aug 2011 15:55:53 -0600 User-Agent: KMail/1.13.6 (Linux/2.6.38-10-server; KDE/4.6.2; x86_64; ; ) References: <201108181349.54727.matze999@gmail.com> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: matze999@gmail.com List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 71 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1313790971 news.xs4all.nl 23909 [2001:888:2000:d::a6]:59202 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:11889 On Friday, August 19, 2011, Carl Banks wrote: > On Friday, August 19, 2011 10:33:49 AM UTC-7, Matt Funk wrote: > > number = r"\d\.\d+e\+\d+" > > numbersequence = r"%s( %s){31}(.+)" % (number,number) > > instance_linetype_pattern = re.compile(numbersequence) > > > > The results obtained are: > > results: > > [(' 2.199000e+01', ' : (instance: 0)\t:\tsome description')] > > so this matches the last number plus the string at the end of the line, > > but no retaining the previous numbers. > > > > Anyway, i think at this point i will go another route. Not sure where the > > issues lies at this point. > > I think the problem is that repeat counts don't actually repeat the > groupings; they just repeat the matchings. Take this expression: > > r"(\w+\s*){2}" I see > > This will match exactly two words separated by whitespace. But the match > result won't contain two groups; it'll only contain one group, and the > value of that group will match only the very last thing repeated: > > Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) > [GCC 4.5.2] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > > >>> import re > >>> m = re.match(r"(\w+\s*){2}","abc def") > >>> m.group(1) > > 'def' > > So you see, the regular expression is doing what you think it is, but the > way it forms groups is not. > > > Just a little advice (I know you've found a different method, and that's > good, this is for the general reader). > > The functions re.findall and re.finditer could have helped here, they find > all the matches in a string and let you iterate through them. (findall > returns the strings matched, and finditer returns the sequence of match > objects.) You could have done something like this: I did use findall but when i tried to match the everything (including the 'some description' part) it did not work. But i think the explanation you gave above matches this case and explains why it did not. > > row = [ float(x) for x in re.findall(r'\d+\.\d+e\+d+',line) ] > > And regexp matching is often overkill for a particular problem; this may be > of them. line.split() could have been sufficient: > > row = [ float(x) for x in line.split() ] > > Of course, these solutions don't account for the case where you have lines, > some of which aren't 32 floating-point numbers. You need extra error > handling for that, but you get the idea. thanks matt > > > Carl Banks