Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!xlned.com!feeder7.xlned.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; '>>>>': 0.09; 'from:addr:python': 0.09; 'match.': 0.09; 'separator.': 0.09; '>>>': 0.12; 'wrote:': 0.14; "'',": 0.16; 'element,': 0.16; 'from:addr:mrabarnett.plus.com': 0.16; 'from:name:mrab': 0.16; 'furman': 0.16; 'message-id:@mrabarnett.plus.com': 0.16; 'received:84.92': 0.16; 'received:84.92.122': 0.16; 'received:84.92.122.60': 0.16; 'reply-to:addr:python-list': 0.16; 'text:': 0.16; 'there...': 0.16; 'typo': 0.16; 'header:In-Reply- To:1': 0.21; 'module,': 0.23; 'received:84': 0.25; 'subject:how': 0.29; 'rid': 0.29; 'instead': 0.29; 'fact': 0.30; "can't": 0.32; 'to:addr:python-list': 0.33; 'chris': 0.34; 'header:User-Agent:1': 0.35; 'reply-to:addr:python.org': 0.35; 'probably': 0.36; 'think': 0.38; 'subject:: ': 0.38; 'should': 0.39; 'empty': 0.39; 'got': 0.39; 'add': 0.39; 'to:addr:python.org': 0.39; 'current': 0.40; 'email addr:yahoo.com': 0.63; 'dealing': 0.69; 'header:Reply- To:1': 0.72; 'reply-to:no real name:2**0': 0.72; 'bar,': 0.91 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjoKAGpi6U3Unw4S/2dsb2JhbABTl3QyjiB3ygiGIQSVPIQxhkE Date: Fri, 03 Jun 2011 23:38:50 +0100 From: MRAB User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: python-list@python.org Subject: Re: how to avoid leading white spaces References: <4de8eef1$0$29996$c3e8da3$5496439d@news.astraweb.com> <1237a287-10b0-4a2d-ba35-97b5238deda1@n11g2000yqf.googlegroups.com> <94svm4Fe7eU1@mid.individual.net> <4DE95C0C.6050900@stoneleaf.us> In-Reply-To: <4DE95C0C.6050900@stoneleaf.us> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: python-list@python.org List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 25 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1307140745 news.xs4all.nl 49175 [::ffff:82.94.164.166]:49256 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:6979 On 03/06/2011 23:11, Ethan Furman wrote: > Chris Torek wrote: >>> On 2011-06-03, rurpy@yahoo.com wrote: >> [prefers] >>>> re.split ('[ ,]', source) >> >> This is probably not what you want in dealing with >> human-created text: >> >> >>> re.split('[ ,]', 'foo bar, spam,maps') >> ['foo', '', 'bar', '', 'spam', 'maps'] > > I think you've got a typo in there... this is what I get: > > --> re.split('[ ,]', 'foo bar, spam,maps') > ['foo', 'bar', '', 'spam', 'maps'] > > I would add a * to get rid of that empty element, myself: > --> re.split('[ ,]*', 'foo bar, spam,maps') > ['foo', 'bar', 'spam', 'maps'] > It's better to use + instead of * because you don't want it to be a zero-width separator. The fact that it works should be treated as an idiosyncrasy of the current re module, which can't split on a zero-width match.