Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'elif': 0.04; 'that?': 0.05; 'val': 0.07; 'subject:help': 0.07; 'python': 0.09; 'elegant,': 0.09; 'logic': 0.09; 'lst': 0.09; 'minus': 0.09; 'python:': 0.09; 'robust': 0.09; 'self.buffer': 0.09; 'subject:skip:m 10': 0.09; 'subject:string': 0.09; 'def': 0.10; 'value.': 0.15; 'cleaner': 0.16; 'lambda': 0.16; 'object()': 0.16; 'received:74.55.86': 0.16; 'received:74.55.86.74': 0.16; 'received:smtp.webfaction.com': 0.16; 'received:webfaction.com': 0.16; 'sign,': 0.16; 'true:': 0.16; 'wrote:': 0.17; 'obviously': 0.18; 'input': 0.18; 'followed': 0.20; 'bit': 0.21; 'thanks.': 0.21; 'assumes': 0.22; 'header:In-Reply-To:1': 0.25; 'header:User- Agent:1': 0.26; 'skip:" 20': 0.26; 'am,': 0.27; 'wonder': 0.27; 'this?': 0.28; 'skip:_ 10': 0.29; 'class': 0.29; 'this.': 0.29; 'that.': 0.30; 'stuff': 0.30; "skip:' 20": 0.32; 'skip:j 20': 0.33; 'to:addr:python-list': 0.33; 'needed': 0.35; 'skip:. 20': 0.35; 'skip:l 30': 0.35; 'doing': 0.35; 'subject:?': 0.35; 'similar': 0.35; 'something': 0.35; 'next': 0.35; 'but': 0.36; 'characters': 0.36; 'two': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'received:192.168': 0.40; 'think': 0.40; 'easy': 0.60; 'remove': 0.61; 'subject:Need': 0.61; 'skip:n 10': 0.63; 'more': 0.63; 'to,': 0.65; 'subject:. ': 0.66; 'now:': 0.71; 'as:': 0.75; 'step.': 0.91 Date: Sun, 06 Jan 2013 01:32:46 -0500 From: Mitya Sirenef User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: python-list@python.org Subject: Re: Need a specific sort of string modification. Can someone help? References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 96 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1357453978 news.xs4all.nl 6904 [2001:888:2000:d::a6]:59488 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:36228 On 01/05/2013 03:35 AM, Sia wrote: > I have strings such as: > > tA.-2AG.-2AG,-2ag > or > .+3ACG.+5CAACG.+3ACG.+3ACG > > The plus and minus signs are always followed by a number (say, i). I want python to find each single plus or minus, remove the sign, the number after it and remove i characters after that. So the two strings above become: > > tA.., > and > ... > > How can I do that? > Thanks. I think it's a bit cleaner and nicer to do something similar to itertools.takewhile but takewhile 'eats' a single next value. I was actually doing some stuff that also needed this. I wonder if there's a more elegant, robust way to do this? Here's what I got for now: class BIterator(object): """Iterator with 'buffered' takewhile.""" def __init__(self, seq): self.seq = iter(seq) self.buffer = [] self.end_marker = object() self.last = None def consume(self, n): for _ in range(n): self.next() def next(self): val = self.buffer.pop() if self.buffer else next(self.seq, self.end_marker) self.last = val return val def takewhile(self, test): lst = [] while True: val = self.next() if val is self.end_marker: return lst elif test(val): lst.append(val) else: self.buffer.append(val) return lst def joined_takewhile(self, test): return ''.join(self.takewhile(test)) def done(self): return bool(self.last is self.end_marker) s = ".+3ACG.+5CAACG.+3ACG.+3ACG" not_plusminus = lambda x: x not in "+-" isdigit = lambda x: x.isdigit() def process(s): lst = [] s = BIterator(s) while True: lst.extend(s.takewhile(not_plusminus)) if s.done(): break s.next() n = int(s.joined_takewhile(isdigit)) s.consume(n) return ''.join(lst) print(process(s)) Obviously it assumes the input is well-formed, but the logic would be very easy to change to, for example, check for s.done() after each step. - mitya -- Lark's Tongue Guide to Python: http://lightbird.net/larks/