Path: csiph.com!news.mixmin.net!newsreader4.netcologne.de!news.netcologne.de!bcyclone04.am1.xlned.com!bcyclone04.am1.xlned.com!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!nzpost1.xs4all.net!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'needed,': 0.05; 'performs': 0.07; 'subject:file': 0.07; 'expected.': 0.09; 'lengths': 0.09; 'targets': 0.09; 'def': 0.13; 'correctly,': 0.16; 'delay:': 0.16; 'hits': 0.16; 'intervening': 0.16; 'length,': 0.16; 'mistake.': 0.16; 'posted:': 0.16; 'received:195.186': 0.16; 'received:bluewin.ch': 0.16; 'reversed': 0.16; 'subject:Reading': 0.16; 'substitutes': 0.16; 'wrote:': 0.16; 'later': 0.16; 'string': 0.17; 'element': 0.18; '>>>': 0.20; 'meant': 0.22; 'are.': 0.22; 'function:': 0.22; 'pass': 0.22; 'trying': 0.22; 'programming': 0.22; 'am,': 0.23; 'replacing': 0.23; 'header:In- Reply-To:1': 0.24; 'feature': 0.24; 'header:User-Agent:1': 0.26; "doesn't": 0.26; 'skip:" 20': 0.26; 'right.': 0.27; 'least': 0.27; 'cool': 0.27; 'function': 0.28; 'idea': 0.28; 'went': 0.28; 'looks': 0.29; 'accepts': 0.29; 'that.': 0.30; 'too.': 0.30; 'skip:[ 10': 0.31; 'table': 0.32; 'functional': 0.32; 'getting': 0.33; 'common': 0.33; 'curious': 0.33; 'foo': 0.33; 'editor': 0.34; 'equal': 0.34; 'handle': 0.34; 'list': 0.34; 'could': 0.35; 'text': 0.35; 'replace': 0.35; 'item': 0.35; 'but': 0.36; 'list,': 0.36; 'instead': 0.36; 'faster': 0.36; 'totally': 0.36; 'to:addr :python-list': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'being': 0.37; 'method': 0.37; 'thanks': 0.37; 'one,': 0.37; 'building': 0.38; 'subject:from': 0.39; "didn't": 0.39; 'received:192': 0.39; 'to:addr:python.org': 0.40; 'some': 0.40; 'your': 0.60; 'avoid': 0.61; 'here.': 0.62; 'charset:windows-1252': 0.62; 'more': 0.63; '>>>>>': 0.66; 'hints': 0.66; 'received:ch': 0.66; 'repeat': 0.67; 'ineffective': 0.84; 'mistakes.': 0.84; 'otten': 0.84; 'twelve': 0.84; 'wrong...': 0.84 Date: Sun, 06 Sep 2015 14:43:37 +0200 From: Friedrich Rentsch User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: Reading \n unescaped from a file References: <55E65909.2080507@medimorphosis.com.au> <55E8078F.7090502@bluewin.ch> <55E84D1F.5040205@bluewin.ch> <55E9669D.8030209@bluewin.ch> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 92 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1441543490 news.xs4all.nl 23724 [2001:888:2000:d::a6]:50866 X-Complaints-To: abuse@xs4all.nl X-Received-Bytes: 6473 X-Received-Body-CRC: 1331655481 Xref: csiph.com comp.lang.python:96064 On 09/06/2015 09:51 AM, Peter Otten wrote: > Friedrich Rentsch wrote: > >> My response was meant for the list, but went to Peter by mistake. So I >> repeat it with some delay: >> >> On 09/03/2015 04:24 PM, Peter Otten wrote: >>> Friedrich Rentsch wrote: >>> >>>> On 09/03/2015 11:24 AM, Peter Otten wrote: >>>>> Friedrich Rentsch wrote: >>>> I appreciate your identifying two mistakes. I am curious to know what >>>> they are. >>> Sorry for not being explicit. >>> >>>>>> substitutes = [self.table [item] for item in hits if >>>>>> item >>>>>> in valid_hits] + [] # Make lengths equal for zip to work right >>>>> That looks wrong... >>> You are adding an empty list here. I wondered what you were trying to >>> achieve with that. >> Right you are! It doesn't do anything. I remember my idea was to pad the >> substitutes list by one, because the list of intervening text segments >> is longer by one element and zip uses the least common length, >> discarding all overhang. The remedy was totally ineffective and, what's >> more, not needed, judging by the way the editor performs as expected. > That's because you are getting the same effect later by adding > > nohits[-1] > > You could avoid that by replacing [] with [""]. > >>>> substitutes = list("12") >>>> nohits = list("abc") >>>> zipped = zip(nohits, substitutes) >>>> "".join(list(reduce(lambda a, b: a+b, [zipped][0]))) + nohits[-1] > 'a1b2c' >>>> zipped = zip(nohits, substitutes + [""]) >>>> "".join(list(reduce(lambda a, b: a+b, [zipped][0]))) > 'a1b2c' > > By the way, even those who are into functional programming might find > >>>> "".join(map("".join, zipped)) > 'a1b2c' > > more readable. > > But there's a more general change that I suggest: instead of processing the > string twice, first to search for matches, then for the surrounding text you > could achieve the same in one pass with a cool feature of the re.sub() > method -- it accepts a function: > >>>> def replace(text, replacements): > ... table = dict(replacements) > ... def substitute(match): > ... return table[match.group()] > ... regex = "|".join(re.escape(find) for find, replace in replacements) > ... return re.compile(regex).sub(substitute, text) > ... >>>> replace("1 foo 2 bar 1 baz", [("1", "one"), ("2", "two")]) > 'one foo two bar one baz' > > I didn't think of using sub. But you're right. It is better, likely faster too. Building the regex reversed sorted will make it handle overlapping targets correctly, e.g.: r = ( ("1", "one"), ("2", "two"), ("12", "twelve"), ) Your function as posted: replace ('1 foo 2 bar 12 baz', r) 'one foo two bar onetwo baz' regex = "|".join(re.escape(find) for find, replace in reversed (sorted (replacements))) replace ('1 foo 2 bar 12 baz', r) 'one foo two bar twelve baz' Thanks for the hints Frederic