Groups > comp.lang.python > #96064 > unrolled thread

Re: Reading \n unescaped from a file

Started by	Friedrich Rentsch <anthra.norell@bluewin.ch>
First post	2015-09-06 14:43 +0200
Last post	2015-09-06 14:43 +0200
Articles	1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Reading \n unescaped from a file Friedrich Rentsch <anthra.norell@bluewin.ch> - 2015-09-06 14:43 +0200

#96064 — Re: Reading \n unescaped from a file

From	Friedrich Rentsch <anthra.norell@bluewin.ch>
Date	2015-09-06 14:43 +0200
Subject	Re: Reading \n unescaped from a file
Message-ID	<mailman.175.1441543490.8327.python-list@python.org>


On 09/06/2015 09:51 AM, Peter Otten wrote:
> Friedrich Rentsch wrote:
>
>> My response was meant for the list, but went to Peter by mistake. So I
>> repeat it with some delay:
>>
>> On 09/03/2015 04:24 PM, Peter Otten wrote:
>>> Friedrich Rentsch wrote:
>>>
>>>> On 09/03/2015 11:24 AM, Peter Otten wrote:
>>>>> Friedrich Rentsch wrote:
>>>> I appreciate your identifying two mistakes. I am curious to know what
>>>> they are.
>>> Sorry for not being explicit.
>>>
>>>>>>                 substitutes = [self.table [item] for item in hits if
>>>>>>                 item
>>>>>> in valid_hits] + []  # Make lengths equal for zip to work right
>>>>> That looks wrong...
>>> You are adding an empty list here. I wondered what you were trying to
>>> achieve with that.
>> Right you are! It doesn't do anything. I remember my idea was to pad the
>> substitutes list by one, because the list of intervening text segments
>> is longer by one element and zip uses the least common length,
>> discarding all overhang. The remedy was totally ineffective and, what's
>> more, not needed, judging by the way the editor performs as expected.
> That's because you are getting the same effect later by adding
>
> nohits[-1]
>
> You could avoid that by replacing [] with [""].
>
>>>> substitutes = list("12")
>>>> nohits = list("abc")
>>>> zipped = zip(nohits, substitutes)
>>>> "".join(list(reduce(lambda a, b: a+b, [zipped][0]))) + nohits[-1]
> 'a1b2c'
>>>> zipped = zip(nohits, substitutes + [""])
>>>> "".join(list(reduce(lambda a, b: a+b, [zipped][0])))
> 'a1b2c'
>
> By the way, even those who are into functional programming might find
>
>>>> "".join(map("".join, zipped))
> 'a1b2c'
>
> more readable.
>
> But there's a more general change that I suggest: instead of processing the
> string twice, first to search for matches, then for the surrounding text you
> could achieve the same in one pass with a cool feature of the re.sub()
> method -- it accepts a function:
>
>>>> def replace(text, replacements):
> ...     table = dict(replacements)
> ...     def substitute(match):
> ...         return table[match.group()]
> ...     regex = "|".join(re.escape(find) for find, replace in replacements)
> ...     return re.compile(regex).sub(substitute, text)
> ...
>>>> replace("1 foo 2 bar 1 baz", [("1", "one"), ("2", "two")])
> 'one foo two bar one baz'
>
>

I didn't think of using sub. But you're right. It is better, likely 
faster too. Building the regex reversed sorted will make it handle 
overlapping targets correctly, e.g.:

r = (
     ("1", "one"),
     ("2", "two"),
     ("12", "twelve"),
)

Your function as posted:

replace ('1 foo 2 bar 12 baz', r)
'one foo two bar onetwo baz'

regex = "|".join(re.escape(find) for find, replace in reversed (sorted (replacements)))


replace ('1 foo 2 bar 12 baz', r)
'one foo two bar twelve baz'

Thanks for the hints

Frederic

[toc] | [standalone]

csiph-web

Re: Reading \n unescaped from a file

Contents

#96064 — Re: Reading \n unescaped from a file