Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!newsfeed.eweka.nl!eweka.nl!feeder3.eweka.nl!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Dave Angel <davea@davea.name>
Subject: Re: Find and Replace Simplification
Date: Fri, 19 Jul 2013 18:45:28 -0400
References: <mailman.4865.1374240179.3114.python-list@python.org> <51e967bb$0$29971$c3e8da3$5496439d@news.astraweb.com> <51E9B338.4060901@Gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130623 Thunderbird/17.0.7
In-Reply-To: <51E9B338.4060901@Gmail.com>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.4888.1374273947.3114.python-list@python.org>
Lines: 78
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:50926

On 07/19/2013 05:44 PM, Devyn Collier Johnson wrote:
>
> On 07/19/2013 12:22 PM, Steven D'Aprano wrote:
>> On Fri, 19 Jul 2013 09:22:48 -0400, Devyn Collier Johnson wrote:
>>
>>> I have some code that I want to simplify. I know that a for-loop would
>>> work well, but can I make re.sub perform all of the below tasks at once,
>>> or can I write this in a way that is more efficient than using a
>>> for-loop?
>>>
>>> DATA = re.sub(',', '', 'DATA')
>>> DATA = re.sub('\'', '', 'DATA')
>>> DATA = re.sub('(', '', 'DATA')
>>> DATA = re.sub(')', '', 'DATA')
>>
>> I don't think you intended to put DATA in quotes on the right hand side.
>> That makes it literally the string D A T A, so all those replacements are
>> no-ops, and you could simplify it to:
>>
>> DATA = 'DATA'
>>
>> But that's probably not what you wanted.
>>
>> My prediction is that this will be by far the most efficient way to do
>> what you are trying to do:
>>
>> py> DATA = "Hello, 'World'()"
>> py> DATA.translate(dict.fromkeys(ord(c) for c in ",'()"))
>> 'Hello World'
>>
>> That's in Python 3 -- in Python 2, using translate will still probably be
>> the fastest, but you'll need to call it like this:
>>
>> import string
>> DATA.translate(string.maketrans("", ""), ",'()")
>>
>> I also expect that the string replace() method will be second fastest,
>> and re.sub will be the slowest, by a very long way.
>>
>> As a general rule, you should avoiding using regexes unless the text you
>> are searching for actually contains a regular expression of some kind. If
>> it's merely a literal character or substring, standard string methods
>> will probably be faster.
>>
>>
>> Oh, and a tip for you:
>>
>> - don't escape quotes unless you don't need to, use the other quote.
>>
>> s = '\''  # No, don't do this!
>> s = "'"  # Better!
>>
>> and vice versa.
>>
>>
>>
>>
> Thanks for finding that error; DATA should not be in quotes. I cannot
> believe I missed that. Good eye Steven!
>
> Using the replace command is a brilliant idea; I will implement that
> where ever I can. I am wanting to perform all of the replaces at once.
> Is that possible?
>

Read what you're quoting from.  The translate() method does just that. 
And maketrans() is the way to build a translate table.

On an Intel processor, the xlat instruction does a translate for one 
character, and adding a REP in front of it does it for an entire buffer. 
  No idea if Python takes advantage of that, however.




-- 
DaveA