Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!news2.arglkargh.de!news.wiretrip.org!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.019 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'subject:using': 0.04; 'foo': 0.09; 'from:addr:ethan': 0.09; 'from:addr:stoneleaf.us': 0.09; 'from:name:ethan furman': 0.09; 'message-id:@stoneleaf.us': 0.09; 'object.': 0.09; 'received:gator410.hostgator.com': 0.09; 'tens': 0.09; 'to:name:python list': 0.09; '~ethan~': 0.09; 'wrote:': 0.14; 'subject: \n ': 0.16; 'converting': 0.16; 'mon,': 0.17; 'subject:list': 0.19; 'header:In-Reply-To:1': 0.21; 'items.': 0.23; "doesn't": 0.25; 'tests': 0.26; 'skip:[ 10': 0.26; 'beyond': 0.28; 'depends': 0.29; 'subject:?': 0.29; 'lists': 0.29; 'do.': 0.30; 'steven': 0.32; 'to:addr:python-list': 0.33; 'list': 0.33; "i've": 0.33; 'rather': 0.34; 'regular': 0.34; 'there': 0.35; 'header:User-Agent:1': 0.35; '-0700,': 0.35; "d'aprano": 0.35; 'lists?': 0.35; 'couple': 0.35; 'using': 0.35; 'lists,': 0.36; 'probably': 0.36; 'instead.': 0.37; 'matters': 0.38; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'really': 0.40; 'matter': 0.63; 'back': 0.63; 'took': 0.63; 'unique': 0.63; 'received:websitewelcome.com': 0.67; 'records': 0.72; 'thousand': 0.74; 'thousands': 0.75; 'received:69.56': 0.77; 'sets,': 0.84; 'subject:any': 0.84; 'subject:over': 0.84; 'subject:there': 0.91; 'complexity': 0.93 Date: Mon, 20 Jun 2011 20:29:02 -0700 From: Ethan Furman User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: python list Subject: Re: Is there any advantage or disadvantage to using sets over list comps to ensure a list of unique entries? References: <5b73ae60-506f-45e4-a82c-e59571252d47@w4g2000yqm.googlegroups.com> <4dffefa1$0$30002$c3e8da3$5496439d@news.astraweb.com> In-Reply-To: <4dffefa1$0$30002$c3e8da3$5496439d@news.astraweb.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator410.hostgator.com X-AntiAbuse: Original Domain - python.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - stoneleaf.us X-BWhitelist: no X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: c-67-170-168-84.hsd1.or.comcast.net ([192.168.74.5]) [67.170.168.84]:1756 X-Source-Auth: ethan+stoneleaf.us X-Email-Count: 2 X-Source-Cap: dG9idWs7dG9idWs7Z2F0b3I0MTAuaG9zdGdhdG9yLmNvbQ== X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 23 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1308627012 news.xs4all.nl 49179 [::ffff:82.94.164.166]:33082 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:8063 Steven D'Aprano wrote: > On Mon, 20 Jun 2011 12:43:52 -0700, deathweaselx86 wrote: >> I've been converting lists to sets, then back to lists again to get >> unique lists. >> >> I used to use list comps to do this instead. >>>>> foo = ['1','2','3'] >>>>> bar = ['2','5'] >>>>> foo.extend([a for a in bar if a not in foo]) foo >> ['1', '2', '3', '5'] >> >> Is there any performance hit to using one of these methods over the >> other for rather large lists? > > Absolutely! > > For small lists, it really doesn't matter what you do. This probably only > matters beyond a few tens of thousands of items. Depends on the complexity of the object. It only took a couple thousand dbf records to notice a *huge* slowdown using 'in' tests on regular lists. ~Ethan~