Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #99934

Re: filter a list of strings

Path csiph.com!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail
From Jussi Piitulainen <harvesting@is.invalid>
Newsgroups comp.lang.python
Subject Re: filter a list of strings
Date Thu, 03 Dec 2015 08:32:49 +0200
Organization A noiseless patient Spider
Lines 53
Message-ID <lf51tb459f2.fsf@ling.helsinki.fi> (permalink)
References <mailman.155.1449122975.14615.python-list@python.org>
Mime-Version 1.0
Content-Type text/plain; charset=us-ascii
Injection-Info mx02.eternal-september.org; posting-host="305c68510616a2e7ac08bcd2ff1598bd"; logging-data="15580"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19foZwrjP8m1D/JRzo/6hl9hTQGXPOcAg4="
User-Agent Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)
Cancel-Lock sha1:N6xTDhUnDwSKo5IpcifMAHkAkAQ= sha1:DtVoiP/LveMVxUHp5BimTuf76rY=
Xref csiph.com comp.lang.python:99934

Show key headers only | View raw


<c.buhtz@posteo.jp> writes:

> I would like to know how this could be done more elegant/pythonic.
>
> I have a big list (over 10.000 items) with strings (each 100 to 300
> chars long) and want to filter them.
>
> list = .....
>
> for item in list[:]:
>   if 'Banana' in item:
>      list.remove(item)
>   if 'Car' in item:
>      list.remove(item)
>
> There are a lot of more conditions of course. This is just example
> code.  It doesn't look nice to me. To much redundance.

Yes. The initial copy is redundant and the repeated .remove calls are
not only expensive (quadratic time loop that could have been linear),
they are also incorrect if there are duplicates in the list. You want to
copy and filter in one go:

list = ...
list = [ item for item in list
         if ( 'Banana' not in item and
              'Car' not in item ) ]

It's better to use another name, since "list" is the name of a built-in
function. It may be a good idea to define a complex condition as a
separate function:

def isbad(item):
    return ( 'Banana' in item or
             'Car' in item )

def isgood(item)
    return not isbad(item)

items = ...
items = [ item for item in items if isgood(item) ]

Then there's also filter, which is easy to use now that the condition is
already a named function:

items = list(filter(isgood, items))

> btw: Is it correct to iterate over a copy (list[:]) of that string
> list and not the original one?

I think it's a good idea to iterate over a copy if you are modifying the
original during the iteration, but the above suggestions are better for
other reasons.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

filter a list of strings <c.buhtz@posteo.jp> - 2015-12-03 02:15 +0100
  Re: filter a list of strings Jussi Piitulainen <harvesting@is.invalid> - 2015-12-03 08:32 +0200
    Re: filter a list of strings <c.buhtz@posteo.jp> - 2015-12-03 10:27 +0100
      Re: filter a list of strings Jussi Piitulainen <harvesting@is.invalid> - 2015-12-03 13:53 +0200
      Re: filter a list of strings Peter Pearson <pkpearson@nowhere.invalid> - 2015-12-05 19:42 +0000
    Re: filter a list of strings Chris Angelico <rosuav@gmail.com> - 2015-12-03 20:40 +1100
    Re: filter a list of strings Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2015-12-03 10:46 +0100
    Re: filter a list of strings Laura Creighton <lac@openend.se> - 2015-12-03 10:53 +0100
    Re: filter a list of strings jmp <jeanmichel@sequans.com> - 2015-12-03 11:03 +0100
    Re: filter a list of strings Peter Otten <__peter__@web.de> - 2015-12-03 11:13 +0100
    Re: filter a list of strings Denis McMahon <denismfmcmahon@gmail.com> - 2015-12-03 14:16 +0000
      Re: filter a list of strings Jussi Piitulainen <harvesting@is.invalid> - 2015-12-03 17:02 +0200
  Re: filter a list of strings Grobu <snailcoder@retrosite.invalid> - 2015-12-03 13:17 +0100

csiph-web