Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #99930 > unrolled thread
| Started by | <c.buhtz@posteo.jp> |
|---|---|
| First post | 2015-12-03 02:15 +0100 |
| Last post | 2015-12-03 13:17 +0100 |
| Articles | 13 — 10 participants |
Back to article view | Back to comp.lang.python
filter a list of strings <c.buhtz@posteo.jp> - 2015-12-03 02:15 +0100
Re: filter a list of strings Jussi Piitulainen <harvesting@is.invalid> - 2015-12-03 08:32 +0200
Re: filter a list of strings <c.buhtz@posteo.jp> - 2015-12-03 10:27 +0100
Re: filter a list of strings Jussi Piitulainen <harvesting@is.invalid> - 2015-12-03 13:53 +0200
Re: filter a list of strings Peter Pearson <pkpearson@nowhere.invalid> - 2015-12-05 19:42 +0000
Re: filter a list of strings Chris Angelico <rosuav@gmail.com> - 2015-12-03 20:40 +1100
Re: filter a list of strings Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2015-12-03 10:46 +0100
Re: filter a list of strings Laura Creighton <lac@openend.se> - 2015-12-03 10:53 +0100
Re: filter a list of strings jmp <jeanmichel@sequans.com> - 2015-12-03 11:03 +0100
Re: filter a list of strings Peter Otten <__peter__@web.de> - 2015-12-03 11:13 +0100
Re: filter a list of strings Denis McMahon <denismfmcmahon@gmail.com> - 2015-12-03 14:16 +0000
Re: filter a list of strings Jussi Piitulainen <harvesting@is.invalid> - 2015-12-03 17:02 +0200
Re: filter a list of strings Grobu <snailcoder@retrosite.invalid> - 2015-12-03 13:17 +0100
| From | <c.buhtz@posteo.jp> |
|---|---|
| Date | 2015-12-03 02:15 +0100 |
| Subject | filter a list of strings |
| Message-ID | <mailman.155.1449122975.14615.python-list@python.org> |
I would like to know how this could be done more elegant/pythonic.
I have a big list (over 10.000 items) with strings (each 100 to 300
chars long) and want to filter them.
list = .....
for item in list[:]:
if 'Banana' in item:
list.remove(item)
if 'Car' in item:
list.remove(item)
There are a lot of more conditions of course. This is just example code.
It doesn't look nice to me. To much redundance.
btw: Is it correct to iterate over a copy (list[:]) of that string list
and not the original one?
--
GnuPGP-Key ID 0751A8EC
[toc] | [next] | [standalone]
| From | Jussi Piitulainen <harvesting@is.invalid> |
|---|---|
| Date | 2015-12-03 08:32 +0200 |
| Message-ID | <lf51tb459f2.fsf@ling.helsinki.fi> |
| In reply to | #99930 |
<c.buhtz@posteo.jp> writes:
> I would like to know how this could be done more elegant/pythonic.
>
> I have a big list (over 10.000 items) with strings (each 100 to 300
> chars long) and want to filter them.
>
> list = .....
>
> for item in list[:]:
> if 'Banana' in item:
> list.remove(item)
> if 'Car' in item:
> list.remove(item)
>
> There are a lot of more conditions of course. This is just example
> code. It doesn't look nice to me. To much redundance.
Yes. The initial copy is redundant and the repeated .remove calls are
not only expensive (quadratic time loop that could have been linear),
they are also incorrect if there are duplicates in the list. You want to
copy and filter in one go:
list = ...
list = [ item for item in list
if ( 'Banana' not in item and
'Car' not in item ) ]
It's better to use another name, since "list" is the name of a built-in
function. It may be a good idea to define a complex condition as a
separate function:
def isbad(item):
return ( 'Banana' in item or
'Car' in item )
def isgood(item)
return not isbad(item)
items = ...
items = [ item for item in items if isgood(item) ]
Then there's also filter, which is easy to use now that the condition is
already a named function:
items = list(filter(isgood, items))
> btw: Is it correct to iterate over a copy (list[:]) of that string
> list and not the original one?
I think it's a good idea to iterate over a copy if you are modifying the
original during the iteration, but the above suggestions are better for
other reasons.
[toc] | [prev] | [next] | [standalone]
| From | <c.buhtz@posteo.jp> |
|---|---|
| Date | 2015-12-03 10:27 +0100 |
| Message-ID | <mailman.165.1449134847.14615.python-list@python.org> |
| In reply to | #99934 |
Thank you for your suggestion. This will help a lot.
On 2015-12-03 08:32 Jussi Piitulainen <harvesting@is.invalid> wrote:
> list = [ item for item in list
> if ( 'Banana' not in item and
> 'Car' not in item ) ]
I often saw constructions like this
x for x in y if ...
But I don't understand that combination of the Python keywords (for,
in, if) I allready know. It is to complex to imagine what there really
happen.
I understand this
for x in y:
if ...
But what is about the 'x' in front of all that?
--
GnuPGP-Key ID 0751A8EC
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <harvesting@is.invalid> |
|---|---|
| Date | 2015-12-03 13:53 +0200 |
| Message-ID | <lf5twnzsq8m.fsf@ling.helsinki.fi> |
| In reply to | #99946 |
<c.buhtz@posteo.jp> writes:
> Thank you for your suggestion. This will help a lot.
>
> On 2015-12-03 08:32 Jussi Piitulainen wrote:
>> list = [ item for item in list
>> if ( 'Banana' not in item and
>> 'Car' not in item ) ]
>
> I often saw constructions like this
> x for x in y if ...
> But I don't understand that combination of the Python keywords (for,
> in, if) I allready know. It is to complex to imagine what there really
> happen.
Others have given the crucial search word, "list comprehension".
The brackets are part of the notation. Without brackets, or grouped in
parentheses, it would be a generator expression, whose value would yield
the items on demand. Curly braces would make it a set or dict
comprehension; the latter also uses a colon.
> I understand this
> for x in y:
> if ...
>
> But what is about the 'x' in front of all that?
You can understand the notation as collecting the values from nested
for-loops and conditions, just like you are attempting here, together
with a fresh list that will be the result. The "x" in front can be any
expression involving the loop variables; it corresponds to a
result.append(x) inside the nested loops and conditions. Roughly:
result = []
for x in xs:
for y in ys:
if x != y:
result.append((x,y))
==>
result = [(x,y) for x in xs for y in ys if x != y]
On python.org, this information seems to be in the tutorial but not in
the language reference.
[toc] | [prev] | [next] | [standalone]
| From | Peter Pearson <pkpearson@nowhere.invalid> |
|---|---|
| Date | 2015-12-05 19:42 +0000 |
| Message-ID | <dcgt0cF9d3fU2@mid.individual.net> |
| In reply to | #99946 |
On Thu, 3 Dec 2015 10:27:19 +0100, <c.buhtz@posteo.jp> wrote: [snip] > I often saw constructions like this > x for x in y if ... > But I don't understand that combination of the Python keywords (for, > in, if) I allready know. It is to complex to imagine what there really > happen. Don't give up! List comprehensions are one of the coolest things in Python. Maybe this simple example will make it click for you: >>> [x**2 for x in [1,2,3,4] if x != 2] [1, 9, 16] -- To email me, substitute nowhere->runbox, invalid->com.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-12-03 20:40 +1100 |
| Message-ID | <mailman.166.1449135660.14615.python-list@python.org> |
| In reply to | #99934 |
On Thu, Dec 3, 2015 at 8:27 PM, <c.buhtz@posteo.jp> wrote:
> Thank you for your suggestion. This will help a lot.
>
> On 2015-12-03 08:32 Jussi Piitulainen <harvesting@is.invalid> wrote:
>> list = [ item for item in list
>> if ( 'Banana' not in item and
>> 'Car' not in item ) ]
>
> I often saw constructions like this
> x for x in y if ...
> But I don't understand that combination of the Python keywords (for,
> in, if) I allready know. It is to complex to imagine what there really
> happen.
>
> I understand this
> for x in y:
> if ...
>
> But what is about the 'x' in front of all that?
It's called a *list comprehension*. The code Jussi posted is broadly
equivalent to this:
list = []
for item in list:
if ( 'Banana' not in item and
'Car' not in item ):
list.append(item)
I recently came across this blog post, which visualizes comprehensions
fairly well.
http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/
The bit at the beginning (before the first 'for') goes inside a
list.append(...) call, and then everything else is basically the same.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> |
|---|---|
| Date | 2015-12-03 10:46 +0100 |
| Message-ID | <mailman.167.1449136035.14615.python-list@python.org> |
| In reply to | #99934 |
On 03.12.2015 10:27, c.buhtz@posteo.jp wrote:
>
> I often saw constructions like this
> x for x in y if ...
> But I don't understand that combination of the Python keywords (for,
> in, if) I allready know. It is to complex to imagine what there really
> happen.
>
> I understand this
> for x in y:
> if ...
>
> But what is about the 'x' in front of all that?
>
The leading x states which value you want to put in the new list. This
may seem obvious in the simple case, but quite often its not the
original x-ses found in y that you want to store, but some
transformation of it, e.g.:
[x**2 for x in y]
is equivalent to:
squares = []
for x in y:
squares.append(x**2)
[toc] | [prev] | [next] | [standalone]
| From | Laura Creighton <lac@openend.se> |
|---|---|
| Date | 2015-12-03 10:53 +0100 |
| Message-ID | <mailman.168.1449136443.14615.python-list@python.org> |
| In reply to | #99934 |
In a message of Thu, 03 Dec 2015 10:27:19 +0100, c.buhtz@posteo.jp writes:
>Thank you for your suggestion. This will help a lot.
>
>On 2015-12-03 08:32 Jussi Piitulainen <harvesting@is.invalid> wrote:
>> list = [ item for item in list
>> if ( 'Banana' not in item and
>> 'Car' not in item ) ]
>
>I often saw constructions like this
> x for x in y if ...
>But I don't understand that combination of the Python keywords (for,
>in, if) I allready know. It is to complex to imagine what there really
>happen.
>
>I understand this
> for x in y:
> if ...
>
>But what is about the 'x' in front of all that?
This is a list comprehension.
see: https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions
But I would solve your problem like this:
things_I_do_not_want = ['Car', 'Banana', <add all of them here>]
things_I_want = []
for item in list_of_everything_I_started_with:
if item not in things_I_do_not_want:
things_I_want.append(item)
Laura
[toc] | [prev] | [next] | [standalone]
| From | jmp <jeanmichel@sequans.com> |
|---|---|
| Date | 2015-12-03 11:03 +0100 |
| Message-ID | <mailman.169.1449137041.14615.python-list@python.org> |
| In reply to | #99934 |
On 12/03/2015 10:27 AM, c.buhtz@posteo.jp wrote: > I often saw constructions like this > x for x in y if ... > But I don't understand that combination of the Python keywords (for, > in, if) I allready know. It is to complex to imagine what there really > happen. > > I understand this > for x in y: > if ... > > But what is about the 'x' in front of all that? > I'd advise you insist on understanding this construct as it is a very common (and useful) construct in python. It's a list comprehension, you can google it to get some clues about it. consider this example [2*i for i in [0,1,2,3,4] if i%2] == [2,6] you can split it in 3 parts: 1/ for i in [0,1,2,3,4] 2/ if i/2 3/ 2*i 1/ I'm assuming you understand this one 2/ this is the filter part 3/ this is the mapping part, it applies a function to each element To go back to your question "what is about the 'x' in front of all that". The x is the mapping part, but the function applied is the function identity which simply keeps the element as is. # map each element, no filter [2*i for i in [0,1,2,3,4]] == [0, 2, 4, 6, 8] # no mapping, keeping only odd elements [i for i in [0,1,2,3,4] if i%2] == [1,3] JM
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-12-03 11:13 +0100 |
| Message-ID | <mailman.170.1449137650.14615.python-list@python.org> |
| In reply to | #99934 |
Laura Creighton wrote:
> In a message of Thu, 03 Dec 2015 10:27:19 +0100, c.buhtz@posteo.jp writes:
>>Thank you for your suggestion. This will help a lot.
>>
>>On 2015-12-03 08:32 Jussi Piitulainen <harvesting@is.invalid> wrote:
>>> list = [ item for item in list
>>> if ( 'Banana' not in item and
>>> 'Car' not in item ) ]
>>
>>I often saw constructions like this
>> x for x in y if ...
>>But I don't understand that combination of the Python keywords (for,
>>in, if) I allready know. It is to complex to imagine what there really
>>happen.
>>
>>I understand this
>> for x in y:
>> if ...
>>
>>But what is about the 'x' in front of all that?
>
> This is a list comprehension.
> see:
> https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions
>
> But I would solve your problem like this:
>
> things_I_do_not_want = ['Car', 'Banana', <add all of them here>]
> things_I_want = []
>
> for item in list_of_everything_I_started_with:
> if item not in things_I_do_not_want:
> things_I_want.append(item)
Note that unlike the original code your variant will not reject
"Blue Banana". If the OP wants to preserve the '"Banana" in item' test he
can use
for item in list_of_everything_I_started_with:
for unwanted in things_I_do_not_want:
if unwanted in item:
break
else: # executed unless the for loop exits with break
things_I_want.append(item)
or
things_I_want = [
item for item in list_of_everything_I_started_with
if not any(unwanted in item for unwanted in things_I_do_not_want)
]
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-12-03 14:16 +0000 |
| Message-ID | <n3pis2$aa6$1@dont-email.me> |
| In reply to | #99934 |
On Thu, 03 Dec 2015 08:32:49 +0200, Jussi Piitulainen wrote:
> def isbad(item):
> return ( 'Banana' in item or
> 'Car' in item )
>
> def isgood(item)
> return not isbad(item)
badthings = [ 'Banana', 'Car', ........]
def isgood(item)
for thing in badthings:
if thing in item:
return False
return True
--
Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <harvesting@is.invalid> |
|---|---|
| Date | 2015-12-03 17:02 +0200 |
| Message-ID | <lf5oae7r2xc.fsf@ling.helsinki.fi> |
| In reply to | #99959 |
Denis McMahon writes: > On Thu, 03 Dec 2015 08:32:49 +0200, Jussi Piitulainen wrote: > >> def isbad(item): >> return ( 'Banana' in item or >> 'Car' in item ) >> >> def isgood(item) >> return not isbad(item) > > badthings = [ 'Banana', 'Car', ........] > > def isgood(item) > for thing in badthings: > if thing in item: > return False > return True As long as all conditions are of that shape.
[toc] | [prev] | [next] | [standalone]
| From | Grobu <snailcoder@retrosite.invalid> |
|---|---|
| Date | 2015-12-03 13:17 +0100 |
| Message-ID | <n3pbns$i4e$1@dont-email.me> |
| In reply to | #99930 |
On 03/12/15 02:15, c.buhtz@posteo.jp wrote: > I would like to know how this could be done more elegant/pythonic. > > I have a big list (over 10.000 items) with strings (each 100 to 300 > chars long) and want to filter them. > > list = ..... > > for item in list[:]: > if 'Banana' in item: > list.remove(item) > if 'Car' in item: > list.remove(item) > > There are a lot of more conditions of course. This is just example code. > It doesn't look nice to me. To much redundance. > > btw: Is it correct to iterate over a copy (list[:]) of that string list > and not the original one? > No idea how 'Pythonic' this would be considered, but you could use a combination of filter() with a regular expression : # ------------------------------------------------------------------ import re list = ... pattern = re.compile( r'banana|car', re.I ) filtered_list = filter( lambda line: not pattern.search(line), list ) # ------------------------------------------------------------------ HTH
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web