Groups > comp.lang.python > #92530 > unrolled thread

zip as iterator and bad/good practices

Started by	Fabien <fabien.maussion@gmail.com>
First post	2015-06-12 17:00 +0200
Last post	2015-06-13 16:16 +0000
Articles	16 — 10 participants

Back to article view | Back to comp.lang.python

  zip as iterator and bad/good practices Fabien <fabien.maussion@gmail.com> - 2015-06-12 17:00 +0200
    Re: zip as iterator and bad/good practices Fabien <fabien.maussion@gmail.com> - 2015-06-12 17:05 +0200
    Re: zip as iterator and bad/good practices Ian Kelly <ian.g.kelly@gmail.com> - 2015-06-12 09:26 -0600
      Re: zip as iterator and bad/good practices Fabien <fabien.maussion@gmail.com> - 2015-06-12 17:34 +0200
      Re: zip as iterator and bad/good practices Fabien <fabien.maussion@gmail.com> - 2015-06-12 17:59 +0200
    Re: zip as iterator and bad/good practices Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-06-12 17:22 +0100
    Re: zip as iterator and bad/good practices Laura Creighton <lac@openend.se> - 2015-06-12 22:34 +0200
    Re: zip as iterator and bad/good practices Terry Reedy <tjreedy@udel.edu> - 2015-06-12 19:27 -0400
    Re: zip as iterator and bad/good practices Terry Reedy <tjreedy@udel.edu> - 2015-06-12 19:43 -0400
      Re: zip as iterator and bad/good practices sohcahtoa82@gmail.com - 2015-06-12 17:02 -0700
        Re: zip as iterator and bad/good practices Chris Angelico <rosuav@gmail.com> - 2015-06-13 10:26 +1000
          Re: zip as iterator and bad/good practices sohcahtoa82@gmail.com - 2015-06-12 17:39 -0700
    Re: zip as iterator and bad/good practices jimages <jimages123@gmail.com> - 2015-06-13 13:32 +0800
      Re: zip as iterator and bad/good practices Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-06-13 07:17 +0000
        Re: zip as iterator and bad/good practices Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-06-13 13:48 +0100
          Re: zip as iterator and bad/good practices Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-06-13 16:16 +0000

#92530 — zip as iterator and bad/good practices

From	Fabien <fabien.maussion@gmail.com>
Date	2015-06-12 17:00 +0200
Subject	zip as iterator and bad/good practices
Message-ID	<mles66$sk2$1@speranza.aioe.org>

Folks,

I am developing a program which I'd like to be python 2 and 3 
compatible. I am still relatively new to python and I use primarily py3 
for development. Every once in a while I use a py2 interpreter to see if 
my tests pass through.

I just spent several hours tracking down a bug which was related to the 
fact that zip is an iterator in py3 but not in py2. Of course I did not 
know about that difference. I've found the izip() function which should 
do what I want, but that awful bug made me wonder: is it a bad practice 
to interactively modify the list you are iterating over?

I am computing mass fluxes along glacier branches ordered by 
hydrological order, i.e. branch i is guaranteed to flow in a branch 
later in that list. Branches are objects which have a pointer to the 
object they are flowing into.

In pseudo code:

for stuff, branch in zip(stuffs, branches):
	# compute flux
	...
	# add to the downstream branch
	id_branch = branches.index(branch.flows_to)
	branches[id_branch].property.append(stuff_i_computed)

So, all downstream branches in python2 where missing information from 
their tributaries. It is quite a dangerous code but I can't find a more 
elegant solution.

Thanks!

Fabien

[toc] | [next] | [standalone]

#92531

From	Fabien <fabien.maussion@gmail.com>
Date	2015-06-12 17:05 +0200
Message-ID	<mlesfd$sk2$2@speranza.aioe.org>
In reply to	#92530

On 06/12/2015 05:00 PM, Fabien wrote:
> I've found the izip() function which should do what I want

I've just come accross a stackoverflow post where they recommend:

from future_builtins import zip

which is OK since I don't want to support versions <= 2.6

[toc] | [prev] | [next] | [standalone]

#92532

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2015-06-12 09:26 -0600
Message-ID	<mailman.428.1434122838.13271.python-list@python.org>
In reply to	#92530

On Fri, Jun 12, 2015 at 9:00 AM, Fabien <fabien.maussion@gmail.com> wrote:
> Folks,
>
> I am developing a program which I'd like to be python 2 and 3 compatible. I
> am still relatively new to python and I use primarily py3 for development.
> Every once in a while I use a py2 interpreter to see if my tests pass
> through.
>
> I just spent several hours tracking down a bug which was related to the fact
> that zip is an iterator in py3 but not in py2. Of course I did not know
> about that difference. I've found the izip() function which should do what I
> want

If you're supporting both 2 and 3, you may want to look into using the
third-party "six" library, which provides utilities for writing
cross-compatible code.  Using the correct zip() function with six is
just:

    from six.moves import zip

> but that awful bug made me wonder: is it a bad practice to
> interactively modify the list you are iterating over?

Generally speaking, yes, it's bad practice to add or remove items
because this may result in items being visited more than once or not
at all. Modifying or replacing items however is usually not an issue.

> I am computing mass fluxes along glacier branches ordered by hydrological
> order, i.e. branch i is guaranteed to flow in a branch later in that list.
> Branches are objects which have a pointer to the object they are flowing
> into.
>
> In pseudo code:
>
> for stuff, branch in zip(stuffs, branches):
>         # compute flux
>         ...
>         # add to the downstream branch
>         id_branch = branches.index(branch.flows_to)
>         branches[id_branch].property.append(stuff_i_computed)

Er, I don't see the problem here. The branch object in the zip list
and the branch object in branches should be the *same* object, so the
downstream branch update should be reflected when you visit it later
in the iteration, regardless of whether zip returns a list or an iterator.

Tangentially, unless you're using id_branch for something else that
isn't shown here, is it really necessary to search the list for the
downstream branch when it looks like you already have a reference to
it? Could the above simply be replaced with:

    branch.flows_to.property.append(stuff_i_computed)

[toc] | [prev] | [next] | [standalone]

#92533

From	Fabien <fabien.maussion@gmail.com>
Date	2015-06-12 17:34 +0200
Message-ID	<mleu5a$1pf$1@speranza.aioe.org>
In reply to	#92532

On 06/12/2015 05:26 PM, Ian Kelly wrote:
>> for stuff, branch in zip(stuffs, branches):
>> >         # compute flux
>> >         ...
>> >         # add to the downstream branch
>> >         id_branch = branches.index(branch.flows_to)
>> >         branches[id_branch].property.append(stuff_i_computed)
> Er, I don't see the problem here. The branch object in the zip list
> and the branch object in branches should be the*same*  object, so the
> downstream branch update should be reflected when you visit it later
> in the iteration, regardless of whether zip returns a list or an iterator.
>
> Tangentially, unless you're using id_branch for something else that
> isn't shown here, is it really necessary to search the list for the
> downstream branch when it looks like you already have a reference to
> it? Could the above simply be replaced with:
>
>      branch.flows_to.property.append(stuff_i_computed)

Thanks a lot for your careful reading! I overly simplified my example 
and indeed this line works fine. I was adding things to "stuffs" too, 
which is a list of lists... Sorry for the confusion!

[toc] | [prev] | [next] | [standalone]

#92535

From	Fabien <fabien.maussion@gmail.com>
Date	2015-06-12 17:59 +0200
Message-ID	<mlevkm$4nt$1@speranza.aioe.org>
In reply to	#92532

On 06/12/2015 05:26 PM, Ian Kelly wrote:
>> but that awful bug made me wonder: is it a bad practice to
>> >interactively modify the list you are iterating over?
> Generally speaking, yes, it's bad practice to add or remove items
> because this may result in items being visited more than once or not
> at all. Modifying or replacing items however is usually not an issue.
>

Thanks. In that case I was modifying items and needed them to be updated 
during the loop. I kept the solution as is and my tests pass in 2 and 3.

I will consider using six. Currently all my modules begin with:


from __future__ import division
try:
     from itertools import izip as zip
except ImportError:
     pass

Which might even become longer if I find other bugs ;-)

Fabien

[toc] | [prev] | [next] | [standalone]

#92539

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2015-06-12 17:22 +0100
Message-ID	<mailman.430.1434126164.13271.python-list@python.org>
In reply to	#92530

On 12/06/2015 16:00, Fabien wrote:
> Folks,
>
> I am developing a program which I'd like to be python 2 and 3
> compatible. I am still relatively new to python and I use primarily py3
> for development. Every once in a while I use a py2 interpreter to see if
> my tests pass through.
>
> I just spent several hours tracking down a bug which was related to the
> fact that zip is an iterator in py3 but not in py2. Of course I did not
> know about that difference. I've found the izip() function which should
> do what I want, but that awful bug made me wonder: is it a bad practice
> to interactively modify the list you are iterating over?
>
> I am computing mass fluxes along glacier branches ordered by
> hydrological order, i.e. branch i is guaranteed to flow in a branch
> later in that list. Branches are objects which have a pointer to the
> object they are flowing into.
>
> In pseudo code:
>
> for stuff, branch in zip(stuffs, branches):
>      # compute flux
>      ...
>      # add to the downstream branch
>      id_branch = branches.index(branch.flows_to)
>      branches[id_branch].property.append(stuff_i_computed)
>
> So, all downstream branches in python2 where missing information from
> their tributaries. It is quite a dangerous code but I can't find a more
> elegant solution.
>
> Thanks!
>
> Fabien
>

Start here https://docs.python.org/3/howto/pyporting.html

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#92552

From	Laura Creighton <lac@openend.se>
Date	2015-06-12 22:34 +0200
Message-ID	<mailman.439.1434141281.13271.python-list@python.org>
In reply to	#92530

The real problem is removing things from lists when you are iterating
over them, not adding things to the end of lists.

Python 2.7.9 (default, Mar  1 2015, 12:57:24)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> mylist = [1,2,3]
>>> for i in mylist:
...     print i
...     mylist.remove(i)
...
1
3
>>> mylist
[2]

Most people expect 1 2 and 3 to get printed, and mylist to be empty at
the end of this loop.

Laura

[toc] | [prev] | [next] | [standalone]

#92566

From	Terry Reedy <tjreedy@udel.edu>
Date	2015-06-12 19:27 -0400
Message-ID	<mailman.446.1434151715.13271.python-list@python.org>
In reply to	#92530

On 6/12/2015 11:00 AM, Fabien wrote:
> is it a bad practice
> to interactively modify the list you are iterating over?

One needs care.  Appending to the end of the list is OK, unless you 
append a billion items or so ;-)  Appending to the end of a queue while 
*removing* items from the front of the queue, where the queue resizes 
itself at the front as needed, is standard for breadth-first search.  A 
deque.Deque can be used for this.  Depth-first search appends to and 
deletes from the end (or top) of a stack, but this is NOT 
forward-iteration as implemented by Python iterators.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#92567

From	Terry Reedy <tjreedy@udel.edu>
Date	2015-06-12 19:43 -0400
Message-ID	<mailman.447.1434152635.13271.python-list@python.org>
In reply to	#92530

On 6/12/2015 4:34 PM, Laura Creighton wrote:
> The real problem is removing things from lists when you are iterating
> over them, not adding things to the end of lists.

One needs to iterate backwards.

 >>> ints = [0, 1, 2, 2, 1, 4, 6, 5, 5]

 >>> for i in range(len(ints)-1, -1, -1):
	if ints[i] % 2:
		del ints[i]
	
 >>> ints
[0, 2, 2, 4, 6]

But using a list comp and, if necessary, copying the result back into 
the original list is much easier.

 >>> ints = [0, 1, 2, 2, 1, 4, 6, 5, 5]
 >>> ints[:] = [i for i in ints if not i % 2]
 >>> ints
[0, 2, 2, 4, 6]


-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#92569

From	sohcahtoa82@gmail.com
Date	2015-06-12 17:02 -0700
Message-ID	<e5d750bd-d7e3-4d93-9794-e9f16a4b40bd@googlegroups.com>
In reply to	#92567

On Friday, June 12, 2015 at 4:44:08 PM UTC-7, Terry Reedy wrote:
> On 6/12/2015 4:34 PM, Laura Creighton wrote:
> > The real problem is removing things from lists when you are iterating
> > over them, not adding things to the end of lists.
> 
> One needs to iterate backwards.
> 
>  >>> ints = [0, 1, 2, 2, 1, 4, 6, 5, 5]
> 
>  >>> for i in range(len(ints)-1, -1, -1):
> 	if ints[i] % 2:
> 		del ints[i]
> 	
>  >>> ints
> [0, 2, 2, 4, 6]
> 
> But using a list comp and, if necessary, copying the result back into 
> the original list is much easier.
> 
>  >>> ints = [0, 1, 2, 2, 1, 4, 6, 5, 5]
>  >>> ints[:] = [i for i in ints if not i % 2]
>  >>> ints
> [0, 2, 2, 4, 6]
> 
> 
> -- 
> Terry Jan Reedy

On the second line of your final solution, is there any reason you're using `ints[:]` rather than just `ints`?

[toc] | [prev] | [next] | [standalone]

#92572

From	Chris Angelico <rosuav@gmail.com>
Date	2015-06-13 10:26 +1000
Message-ID	<mailman.450.1434155212.13271.python-list@python.org>
In reply to	#92569

On Sat, Jun 13, 2015 at 10:02 AM,  <sohcahtoa82@gmail.com> wrote:
>>  >>> ints = [0, 1, 2, 2, 1, 4, 6, 5, 5]
>>  >>> ints[:] = [i for i in ints if not i % 2]
>>  >>> ints
>> [0, 2, 2, 4, 6]
>>
>>
>> --
>> Terry Jan Reedy
>
> On the second line of your final solution, is there any reason you're using `ints[:]` rather than just `ints`?

If you use "ints = [...]", it rebinds the name ints to the new list.
If you use "ints[:] = [...]", it replaces the entire contents of the
list with the new list. The two are fairly similar if there are no
other references to that list, but the replacement matches the
mutation behaviour of remove().

def just_some(ints):
    ints[:] = [i for i in ints if not i % 2]

ChrisA

[toc] | [prev] | [next] | [standalone]

#92574

From	sohcahtoa82@gmail.com
Date	2015-06-12 17:39 -0700
Message-ID	<df28e29a-cbc9-48b9-bf30-8b1f311848c2@googlegroups.com>
In reply to	#92572

On Friday, June 12, 2015 at 5:27:21 PM UTC-7, Chris Angelico wrote:
> On Sat, Jun 13, 2015 at 10:02 AM,  <sohcahtoa82@gmail.com> wrote:
> >>  >>> ints = [0, 1, 2, 2, 1, 4, 6, 5, 5]
> >>  >>> ints[:] = [i for i in ints if not i % 2]
> >>  >>> ints
> >> [0, 2, 2, 4, 6]
> >>
> >>
> >> --
> >> Terry Jan Reedy
> >
> > On the second line of your final solution, is there any reason you're using `ints[:]` rather than just `ints`?
> 
> If you use "ints = [...]", it rebinds the name ints to the new list.
> If you use "ints[:] = [...]", it replaces the entire contents of the
> list with the new list. The two are fairly similar if there are no
> other references to that list, but the replacement matches the
> mutation behaviour of remove().
> 
> def just_some(ints):
>     ints[:] = [i for i in ints if not i % 2]
> 
> ChrisA

Ah that makes sense.  Thanks.

[toc] | [prev] | [next] | [standalone]

#92582

From	jimages <jimages123@gmail.com>
Date	2015-06-13 13:32 +0800
Message-ID	<mailman.452.1434175078.13271.python-list@python.org>
In reply to	#92530


> On Jun 12, 2015, at 11:00 PM, Fabien <fabien.maussion@gmail.com> wrote:
> but that awful bug made me wonder: is it a bad practice to interactively modify the list you are iterating over?
Yes.
I am a newbie. I also have been confused when I read the tutorial. It recommends make a copy before looping. Then I try.
#--------------------------
Test = [1, 2]
For i in Test:
    Test.append(i)
#--------------------------
But when i execute. The script does not end. I know there must something wrong. So I launch debugger and deserve the list after each loop.
And I see:
Loop 1: [ 1, 2, 1]
Loop 2: [ 1, 2, 1, 2]
Loop 3: [ 1, 2, 1, 2, 1]
Loop 4: [ 1, 2, 1, 2, 1, 2]
......
So you can see that loop will *never* end.
So I think you regard the 'i' as a pointer. After execute one loop the pointer repoints to next element , but at the same time you are appending element. So pointer will *never* repoints to the last element.
How to solve?
Change code to this
#--------------------------
Test = [1, 2]
For i in Test[:] :
    Test.append(i)
#--------------------------

[toc] | [prev] | [next] | [standalone]

#92583

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-06-13 07:17 +0000
Message-ID	<557bd903$0$11125$c3e8da3@news.astraweb.com>
In reply to	#92582

On Sat, 13 Jun 2015 13:32:59 +0800, jimages wrote:

> I am a newbie. I also have been confused when I read the tutorial. It
> recommends make a copy before looping. Then I try.
> #--------------------------
> Test = [1, 2]
> For i in Test:
>     Test.append(i)
> #--------------------------

You don't make a copy of Test here. You could try this instead:

Test = [1, 2]
copy_test = Test[:]  # [:] makes a slice copy of the whole list
for i in copy_test:  # iterate over the copy
    Test.append(i)  # and append to the original

print(Test)


But an easier way is:

Test = [1, 2]
Test.extend(Test)
print(Test)


> But when i execute. The script does not end. I know there must something
> wrong. So I launch debugger and deserve the list after each loop. And I
> see:
> Loop 1: [ 1, 2, 1]
> Loop 2: [ 1, 2, 1, 2]
> Loop 3: [ 1, 2, 1, 2, 1]
> Loop 4: [ 1, 2, 1, 2, 1, 2]
> ......
> So you can see that loop will *never* end. So I think you regard the 'i'
> as a pointer.

i is not a pointer. It is just a variable that gets a value from the 
list, the same as:

    # first time through the loop
    i = Test[0]
    # second time through the loop
    i = Test[1]  # the second item


The for loop statement:

    for item in seq: ...

understands sequences, lists, and other iterables, not "item". item is 
just an ordinary variable, nothing special about it. The for statement 
takes the items in seq, one at a time, and assigns them to the variable 
"item". In English:

    for each item in seq ...

or to put it another way:

    get the first item of seq
    assign it to "item"
    process the block
    get the second item of seq
    assign it to "item"
    process the block
    get the third item of seq
    assign it to "item"
    process the block
    ...

and so on, until seq runs out of items. But if you keep appending items 
to the end, it will never run out.


> Change code to this
> #--------------------------
> Test = [1, 2]
> For i in Test[:] :
>     Test.append(i)
> #--------------------------


Yes, this will work.


-- 
Steve

[toc] | [prev] | [next] | [standalone]

#92587

From	Oscar Benjamin <oscar.j.benjamin@gmail.com>
Date	2015-06-13 13:48 +0100
Message-ID	<mailman.455.1434199754.13271.python-list@python.org>
In reply to	#92583

On 13 June 2015 at 08:17, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Sat, 13 Jun 2015 13:32:59 +0800, jimages wrote:
>
>> I am a newbie. I also have been confused when I read the tutorial. It
>> recommends make a copy before looping. Then I try.
>> #--------------------------
>> Test = [1, 2]
>> For i in Test:
>>     Test.append(i)
>> #--------------------------
>
> You don't make a copy of Test here. You could try this instead:
>
> Test = [1, 2]
> copy_test = Test[:]  # [:] makes a slice copy of the whole list
> for i in copy_test:  # iterate over the copy
>     Test.append(i)  # and append to the original
>
> print(Test)
>
>
> But an easier way is:
>
> Test = [1, 2]
> Test.extend(Test)
> print(Test)

I can't see anything in the docs that specify the behaviour that
occurs here. If I change it to

    Test.extend(iter(Test))

then it borks my system in 1s after consuming 8GB of RAM (I recovered
with killall python in the tty).

According to the docs:
"""
list.extend(L)

Extend the list by appending all the items in the given list;
equivalent to a[len(a):] = L.
"""
https://docs.python.org/2/tutorial/datastructures.html#more-on-lists

The alternate form

    Test[len(Test):] = Test

is equivalent but

    Test[len(Test):] = iter(Test)

is not since it doesn't bork my system.

I looked here:
https://docs.python.org/2/library/stdtypes.html#mutable-sequence-types
but I don't see anything that specifies how self-referential slice
assignment should behave.

I checked under pypy and all behaviour is the same but I'm not sure if
this shouldn't be considered implementation-defined or undefined
behaviour. It's not hard to see how a rearrangement of the list.extend
method would lead to a change of behaviour and I can't see that the
current behaviour is really guaranteed by the language and in fact
it's inconsistent with the docs for list.extend.

As an aside they say that pypy is fast but it took about 10 times
longer than cpython to bork my system. :)

--
Oscar

[toc] | [prev] | [next] | [standalone]

#92592

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-06-13 16:16 +0000
Message-ID	<557c575d$0$11128$c3e8da3@news.astraweb.com>
In reply to	#92587

On Sat, 13 Jun 2015 13:48:45 +0100, Oscar Benjamin wrote:

> On 13 June 2015 at 08:17, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:

>> But an easier way is:
>>
>> Test = [1, 2]
>> Test.extend(Test)
>> print(Test)
> 
> I can't see anything in the docs that specify the behaviour that occurs
> here. 

Neither do I, but there is a test for it:

        a.extend(a)
        self.assertEqual(a, self.type2test([0, 0, 1, 0, 0, 1]))

https://hg.python.org/cpython/file/a985b6455fde/Lib/test/list_tests.py

> If I change it to
> 
>     Test.extend(iter(Test))
> 
> then it borks my system in 1s after consuming 8GB of RAM (I recovered
> with killall python in the tty).

The reason that fails should be obvious: as new items keep getting added 
to Test, the iterator likewise sees more items to iterate over. I don't 
know if this is documented, but you can see what happens here:

py> L = [10, 20]
py> it = iter(L)
py> L.append(next(it)); print L
[10, 20, 10]
py> L.append(next(it)); print L
[10, 20, 10, 20]
py> L.append(next(it)); print L
[10, 20, 10, 20, 10]
py> L.append(next(it)); print L
[10, 20, 10, 20, 10, 20]


So as Test.extend tries to iterate over iter(Test), it just keeps growing 
as more items are added to Test.


-- 
Steven D'Aprano

[toc] | [prev] | [standalone]

csiph-web

zip as iterator and bad/good practices

Contents

#92530 — zip as iterator and bad/good practices

#92531

#92532

#92533

#92535

#92539

#92552

#92566

#92567

#92569

#92572

#92574

#92582

#92583

#92587

#92592