Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #70639 > unrolled thread

Re: Proper deletion of selected items during map iteration in for loop: Thanks to all

Started byCharles Hixson <charleshixsn@earthlink.net>
First post2014-04-26 12:25 -0700
Last post2014-04-26 22:57 -0400
Articles 5 — 5 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Proper deletion of selected items during map iteration in for loop: Thanks to all Charles Hixson <charleshixsn@earthlink.net> - 2014-04-26 12:25 -0700
    Re: Proper deletion of selected items during map iteration in for loop: Thanks to all Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-04-27 02:14 +0000
      Re: Proper deletion of selected items during map iteration in for loop: Thanks to all Chris Angelico <rosuav@gmail.com> - 2014-04-27 12:49 +1000
        Re: Proper deletion of selected items during map iteration in for loop: Thanks to all Duncan Booth <duncan.booth@invalid.invalid> - 2014-04-28 10:48 +0000
      Re: Proper deletion of selected items during map iteration in for loop: Thanks to all Roy Smith <roy@panix.com> - 2014-04-26 22:57 -0400

#70639 — Re: Proper deletion of selected items during map iteration in for loop: Thanks to all

FromCharles Hixson <charleshixsn@earthlink.net>
Date2014-04-26 12:25 -0700
SubjectRe: Proper deletion of selected items during map iteration in for loop: Thanks to all
Message-ID<mailman.9524.1398540343.18130.python-list@python.org>
On 04/25/2014 10:53 AM, Charles Hixson wrote:
> What is the proper way to delete selected items during iteration of a 
> map?  What I want to do is:
>
> for (k, v) in m.items():
>    if f(k):
>       #  do some processing of v and save result elsewhere
>       del m[k]
>
> But this gives (as should be expected):
>         RuntimeError: dictionary changed size during iteration
> In the past I've accumulated the keys to be deleted in a separate 
> list, but this time there are likely to be a large number of them, so 
> is there some better way?
>

Going over the various responses, it looks like saving the "to be 
deleted" keys to a list, and then iterating over that to delete is the 
best answer.  I expect that I'll be deleting around 1/3 during each 
iteration of the process...and then adding new ones back in. There 
shouldn't be a really huge number of deletions on any particular pass, 
but it will be looped through many times...so if there were any better 
way to do this, it would speed things up considerably...but it's not 
really surprising that there doesn't appear to be.  So now it translates 
into (approximately, not yet tested):

toDel = []
for (k, v) in m.items():
    if f(k):
       #  do some processing of v and save result elsewhere
       toDel.append(k)
    else:
       # do something else
for k in toDel:
    del m[k]
toDel = None


-- 
Charles Hixson

[toc] | [next] | [standalone]


#70642

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-04-27 02:14 +0000
Message-ID<535c67e9$0$29965$c3e8da3$5496439d@news.astraweb.com>
In reply to#70639
On Sat, 26 Apr 2014 12:25:27 -0700, Charles Hixson wrote:

> On 04/25/2014 10:53 AM, Charles Hixson wrote:
>> What is the proper way to delete selected items during iteration of a
>> map?  What I want to do is:
>>
>> for (k, v) in m.items():
>>    if f(k):
>>       #  do some processing of v and save result elsewhere 
>>       del m[k]
>>
>> But this gives (as should be expected):
>>         RuntimeError: dictionary changed size during iteration
>> In the past I've accumulated the keys to be deleted in a separate list,
>> but this time there are likely to be a large number of them, so is
>> there some better way?
>>
>>
> Going over the various responses, it looks like saving the "to be
> deleted" keys to a list, and then iterating over that to delete is the
> best answer.  


I don't think that there is any one "best" answer that always applies. 
"My dict has three items, and I'll be deleting most of them" is likely to 
have different performance characteristics from "My dict has three 
billion items, and I'll be deleting two [or two billion] of them".

So how much time do you want to spend tuning this for optimum performance 
(or memory use, or programmer time)? Or is "good enough" good enough?

I think the two obviously good enough approaches are:

- save a "to be deleted" list, then delete those keys;

- copy the "not to be deleted" items into a new dict

My intuition is that the second will probably be faster, unless your dict 
is truly monstrously big. Not millions, but tens or hundreds or thousands 
of millions of items, depending on how much memory your computer has.

You've already got a code snippet using the "to be deleted" list, here's 
how I would do the other way:

new = {}
for k, v in m.items():
    if f(k):
        process(v)
    else:
        new[k] = v
m.clear()
m.update(new)
del new


If you don't care about modifying the existing dict, but can afford to 
write in a more functional style, you don't even need to bother doing 
that m.clear(), m.update(new). Just return the new dict, stop using the 
old one (it will be garbage collected), and use the new one.

Oh, in case you're wondering, this will be more efficient than it may 
seem, because the actual data in the dict isn't copied. The only things 
being copied are references to the keys and values, not the keys and 
values themselves.



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#70643

FromChris Angelico <rosuav@gmail.com>
Date2014-04-27 12:49 +1000
Message-ID<mailman.9527.1398566956.18130.python-list@python.org>
In reply to#70642
On Sun, Apr 27, 2014 at 12:14 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> I think the two obviously good enough approaches are:
>
> - save a "to be deleted" list, then delete those keys;
>
> - copy the "not to be deleted" items into a new dict

For a small enough dict that the performance question doesn't matter,
I'd go with the other option: iterate over a snapshot of the keys.
Compare:

# Naive approach:
for k in d:
    if f(k): del d[k]

# Snapshot of keys:
for k in list(d):
    if f(k): del d[k]

No extra loop at the end, no switching out and in of contents, just
one little change in the loop header. Obviously you don't want to do
this when you're deleting two out of three billion, but for smallish
dicts, that won't make a visible change.

ChrisA

[toc] | [prev] | [next] | [standalone]


#70674

FromDuncan Booth <duncan.booth@invalid.invalid>
Date2014-04-28 10:48 +0000
Message-ID<XnsA31D7830AB42Cduncanbooth@127.0.0.1>
In reply to#70643
Chris Angelico <rosuav@gmail.com> wrote:

> # Snapshot of keys:
> for k in list(d):
>     if f(k): del d[k]
> 
> No extra loop at the end, no switching out and in of contents, just
> one little change in the loop header. Obviously you don't want to do
> this when you're deleting two out of three billion, but for smallish
> dicts, that won't make a visible change.

Even if you have three billion keys, the extra memory needed to create a 
list that references those keys is going to be a lot less than the memory 
used by the keys themselves. For example if the keys are 6 character 
strings then each string needs I think at least 45 bytes (64 bit Python 
2.x, up to double that in Python 3.x) but the list only needs one 8 byte 
pointer per key.

I would always choose this simple solution until such time as it is proved 
to be a problem.

-- 
Duncan Booth

[toc] | [prev] | [next] | [standalone]


#70645

FromRoy Smith <roy@panix.com>
Date2014-04-26 22:57 -0400
Message-ID<roy-29F440.22571426042014@news.panix.com>
In reply to#70642
In article <535c67e9$0$29965$c3e8da3$5496439d@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> I think the two obviously good enough approaches are:
> 
> - save a "to be deleted" list, then delete those keys;
> 
> - copy the "not to be deleted" items into a new dict

There is a third possibility:

Iterate over the dict, storing keys to be deleted in a list, but break 
out after the list reaches N items.  Delete those keys from the dict.  
Repeat the whole process until you reach the end of the dictionary 
before you reach N saved keys.

It's going to take multiple (perhaps many) passes over the dict, but it 
will limit the amount of extra memory used.  In the extreme case, if 
N=1, with k keys in the dict, it will turn a process which is O(k) in 
time and O(k) in extra memory into one which is O(k^2) in time and O(1) 
in extra memory.  Is that a good tradeoff?  Only your hairdresser knows 
for sure.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web