Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #70639 > unrolled thread
| Started by | Charles Hixson <charleshixsn@earthlink.net> |
|---|---|
| First post | 2014-04-26 12:25 -0700 |
| Last post | 2014-04-26 22:57 -0400 |
| Articles | 5 — 5 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Proper deletion of selected items during map iteration in for loop: Thanks to all Charles Hixson <charleshixsn@earthlink.net> - 2014-04-26 12:25 -0700
Re: Proper deletion of selected items during map iteration in for loop: Thanks to all Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-04-27 02:14 +0000
Re: Proper deletion of selected items during map iteration in for loop: Thanks to all Chris Angelico <rosuav@gmail.com> - 2014-04-27 12:49 +1000
Re: Proper deletion of selected items during map iteration in for loop: Thanks to all Duncan Booth <duncan.booth@invalid.invalid> - 2014-04-28 10:48 +0000
Re: Proper deletion of selected items during map iteration in for loop: Thanks to all Roy Smith <roy@panix.com> - 2014-04-26 22:57 -0400
| From | Charles Hixson <charleshixsn@earthlink.net> |
|---|---|
| Date | 2014-04-26 12:25 -0700 |
| Subject | Re: Proper deletion of selected items during map iteration in for loop: Thanks to all |
| Message-ID | <mailman.9524.1398540343.18130.python-list@python.org> |
On 04/25/2014 10:53 AM, Charles Hixson wrote:
> What is the proper way to delete selected items during iteration of a
> map? What I want to do is:
>
> for (k, v) in m.items():
> if f(k):
> # do some processing of v and save result elsewhere
> del m[k]
>
> But this gives (as should be expected):
> RuntimeError: dictionary changed size during iteration
> In the past I've accumulated the keys to be deleted in a separate
> list, but this time there are likely to be a large number of them, so
> is there some better way?
>
Going over the various responses, it looks like saving the "to be
deleted" keys to a list, and then iterating over that to delete is the
best answer. I expect that I'll be deleting around 1/3 during each
iteration of the process...and then adding new ones back in. There
shouldn't be a really huge number of deletions on any particular pass,
but it will be looped through many times...so if there were any better
way to do this, it would speed things up considerably...but it's not
really surprising that there doesn't appear to be. So now it translates
into (approximately, not yet tested):
toDel = []
for (k, v) in m.items():
if f(k):
# do some processing of v and save result elsewhere
toDel.append(k)
else:
# do something else
for k in toDel:
del m[k]
toDel = None
--
Charles Hixson
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-04-27 02:14 +0000 |
| Message-ID | <535c67e9$0$29965$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #70639 |
On Sat, 26 Apr 2014 12:25:27 -0700, Charles Hixson wrote:
> On 04/25/2014 10:53 AM, Charles Hixson wrote:
>> What is the proper way to delete selected items during iteration of a
>> map? What I want to do is:
>>
>> for (k, v) in m.items():
>> if f(k):
>> # do some processing of v and save result elsewhere
>> del m[k]
>>
>> But this gives (as should be expected):
>> RuntimeError: dictionary changed size during iteration
>> In the past I've accumulated the keys to be deleted in a separate list,
>> but this time there are likely to be a large number of them, so is
>> there some better way?
>>
>>
> Going over the various responses, it looks like saving the "to be
> deleted" keys to a list, and then iterating over that to delete is the
> best answer.
I don't think that there is any one "best" answer that always applies.
"My dict has three items, and I'll be deleting most of them" is likely to
have different performance characteristics from "My dict has three
billion items, and I'll be deleting two [or two billion] of them".
So how much time do you want to spend tuning this for optimum performance
(or memory use, or programmer time)? Or is "good enough" good enough?
I think the two obviously good enough approaches are:
- save a "to be deleted" list, then delete those keys;
- copy the "not to be deleted" items into a new dict
My intuition is that the second will probably be faster, unless your dict
is truly monstrously big. Not millions, but tens or hundreds or thousands
of millions of items, depending on how much memory your computer has.
You've already got a code snippet using the "to be deleted" list, here's
how I would do the other way:
new = {}
for k, v in m.items():
if f(k):
process(v)
else:
new[k] = v
m.clear()
m.update(new)
del new
If you don't care about modifying the existing dict, but can afford to
write in a more functional style, you don't even need to bother doing
that m.clear(), m.update(new). Just return the new dict, stop using the
old one (it will be garbage collected), and use the new one.
Oh, in case you're wondering, this will be more efficient than it may
seem, because the actual data in the dict isn't copied. The only things
being copied are references to the keys and values, not the keys and
values themselves.
--
Steven D'Aprano
http://import-that.dreamwidth.org/
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-04-27 12:49 +1000 |
| Message-ID | <mailman.9527.1398566956.18130.python-list@python.org> |
| In reply to | #70642 |
On Sun, Apr 27, 2014 at 12:14 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> I think the two obviously good enough approaches are:
>
> - save a "to be deleted" list, then delete those keys;
>
> - copy the "not to be deleted" items into a new dict
For a small enough dict that the performance question doesn't matter,
I'd go with the other option: iterate over a snapshot of the keys.
Compare:
# Naive approach:
for k in d:
if f(k): del d[k]
# Snapshot of keys:
for k in list(d):
if f(k): del d[k]
No extra loop at the end, no switching out and in of contents, just
one little change in the loop header. Obviously you don't want to do
this when you're deleting two out of three billion, but for smallish
dicts, that won't make a visible change.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Duncan Booth <duncan.booth@invalid.invalid> |
|---|---|
| Date | 2014-04-28 10:48 +0000 |
| Message-ID | <XnsA31D7830AB42Cduncanbooth@127.0.0.1> |
| In reply to | #70643 |
Chris Angelico <rosuav@gmail.com> wrote: > # Snapshot of keys: > for k in list(d): > if f(k): del d[k] > > No extra loop at the end, no switching out and in of contents, just > one little change in the loop header. Obviously you don't want to do > this when you're deleting two out of three billion, but for smallish > dicts, that won't make a visible change. Even if you have three billion keys, the extra memory needed to create a list that references those keys is going to be a lot less than the memory used by the keys themselves. For example if the keys are 6 character strings then each string needs I think at least 45 bytes (64 bit Python 2.x, up to double that in Python 3.x) but the list only needs one 8 byte pointer per key. I would always choose this simple solution until such time as it is proved to be a problem. -- Duncan Booth
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2014-04-26 22:57 -0400 |
| Message-ID | <roy-29F440.22571426042014@news.panix.com> |
| In reply to | #70642 |
In article <535c67e9$0$29965$c3e8da3$5496439d@news.astraweb.com>, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > I think the two obviously good enough approaches are: > > - save a "to be deleted" list, then delete those keys; > > - copy the "not to be deleted" items into a new dict There is a third possibility: Iterate over the dict, storing keys to be deleted in a list, but break out after the list reaches N items. Delete those keys from the dict. Repeat the whole process until you reach the end of the dictionary before you reach N saved keys. It's going to take multiple (perhaps many) passes over the dict, but it will limit the amount of extra memory used. In the extreme case, if N=1, with k keys in the dict, it will turn a process which is O(k) in time and O(k) in extra memory into one which is O(k^2) in time and O(1) in extra memory. Is that a good tradeoff? Only your hairdresser knows for sure.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web