Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!nntp-feed.chiark.greenend.org.uk!ewrotcd!news.nosignal.org!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Sun, 06 Jan 2013 23:18:43 +0000
From: MRAB <python@mrabarnett.plus.com>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Thunderbird/17.0
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: Numpy outlier removal
References: <mailman.179.1357501521.2939.python-list@python.org> <50e9fbd5$0$6848$e4fe514c@news2.news.xs4all.nl>
In-Reply-To: <50e9fbd5$0$6848$e4fe514c@news2.news.xs4all.nl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: list
Reply-To: python-list@python.org
Newsgroups: comp.lang.python
Message-ID: <mailman.196.1357514503.2939.python-list@python.org>
Lines: 36
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:36303

On 2013-01-06 22:33, Hans Mulder wrote:
> On 6/01/13 20:44:08, Joseph L. Casale wrote:
>> I have a dataset that consists of a dict with text descriptions and values that are integers. If
>> required, I collect the values into a list and create a numpy array running it through a simple
>> routine: data[abs(data - mean(data)) < m * std(data)] where m is the number of std deviations
>> to include.
>>
>>
>> The problem is I loos track of which were removed so the original display of the dataset is
>> misleading when the processed average is returned as it includes the removed key/values.
>>
>>
>> Ayone know how I can maintain the relationship and when I exclude a value, remove it from
>> the dict?
>
> Assuming your data and the dictionary are keyed by a common set of keys:
>
> for key in descriptions:
>      if abs(data[key] - mean(data)) >= m * std(data):
>          del data[key]
>          del descriptions[key]
>
It's generally a bad idea to modify a collection over which you're
iterating. It's better to, say, make a list of what you're going to
delete and then iterate over that list to make the deletions:

deletions = []

for key in in descriptions:
     if abs(data[key] - mean(data)) >= m * std(data):
         deletions.append(key)

for key in deletions:
     del data[key]
     del descriptions[key]