Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <50ea28e7$0$30003$c3e8da3$5496439d@news.astraweb.com>
References: <mailman.179.1357501521.2939.python-list@python.org> <50ea28e7$0$30003$c3e8da3$5496439d@news.astraweb.com>
Date: Mon, 7 Jan 2013 02:29:27 +0000
Subject: Re: Numpy outlier removal
From: Oscar Benjamin <oscar.j.benjamin@gmail.com>
To: "Steven D'Aprano" <steve+comp.lang.python@pearwood.info>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Python List <python-list@python.org>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.205.1357525775.2939.python-list@python.org>
Lines: 25
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:36316

On 7 January 2013 01:46, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Sun, 06 Jan 2013 19:44:08 +0000, Joseph L. Casale wrote:
>
>> I have a dataset that consists of a dict with text descriptions and
>> values that are integers. If required, I collect the values into a list
>> and create a numpy array running it through a simple routine:
>>
>> data[abs(data - mean(data)) < m * std(data)]
>>
>> where m is the number of std deviations to include.
>
> I'm not sure that this approach is statistically robust. No, let me be
> even more assertive: I'm sure that this approach is NOT statistically
> robust, and may be scientifically dubious.

Whether or not this is "statistically robust" requires more
explanation about the OP's intention. Thus far, the OP has not given
any reason/motivation for excluding data or even for having any data
in the first place! It's hard to say whether any technique applied is
really accurate/robust without knowing *anything* about the purpose of
the operation.


Oscar