Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #36279 > unrolled thread
| Started by | "Joseph L. Casale" <jcasale@activenetwerx.com> |
|---|---|
| First post | 2013-01-06 19:44 +0000 |
| Last post | 2013-01-07 02:12 +0000 |
| Articles | 8 on this page of 28 — 11 participants |
Back to article view | Back to comp.lang.python
Numpy outlier removal "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-01-06 19:44 +0000
Re: Numpy outlier removal Hans Mulder <hansmu@xs4all.nl> - 2013-01-06 23:33 +0100
RE: Numpy outlier removal "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-01-06 22:50 +0000
Re: Numpy outlier removal MRAB <python@mrabarnett.plus.com> - 2013-01-06 23:18 +0000
Re: Numpy outlier removal Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-07 01:46 +0000
Re: Numpy outlier removal "Paul Simon" <psimon@sonic.net> - 2013-01-06 18:21 -0800
Re: Numpy outlier removal Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-01-07 02:29 +0000
Re: Numpy outlier removal Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-07 05:11 +0000
Re: Numpy outlier removal Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-01-07 15:20 +0000
[Offtopic] Line fitting [was Re: Numpy outlier removal] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-07 17:58 +0000
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Chris Angelico <rosuav@gmail.com> - 2013-01-08 06:43 +1100
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-08 02:06 +0000
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Chris Angelico <rosuav@gmail.com> - 2013-01-08 17:35 +1100
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Robert Kern <robert.kern@gmail.com> - 2013-01-08 15:55 +0000
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Chris Angelico <rosuav@gmail.com> - 2013-01-09 07:14 +1100
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-09 07:50 +0000
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Robert Kern <robert.kern@gmail.com> - 2013-01-08 22:59 +0000
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-01-07 22:32 +0000
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-08 01:23 +0000
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Terry Reedy <tjreedy@udel.edu> - 2013-01-08 04:07 -0500
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Maarten <maarten.sneep@knmi.nl> - 2013-01-08 08:47 -0800
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Maarten <maarten.sneep@knmi.nl> - 2013-01-08 08:47 -0800
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-01-09 00:02 +0000
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-01-08 13:50 +0000
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Jason Friedman <jason@powerpull.net> - 2013-01-08 19:22 -0700
Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] Jason Friedman <jason@powerpull.net> - 2013-01-08 19:23 -0700
Re: Numpy outlier removal Robert Kern <robert.kern@gmail.com> - 2013-01-07 15:35 +0000
RE: Numpy outlier removal "Joseph L. Casale" <jcasale@activenetwerx.com> - 2013-01-07 02:12 +0000
Page 2 of 2 — ← Prev page 1 [2]
| From | Maarten <maarten.sneep@knmi.nl> |
|---|---|
| Date | 2013-01-08 08:47 -0800 |
| Subject | Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] |
| Message-ID | <7dafc98b-99c9-4727-bdb5-087dc846546c@googlegroups.com> |
| In reply to | #36421 |
On Tuesday, January 8, 2013 10:07:08 AM UTC+1, Terry Reedy wrote: > With the line constrained to go through 0,0, a line eyeballed with a > clear ruler could easily be better than either regression line, as a > human will tend to minimize the deviations *perpendicular to the line*, > which is the proper thing to do (assuming both variables are measured in > the same units). In that case use an appropriate algorithm to perform the fit. ODR comes to mind. http://docs.scipy.org/doc/scipy/reference/odr.html Maarten
[toc] | [prev] | [next] | [standalone]
| From | Maarten <maarten.sneep@knmi.nl> |
|---|---|
| Date | 2013-01-08 08:47 -0800 |
| Subject | Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] |
| Message-ID | <mailman.283.1357663681.2939.python-list@python.org> |
| In reply to | #36421 |
On Tuesday, January 8, 2013 10:07:08 AM UTC+1, Terry Reedy wrote: > With the line constrained to go through 0,0, a line eyeballed with a > clear ruler could easily be better than either regression line, as a > human will tend to minimize the deviations *perpendicular to the line*, > which is the proper thing to do (assuming both variables are measured in > the same units). In that case use an appropriate algorithm to perform the fit. ODR comes to mind. http://docs.scipy.org/doc/scipy/reference/odr.html Maarten
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-01-09 00:02 +0000 |
| Subject | Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] |
| Message-ID | <50ecb382$0$30003$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #36421 |
On Tue, 08 Jan 2013 04:07:08 -0500, Terry Reedy wrote: >> But that is not fitting a line by eye, which is what I am talking >> about. > > With the line constrained to go through 0,0 a line eyeballed with a > clear ruler could easily be better than either regression line, as a > human will tend to minimize the deviations *perpendicular to the line*, > which is the proper thing to do (assuming both variables are measured > in the same units). It is conventional to talk about "residuals" rather than deviations. And it could even more easily be worse than a regression line. And since eyeballing is entirely subjective and impossible to objectively verify, the line that you claim minimizes the residuals might be very different from the line that I claim minimizes the residuals, and no way to decide between the two claims. In any case, there is a technique for working out ordinary least squares (OLS) linear regression using perpendicular offsets rather than vertical offsets: http://mathworld.wolfram.com/LeastSquaresFittingPerpendicularOffsets.html but in general, if you have to care about errors in the dependent variable, you're better off using a more powerful technique than just OLS. The point I keep making, that everybody seems to be ignoring, is that eyeballing a line of best fit is subjective, unreliable and impossible to verify. How could I check that the line you say is the "best fit" actually *is* the *best fit* for the given data, given that you picked that line by eye? Chances are good that if you came back to the data a month later, you'd pick a different line! As I have said, eyeballing a line is fine for rough back of the envelope type calculations, where you only care that you have a line pointing more or less in the right direction. But for anything where accuracy is required, line fitting by eye is down in the pits of things not to do, right next to "making up the answers you prefer". -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| Date | 2013-01-08 13:50 +0000 |
| Subject | Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] |
| Message-ID | <mailman.275.1357653482.2939.python-list@python.org> |
| In reply to | #36397 |
On 8 January 2013 01:23, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > On Mon, 07 Jan 2013 22:32:54 +0000, Oscar Benjamin wrote: > > [...] >> I also think it would >> be highly foolish to go so far with refusing to eyeball data that you >> would accept the output of some regression algorithm even when it >> clearly looks wrong. > > I never said anything of the sort. > > I said, don't fit lines to data by eye. I didn't say not to sanity check > your straight line fit is reasonable by eyeballing it. I should have been a little clearer. That was the situation when I decided to just use a (digital) ruler - although really it was more of a visual bisection (1, 2, 1.5, 1.25...). The regression result was clearly wrong (and also invalid for the reasons Terry has described). Some of the problems were easily fixable and others were not. I could have spent an hour getting the code to make the line go where I wanted it to, or I could just fit the line visually in about 2 minutes. Oscar
[toc] | [prev] | [next] | [standalone]
| From | Jason Friedman <jason@powerpull.net> |
|---|---|
| Date | 2013-01-08 19:22 -0700 |
| Subject | Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] |
| Message-ID | <mailman.301.1357698180.2939.python-list@python.org> |
| In reply to | #36368 |
> Statistical analysis is a huge science. So is lying. And I'm not sure > most people can pick one from the other. Chris, your sentence causes me to think of Mr. Twain's sentence, or at least the one he popularized: http://www.twainquotes.com/Statistics.html.
[toc] | [prev] | [next] | [standalone]
| From | Jason Friedman <jason@powerpull.net> |
|---|---|
| Date | 2013-01-08 19:23 -0700 |
| Subject | Re: [Offtopic] Line fitting [was Re: Numpy outlier removal] |
| Message-ID | <mailman.302.1357698206.2939.python-list@python.org> |
| In reply to | #36368 |
> Statistical analysis is a huge science. So is lying. And I'm not sure > most people can pick one from the other. Chris, your sentence causes me to think of Mr. Twain's sentence, or at least the one he popularized: http://www.twainquotes.com/Statistics.html.
[toc] | [prev] | [next] | [standalone]
| From | Robert Kern <robert.kern@gmail.com> |
|---|---|
| Date | 2013-01-07 15:35 +0000 |
| Message-ID | <mailman.224.1357572924.2939.python-list@python.org> |
| In reply to | #36321 |
On 07/01/2013 15:20, Oscar Benjamin wrote: > On 7 January 2013 05:11, Steven D'Aprano > <steve+comp.lang.python@pearwood.info> wrote: >> On Mon, 07 Jan 2013 02:29:27 +0000, Oscar Benjamin wrote: >> >>> On 7 January 2013 01:46, Steven D'Aprano >>> <steve+comp.lang.python@pearwood.info> wrote: >>>> On Sun, 06 Jan 2013 19:44:08 +0000, Joseph L. Casale wrote: >>>> >>>> I'm not sure that this approach is statistically robust. No, let me be >>>> even more assertive: I'm sure that this approach is NOT statistically >>>> robust, and may be scientifically dubious. >>> >>> Whether or not this is "statistically robust" requires more explanation >>> about the OP's intention. >> >> Not really. Statistics robustness is objectively defined, and the user's >> intention doesn't come into it. The mean is not a robust measure of >> central tendency, the median is, regardless of why you pick one or the >> other. > > Okay, I see what you mean. I wasn't thinking of robustness as a > technical term but now I see that you are correct. > > Perhaps what I should have said is that whether or not this matters > depends on the problem at hand (hopefully this isn't an important > medical trial) and the particular type of data that you have; assuming > normality is fine in many cases even if the data is not "really" > normal. "Having outliers" literally means that assuming normality is not fine. If assuming normality were fine, then you wouldn't need to remove outliers. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
[toc] | [prev] | [next] | [standalone]
| From | "Joseph L. Casale" <jcasale@activenetwerx.com> |
|---|---|
| Date | 2013-01-07 02:12 +0000 |
| Message-ID | <mailman.212.1357552594.2939.python-list@python.org> |
| In reply to | #36314 |
> In other words: this approach for detecting outliers is nothing more than > a very rough, and very bad, heuristic, and should be avoided. Heh, very true but the results will only be used for conversational purposes. I am making an assumption that the data is normally distributed and I do expect valid results to all be very nearly the same. > You can read up more about outlier detection, and the difficulties > thereof, here: I much appreciate the links and the thought in the post. I'll admit I didn't realize outlier detection was as involved. Again, thanks! jlc
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.lang.python
csiph-web