Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.os.linux.advocacy > #116496
| From | Snit <usenet@gallopinginsanity.com> |
|---|---|
| Newsgroups | comp.os.linux.advocacy |
| Subject | A lesson on outliers for cc |
| Date | 2012-06-27 22:45 -0700 |
| Message-ID | <CC113D82.41B3%usenet@gallopinginsanity.com> (permalink) |
<http://wp.me/ps1hD-bb>
-----
Diagnosing Outliers
There are a couple of ways to check data for outliers. These
include:
Visually Inspect Data
Plot your data on a chart or graph. Do some points stand out
from the ³crowd?² If so, what is the record? Can you verify
that it was entered correctly?
-----
As you have been shown, the data set is rather "smooth" - not much stands
out: <http://tmp.gallopinginsanity.com/outliers.pdf>
In the first images, there is one point that clearly stands out... it is an
outlier. In the second two images there are no such points - no obvious
outliers. With the second image there is a pretty clear curve - with the
third image (the one that charts the full set of data we are discussing),
the high point is actually smoother and has fewer points that stand out from
the data which you are *not* claiming are "outliers". When you claim the
high points from end of 2011 are "outliers" you are making a very, very poor
judgment call based on this method.
Based on this method cc screwed up - but he will never admit to it.
-----
Automatically Minimize Exposure to Outliers
One way to check for outliers is to assume you¹ll have some
and adjust your data accordingly. You may say that a set
percentage (say 1% to 5%) of your data on both ends is an
outlier and then either remove those observations, or set a
floor or ceiling based on the remaining data. For example, if
you have 1,000 records in your data set and you assume that
1% on both ends is an outlier, you can either remove the
bottom and top 10 observations from your analysis, or you can
change the values of the bottom 10 to the value of the 11th
lowest and those of the top 10 to that of the 11th highest
value.
The problem here is that this approach is arbitrary and
doesn¹t take into account the uniqueness of each individual
data set. ... I don¹t particularly recommend this approach,
but in the interest of expediency it may be helpful.
-----
Right: with a large data set of 1000 data points one might assume 2-10% are
outliers... but this is not recommended. But you did much worse and assumed
17% of a data set of only 24 data points were outliers. This is completely
idiotic, esp. considering the visual inspection completely goes against this
idea.
Based on this method cc screwed up - but he will never admit to it.
------
Parametric Extremity
In parametric extremity, we use the data set¹s parameters to
determine how a particular value diverges from the center of
the data set¹s distribution. The obvious center of the
distribution is the mean; the measure of divergence is the
standard deviation. When data is normally distributed,
virtually all observations are located within three standard
deviations from the mean (in each direction). Hence, we may
set a rule that an outlier is any value that is at least +/-
3 standard deviations from the mean.
This approach also has some drawbacks. The mean and standard
deviation are computed from all values, including outliers.
Hence, outliers tend to pull the mean towards them and
inflate the standard deviation. As a result, they tend to
bias the criteria used for judging whether a value is an
outlier. Indeed, outliers introduce bias towards including
extreme values.
-----
This sounds closer to what you did... but when you use 17% of the data as
your "outliers" and *all* of those points are on an extreme in the same
direction (high values), you are completely screwing up by basing your
decision on the mean and standard deviation. Yet this is what you keep
claiming you did!
Based on this method cc screwed up - but he will never admit to it.
-----
Non-Parametric Extremity
Another approach to measuring divergence is through
non-parametric methods. Essentially, the concept is the same,
and the mean is still the center; however the divergence is
measured by the inter-quartile range (IQR). Essentially, you
order your data set and then break it into four equal parts.
The lowest 25% is your first quartile; the next 25% is your
second quartile (whose upper bound is the median); and so on.
Essentially, anything higher than the top of the third
quartile or lower than the bottom of the second quartile is
reviewed for outliers.
If done haphazardly, non-parametric extremity will give you
the same problem as establishing a set percentage on both
ends as outliers. To avoid this drawback, again inspect the
points that fall outside the second and third quartiles.
Those closest to the outer bounds of the IQR can remain in
your data set; those far away should be measured for
accuracy, and if accurate can be adjusted or removed.
-----
You did no such measuring. And, again, you have the problem that you want
to see just the high range as the outliers. Completely screwed up
methodology on your part.
Based on this method cc screwed up - but he will never admit to it.
-----
These are just a few of the ways you can identify outliers in
your data set. Frequently, classifying a value as an outlier
is a judgment call...
-----
Right: it is a "judgment call" - a subjective decision. You claimed your
17% of the data being outliers was a "fact" - implying it was an *objective*
claim. It was not. Your claim that it was a "fact" shows how you are
completely ignorant of the topic.
You blew it again, cc... you proved you have no clue what you are talking
about.
And here is the funny part, at least for me: even though this has been
repeatedly proved to you it is 100% predictable you will deny this and run
from the fact you completely screwed up in your claims about outliers.
Based on *any* method of determining outliers, cc screwed up - but he will
never admit to it. And I just laugh at him.
--
Summary of cc's statistical BS: <http://tinyurl.com/7rwazxw>
Details on cc's "outliers" BS: <http://tinyurl.com/84r3ypq>
Details on cc's sigma and R^2 BS: <http://tinyurl.com/7vambev>
cc repeatedly pretends to be knowledgeable on topics he is clueless about.
Back to comp.os.linux.advocacy | Previous | Next — Next in thread | Find similar
A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-27 22:45 -0700
Re: A lesson on outliers for cc Onion Knight <onionknightgot@gmail.com> - 2012-06-27 23:10 -0700
Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-27 23:45 -0700
Au revoir (killfile extension) Chris Ahlstrom <ahlstromc@xzoozy.com> - 2012-06-28 06:16 -0400
Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-28 08:37 -0400
Re: Au revoir (killfile extension) Peter Köhlmann <peter-koehlmann@t-online.de> - 2012-06-28 14:47 +0200
Re: Au revoir (killfile extension) Peter Köhlmann <peter-koehlmann@t-online.de> - 2012-06-28 14:54 +0200
Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-28 08:58 -0400
Re: Au revoir (killfile extension) Snit <usenet@gallopinginsanity.com> - 2012-06-28 09:09 -0700
Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-28 08:58 -0400
Re: Au revoir (killfile extension) chrisv <chrisv@nospam.invalid> - 2012-06-28 08:04 -0500
Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-28 09:07 -0400
Re: Au revoir (killfile extension) DFS <nospam@dfs.com> - 2012-06-28 09:34 -0400
Re: Au revoir (killfile extension) Snit <usenet@gallopinginsanity.com> - 2012-06-28 09:11 -0700
Re: Au revoir (killfile extension) Chris Ahlstrom <ahlstromc@xzoozy.com> - 2012-06-28 09:54 -0400
Re: Au revoir (killfile extension) cc <scatnubbs@hotmail.com> - 2012-06-28 07:24 -0700
Re: Au revoir (killfile extension) Snit <usenet@gallopinginsanity.com> - 2012-06-28 09:02 -0700
Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-28 19:40 -0400
Re: Au revoir (killfile extension) Marti Van Lin <ml2mst@dontevenbother.invalid> - 2012-06-29 09:29 +0200
Re: Au revoir (killfile extension) William Poaster <wp@induh-vidual.net> - 2012-06-29 10:37 +0100
Re: Au revoir (killfile extension) Hadron<hadronquark@gmail.com> - 2012-06-29 11:42 +0200
Re: Au revoir (killfile extension) Peter Köhlmann <peter-koehlmann@t-online.de> - 2012-06-29 11:47 +0200
Re: Au revoir (killfile extension) Chris Ahlstrom <ahlstromc@xzoozy.com> - 2012-06-29 06:13 -0400
Re: Au revoir (killfile extension) William Poaster <wp@induh-vidual.net> - 2012-06-29 11:23 +0100
Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-29 10:28 -0400
Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-29 10:27 -0400
Re: Au revoir (killfile extension) GreyCloud <mist@cumulus.com> - 2012-06-29 13:10 -0600
Re: Au revoir (killfile extension) DFS <nospam@dfs.com> - 2012-06-29 08:31 -0400
Re: Au revoir (killfile extension) DFS <nospam@dfs.com> - 2012-06-29 08:29 -0400
Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-29 10:26 -0400
Re: A lesson on outliers for cc Peter Köhlmann <peter-koehlmann@t-online.de> - 2012-06-29 11:49 +0200
Re: A lesson on outliers for cc Onion Knight <onionknightgot@gmail.com> - 2012-06-27 23:27 -0700
Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 05:51 -0700
Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 09:38 -0700
Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 10:17 -0700
Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 11:17 -0700
Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 11:32 -0700
Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 13:50 -0700
Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 14:01 -0700
Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 14:30 -0700
Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 14:34 -0700
Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 14:48 -0700
Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 14:52 -0700
Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 15:17 -0700
Re: Grubbs_Nov2007-May2012 Snit <usenet@gallopinginsanity.com> - 2012-06-28 15:59 -0700
Re: Grubbs_Nov2007-May2012 Onion Knight <onionknightgot@hotmail.com> - 2012-06-29 04:36 -0700
Re: Grubbs_Nov2007-May2012 Snit <usenet@gallopinginsanity.com> - 2012-06-29 09:19 -0700
Re: Grubbs_Nov2007-May2012 Onion Knight <onionknightgot@gmail.com> - 2012-06-29 22:15 -0700
Re: Grubbs_Nov2007-May2012 Snit <usenet@gallopinginsanity.com> - 2012-06-30 11:05 -0700
Re: Median_and_Array_Nov2007-May2012 Snit <usenet@gallopinginsanity.com> - 2012-06-28 20:56 -0700
Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 14:14 -0700
Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 14:36 -0700
Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 14:42 -0700
Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 14:51 -0700
Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 20:52 -0700
Re: A lesson on outliers for cc Onion Knight <onionknightgot@hotmail.com> - 2012-06-29 04:33 -0700
Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-29 09:03 -0700
Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 09:45 -0700
csiph-web