Groups > comp.os.linux.advocacy > #116496

A lesson on outliers for cc

From	Snit <usenet@gallopinginsanity.com>
Newsgroups	comp.os.linux.advocacy
Subject	A lesson on outliers for cc
Date	2012-06-27 22:45 -0700
Message-ID	<CC113D82.41B3%usenet@gallopinginsanity.com> (permalink)

Show all headers | View raw

<http://wp.me/ps1hD-bb>
    -----
    Diagnosing Outliers

    There are a couple of ways to check data for outliers. These
    include:

    Visually Inspect Data

    Plot your data on a chart or graph. Do some points stand out
    from the ³crowd?² If so, what is the record? Can you verify
    that it was entered correctly?
    -----
    
As you have been shown, the data set is rather "smooth" - not much stands
out: <http://tmp.gallopinginsanity.com/outliers.pdf>

In the first images, there is one point that clearly stands out... it is an
outlier.  In the second two images there are no such points - no obvious
outliers.  With the second image there is a pretty clear curve - with the
third image (the one that charts the full set of data we are discussing),
the high point is actually smoother and has fewer points that stand out from
the data which you are *not* claiming are "outliers".  When you claim the
high points from end of 2011 are "outliers" you are making a very, very poor
judgment call based on this method.

Based on this method cc screwed up - but he will never admit to it.
  
    -----    
    Automatically Minimize Exposure to Outliers

    One way to check for outliers is to assume you¹ll have some
    and adjust your data accordingly. You may say that a set
    percentage (say 1% to 5%) of your data on both ends is an
    outlier and then either remove those observations, or set a
    floor or ceiling based on the remaining data. For example, if
    you have 1,000 records in your data set and you assume that
    1% on both ends is an outlier, you can either remove the
    bottom and top 10 observations from your analysis, or you can
    change the values of the bottom 10 to the value of the 11th
    lowest and those of the top 10 to that of the 11th highest
    value.

    The problem here is that this approach is arbitrary and
    doesn¹t take into account the uniqueness of each individual
    data set. ... I don¹t particularly recommend this approach,
    but in the interest of expediency it may be helpful.
    -----
    
Right: with a large data set of 1000 data points one might assume 2-10% are
outliers... but this is not recommended.  But you did much worse and assumed
17% of a data set of only 24 data points were outliers.  This is completely
idiotic, esp. considering the visual inspection completely goes against this
idea.

Based on this method cc screwed up - but he will never admit to it.
    
    ------
    Parametric Extremity

    In parametric extremity, we use the data set¹s parameters to
    determine how a particular value diverges from the center of
    the data set¹s distribution. The obvious center of the
    distribution is the mean; the measure of divergence is the
    standard deviation. When data is normally distributed,
    virtually all observations are located within three standard
    deviations from the mean (in each direction). Hence, we may
    set a rule that an outlier is any value that is at least +/-
    3 standard deviations from the mean.

    This approach also has some drawbacks. The mean and standard
    deviation are computed from all values, including outliers.
    Hence, outliers tend to pull the mean towards them and
    inflate the standard deviation. As a result, they tend to
    bias the criteria used for judging whether a value is an
    outlier. Indeed, outliers introduce bias towards including
    extreme values.
    -----
    
This sounds closer to what you did... but when you use 17% of the data as
your "outliers" and *all* of those points are on an extreme in the same
direction (high values), you are completely screwing up by basing your
decision on the mean and standard deviation.  Yet this is what you keep
claiming you did!  

Based on this method cc screwed up - but he will never admit to it.

    -----    
    Non-Parametric Extremity

    Another approach to measuring divergence is through
    non-parametric methods. Essentially, the concept is the same,
    and the mean is still the center; however the divergence is
    measured by the inter-quartile range (IQR). Essentially, you
    order your data set and then break it into four equal parts.
    The lowest 25% is your first quartile; the next 25% is your
    second quartile (whose upper bound is the median); and so on.
    Essentially, anything higher than the top of the third
    quartile or lower than the bottom of the second quartile is
    reviewed for outliers.

    If done haphazardly, non-parametric extremity will give you
    the same problem as establishing a set percentage on both
    ends as outliers. To avoid this drawback, again inspect the
    points that fall outside the second and third quartiles.
    Those closest to the outer bounds of the IQR can remain in
    your data set; those far away should be measured for
    accuracy, and if accurate can be adjusted or removed.
    -----
    
You did no such measuring.  And, again, you have the problem that you want
to see just the high range as the outliers.  Completely screwed up
methodology on your part.

Based on this method cc screwed up - but he will never admit to it.

    -----
    These are just a few of the ways you can identify outliers in
    your data set. Frequently, classifying a value as an outlier
    is a judgment call...
    -----
    
Right: it is a "judgment call" - a subjective decision.  You claimed your
17% of the data being outliers was a "fact" - implying it was an *objective*
claim.  It was not.  Your claim that it was a "fact" shows how you are
completely ignorant of the topic.

You blew it again, cc... you proved you have no clue what you are talking
about.

And here is the funny part, at least for me: even though this has been
repeatedly proved to you it is 100% predictable you will deny this and run
from the fact you completely screwed up in your claims about outliers.

Based on *any* method of determining outliers, cc screwed up - but he will
never admit to it.  And I just laugh at him.

-- 
Summary of cc's statistical BS: <http://tinyurl.com/7rwazxw>
Details on cc's "outliers" BS: <http://tinyurl.com/84r3ypq>
Details on cc's sigma and R^2 BS: <http://tinyurl.com/7vambev>
cc repeatedly pretends to be knowledgeable on topics he is clueless about.

Back to comp.os.linux.advocacy | Previous | Next — Next in thread | Find similar

Thread

A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-27 22:45 -0700
  Re: A lesson on outliers for cc Onion Knight <onionknightgot@gmail.com> - 2012-06-27 23:10 -0700
    Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-27 23:45 -0700
    Au revoir (killfile extension) Chris Ahlstrom <ahlstromc@xzoozy.com> - 2012-06-28 06:16 -0400
      Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-28 08:37 -0400
        Re: Au revoir (killfile extension) Peter Köhlmann <peter-koehlmann@t-online.de> - 2012-06-28 14:47 +0200
          Re: Au revoir (killfile extension) Peter Köhlmann <peter-koehlmann@t-online.de> - 2012-06-28 14:54 +0200
            Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-28 08:58 -0400
            Re: Au revoir (killfile extension) Snit <usenet@gallopinginsanity.com> - 2012-06-28 09:09 -0700
          Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-28 08:58 -0400
          Re: Au revoir (killfile extension) chrisv <chrisv@nospam.invalid> - 2012-06-28 08:04 -0500
            Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-28 09:07 -0400
              Re: Au revoir (killfile extension) DFS <nospam@dfs.com> - 2012-06-28 09:34 -0400
            Re: Au revoir (killfile extension) Snit <usenet@gallopinginsanity.com> - 2012-06-28 09:11 -0700
          Re: Au revoir (killfile extension) Chris Ahlstrom <ahlstromc@xzoozy.com> - 2012-06-28 09:54 -0400
            Re: Au revoir (killfile extension) cc <scatnubbs@hotmail.com> - 2012-06-28 07:24 -0700
              Re: Au revoir (killfile extension) Snit <usenet@gallopinginsanity.com> - 2012-06-28 09:02 -0700
            Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-28 19:40 -0400
      Re: Au revoir (killfile extension) Marti Van Lin <ml2mst@dontevenbother.invalid> - 2012-06-29 09:29 +0200
        Re: Au revoir (killfile extension) William Poaster <wp@induh-vidual.net> - 2012-06-29 10:37 +0100
          Re: Au revoir (killfile extension) Hadron<hadronquark@gmail.com> - 2012-06-29 11:42 +0200
            Re: Au revoir (killfile extension) Peter Köhlmann <peter-koehlmann@t-online.de> - 2012-06-29 11:47 +0200
              Re: Au revoir (killfile extension) Chris Ahlstrom <ahlstromc@xzoozy.com> - 2012-06-29 06:13 -0400
                Re: Au revoir (killfile extension) William Poaster <wp@induh-vidual.net> - 2012-06-29 11:23 +0100
                Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-29 10:28 -0400
            Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-29 10:27 -0400
              Re: Au revoir (killfile extension) GreyCloud <mist@cumulus.com> - 2012-06-29 13:10 -0600
          Re: Au revoir (killfile extension) DFS <nospam@dfs.com> - 2012-06-29 08:31 -0400
        Re: Au revoir (killfile extension) DFS <nospam@dfs.com> - 2012-06-29 08:29 -0400
        Re: Au revoir (killfile extension) Foster <frankfoster50@yahoo.com> - 2012-06-29 10:26 -0400
    Re: A lesson on outliers for cc Peter Köhlmann <peter-koehlmann@t-online.de> - 2012-06-29 11:49 +0200
  Re: A lesson on outliers for cc Onion Knight <onionknightgot@gmail.com> - 2012-06-27 23:27 -0700
  Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 05:51 -0700
    Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 09:38 -0700
      Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 10:17 -0700
        Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 11:17 -0700
          Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 11:32 -0700
            Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 13:50 -0700
              Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 14:01 -0700
                Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 14:30 -0700
                Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 14:34 -0700
                Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 14:48 -0700
                Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 14:52 -0700
                Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 15:17 -0700
                Re: Grubbs_Nov2007-May2012 Snit <usenet@gallopinginsanity.com> - 2012-06-28 15:59 -0700
                Re: Grubbs_Nov2007-May2012 Onion Knight <onionknightgot@hotmail.com> - 2012-06-29 04:36 -0700
                Re: Grubbs_Nov2007-May2012 Snit <usenet@gallopinginsanity.com> - 2012-06-29 09:19 -0700
                Re: Grubbs_Nov2007-May2012 Onion Knight <onionknightgot@gmail.com> - 2012-06-29 22:15 -0700
                Re: Grubbs_Nov2007-May2012 Snit <usenet@gallopinginsanity.com> - 2012-06-30 11:05 -0700
                Re: Median_and_Array_Nov2007-May2012 Snit <usenet@gallopinginsanity.com> - 2012-06-28 20:56 -0700
              Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 14:14 -0700
                Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 14:36 -0700
                Re: A lesson on outliers for cc cc <scatnubbs@hotmail.com> - 2012-06-28 14:42 -0700
                Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 14:51 -0700
                Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 20:52 -0700
                Re: A lesson on outliers for cc Onion Knight <onionknightgot@hotmail.com> - 2012-06-29 04:33 -0700
                Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-29 09:03 -0700
    Re: A lesson on outliers for cc Snit <usenet@gallopinginsanity.com> - 2012-06-28 09:45 -0700

csiph-web