Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #57072

Re: Python package statistics

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.005
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'root': 0.05; 'say,': 0.05; 'subject:Python': 0.06; '*not*': 0.07; 'pypi': 0.07; '161': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'jan': 0.12; 'latter,': 0.16; 'pypi.': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reedy': 0.16; 'skewed': 0.16; 'statistics,': 0.16; 'wrote:': 0.18; 'header:User-Agent:1': 0.23; 'people,': 0.24; 'header:X-Complaints-To:1': 0.27; 'header:In- Reply-To:1': 0.27; 'am,': 0.29; 'especially': 0.30; "skip:' 10": 0.31; 'values.': 0.31; 'summary': 0.32; 'url:python': 0.33; 'raw': 0.33; 'data,': 0.36; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'either': 0.39; 'received:org': 0.40; 'received:173': 0.61; 'back': 0.62; 'such': 0.63; 'total': 0.65; 'taking': 0.65; 'published': 0.71; 'square': 0.74; '4.2': 0.84; 'received:fios.verizon.net': 0.84; 'average': 0.93; 'confidence': 0.95; 'obtained': 0.96
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Terry Reedy <tjreedy@udel.edu>
Subject Re: Python package statistics
Date Fri, 18 Oct 2013 14:54:11 -0400
References <CAFEUn8YPfRsqKSfB5SCUo5=PDfMLJ3iMQ2Uq=S2o=XRWJVQNXg@mail.gmail.com>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding quoted-printable
X-Gmane-NNTP-Posting-Host pool-173-59-117-133.phlapa.fios.verizon.net
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.0.1
In-Reply-To <CAFEUn8YPfRsqKSfB5SCUo5=PDfMLJ3iMQ2Uq=S2o=XRWJVQNXg@mail.gmail.com>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1240.1382122470.18130.python-list@python.org> (permalink)
Lines 32
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1382122470 news.xs4all.nl 15911 [2001:888:2000:d::a6]:34862
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:57072

Show key headers only | View raw


On 10/18/2013 8:41 AM, Yaşar Arabacı wrote:
> Hi people,
>
> I collected some data on PyPI and published some statistics about
> packages on PyPI. I think you might find it an interesting read:
>
> http://ysar.net/python/python-package-statistics.html

"b2gpopulate (36MB)
...
Total sizes on packages in PyPI amounted to 4.2 GB. Average package size 
is 161 KB and standard deviation is 1MB."

For such highly skewed data, the mean and especially the standard 
deviation and confidence intervals are meaningless. The are 
'parameteric' statistics, which is to say, were designed for bell-shaped 
distributions. (I will not say 'normal' == Guassian distributions 
because they are *not* normal for much raw data.)

  A better summary is obtained from either 'non-parametric' statistics 
(median, inter-quartile range) or from 'normalizing' the data (if 
possible). For the latter, try taking the square root or log of the 
sizes and plot the distribution. If either works, take the mean and sd 
of the transformed values. Then report those and also the transformed 
back mean and mean+-sd.

-- 
Terry Jan Reedy

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Python package statistics Terry Reedy <tjreedy@udel.edu> - 2013-10-18 14:54 -0400

csiph-web