Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #57084
| Path | csiph.com!usenet.pasdenom.info!news.etla.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <yasar11732@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.013 |
| X-Spam-Evidence | '*H*': 0.98; '*S*': 0.00; 'broken': 0.04; 'root': 0.05; 'say,': 0.05; 'subject:Python': 0.06; '*not*': 0.07; 'pypi': 0.07; '161': 0.09; 'cc:addr:python-list': 0.11; 'jan': 0.12; 'latter,': 0.16; 'pypi.': 0.16; 'reedy': 0.16; 'skewed': 0.16; 'statistics,': 0.16; 'wrote:': 0.18; 'cc:addr:python.org': 0.22; 'people,': 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'function': 0.29; 'am,': 0.29; 'especially': 0.30; 'message-id:@mail.gmail.com': 0.30; 'url:mailman': 0.30; "skip:' 10": 0.31; 'values.': 0.31; 'summary': 0.32; 'url:python': 0.33; 'raw': 0.33; 'received:google.com': 0.35; 'data,': 0.36; 'url:listinfo': 0.36; 'thanks': 0.36; 'url:org': 0.36; 'either': 0.39; 'url:mail': 0.40; 'here:': 0.62; 'back': 0.62; 'such': 0.63; 'skip:n 10': 0.64; 'total': 0.65; 'taking': 0.65; 'published': 0.71; 'square': 0.74; '4.2': 0.84; 'from:charset:iso-8859-9': 0.84; 'average': 0.93; 'confidence': 0.95; 'obtained': 0.96 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=XMKwuggwpdwlDlXonqDHtxCpgES5JGURvDvr7mxWGAk=; b=CT5RP41WDbRLTY1ulyoV5x7fkJXgmba4+BFnEjLTcbPW9himZrEeyiuyRmikfyqN39 75PXxQEvgWF5wvo4t5aPoan8YQrqC81BxVlP3+TBn621zfTlVWVfEJsA1Z6Ey8stQi3B rzIaV4AGdNLhvSUP6DhnP6Rx5o1cpCEQaxIHHWHYqPNDXhRQ9Jhb+GvjgV36J3YE2zJY 4yb+322rRnLhZQorkCbYpJEfAoIDRhl+nCuwvwFWYIKUOMJAqvFYhuWgioo44pH/YEYo sWsWNiSLc+3ZXKT+cRHSDvCekuw2dWOzCH0kke4ePOw8VlDVlwQWnXT8FHjJiGsvXCon 4g3w== |
| MIME-Version | 1.0 |
| X-Received | by 10.205.22.71 with SMTP id qv7mr902321bkb.20.1382130455735; Fri, 18 Oct 2013 14:07:35 -0700 (PDT) |
| In-Reply-To | <l3s04b$8v0$1@ger.gmane.org> |
| References | <CAFEUn8YPfRsqKSfB5SCUo5=PDfMLJ3iMQ2Uq=S2o=XRWJVQNXg@mail.gmail.com> <l3s04b$8v0$1@ger.gmane.org> |
| Date | Sat, 19 Oct 2013 00:07:35 +0300 |
| Subject | Re: Python package statistics |
| From | Yaşar Arabacı <yasar11732@gmail.com> |
| To | Terry Reedy <tjreedy@udel.edu> |
| Content-Type | text/plain; charset=ISO-8859-9 |
| Content-Transfer-Encoding | quoted-printable |
| X-Mailman-Approved-At | Sat, 19 Oct 2013 02:34:42 +0200 |
| Cc | python-list@python.org |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1248.1382142882.18130.python-list@python.org> (permalink) |
| Lines | 52 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1382142882 news.xs4all.nl 16005 [2001:888:2000:d::a6]:59675 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:57084 |
Show key headers only | View raw
Hi Terry, Thanks for pointing it out.matplotlib's hist function wasn't broken after all :) I published non-parametric statistics here: http://ysar.net/python/python-package-statistics-additions.html 2013/10/18 Terry Reedy <tjreedy@udel.edu>: > On 10/18/2013 8:41 AM, Yaşar Arabacı wrote: >> >> Hi people, >> >> I collected some data on PyPI and published some statistics about >> packages on PyPI. I think you might find it an interesting read: >> >> http://ysar.net/python/python-package-statistics.html > > > "b2gpopulate (36MB) > ... > Total sizes on packages in PyPI amounted to 4.2 GB. Average package size is > 161 KB and standard deviation is 1MB." > > For such highly skewed data, the mean and especially the standard deviation > and confidence intervals are meaningless. The are 'parameteric' statistics, > which is to say, were designed for bell-shaped distributions. (I will not > say 'normal' == Guassian distributions because they are *not* normal for > much raw data.) > > A better summary is obtained from either 'non-parametric' statistics > (median, inter-quartile range) or from 'normalizing' the data (if possible). > For the latter, try taking the square root or log of the sizes and plot the > distribution. If either works, take the mean and sd of the transformed > values. Then report those and also the transformed back mean and mean+-sd. > > -- > Terry Jan Reedy > > > -- > https://mail.python.org/mailman/listinfo/python-list -- http://ysar.net/
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: Python package statistics Yaşar Arabacı <yasar11732@gmail.com> - 2013-10-19 00:07 +0300
csiph-web