Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Ben Finney Newsgroups: comp.lang.python Subject: Categorising strings on =?utf-8?Q?meaningful=E2=80=93meaningless?= spectrum (was: Catogorising strings into random versus non-random) Date: Mon, 21 Dec 2015 14:45:31 +1100 Lines: 39 Message-ID: References: <56776b9d$0$1615$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de 3M0BdQGRauHdZK5CYXbZ/AoiSO5VzwMZ/OuHdT5axVsQ== Cancel-Lock: sha1:y2aawDW7iCDAvsWojhmE8fbNfVs= Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.029 X-Spam-Evidence: '*H*': 0.94; '*S*': 0.00; 'value,': 0.03; 'shannon': 0.07; 'creator': 0.09; 'descriptor': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:into': 0.09; 'hopeful': 0.16; 'ideally,': 0.16; 'received:80.91.229.3': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'subject:non': 0.16; 'subject:random': 0.16; 'subject:versus': 0.16; 'tweak': 0.16; 'string': 0.17; 'arguments': 0.22; 'seems': 0.23; 'split': 0.23; 'second': 0.24; 'sort': 0.25; 'header:User- Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'helpful': 0.27; 'not.': 0.27; 'least': 0.27; 'boundaries': 0.29; 'measure': 0.29; 'url:wikipedia': 0.29; 'random': 0.29; "i'm": 0.30; 'url:wiki': 0.30; 'maybe': 0.33; 'getting': 0.33; "d'aprano": 0.33; 'steven': 0.33; 'values.': 0.33; "i'll": 0.33; 'definition': 0.34; 'so,': 0.35; 'could': 0.35; 'text': 0.35; 'c++': 0.35; 'quite': 0.35; 'but': 0.36; 'should': 0.36; 'url:org': 0.36; 'to:addr:python- list': 0.36; 'subject:?': 0.36; 'subject:: ': 0.37; 'really': 0.37; 'received:org': 0.37; 'url:en': 0.39; 'subject:-': 0.39; 'to:addr:python.org': 0.40; 'where': 0.40; 'some': 0.40; 'term': 0.60; 'determine': 0.61; 'discuss': 0.61; 'more': 0.63; 'information': 0.63; 'url:%28': 0.66; 'url:%29': 0.66; 'url:%1': 0.67; 'risk': 0.68; 'skip:\xe2 10': 0.70; 'wish': 0.71; '8bit%:27': 0.72; 'score': 0.76; 'counts': 0.81; '_o__)': 0.84; 'distinguish': 0.84; 'groups:': 0.84; 'received:125': 0.84; '1997': 0.91; '8bit%:33': 0.91; 'examining': 0.91 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: jigong.madmonks.org X-Public-Key-ID: 0xAC128405 X-Public-Key-Fingerprint: 517C F14B B2F3 98B0 CB35 4855 B8B2 4C06 AC12 8405 X-Public-Key-URL: http://www.benfinney.id.au/contact/bfinney-pubkey.asc X-Post-From: Ben Finney User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:100644 Steven D'Aprano writes: > Let's call the second group "random" and the first "non-random", > without getting bogged down into arguments about whether they are > really random or not. I think we should discuss it, even at risk of getting bogged down. As you know better than I, “random” is not an observable property of the value, but of the process that produced it. So, I don't think “random” is at all helpful as a descriptor of the criteria you need for discriminating these values. Can you give a better definition of what criteria distinguish the values, based only on their observable properties? You used “meaningless”; that seems at least more hopeful as a criterion we can use by examining text values. So, what counts as meaningless? > I wish to process the strings and automatically determine whether each > string is random or not. I need to split the strings into three groups: > > - those that I'm confident are random > - those that I'm unsure about > - those that I'm confident are non-random > > Ideally, I'll get some sort of numeric score so I can tweak where the > boundaries fall. Perhaps you could measure Shannon entropy (“expected information value”) as a proxy? Or maybe I don't quite understand the criteria. -- \ “Actually I made up the term “object-oriented”, and I can tell | `\ you I did not have C++ in mind.” —Alan Kay, creator of | _o__) Smalltalk, at OOPSLA 1997 | Ben Finney