Path: csiph.com!usenet.pasdenom.info!news.albasani.net!feeder.erje.net!border3.nntp.ams.giganews.com!Xl.tags.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!local2.nntp.ams.giganews.com!nntp.bt.com!news.bt.com.POSTED!not-for-mail
NNTP-Posting-Date: Tue, 01 May 2012 02:53:03 -0500
From: "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org>
Newsgroups: comp.programming
References: <12217875.401.1335542191031.JavaMail.geo-discussion-forums@ynjj38> <W5udnasYne_KmQDSnZ2dnUVZ7q2dnZ2d@bt.com> <1rnzov5qdfjg9$.1xzgbukwvzdqc$.dlg@40tude.net>
Subject: Re: quantifying bloat
Date: Tue, 1 May 2012 08:53:12 +0100
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5512
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5512
Message-ID: <oOOdncDIyY_CCwLSnZ2dnUVZ7vCdnZ2d@bt.com>
Lines: 41
X-Usenet-Provider: http://www.giganews.com
X-AuthenticatedUsername: NoAuthUser
X-Trace: sv3-VXZXzvrPADce4WePxzFBET5DrROz5kL3gghdLi+HmSRBrlmfI0WSAXvdwQuw2R1IbgJ/AXZS+SX1bAl!dm1thZFf6JpiWe6LDyLUGbZ2rvdiUAKpv2BLBybLe5bX9wP9LIHAeQ/Cc7bzzUCD9w6xTDAtYzA=
X-Complaints-To: abuse@btinternet.com
X-DMCA-Complaints-To: abuse@btinternet.com
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 3468
Xref: csiph.com comp.programming:1517

Dmitry A. Kazakov wrote:

> The notion of information complexity is just rubbish. There is no
> information without an observer. So there is no complexity in raw data.

I think you've misunderstood the word "information" in the phrase "information 
theory".  In that context, it doesn't have the normal English meaning 
(something similar to "knowledge" -- which must certainly have a "knower"), but 
has a narrow technical (jargon) meaning which is very roughly -- the 
information in a message is the [size of the] set of other messages which might 
have been transmitted instead.

That's very rough, of course, but it captures the important point that 
"information theory" isn't about information, as that word is normally 
understood, at all.

That definition (the real version, or my paraphrase) only applies when there is 
a known set of potential messages to consider.  So it doesn't directly apply to 
just one program (what set is that program drawn from?), but it is very common 
to wave ones hands a bit, and treat individual passages from the text as if 
drawn from a set which is exemplified by the whole (available) program.  In 
which case, it becomes possible to talk of the information-density of "the 
program" (I don't like this misuse of words myself, I think it's confusing, 
although there is a perfectly well-defined concept there)

So, consider a program made of many function definitions (or lines, or classes 
or whatever).  If knowing the text of all the other function definitions gives 
you a better guess of the text of some arbitrarily chosen remaining one than 
you would have if you did the same exercise with a different program, then the 
first is definitely more redundant/compressible than the second.  The 
hypothesis here is that similar reasoning might justify the claim that the 
first was more "bloated" than the second.

I think that one could use that sort of technique to identify programs where a 
lot of copy-paste repetition exists, and that is certainly something one 
/could/ label as "bloat" -- for all it's not the only meaning of "bloat", nor 
does that label really capture the essence of what's wrong with the code.

    -- chris