Path: csiph.com!usenet.pasdenom.info!news.albasani.net!feeder.erje.net!border3.nntp.ams.giganews.com!Xl.tags.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!local2.nntp.ams.giganews.com!nntp.bt.com!news.bt.com.POSTED!not-for-mail NNTP-Posting-Date: Tue, 01 May 2012 02:53:03 -0500 From: "Chris Uppal" Newsgroups: comp.programming References: <12217875.401.1335542191031.JavaMail.geo-discussion-forums@ynjj38> <1rnzov5qdfjg9$.1xzgbukwvzdqc$.dlg@40tude.net> Subject: Re: quantifying bloat Date: Tue, 1 May 2012 08:53:12 +0100 X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.5512 X-RFC2646: Format=Flowed; Original X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5512 Message-ID: Lines: 41 X-Usenet-Provider: http://www.giganews.com X-AuthenticatedUsername: NoAuthUser X-Trace: sv3-VXZXzvrPADce4WePxzFBET5DrROz5kL3gghdLi+HmSRBrlmfI0WSAXvdwQuw2R1IbgJ/AXZS+SX1bAl!dm1thZFf6JpiWe6LDyLUGbZ2rvdiUAKpv2BLBybLe5bX9wP9LIHAeQ/Cc7bzzUCD9w6xTDAtYzA= X-Complaints-To: abuse@btinternet.com X-DMCA-Complaints-To: abuse@btinternet.com X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 X-Original-Bytes: 3468 Xref: csiph.com comp.programming:1517 Dmitry A. Kazakov wrote: > The notion of information complexity is just rubbish. There is no > information without an observer. So there is no complexity in raw data. I think you've misunderstood the word "information" in the phrase "information theory". In that context, it doesn't have the normal English meaning (something similar to "knowledge" -- which must certainly have a "knower"), but has a narrow technical (jargon) meaning which is very roughly -- the information in a message is the [size of the] set of other messages which might have been transmitted instead. That's very rough, of course, but it captures the important point that "information theory" isn't about information, as that word is normally understood, at all. That definition (the real version, or my paraphrase) only applies when there is a known set of potential messages to consider. So it doesn't directly apply to just one program (what set is that program drawn from?), but it is very common to wave ones hands a bit, and treat individual passages from the text as if drawn from a set which is exemplified by the whole (available) program. In which case, it becomes possible to talk of the information-density of "the program" (I don't like this misuse of words myself, I think it's confusing, although there is a perfectly well-defined concept there) So, consider a program made of many function definitions (or lines, or classes or whatever). If knowing the text of all the other function definitions gives you a better guess of the text of some arbitrarily chosen remaining one than you would have if you did the same exercise with a different program, then the first is definitely more redundant/compressible than the second. The hypothesis here is that similar reasoning might justify the claim that the first was more "bloated" than the second. I think that one could use that sort of technique to identify programs where a lot of copy-paste repetition exists, and that is certainly something one /could/ label as "bloat" -- for all it's not the only meaning of "bloat", nor does that label really capture the essence of what's wrong with the code. -- chris