Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!feeder2.ecngs.de!ecngs!feeder.ecngs.de!Xl.tags.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!local2.nntp.ams.giganews.com!nntp.bt.com!news.bt.com.POSTED!not-for-mail NNTP-Posting-Date: Sat, 05 May 2012 04:04:44 -0500 From: "Chris Uppal" Newsgroups: comp.programming References: <12217875.401.1335542191031.JavaMail.geo-discussion-forums@ynjj38> <1rnzov5qdfjg9$.1xzgbukwvzdqc$.dlg@40tude.net> Subject: Re: quantifying bloat Date: Sat, 5 May 2012 10:03:29 +0100 X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.5512 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5512 X-RFC2646: Format=Flowed; Original Message-ID: Lines: 108 X-Usenet-Provider: http://www.giganews.com X-AuthenticatedUsername: NoAuthUser X-Trace: sv3-Cm5hVhNComD7T5kTxlWQ3laWe5KtXLwn68OFnw60pV9/J+MAUFLo72Od4g2QSzrdIkwZx82/PZInSnc!BafhJQAfr1VDSm8aPIr5r3Ylvx2YrlkDhZU4VfM9eh4CQX9FGWs2qi/KjgNeSEvim/IHfJxhcao= X-Complaints-To: abuse@btinternet.com X-DMCA-Complaints-To: abuse@btinternet.com X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 X-Original-Bytes: 7342 Xref: csiph.com comp.programming:1537 Dmitry A. Kazakov wrote: > > [me] > > I think you've misunderstood the word "information" in the phrase > > "information theory". In that context, it doesn't have the normal > > English meaning (something similar to "knowledge" -- which must > > certainly have a "knower"), but has a narrow technical (jargon) meaning > > which is very roughly -- the information in a message is the [size of > > the] set of other messages which might have been transmitted instead. > > There are technical terms to describe what you mean, e.g. code density, > bandwidth etc. The "information" in "information theory" /is/ a technical term. I know it has other meanings in other contexts, but mixing those meanings up (as you are doing) will not help communication here. The idea under consideration is that the tools of information theory might help identify code bloat. Confounding the irrelevant (here) meanings of the word "information" will only obscure that question. > > So, consider a program made of many function definitions (or lines, or > > classes or whatever). If knowing the text of all the other function > > definitions gives you a better guess of the text of some arbitrarily > > chosen remaining one than you would have if you did the same exercise > > with a different program, then the first is definitely more > > redundant/compressible than the second. The hypothesis here is that > > similar reasoning might justify the claim that the first was more > > "bloated" than the second. > > My point was that this particular issue, provided OP indeed meant that, > has very little to do with information theory (coding theory). It does > with psychology and linguistics, with how human beings sense, comprehend, > feel > about programs. But if we (or someone) /can/ use the tools of information theory (objectively, perhaps even automatically) to identify factors in code which correlate well with (some aspects of) what we informally describe as code bloat, then that would constitute a disproof of your claim. /Starting/ with the assertion that it's impossible is begging the question. > > I think that one could use that sort of technique to identify programs > > where a lot of copy-paste repetition exists, and that is certainly > > something one /could/ label as "bloat" -- for all it's not the only > > meaning of "bloat", nor does that label really capture the essence of > > what's wrong with the code. > > Yes, the level of reuse can be considered as one characteristic. However, > in any language reuse comes at the cost of means used to factor out the > repeated piece of code. Be it a class, a subprogram, a template, it always > "bloats" a bit. If I refactor (or redesign) a 1MLoC program into an equivalent 200KLOC program, then I don't think that /anyone/ could call that bloating the program. Adding layers of abstraction? For sure. Making it harder for a total newbie to understand? Quite possibly. But "bloated" ? Never! (Note: there's nothing in the above paragraph to claim that information theory would help me refactor, or identify the need/possibility to refactor, the original code. But it /might/ and that's the question.) Another place where one might try to apply information theory, would be in the external interface to the program. Say it's a GUI application. You could identify a "language" which described the legal sequences of commands/gestures/inputs and responses/outputs. Perhaps a context-free grammar, perhaps something more complex, perhaps something simpler. The more tightly the language captures the actual sequences the better, but even a loose characterisation has value. You then build some sort of a probabilistic model of the relative frequencies of sentences and clauses in that grammar as actually used by the real users of the app. That gives you /a/ model to plug into your information-theoretic calculations. (Note: /a/ model, not /the/ model -- it may be good or bad). Armed with that model you can talk about the "information" content (relative /to/ the model) of commands issued to the app. Two ways you might use that are: 1) compare that model with another one for the same app ten years ago (or a current competitor). If the user is required to supply (or consume) significantly more "information" (as calculated objectively using the model) when using the later app compared to the earlier, then we can say that (in some technical, objective, sense) the interface has become more complicated to use. If technical, objective, facts like that correlate well with what people think of as "bloat" in the UI (think MS Office here ;-) then we have a tool for the automatic identification of "bloat" (admittedly heuristic, but we've never been aiming for mathematical proof). 2) if the information density of the language the user uses to control the app stays more of less constant, but the code base size increases significantly, or the memory/cpu/disk-space required by the app increase significantly, then again we can say that the app has become bloated in its implementation. Support from more examples (and not too many counter-examples) of the same thing, would justify us in using the (objective) heuristic to detect implementation bloat. Personally, I'm more interested in the possibility of analysing the code base directly (because I'm a programmer, not a user, I suppose). I've got too many projects on hand already, but it's tempting to go find some chunk of software where the source history is available, and which is commonly supposed to have bloated (the Linux kernel, perhaps, or the JRE, or even just *IX "cat") and do some modelling and analysis. (If only the gzip hack I mentioned earlier.) I'd need some examples of code that has grown or changed /without/ bloating too, of course -- not quite so easy to think of candidates ;-) -- chris