Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.programming > #1537
| From | "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> |
|---|---|
| Newsgroups | comp.programming |
| References | <12217875.401.1335542191031.JavaMail.geo-discussion-forums@ynjj38> <W5udnasYne_KmQDSnZ2dnUVZ7q2dnZ2d@bt.com> <1rnzov5qdfjg9$.1xzgbukwvzdqc$.dlg@40tude.net> <oOOdncDIyY_CCwLSnZ2dnUVZ7vCdnZ2d@bt.com> <ojotwsvdqgff$.am755c0uxjj2$.dlg@40tude.net> |
| Subject | Re: quantifying bloat |
| Date | 2012-05-05 10:03 +0100 |
| Message-ID | <RfudnQbbhYaxcDnSnZ2dnUVZ8vCdnZ2d@bt.com> (permalink) |
Dmitry A. Kazakov wrote:
> > [me]
> > I think you've misunderstood the word "information" in the phrase
> > "information theory". In that context, it doesn't have the normal
> > English meaning (something similar to "knowledge" -- which must
> > certainly have a "knower"), but has a narrow technical (jargon) meaning
> > which is very roughly -- the information in a message is the [size of
> > the] set of other messages which might have been transmitted instead.
>
> There are technical terms to describe what you mean, e.g. code density,
> bandwidth etc.
The "information" in "information theory" /is/ a technical term. I know it has
other meanings in other contexts, but mixing those meanings up (as you are
doing) will not help communication here. The idea under consideration is that
the tools of information theory might help identify code bloat. Confounding
the irrelevant (here) meanings of the word "information" will only obscure that
question.
> > So, consider a program made of many function definitions (or lines, or
> > classes or whatever). If knowing the text of all the other function
> > definitions gives you a better guess of the text of some arbitrarily
> > chosen remaining one than you would have if you did the same exercise
> > with a different program, then the first is definitely more
> > redundant/compressible than the second. The hypothesis here is that
> > similar reasoning might justify the claim that the first was more
> > "bloated" than the second.
>
> My point was that this particular issue, provided OP indeed meant that,
> has very little to do with information theory (coding theory). It does
> with psychology and linguistics, with how human beings sense, comprehend,
> feel
> about programs.
But if we (or someone) /can/ use the tools of information theory (objectively,
perhaps even automatically) to identify factors in code which correlate well
with (some aspects of) what we informally describe as code bloat, then that
would constitute a disproof of your claim. /Starting/ with the assertion that
it's impossible is begging the question.
> > I think that one could use that sort of technique to identify programs
> > where a lot of copy-paste repetition exists, and that is certainly
> > something one /could/ label as "bloat" -- for all it's not the only
> > meaning of "bloat", nor does that label really capture the essence of
> > what's wrong with the code.
>
> Yes, the level of reuse can be considered as one characteristic. However,
> in any language reuse comes at the cost of means used to factor out the
> repeated piece of code. Be it a class, a subprogram, a template, it always
> "bloats" a bit.
If I refactor (or redesign) a 1MLoC program into an equivalent 200KLOC program,
then I don't think that /anyone/ could call that bloating the program. Adding
layers of abstraction? For sure. Making it harder for a total newbie to
understand? Quite possibly. But "bloated" ? Never!
(Note: there's nothing in the above paragraph to claim that information theory
would help me refactor, or identify the need/possibility to refactor, the
original code. But it /might/ and that's the question.)
Another place where one might try to apply information theory, would be in the
external interface to the program. Say it's a GUI application. You could
identify a "language" which described the legal sequences of
commands/gestures/inputs and responses/outputs. Perhaps a context-free
grammar, perhaps something more complex, perhaps something simpler. The more
tightly the language captures the actual sequences the better, but even a loose
characterisation has value. You then build some sort of a probabilistic model
of the relative frequencies of sentences and clauses in that grammar as
actually used by the real users of the app. That gives you /a/ model to plug
into your information-theoretic calculations. (Note: /a/ model, not /the/
model -- it may be good or bad).
Armed with that model you can talk about the "information" content (relative
/to/ the model) of commands issued to the app. Two ways you might use that
are:
1) compare that model with another one for the same app ten years ago (or a
current competitor). If the user is required to supply (or consume)
significantly more "information" (as calculated objectively using the model)
when using the later app compared to the earlier, then we can say that (in some
technical, objective, sense) the interface has become more complicated to use.
If technical, objective, facts like that correlate well with what people think
of as "bloat" in the UI (think MS Office here ;-) then we have a tool for the
automatic identification of "bloat" (admittedly heuristic, but we've never been
aiming for mathematical proof).
2) if the information density of the language the user uses to control the app
stays more of less constant, but the code base size increases significantly, or
the memory/cpu/disk-space required by the app increase significantly, then
again we can say that the app has become bloated in its implementation.
Support from more examples (and not too many counter-examples) of the same
thing, would justify us in using the (objective) heuristic to detect
implementation bloat.
Personally, I'm more interested in the possibility of analysing the code base
directly (because I'm a programmer, not a user, I suppose). I've got too many
projects on hand already, but it's tempting to go find some chunk of software
where the source history is available, and which is commonly supposed to have
bloated (the Linux kernel, perhaps, or the JRE, or even just *IX "cat") and do
some modelling and analysis. (If only the gzip hack I mentioned earlier.) I'd
need some examples of code that has grown or changed /without/ bloating too, of
course -- not quite so easy to think of candidates ;-)
-- chris
Back to comp.programming | Previous | Next — Previous in thread | Next in thread | Find similar
quantifying bloat bob <bob@coolfone.comze.com> - 2012-04-27 08:56 -0700
Re: quantifying bloat hopcode <hopcode@invalid.de> - 2012-04-27 18:16 +0200
Re: quantifying bloat Nomen Nescio <nobody@dizum.com> - 2012-04-29 16:22 +0200
Re: quantifying bloat "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-04-29 10:03 +0100
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-04-29 11:36 +0200
Re: quantifying bloat Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-04-29 15:09 -0700
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-04-30 10:09 +0200
Re: quantifying bloat "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-05-01 08:53 +0100
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-05-01 10:52 +0200
Re: quantifying bloat hopcode <hopcode@invalid.de> - 2012-05-02 04:02 +0200
Re: quantifying bloat "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-05-05 10:03 +0100
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-05-05 12:50 +0200
Re: quantifying bloat hopcode <hopcode@invalid.de> - 2012-05-05 16:23 +0200
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-05-05 17:43 +0200
Re: quantifying bloat gremnebulin <peterdjones@yahoo.com> - 2012-05-03 09:27 -0700
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-05-03 18:50 +0200
Re: quantifying bloat Willem <willem@toad.stack.nl> - 2012-05-04 13:52 +0000
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-05-04 16:05 +0200
Re: quantifying bloat hopcode <hopcode@invalid.de> - 2012-05-04 20:44 +0200
Re: quantifying bloat Willem <willem@toad.stack.nl> - 2012-05-04 20:32 +0000
Re: quantifying bloat "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-05-05 10:16 +0100
Re: quantifying bloat James Dow Allen <jdallen2000@yahoo.com> - 2012-05-02 02:44 -0700
Re: quantifying bloat "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-05-05 10:11 +0100
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-05-05 13:22 +0200
Re: quantifying bloat hopcode <hopcode@invalid.de> - 2012-05-05 16:27 +0200
Re: quantifying bloat rossum <rossum48@coldmail.com> - 2012-04-29 11:01 +0100
csiph-web