Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.programming > #1504 > unrolled thread
| Started by | bob <bob@coolfone.comze.com> |
|---|---|
| First post | 2012-04-27 08:56 -0700 |
| Last post | 2012-04-29 11:01 +0100 |
| Articles | 20 on this page of 26 — 10 participants |
Back to article view | Back to comp.programming
quantifying bloat bob <bob@coolfone.comze.com> - 2012-04-27 08:56 -0700
Re: quantifying bloat hopcode <hopcode@invalid.de> - 2012-04-27 18:16 +0200
Re: quantifying bloat Nomen Nescio <nobody@dizum.com> - 2012-04-29 16:22 +0200
Re: quantifying bloat "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-04-29 10:03 +0100
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-04-29 11:36 +0200
Re: quantifying bloat Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-04-29 15:09 -0700
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-04-30 10:09 +0200
Re: quantifying bloat "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-05-01 08:53 +0100
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-05-01 10:52 +0200
Re: quantifying bloat hopcode <hopcode@invalid.de> - 2012-05-02 04:02 +0200
Re: quantifying bloat "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-05-05 10:03 +0100
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-05-05 12:50 +0200
Re: quantifying bloat hopcode <hopcode@invalid.de> - 2012-05-05 16:23 +0200
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-05-05 17:43 +0200
Re: quantifying bloat gremnebulin <peterdjones@yahoo.com> - 2012-05-03 09:27 -0700
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-05-03 18:50 +0200
Re: quantifying bloat Willem <willem@toad.stack.nl> - 2012-05-04 13:52 +0000
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-05-04 16:05 +0200
Re: quantifying bloat hopcode <hopcode@invalid.de> - 2012-05-04 20:44 +0200
Re: quantifying bloat Willem <willem@toad.stack.nl> - 2012-05-04 20:32 +0000
Re: quantifying bloat "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-05-05 10:16 +0100
Re: quantifying bloat James Dow Allen <jdallen2000@yahoo.com> - 2012-05-02 02:44 -0700
Re: quantifying bloat "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-05-05 10:11 +0100
Re: quantifying bloat "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-05-05 13:22 +0200
Re: quantifying bloat hopcode <hopcode@invalid.de> - 2012-05-05 16:27 +0200
Re: quantifying bloat rossum <rossum48@coldmail.com> - 2012-04-29 11:01 +0100
Page 1 of 2 [1] 2 Next page →
| From | bob <bob@coolfone.comze.com> |
|---|---|
| Date | 2012-04-27 08:56 -0700 |
| Subject | quantifying bloat |
| Message-ID | <12217875.401.1335542191031.JavaMail.geo-discussion-forums@ynjj38> |
Has anyone ever tried to apply information theory to source code to quantitatively determine if code is bloated or not?
[toc] | [next] | [standalone]
| From | hopcode <hopcode@invalid.de> |
|---|---|
| Date | 2012-04-27 18:16 +0200 |
| Message-ID | <jnegp6$qjn$1@dont-email.me> |
| In reply to | #1504 |
Il 27.04.2012 17:56, bob ha scritto: > Has anyone ever tried to apply information theory to source code to quantitatively determine if code is bloated or not? > me. i am still at work on some few formulations. here the goal, http://sites.google.com/site/x64lab/home/uncategorized/programs-by-code-languages-by-semantics Cheers, -- .:mrk[hopcode] .:x64lab:. group http://groups.google.com/group/x64lab site http://sites.google.com/site/x64lab
[toc] | [prev] | [next] | [standalone]
| From | Nomen Nescio <nobody@dizum.com> |
|---|---|
| Date | 2012-04-29 16:22 +0200 |
| Message-ID | <2a56ef3122afad094c80ae86801158ff@dizum.com> |
| In reply to | #1506 |
hopcode <hopcode@invalid.de> wrote:
> Il 27.04.2012 17:56, bob ha scritto:
> > Has anyone ever tried to apply information theory to source code to
> quantitatively determine if code is bloated or not?
That is pretty easy. I use the following logic:
if (env_windows || env_unix_gnu)
code_bloat == yes;
[toc] | [prev] | [next] | [standalone]
| From | "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> |
|---|---|
| Date | 2012-04-29 10:03 +0100 |
| Message-ID | <W5udnasYne_KmQDSnZ2dnUVZ7q2dnZ2d@bt.com> |
| In reply to | #1504 |
bob wrote:
> Has anyone ever tried to apply information theory to source code to
> quantitatively determine if code is bloated or not?
tar -cf - $codebase | gzip -v > /dev/null
;-)
More seriously (though the above certanly isn't entirely silly), it depends on
what you mean by "bloat". Wordy/verbose language design? Verbose API's ?
Copy-paste redundancy ? Missing abstractions[*] ? Excess features ? Dead code
left unpruned ? ...
Some of those could be attacked, I think, with information theory.
But note that the closer you get to some kind of infomation theoretic ideal,
with no "wasted" bandwidth, the nearer you get to the situation where any error
in transmission results in a /different/, but /still valid/ message. Not
something that I'd like in a programming environment.
-- chris
[*] abstraction can be thought of as a compression technique.
[toc] | [prev] | [next] | [standalone]
| From | "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> |
|---|---|
| Date | 2012-04-29 11:36 +0200 |
| Message-ID | <1rnzov5qdfjg9$.1xzgbukwvzdqc$.dlg@40tude.net> |
| In reply to | #1510 |
On Sun, 29 Apr 2012 10:03:46 +0100, Chris Uppal wrote: >> Has anyone ever tried to apply information theory to source code to >> quantitatively determine if code is bloated or not? > > tar -cf - $codebase | gzip -v > /dev/null > > ;-) > > > More seriously (though the above certanly isn't entirely silly), it depends on > what you mean by "bloat". Wordy/verbose language design? Verbose API's ? > Copy-paste redundancy ? Missing abstractions[*] ? Excess features ? Dead code > left unpruned ? ... > > Some of those could be attacked, I think, with information theory. The notion of information complexity is just rubbish. There is no information without an observer. So there is no complexity in raw data. > But note that the closer you get to some kind of infomation theoretic ideal, > with no "wasted" bandwidth, the nearer you get to the situation where any error > in transmission results in a /different/, but /still valid/ message. Not > something that I'd like in a programming environment. Yes, that is one. There is fundamentally no way to distinguish noise and tightly messages. Another is the meaning of the message. For example, Pi is incomputable, but there is no problem to pass a message "Pi" to a recipient knowing what Pi is. Is Pi complex? A meaningless question. The bottom line, complexity of code is not a subject of information theory. It is a subject of psychology if we consider how complex is it for an average programmer to understand the code. It is a subject of compiler construction if we consider how to build a compiler to translate the code. etc. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-04-29 15:09 -0700 |
| Message-ID | <S_inr.17525$zA4.4715@newsfe19.iad> |
| In reply to | #1511 |
On 4/29/12 2:36 AM, Dmitry A. Kazakov wrote: > On Sun, 29 Apr 2012 10:03:46 +0100, Chris Uppal wrote: > >>> Has anyone ever tried to apply information theory to source code to >>> quantitatively determine if code is bloated or not? >> >> tar -cf - $codebase | gzip -v> /dev/null >> >> ;-) >> >> >> More seriously (though the above certanly isn't entirely silly), it depends on >> what you mean by "bloat". Wordy/verbose language design? Verbose API's ? >> Copy-paste redundancy ? Missing abstractions[*] ? Excess features ? Dead code >> left unpruned ? ... >> >> Some of those could be attacked, I think, with information theory. > > The notion of information complexity is just rubbish. There is no > information without an observer. So there is no complexity in raw data. The information is there, if *can* be observed, not if it *is* observed. Well, maybe Erwin Schrödinger would disagree, but the point is that the information stored has a certain amount of entropy in it, and a specific piece of information, to be discernible from any other piece, has a specific minimum amount of space needed to encode it. Though, I think bloat in software terms is a bit different. See below... > The bottom line, complexity of code is not a subject of information theory. > It is a subject of psychology if we consider how complex is it for an > average programmer to understand the code. It is a subject of compiler > construction if we consider how to build a compiler to translate the code. > etc. I would characterize software bloat as the difference in resources usage between the "most optimal" and the "actual" implementation. Resources being memory, cpu, disk space, etc... With this definition, most software is slightly bloated, since they rely on abstraction layers (such as the O.S., standard libraries, etc...). Bloat isn't entirely bad, as allowing for bloat allows for these abstractions, and these abstractions make it easier to produce software that is complex and reasonably correct. Good tools can provide these abstractions with less bloat than a raw translation of the abstractions strictly require (optimizing compilers for example). Though some bloat is caused by poor use of abstractions, or poor abstractions themselves. The difficulty sometimes is determining what the "optimum" resource usage actually is. Some operations have several algorithms that perform slightly differently depending on input, and some operations don't have a "proven minimum" big O. I do think you can quantify bloat, but that value will have large error-bars for most real-world situations.
[toc] | [prev] | [next] | [standalone]
| From | "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> |
|---|---|
| Date | 2012-04-30 10:09 +0200 |
| Message-ID | <687fvjke62hx$.1hlswyyvuygns.dlg@40tude.net> |
| In reply to | #1514 |
On Sun, 29 Apr 2012 15:09:53 -0700, Daniel Pitts wrote: > On 4/29/12 2:36 AM, Dmitry A. Kazakov wrote: >> The notion of information complexity is just rubbish. There is no >> information without an observer. So there is no complexity in raw data. > The information is there, if *can* be observed, not if it *is* observed. No, a system of N independent states contains nothing without an association of these states with meanings by a concrete observer. 123 *can* mean 123 employees or 123 beer bottles or "S". As such it means nothing. > Well, maybe Erwin Schrödinger would disagree, but the point is that the > information stored has a certain amount of entropy in it, and a specific > piece of information, to be discernible from any other piece, has a > specific minimum amount of space needed to encode it. This is wrong too. A specific piece of information (just one) needs no space to encode. Consider a medium which has only one state. You assign your piece to that state and you are done. You can encode whole Britannica this way. Which returns us to the difference between *can* and *is*. > Though, I think bloat in software terms is a bit different. See below... > >> The bottom line, complexity of code is not a subject of information theory. >> It is a subject of psychology if we consider how complex is it for an >> average programmer to understand the code. It is a subject of compiler >> construction if we consider how to build a compiler to translate the code. >> etc. > > I would characterize software bloat as the difference in resources usage > between the "most optimal" and the "actual" implementation. Resources > being memory, cpu, disk space, etc... This is a case when the observer is the machine hardware. I think most people would disagree with this definition of "bloat," because except for very specific embedded and heavy duty applications machine resources play far lesser role than maintenance costs, safety, security and other non-functional requirements. > With this definition, most software is slightly bloated, since they rely > on abstraction layers (such as the O.S., standard libraries, etc...). No, it is hugely "bloated" in this sense. If you compare the resources of a PC now and ones of a workstation 20 years ago against the capabilities of the software used for typical activities: word processing, editing source code, compiling, the latter is almost same. 99.9% of resources gain is just wasted. Of course gaming and other computation-intensive stuff is another beast. But there too, development is not focused on resources as it was before. Mass Effect is far more "bloated" than Packman running on a 32K PDP-11. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de
[toc] | [prev] | [next] | [standalone]
| From | "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> |
|---|---|
| Date | 2012-05-01 08:53 +0100 |
| Message-ID | <oOOdncDIyY_CCwLSnZ2dnUVZ7vCdnZ2d@bt.com> |
| In reply to | #1511 |
Dmitry A. Kazakov wrote:
> The notion of information complexity is just rubbish. There is no
> information without an observer. So there is no complexity in raw data.
I think you've misunderstood the word "information" in the phrase "information
theory". In that context, it doesn't have the normal English meaning
(something similar to "knowledge" -- which must certainly have a "knower"), but
has a narrow technical (jargon) meaning which is very roughly -- the
information in a message is the [size of the] set of other messages which might
have been transmitted instead.
That's very rough, of course, but it captures the important point that
"information theory" isn't about information, as that word is normally
understood, at all.
That definition (the real version, or my paraphrase) only applies when there is
a known set of potential messages to consider. So it doesn't directly apply to
just one program (what set is that program drawn from?), but it is very common
to wave ones hands a bit, and treat individual passages from the text as if
drawn from a set which is exemplified by the whole (available) program. In
which case, it becomes possible to talk of the information-density of "the
program" (I don't like this misuse of words myself, I think it's confusing,
although there is a perfectly well-defined concept there)
So, consider a program made of many function definitions (or lines, or classes
or whatever). If knowing the text of all the other function definitions gives
you a better guess of the text of some arbitrarily chosen remaining one than
you would have if you did the same exercise with a different program, then the
first is definitely more redundant/compressible than the second. The
hypothesis here is that similar reasoning might justify the claim that the
first was more "bloated" than the second.
I think that one could use that sort of technique to identify programs where a
lot of copy-paste repetition exists, and that is certainly something one
/could/ label as "bloat" -- for all it's not the only meaning of "bloat", nor
does that label really capture the essence of what's wrong with the code.
-- chris
[toc] | [prev] | [next] | [standalone]
| From | "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> |
|---|---|
| Date | 2012-05-01 10:52 +0200 |
| Message-ID | <ojotwsvdqgff$.am755c0uxjj2$.dlg@40tude.net> |
| In reply to | #1517 |
On Tue, 1 May 2012 08:53:12 +0100, Chris Uppal wrote: > Dmitry A. Kazakov wrote: > >> The notion of information complexity is just rubbish. There is no >> information without an observer. So there is no complexity in raw data. > > I think you've misunderstood the word "information" in the phrase "information > theory". In that context, it doesn't have the normal English meaning > (something similar to "knowledge" -- which must certainly have a "knower"), but > has a narrow technical (jargon) meaning which is very roughly -- the > information in a message is the [size of the] set of other messages which might > have been transmitted instead. There are technical terms to describe what you mean, e.g. code density, bandwidth etc. > So, consider a program made of many function definitions (or lines, or classes > or whatever). If knowing the text of all the other function definitions gives > you a better guess of the text of some arbitrarily chosen remaining one than > you would have if you did the same exercise with a different program, then the > first is definitely more redundant/compressible than the second. The > hypothesis here is that similar reasoning might justify the claim that the > first was more "bloated" than the second. My point was that this particular issue, provided OP indeed meant that, has very little to do with information theory (coding theory). It does with psychology and linguistics, with how human beings sense, comprehend, feel about programs. > I think that one could use that sort of technique to identify programs where a > lot of copy-paste repetition exists, and that is certainly something one > /could/ label as "bloat" -- for all it's not the only meaning of "bloat", nor > does that label really capture the essence of what's wrong with the code. Yes, the level of reuse can be considered as one characteristic. However, in any language reuse comes at the cost of means used to factor out the repeated piece of code. Be it a class, a subprogram, a template, it always "bloats" a bit. Additionally it needs a variance with leads to all sorts of substitutability issues and ways to formalize them, and thus to more code. Furthermore, it also requires the reader to understand the abstraction behind, e.g. roughly speaking the software pattern applied. If the reader does not recognize the pattern, the code would appear extremely bloated to him. E.g. the result heavily depends on the observer again. Another issue is that you have to consider a set of equivalent programs, ones having same semantics in order to compare them for bloating. This alone is a problem (undecidable). All in one, I think we have to live with empirical software metrics for a long while... -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de
[toc] | [prev] | [next] | [standalone]
| From | hopcode <hopcode@invalid.de> |
|---|---|
| Date | 2012-05-02 04:02 +0200 |
| Message-ID | <jnq4k9$ise$1@dont-email.me> |
| In reply to | #1518 |
Hi all,
i agree 100% with Dmitry; i would add that the *can* or *is* of
information establish a complexity that cannot be solved without
relying on probabilistic methods *nor* the enigma of two or more
overlapping states of an observed information/event,
AFAIK the "black cat" of it, has been clearly solved by quantum
computing.
then i suggest avoiding entering the Schrödinger's realm,
because most of the times i see that conceptual rather confusing
for those people taking advantage from it as a tautological
confirmation of one's own belief. where the conceptual abuse there
consists in the fact that because the black cat in the box may have 2
or more overlapped states, this should be enough to justify
stopping euristhics about it, because it is /already/ of relevance the
matter that 2 or more overlapping states can fully
satisify all eventual answers we expect from the analysis of the whole
(das Ganze).
also we should stay fest on this planet and consider concretely the
machine as actor and "observer", and the information itself as a
"vector" of itself, meaningful when observed in a well defined context.
i would instead sum up some points of relevance outlined by Dmitry,
because they are fundamentals for practical reasons
Dmitry starts generally from the info-theory
1) the states (attributes) of information means only when observed.
2) an information has no direct relation to space-time
archetypes (=categories, it may have no encoding space/time)
but he relates then in details the two points above, back on the
planet, considering the "observer", it is to say, to the machine.
and that is the way i would enter.
Il 01.05.2012 10:52, Dmitry A. Kazakov ha scritto:
>
> Yes, the level of reuse can be considered as one characteristic. However,
> in any language reuse comes at the cost of means used to factor out the
> repeated piece of code. Be it a class, a subprogram, a template, it always
> "bloats" a bit. Additionally it needs a variance with leads to all sorts of
> substitutability issues and ways to formalize them, and thus to more code.
> Furthermore, it also requires the reader to understand the abstraction
> behind, e.g. roughly speaking the software pattern applied. If the reader
> does not recognize the pattern, the code would appear extremely bloated to
> him. E.g. the result heavily depends on the observer again.
>
> Another issue is that you have to consider a set of equivalent programs,
> ones having same semantics in order to compare them for bloating. This
> alone is a problem (undecidable).
if the meaning given to "semantics" is "functionalities"/"aims",
like the Opera browser share the same "semantics" of Firefox, well, we
need to distinguish functionalities: for example the "Anonymous
Browsing Session" or the capability to install plug-in, when they
both show to have got implemented that functionality.
anyways they are fully "decidable", granted that we have previously
built a standard skala.
it seems obvious that they do not share the same performances,
nor they waste the same resources, nor they share the same "skin". but
all this need evaluation too!
>
> All in one, I think we have to live with empirical software metrics for a
> long while...
>
and i propose a new metric based on the difference beetween
output-code and language-semantics. consider the observer, the machine,
and tell me what is the difference, from the observer's point of view,
beetween the two lines
if (alpha == beta)
and
cmp eax,ebx
because this will give some hints on the reasons of my position
"Programs by code, languages by semantics"
Cheers,
--
.:mrk[hopcode]
.:x64lab:.
group http://groups.google.com/group/x64lab
site http://sites.google.com/site/x64lab
[toc] | [prev] | [next] | [standalone]
| From | "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> |
|---|---|
| Date | 2012-05-05 10:03 +0100 |
| Message-ID | <RfudnQbbhYaxcDnSnZ2dnUVZ8vCdnZ2d@bt.com> |
| In reply to | #1518 |
Dmitry A. Kazakov wrote:
> > [me]
> > I think you've misunderstood the word "information" in the phrase
> > "information theory". In that context, it doesn't have the normal
> > English meaning (something similar to "knowledge" -- which must
> > certainly have a "knower"), but has a narrow technical (jargon) meaning
> > which is very roughly -- the information in a message is the [size of
> > the] set of other messages which might have been transmitted instead.
>
> There are technical terms to describe what you mean, e.g. code density,
> bandwidth etc.
The "information" in "information theory" /is/ a technical term. I know it has
other meanings in other contexts, but mixing those meanings up (as you are
doing) will not help communication here. The idea under consideration is that
the tools of information theory might help identify code bloat. Confounding
the irrelevant (here) meanings of the word "information" will only obscure that
question.
> > So, consider a program made of many function definitions (or lines, or
> > classes or whatever). If knowing the text of all the other function
> > definitions gives you a better guess of the text of some arbitrarily
> > chosen remaining one than you would have if you did the same exercise
> > with a different program, then the first is definitely more
> > redundant/compressible than the second. The hypothesis here is that
> > similar reasoning might justify the claim that the first was more
> > "bloated" than the second.
>
> My point was that this particular issue, provided OP indeed meant that,
> has very little to do with information theory (coding theory). It does
> with psychology and linguistics, with how human beings sense, comprehend,
> feel
> about programs.
But if we (or someone) /can/ use the tools of information theory (objectively,
perhaps even automatically) to identify factors in code which correlate well
with (some aspects of) what we informally describe as code bloat, then that
would constitute a disproof of your claim. /Starting/ with the assertion that
it's impossible is begging the question.
> > I think that one could use that sort of technique to identify programs
> > where a lot of copy-paste repetition exists, and that is certainly
> > something one /could/ label as "bloat" -- for all it's not the only
> > meaning of "bloat", nor does that label really capture the essence of
> > what's wrong with the code.
>
> Yes, the level of reuse can be considered as one characteristic. However,
> in any language reuse comes at the cost of means used to factor out the
> repeated piece of code. Be it a class, a subprogram, a template, it always
> "bloats" a bit.
If I refactor (or redesign) a 1MLoC program into an equivalent 200KLOC program,
then I don't think that /anyone/ could call that bloating the program. Adding
layers of abstraction? For sure. Making it harder for a total newbie to
understand? Quite possibly. But "bloated" ? Never!
(Note: there's nothing in the above paragraph to claim that information theory
would help me refactor, or identify the need/possibility to refactor, the
original code. But it /might/ and that's the question.)
Another place where one might try to apply information theory, would be in the
external interface to the program. Say it's a GUI application. You could
identify a "language" which described the legal sequences of
commands/gestures/inputs and responses/outputs. Perhaps a context-free
grammar, perhaps something more complex, perhaps something simpler. The more
tightly the language captures the actual sequences the better, but even a loose
characterisation has value. You then build some sort of a probabilistic model
of the relative frequencies of sentences and clauses in that grammar as
actually used by the real users of the app. That gives you /a/ model to plug
into your information-theoretic calculations. (Note: /a/ model, not /the/
model -- it may be good or bad).
Armed with that model you can talk about the "information" content (relative
/to/ the model) of commands issued to the app. Two ways you might use that
are:
1) compare that model with another one for the same app ten years ago (or a
current competitor). If the user is required to supply (or consume)
significantly more "information" (as calculated objectively using the model)
when using the later app compared to the earlier, then we can say that (in some
technical, objective, sense) the interface has become more complicated to use.
If technical, objective, facts like that correlate well with what people think
of as "bloat" in the UI (think MS Office here ;-) then we have a tool for the
automatic identification of "bloat" (admittedly heuristic, but we've never been
aiming for mathematical proof).
2) if the information density of the language the user uses to control the app
stays more of less constant, but the code base size increases significantly, or
the memory/cpu/disk-space required by the app increase significantly, then
again we can say that the app has become bloated in its implementation.
Support from more examples (and not too many counter-examples) of the same
thing, would justify us in using the (objective) heuristic to detect
implementation bloat.
Personally, I'm more interested in the possibility of analysing the code base
directly (because I'm a programmer, not a user, I suppose). I've got too many
projects on hand already, but it's tempting to go find some chunk of software
where the source history is available, and which is commonly supposed to have
bloated (the Linux kernel, perhaps, or the JRE, or even just *IX "cat") and do
some modelling and analysis. (If only the gzip hack I mentioned earlier.) I'd
need some examples of code that has grown or changed /without/ bloating too, of
course -- not quite so easy to think of candidates ;-)
-- chris
[toc] | [prev] | [next] | [standalone]
| From | "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> |
|---|---|
| Date | 2012-05-05 12:50 +0200 |
| Message-ID | <1kct04w5vsodz.kh2ib37r71em$.dlg@40tude.net> |
| In reply to | #1537 |
On Sat, 5 May 2012 10:03:29 +0100, Chris Uppal wrote: > Dmitry A. Kazakov wrote: > >>> [me] >>> I think you've misunderstood the word "information" in the phrase >>> "information theory". In that context, it doesn't have the normal >>> English meaning (something similar to "knowledge" -- which must >>> certainly have a "knower"), but has a narrow technical (jargon) meaning >>> which is very roughly -- the information in a message is the [size of >>> the] set of other messages which might have been transmitted instead. >> >> There are technical terms to describe what you mean, e.g. code density, >> bandwidth etc. > > The "information" in "information theory" /is/ a technical term. It is a buzz word used here and there. > The idea under consideration is that > the tools of information theory might help identify code bloat. If "information theory" means here a theory that utilizes mathematical statistics to deal with encoding issues of signal transmission, then the clear answer is no. > Confounding > the irrelevant (here) meanings of the word "information" will only obscure that > question. The irrelevant meanings here are ones of the information theory. Which is why it does not work here. >>> So, consider a program made of many function definitions (or lines, or >>> classes or whatever). If knowing the text of all the other function >>> definitions gives you a better guess of the text of some arbitrarily >>> chosen remaining one than you would have if you did the same exercise >>> with a different program, then the first is definitely more >>> redundant/compressible than the second. The hypothesis here is that >>> similar reasoning might justify the claim that the first was more >>> "bloated" than the second. >> >> My point was that this particular issue, provided OP indeed meant that, >> has very little to do with information theory (coding theory). It does >> with psychology and linguistics, with how human beings sense, comprehend, >> feel about programs. > > But if we (or someone) /can/ use the tools of information theory (objectively, > perhaps even automatically) to identify factors in code which correlate well > with (some aspects of) what we informally describe as code bloat, then that > would constitute a disproof of your claim. Any theory has its application domain. In order to be useful or just meaningful certain premises has to be met. [ The burden of proof is on the applicant. ] As for the "tools" these are just of mathematical statistics and nothing else. It is applied mathematics, which per definition of has no fundamental merit of its own. Considering the mathematical statistics, if that to apply to the code analysis, I doubt it could be any useful here, because: 1. Properties of the code are not random. In the overwhelming majority of relevant cases it is all about the deterministic behavior of the program. 2. Human perception of the code as being bloated or not is not stochastic either. [ Perception of the code by a population (the only case where statistics may apply) is, firstly, of no interest, and, secondly, would be a subject of sociology. ] >>> I think that one could use that sort of technique to identify programs >>> where a lot of copy-paste repetition exists, and that is certainly >>> something one /could/ label as "bloat" -- for all it's not the only >>> meaning of "bloat", nor does that label really capture the essence of >>> what's wrong with the code. >> >> Yes, the level of reuse can be considered as one characteristic. However, >> in any language reuse comes at the cost of means used to factor out the >> repeated piece of code. Be it a class, a subprogram, a template, it always >> "bloats" a bit. > > If I refactor (or redesign) a 1MLoC program into an equivalent 200KLOC program, > then I don't think that /anyone/ could call that bloating the program. Adding > layers of abstraction? For sure. Making it harder for a total newbie to > understand? Quite possibly. But "bloated" ? Never! Under the presumption of 1/5 code size reduction? A quite hard one. But even so, 200K is much code, more than can be seen as a whole. The perception of code is local. Being "bloated" is felt about the pieces the programmer is aware right now. It is quite possible to feel these pieces bloated even if all code is actually shorter. It is irrelevant if the code "objectively" shorter, because negative effects of bloating are inflicted on the programmers not on the hard drives (with possible exceptions of course). Note also that code is difficult to insulate from the libraries and frameworks it relies on. "Bloating" can migrate. Is a 10-lines long program bloated when requires 1Gb OS in the RAM? > Another place where one might try to apply information theory, would be in the > external interface to the program. But there is already an applied science to deal with that: ergomonics. > 1) compare that model with another one for the same app ten years ago (or a > current competitor). Windows 2000 vs Windows 8 metro? (:-)) > If the user is required to supply (or consume) > significantly more "information" (as calculated objectively using the model) Two problems: 1. Measurement. Counting mouse clicks and the distance the mouse travelled is the minor concern. The real problem is how to compare mouse clicks with keystrokes, how do you quantify this. [ Stochastic models are known for not working here. ] 2. The model itself. Much "simpler" problems like ranking are not satisfactory solved until now. > 2) if the information density of the language the user uses to control the app > stays more of less constant, but the code base size increases significantly, or > the memory/cpu/disk-space required by the app increase significantly, then > again we can say that the app has become bloated in its implementation. [I leave aside meaningless "information density"] The vendor could say that the application spends more resources addressing non-functional issues, e.g. being more user friendly, more secure, easier to maintain etc. You need a measure and this measure has to be more or less additive in order to allow quantification. Information theory is no help here, not even on the subject. > Personally, I'm more interested in the possibility of analysing the code base > directly (because I'm a programmer, not a user, I suppose). I've got too many > projects on hand already, but it's tempting to go find some chunk of software > where the source history is available, and which is commonly supposed to have > bloated (the Linux kernel, perhaps, or the JRE, or even just *IX "cat") and do > some modelling and analysis. (If only the gzip hack I mentioned earlier.) I'd > need some examples of code that has grown or changed /without/ bloating too, of > course -- not quite so easy to think of candidates ;-) Yes, much (all?) code migrates as the libraries change. Even kernel code does because the hardware changes too. To have a model invariant to this... You already know how sceptical I am. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de
[toc] | [prev] | [next] | [standalone]
| From | hopcode <hopcode@invalid.de> |
|---|---|
| Date | 2012-05-05 16:23 +0200 |
| Message-ID | <jo3d62$5an$1@dont-email.me> |
| In reply to | #1540 |
Il 05.05.2012 12:50, Dmitry A. Kazakov ha scritto: > As for the "tools" these are just of mathematical statistics and nothing > else. It is applied mathematics, which per definition of has no fundamental > merit of its own. Considering the mathematical statistics, if that to apply > to the code analysis, I doubt it could be any useful here, because: > > 1. Properties of the code are not random. In the overwhelming majority of > relevant cases it is all about the deterministic behavior of the program. > > 2. Human perception of the code as being bloated or not is not stochastic > either. You set the thing as an identity ;-) in fact we just want to trace what/how is the "bloat" just in that deterministic behavior.also, those mathe-tools result to be unuseful when used in a biased way. the deterministic behavior of a program is a function of some well known variables, example: the market of compilers; the habit of using this toolchain instead of that. exactly in the same way for natural languages the information (as an useful acknowledgment) is a function of some other well known variables like gesture-recognition etc, things blah-blah belonging to semiology. but variables "without" time-space; they are there meaning something precisely, but concretely un-utterable as they were practically random in their significance ! isnt it "random" the fact that most of people likes C's toolchains ? the conkret: it is damaging for ARM the same application that contains the same "things", and behaves the same way as its counterpart on X86. because ARM, being low-power etc. doesent like the same "bloat" running on x86 platform, they say to be useful. but it is not obvious the fact that ARM will force the users to reduce those "bloated things", as used on x86. and now comes the human perception into scene. whether or not stochastic, it's an istinktive guideline; not to be neglected. in fact C's toolchains have been adapted to ARM for the sake of a presumed *perception* of people used to C's toolchains. this is in order to preserve user-habits of x86 on ARM, they say. consequently, when outputtin for ARM, the same compiler convert/hides and inserts/cuts/adapts lot of behaviours/informations automagically, they say. they. i would like to assume the above 2 points as working hypothesis, not as obvious accepted reasons/limits. information theory seems to me not such a perfect branch. it may be extended, imo. Cheers, -- .:mrk[hopcode] .:x64lab:. group http://groups.google.com/group/x64lab site http://sites.google.com/site/x64lab
[toc] | [prev] | [next] | [standalone]
| From | "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> |
|---|---|
| Date | 2012-05-05 17:43 +0200 |
| Message-ID | <1c1b2jzia4k4m.d3v0gcvuts69$.dlg@40tude.net> |
| In reply to | #1543 |
On Sat, 05 May 2012 16:23:56 +0200, hopcode wrote: > Il 05.05.2012 12:50, Dmitry A. Kazakov ha scritto: >> As for the "tools" these are just of mathematical statistics and nothing >> else. It is applied mathematics, which per definition of has no fundamental >> merit of its own. Considering the mathematical statistics, if that to apply >> to the code analysis, I doubt it could be any useful here, because: >> >> 1. Properties of the code are not random. In the overwhelming majority of >> relevant cases it is all about the deterministic behavior of the program. >> >> 2. Human perception of the code as being bloated or not is not stochastic >> either. > > You set the thing as an identity ;-) > in fact we just want to trace what/how is the "bloat" just in that > deterministic behavior.also, those mathe-tools result to be unuseful > when used in a biased way. Statistics gets misused all the time. As I said, the burden of proof is on the applicant's side. There is a set of axioms (the Kolmogorov axioms) for the probability to satisfy. If anybody wants to apply the probability theory and methods of mathematical statistics to the program behavior or human perception or whatever, he is obliged to show, what are the elementary events, how are they independent, random etc. > isnt it "random" the fact that most of people likes C's toolchains ? Don't you confuse "random" with "illogical"? My pet hypothesis that people's love to C is somehow related to the original sin. Though I must admit that my knowledge of theology is rather superficial. (:-)) -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de
[toc] | [prev] | [next] | [standalone]
| From | gremnebulin <peterdjones@yahoo.com> |
|---|---|
| Date | 2012-05-03 09:27 -0700 |
| Message-ID | <1dae75e0-2ddc-425f-99e4-3af9f7406926@k13g2000vbm.googlegroups.com> |
| In reply to | #1511 |
On Apr 29, 10:36 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> wrote: > Another is the meaning of the message. For example, Pi is incomputable, but > there is no problem to pass a message "Pi" to a recipient knowing what Pi > is. Is Pi complex? A meaningless question. Pi is computable. You could pass a finite string of code for computing Pi to a repient as well. Check out Chaitin and Kolmogorov.
[toc] | [prev] | [next] | [standalone]
| From | "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> |
|---|---|
| Date | 2012-05-03 18:50 +0200 |
| Message-ID | <3leyi3uyxhlh$.vl287d3q1va2.dlg@40tude.net> |
| In reply to | #1527 |
On Thu, 3 May 2012 09:27:22 -0700 (PDT), gremnebulin wrote: > On Apr 29, 10:36 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> > wrote: > >> Another is the meaning of the message. For example, Pi is incomputable, but >> there is no problem to pass a message "Pi" to a recipient knowing what Pi >> is. Is Pi complex? A meaningless question. > > Pi is computable. Not its decimal representation by a FSM. > You could pass a finite string of code for computing Pi > to a repient as well. I already did, quoting myself: "Pi." -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de
[toc] | [prev] | [next] | [standalone]
| From | Willem <willem@toad.stack.nl> |
|---|---|
| Date | 2012-05-04 13:52 +0000 |
| Message-ID | <slrnjq7np3.280u.willem@toad.stack.nl> |
| In reply to | #1528 |
Dmitry A. Kazakov wrote:
) On Thu, 3 May 2012 09:27:22 -0700 (PDT), gremnebulin wrote:
)> Pi is computable.
)
) Not its decimal representation by a FSM.
'computable' has a specific mathematical definition,
by which pi is computable.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
[toc] | [prev] | [next] | [standalone]
| From | "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> |
|---|---|
| Date | 2012-05-04 16:05 +0200 |
| Message-ID | <1saien0an92og.iuio4t54i82a$.dlg@40tude.net> |
| In reply to | #1531 |
On Fri, 4 May 2012 13:52:35 +0000 (UTC), Willem wrote: > Dmitry A. Kazakov wrote: > ) On Thu, 3 May 2012 09:27:22 -0700 (PDT), gremnebulin wrote: > )> Pi is computable. > ) > ) Not its decimal representation by a FSM. > > 'computable' has a specific mathematical definition, > by which pi is computable. That definition requires specification of a formal computation model. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de
[toc] | [prev] | [next] | [standalone]
| From | hopcode <hopcode@invalid.de> |
|---|---|
| Date | 2012-05-04 20:44 +0200 |
| Message-ID | <jo182e$93n$1@dont-email.me> |
| In reply to | #1532 |
Il 04.05.2012 16:05, Dmitry A. Kazakov ha scritto: > On Fri, 4 May 2012 13:52:35 +0000 (UTC), Willem wrote: > >> > Dmitry A. Kazakov wrote: >> > ) On Thu, 3 May 2012 09:27:22 -0700 (PDT), gremnebulin wrote: >> > )> Pi is computable. >> > ) >> > ) Not its decimal representation by a FSM. >> > >> > 'computable' has a specific mathematical definition, >> > by which pi is computable. > That definition requires specification of a formal computation model. pi is not computable. IIRC from the school pi is a real number; it has infinite number of decimal digits, just like the result of the division 10/3. pi-digits may be countable (enumerable?). the count of its digits after the integral part is a function depending on the limits/resources/algo implemented for it on a Turing machine i.e.,largely speaking, bound to a computation model. but please, dont forget the subject, interesting imho: "...apply information theory to source code to quantitatively determine if code is bloated or not" Cheers, -- .:mrk[hopcode] .:x64lab:. group http://groups.google.com/group/x64lab site http://sites.google.com/site/x64lab
[toc] | [prev] | [next] | [standalone]
| From | Willem <willem@toad.stack.nl> |
|---|---|
| Date | 2012-05-04 20:32 +0000 |
| Message-ID | <slrnjq8f72.upi.willem@toad.stack.nl> |
| In reply to | #1534 |
hopcode wrote:
) Il 04.05.2012 16:05, Dmitry A. Kazakov ha scritto:
)> On Fri, 4 May 2012 13:52:35 +0000 (UTC), Willem wrote:
)>
)>> > Dmitry A. Kazakov wrote:
)>> > ) On Thu, 3 May 2012 09:27:22 -0700 (PDT), gremnebulin wrote:
)>> > )> Pi is computable.
)>> > )
)>> > ) Not its decimal representation by a FSM.
)>> >
)>> > 'computable' has a specific mathematical definition,
)>> > by which pi is computable.
)> That definition requires specification of a formal computation model.
)
) pi is not computable.
Again: According to mathematicians, it *is* computable.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.programming
csiph-web