Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!news.iecc.com!nerds-end
From: BGB <cr88192@hotmail.com>
Newsgroups: comp.compilers
Subject: Re: Have we reached the asymptotic plateau of innovation in programming la
Date: Thu, 15 Mar 2012 00:06:46 -0700
Organization: albasani.net
Lines: 354
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <12-03-039@comp.compilers>
References: <12-03-012@comp.compilers> <12-03-014@comp.compilers> <12-03-022@comp.compilers> <12-03-027@comp.compilers> <12-03-030@comp.compilers> <12-03-035@comp.compilers>
NNTP-Posting-Host: news.iecc.com
X-Trace: leila.iecc.com 1332055646 53675 64.57.183.58 (18 Mar 2012 07:27:26 GMT)
X-Complaints-To: abuse@iecc.com
NNTP-Posting-Date: Sun, 18 Mar 2012 07:27:26 +0000 (UTC)
Keywords: design, history
Posted-Date: 18 Mar 2012 03:27:26 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:506

On 3/13/2012 10:19 PM, glen herrmannsfeldt wrote:
> BGB<cr88192@hotmail.com>  wrote:
>
> (snip)
>> in such a scenario, language convergence will have been so widespread
>> that people may have, by this point, ceased to clarify which language
>> they were using, since most would have become largely copy-paste
>> compatible anyways.
>
> Well, first there is the division between interpreted languages, such
> as Mathematica, Matlab, S/R, ..., and for that matter, Excel, that are
> good for quick one time problems, and compiled languages that are
> faster when you want to do something many times.
>
> It seems to me that division will stay, though the languages could
> still tend to converge.
>

there are languages that straddle the border, such as JavaScript and
ActionScript, where the language use ranges from purely interpreted
(such as a naive JavaScript implementation), to statically-compiled
(some uses of Flash and Flex apparently have people compiling
ActionScript into native code, in addition to past versions of Flash
using an interpreter and newer versions using a JIT).

so, it may be more a matter of tiny DSLs vs larger general-purpose
languages, than of compiled-vs-interpreted languages.


many DSLs may be interpreted, yes, but some may also be compiled.
likewise, some industrial-strength / "compiled" languages, such as Java,
C, and C++, may sometimes (if rarely) be run within an interpreted
environment.


> (snip)
>
>> what about a language which is more complex than JavaScript, maybe
>> roughly on-par with C or Java, and generally simpler than C++ and C# ?
>> what about a VM where the bytecode has 100s of unique operations?
>> ...
>
> The bytecode question should be independent of the language, and
> should be optimized, as RISC processors are, for compiler generated
> code instead of human written assembly code.
>

it typically is focused on compiler output.

the bytecode is actually somewhat specialized for the compiler output,
and has a number of opcodes mapping to high-level language constructs.
however, as-is, there is a reasonably tight language/bytecode coupling,
but this is planned to be relaxed some (eventually...).


a particular example in my case (mentioned previously) was "lpostinc"
(along with "lpostinc_fn"), which map, fairly directly, to the
expression "i++" (both for lexical scope, where the latter 'fn' opcode
being for the case when 'i' is known to be an integer or fixnum type).

so, consider a few common cases ('x'=lexical+generic,
'i'=lexical+integer, 'v'=global/package scope, and ';' to mark statement):
inc_s		"v++;" or "++v;"
dec_s		"v--;" or "--v;"
linc		"x++;" or "++x;"
ldec		"x--;" or "--x;"
linc_fn		"i++;" or "++i;"
ldec_fn		"i--;" or "--i;"
postinc_s	"v++"
preinc_s	"++v"
lpostinc	"x++"
lpreinc		"++x"
lpostinc_fn	"i++"
lpreinc_fn	"++i"
postdec_s	"v--"
predec_s	"--v"
...

however, this level of specificity helps somewhat when one is dealing
with a plain interpreter, since many of these operations are fairly
common, and dedicating more opcodes to these special cases helps reduce
passes through the interpreter loop (as well as the number of push/pop
operations on the stack, ...).

for example, one can note the absence of float-specialized versions, but
there is a reason: because "++" and "--" are very rarely used on floats
(however, there are float specialized versions of many other ops).

likewise, there are some similar combinations for things like
load+compare+jump as well.

for example, "if(i>j)" can be encoded as a single bytecode operation.

also some opcodes dedicated to implementing "switch()" as well (after
noting that each "case" ended up producing a 3 opcode chain which could
be collapsed to 1 opcode, ...).


but, yes, if one follows these sorts of patterns, it is not hard to
figure out why there are 100s of opcodes (errm... closer to about 500 at
present).

most recent additions have been more narrowly focused though, mostly
related to adding new functionality, rather than related to
micro-optimizing things.


>>but, OTOH:
>> in a language like JavaScript you can type "a+b*c", and get the expected
>> precedence.
>
>> this is different than typing, say (in Scheme):
>> "(+ a (* b c))"
>> or (in Self):
>> "b * c + a" (noting that "a + b * c" will give a different answer).
>
> Well, first there are a number of precedence cases in C that many
> would have done differently. But also there is the question of which
> should be an operator and which a function. Note that C has the %
> operator and pow() function, where Fortran has mod() and **.
>

but, having precedence is different than not having precedence (say,
requiring parens for everything, or everything being left-associative...).

people at least sort of expect PEMDAS to be followed, even if yes, the
standard C family operator precedences are far from ideal (it is mostly
a matter of leaving them as-is, and have stupid precedence, or changing
them and have an increased risk of people writing code which breaks
because it had assumed the standard C-style operator precedences).


>> as I see it, the sorts of minimalism where one can't "afford" to have
>> things like operator precedence, or including conventional control-flow
>> mechanisms, is needless minimalism.
>
> Then there are funny little differences, where language designers try
> to force a programming style. Fortran added the ability to use
> floating point variables as DO loop variables in Fortran 77, then took
> it away again in Fortran 90.
>

both C-family languages and my language allow using whatever for looping
will go there (could be an integer, or floats, or walking over
characters in a string, ...).


> Another difference is the abilty to use array or structure expressions
> instead of explicit loops.
>

yes.

C family languages have traditionally not had this, but in newer ones
there is "foreach()" or "for(...in...)".


>> most real programmers have better things to do than sit around working
>> around awkward ways of expressing arithmetic, and figuring out how to
>> accomplish all of their control flow via recursion and if/else (and so
>> help you if you want something like sane file IO, or sockets, or
>> threads, ...).
>
> But each time you make something easier to use, something else usually
> gets at least slightly harder. You don't really want 200 different
> operators, with way too many precedence levels. It is just too hard
> for humans, not that it bothers compilers much at all.
>

fair enough.

"the basics" are needed though.


>> (and, it is not necessarily a good sign when things like loops, file IO,
>> arrays, ... are supported by an implementation as... language
>> extensions...).
>
> Well, yes, but how many different loop constructs do you need?
>

probably all the standard ones:
"for()", "while()", "do ... while();", ...

ideally also with "break" and "continue".

when one has to roll their own loops via recursion and conditionals and
similar, it often renders break and continue problematic (the loop
instead becomes an awkward mess of deeply nested if/else chains).


granted, I have before seen people get annoyed at things like doing a
"return" in the middle of a loop, ... but, if the rest of the function
is known to be unnecessary, what is the point of having to wait for it
to reach the end of the function?...

granted, these are probably the same types of people who complain about
things like "while(i<j)i+=i>>1;" and "(4.0/3)*x/(1<<i)" and similar as well.


>> but, many people hate on C and C++ and so on, claiming that languages
>> "should" have such minimalist syntax and semantics. however, such
>> minimalist languages have generally failed to gain widespread acceptance.
>
>> likewise, although a person can make an interpreter with a small number
>> of total opcodes, typically this means the program will need a larger
>> number of them to complete a task, and thus run slower.
>
> OK, but high-level language design should be toward making things
> easier for humans. Unless there is a big trend back toward assembly
> programming, instruction sets, including byte-code interpreters, should
> be designed for faster machine processing, and not for people.
>
> (It is useful in debugging if it isn't too hard for humans to read,
> but most of the time that shouldn't be needed.)
>

yes.

hence, why there are lots of opcodes:
lots of opcodes means shorter opcode traces, and generally faster
execution by an interpreter.

granted, a JIT doesn't really care so much, but it is easier for a JIT
to decompose complex instructions than it is for an interpreter to
recompose macro-operations from long chains of minimal instructions.


>> for example, a person could make an interpreter with roughly 3 opcodes
>> which fairly directly implements lambda calculus... but it will perform
>> like crap.
>
>> 10 or 15 is a bit better, then one probably at least has "the basics".
>
> Well, you can have the data tagged such that only one add instruction
> is needed to add any size or type (byte, short, int, long, float,
> double, etc.) or separate opcodes for each. It doesn't make a huge
> difference, but likely you could see the performance difference.
>

with 10 or 15 opcode ISAs, typically one has opcodes like "binary" and
"unary" which deal with all binary and unary ops, or (worse yet) they
are implemented as function calls.

my VM uses a mixed strategy, with a larger operator set which is
accessible from the "binary" or "unary" opcodes, a number which are
accessible directly by bytecode ops, and some number which are
type-specialized opcodes.


"add_fn", now it knows that it is both doing an add operation and that
it is dealing with integers, allowing it to skip some amount of internal
logic.

"add", OTOH, tells what operation is being performed, but still requires
additional type-checking.

the main operators with dedicated opcodes are:
binary: add/sub/mul/div, and/or/xor/shl/shr/shrr (int)
unary: pos/neg/not/lnot
conditional: eqv/neqv/eq/neq/lt/gt/le/ge/unord
(eqv/neqv is ==/!=, eq/neq is ===/!==)

in my current (incomplete) JIT, the JIT mostly re-merges them and then
performs more advanced type analysis, but again this level of
specialization can help more with an interpreter, even if mostly
redundant for a JIT.


>> with several hundred opcodes, arguably a lot of them are "redundant",
>> being easily expressible in terms of "simpler" opcodes, but at the same
>> time, a single opcode can express what would otherwise require a chain
>> of simpler opcodes.
>
> The RISC vs. CISC argument in processor architecture.
>

yeah, pretty much.
my design is probably fairly solidly CISC.


I am not sure how directly they compare, but the design as it was mostly
emerged some years back when a past version of my interpreter was stuck
at "the switch limit", which basically means it was bounded primarily by
how rapidly it could spin through a switch block.

finding ways to shorten the chain was the main way of making it faster
(at the time, and was mostly driven by a mix of looking at profiler
output and instruction traces for compiled bytecode).

more recently, this is less of an issue (partly due to me switching to
using threaded code for interpretation instead of a giant switch block),
but it probably still helps some (albeit, according to the profiler most
of the time at present in benchmarks goes apparently into
getting/setting lexical variables and similar).


granted, not everything in that story turned out well:
after my successes (with that particular interpreter, and a short-lived
JIT which followed), I spent several years in a partially
performance-crazed state, writing an ill-fated C compiler and a bunch of
overly complex micro-optimized systems in the process.

(the C compiler project might have turned out better had I made sure the
everything worked solidly before attempting to micro-optimize it, since
it ended up being a mess which ultimately proved beyond my abilities to
debug...).


luckily, not everything turned out all that badly either...

at least I ended up with a spiffy C FFI, and a reasonably fast (if
albeit horridly complicated) Class/Instance OO API.


>> like, "wow, there is this here 'lpostinc' opcode to load a value from a
>> variable and store an incremented version of the value back into the
>> variable". is this opcode justified vs, say: "load x; dup; push 1;
>> binary add; store x;"? I figure such cases are likely justified (they do
>> tend to show favorably in a benchmark).
>
>
>> OTOH, a person can go too far in the other direction as well.
>
> Like VAX.
>

possibly...



hmm, what would be the output for "while(i<j)i+=i>>1;"?...

probably something like (variable indices chosen arbitrarily):
L1: jmp_ge_lfn 1,2,L0
lload_f 1
shr_fnc 1
lload_f 1
add_fn
lstore_f 1
jmp L1
L0:

idle thought, if I had a few more opcodes...
I could shave down the above sequence to, say:
L1: jmp_ge_lfn 1,2,L0
lxl_shr_fnc 1,1
lxs_add_fn 1
jmp L1
L0:

these would likely add a chunk of approx 22-30 new opcodes.

but, alas, I probably have more than enough complex opcodes as-is...