Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!news.iecc.com!nerds-end From: BGB Newsgroups: comp.compilers Subject: Re: Have we reached the asymptotic plateau of innovation in programming la Date: Thu, 15 Mar 2012 00:06:46 -0700 Organization: albasani.net Lines: 354 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <12-03-039@comp.compilers> References: <12-03-012@comp.compilers> <12-03-014@comp.compilers> <12-03-022@comp.compilers> <12-03-027@comp.compilers> <12-03-030@comp.compilers> <12-03-035@comp.compilers> NNTP-Posting-Host: news.iecc.com X-Trace: leila.iecc.com 1332055646 53675 64.57.183.58 (18 Mar 2012 07:27:26 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Sun, 18 Mar 2012 07:27:26 +0000 (UTC) Keywords: design, history Posted-Date: 18 Mar 2012 03:27:26 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:506 On 3/13/2012 10:19 PM, glen herrmannsfeldt wrote: > BGB wrote: > > (snip) >> in such a scenario, language convergence will have been so widespread >> that people may have, by this point, ceased to clarify which language >> they were using, since most would have become largely copy-paste >> compatible anyways. > > Well, first there is the division between interpreted languages, such > as Mathematica, Matlab, S/R, ..., and for that matter, Excel, that are > good for quick one time problems, and compiled languages that are > faster when you want to do something many times. > > It seems to me that division will stay, though the languages could > still tend to converge. > there are languages that straddle the border, such as JavaScript and ActionScript, where the language use ranges from purely interpreted (such as a naive JavaScript implementation), to statically-compiled (some uses of Flash and Flex apparently have people compiling ActionScript into native code, in addition to past versions of Flash using an interpreter and newer versions using a JIT). so, it may be more a matter of tiny DSLs vs larger general-purpose languages, than of compiled-vs-interpreted languages. many DSLs may be interpreted, yes, but some may also be compiled. likewise, some industrial-strength / "compiled" languages, such as Java, C, and C++, may sometimes (if rarely) be run within an interpreted environment. > (snip) > >> what about a language which is more complex than JavaScript, maybe >> roughly on-par with C or Java, and generally simpler than C++ and C# ? >> what about a VM where the bytecode has 100s of unique operations? >> ... > > The bytecode question should be independent of the language, and > should be optimized, as RISC processors are, for compiler generated > code instead of human written assembly code. > it typically is focused on compiler output. the bytecode is actually somewhat specialized for the compiler output, and has a number of opcodes mapping to high-level language constructs. however, as-is, there is a reasonably tight language/bytecode coupling, but this is planned to be relaxed some (eventually...). a particular example in my case (mentioned previously) was "lpostinc" (along with "lpostinc_fn"), which map, fairly directly, to the expression "i++" (both for lexical scope, where the latter 'fn' opcode being for the case when 'i' is known to be an integer or fixnum type). so, consider a few common cases ('x'=lexical+generic, 'i'=lexical+integer, 'v'=global/package scope, and ';' to mark statement): inc_s "v++;" or "++v;" dec_s "v--;" or "--v;" linc "x++;" or "++x;" ldec "x--;" or "--x;" linc_fn "i++;" or "++i;" ldec_fn "i--;" or "--i;" postinc_s "v++" preinc_s "++v" lpostinc "x++" lpreinc "++x" lpostinc_fn "i++" lpreinc_fn "++i" postdec_s "v--" predec_s "--v" ... however, this level of specificity helps somewhat when one is dealing with a plain interpreter, since many of these operations are fairly common, and dedicating more opcodes to these special cases helps reduce passes through the interpreter loop (as well as the number of push/pop operations on the stack, ...). for example, one can note the absence of float-specialized versions, but there is a reason: because "++" and "--" are very rarely used on floats (however, there are float specialized versions of many other ops). likewise, there are some similar combinations for things like load+compare+jump as well. for example, "if(i>j)" can be encoded as a single bytecode operation. also some opcodes dedicated to implementing "switch()" as well (after noting that each "case" ended up producing a 3 opcode chain which could be collapsed to 1 opcode, ...). but, yes, if one follows these sorts of patterns, it is not hard to figure out why there are 100s of opcodes (errm... closer to about 500 at present). most recent additions have been more narrowly focused though, mostly related to adding new functionality, rather than related to micro-optimizing things. >>but, OTOH: >> in a language like JavaScript you can type "a+b*c", and get the expected >> precedence. > >> this is different than typing, say (in Scheme): >> "(+ a (* b c))" >> or (in Self): >> "b * c + a" (noting that "a + b * c" will give a different answer). > > Well, first there are a number of precedence cases in C that many > would have done differently. But also there is the question of which > should be an operator and which a function. Note that C has the % > operator and pow() function, where Fortran has mod() and **. > but, having precedence is different than not having precedence (say, requiring parens for everything, or everything being left-associative...). people at least sort of expect PEMDAS to be followed, even if yes, the standard C family operator precedences are far from ideal (it is mostly a matter of leaving them as-is, and have stupid precedence, or changing them and have an increased risk of people writing code which breaks because it had assumed the standard C-style operator precedences). >> as I see it, the sorts of minimalism where one can't "afford" to have >> things like operator precedence, or including conventional control-flow >> mechanisms, is needless minimalism. > > Then there are funny little differences, where language designers try > to force a programming style. Fortran added the ability to use > floating point variables as DO loop variables in Fortran 77, then took > it away again in Fortran 90. > both C-family languages and my language allow using whatever for looping will go there (could be an integer, or floats, or walking over characters in a string, ...). > Another difference is the abilty to use array or structure expressions > instead of explicit loops. > yes. C family languages have traditionally not had this, but in newer ones there is "foreach()" or "for(...in...)". >> most real programmers have better things to do than sit around working >> around awkward ways of expressing arithmetic, and figuring out how to >> accomplish all of their control flow via recursion and if/else (and so >> help you if you want something like sane file IO, or sockets, or >> threads, ...). > > But each time you make something easier to use, something else usually > gets at least slightly harder. You don't really want 200 different > operators, with way too many precedence levels. It is just too hard > for humans, not that it bothers compilers much at all. > fair enough. "the basics" are needed though. >> (and, it is not necessarily a good sign when things like loops, file IO, >> arrays, ... are supported by an implementation as... language >> extensions...). > > Well, yes, but how many different loop constructs do you need? > probably all the standard ones: "for()", "while()", "do ... while();", ... ideally also with "break" and "continue". when one has to roll their own loops via recursion and conditionals and similar, it often renders break and continue problematic (the loop instead becomes an awkward mess of deeply nested if/else chains). granted, I have before seen people get annoyed at things like doing a "return" in the middle of a loop, ... but, if the rest of the function is known to be unnecessary, what is the point of having to wait for it to reach the end of the function?... granted, these are probably the same types of people who complain about things like "while(i>1;" and "(4.0/3)*x/(1<> but, many people hate on C and C++ and so on, claiming that languages >> "should" have such minimalist syntax and semantics. however, such >> minimalist languages have generally failed to gain widespread acceptance. > >> likewise, although a person can make an interpreter with a small number >> of total opcodes, typically this means the program will need a larger >> number of them to complete a task, and thus run slower. > > OK, but high-level language design should be toward making things > easier for humans. Unless there is a big trend back toward assembly > programming, instruction sets, including byte-code interpreters, should > be designed for faster machine processing, and not for people. > > (It is useful in debugging if it isn't too hard for humans to read, > but most of the time that shouldn't be needed.) > yes. hence, why there are lots of opcodes: lots of opcodes means shorter opcode traces, and generally faster execution by an interpreter. granted, a JIT doesn't really care so much, but it is easier for a JIT to decompose complex instructions than it is for an interpreter to recompose macro-operations from long chains of minimal instructions. >> for example, a person could make an interpreter with roughly 3 opcodes >> which fairly directly implements lambda calculus... but it will perform >> like crap. > >> 10 or 15 is a bit better, then one probably at least has "the basics". > > Well, you can have the data tagged such that only one add instruction > is needed to add any size or type (byte, short, int, long, float, > double, etc.) or separate opcodes for each. It doesn't make a huge > difference, but likely you could see the performance difference. > with 10 or 15 opcode ISAs, typically one has opcodes like "binary" and "unary" which deal with all binary and unary ops, or (worse yet) they are implemented as function calls. my VM uses a mixed strategy, with a larger operator set which is accessible from the "binary" or "unary" opcodes, a number which are accessible directly by bytecode ops, and some number which are type-specialized opcodes. "add_fn", now it knows that it is both doing an add operation and that it is dealing with integers, allowing it to skip some amount of internal logic. "add", OTOH, tells what operation is being performed, but still requires additional type-checking. the main operators with dedicated opcodes are: binary: add/sub/mul/div, and/or/xor/shl/shr/shrr (int) unary: pos/neg/not/lnot conditional: eqv/neqv/eq/neq/lt/gt/le/ge/unord (eqv/neqv is ==/!=, eq/neq is ===/!==) in my current (incomplete) JIT, the JIT mostly re-merges them and then performs more advanced type analysis, but again this level of specialization can help more with an interpreter, even if mostly redundant for a JIT. >> with several hundred opcodes, arguably a lot of them are "redundant", >> being easily expressible in terms of "simpler" opcodes, but at the same >> time, a single opcode can express what would otherwise require a chain >> of simpler opcodes. > > The RISC vs. CISC argument in processor architecture. > yeah, pretty much. my design is probably fairly solidly CISC. I am not sure how directly they compare, but the design as it was mostly emerged some years back when a past version of my interpreter was stuck at "the switch limit", which basically means it was bounded primarily by how rapidly it could spin through a switch block. finding ways to shorten the chain was the main way of making it faster (at the time, and was mostly driven by a mix of looking at profiler output and instruction traces for compiled bytecode). more recently, this is less of an issue (partly due to me switching to using threaded code for interpretation instead of a giant switch block), but it probably still helps some (albeit, according to the profiler most of the time at present in benchmarks goes apparently into getting/setting lexical variables and similar). granted, not everything in that story turned out well: after my successes (with that particular interpreter, and a short-lived JIT which followed), I spent several years in a partially performance-crazed state, writing an ill-fated C compiler and a bunch of overly complex micro-optimized systems in the process. (the C compiler project might have turned out better had I made sure the everything worked solidly before attempting to micro-optimize it, since it ended up being a mess which ultimately proved beyond my abilities to debug...). luckily, not everything turned out all that badly either... at least I ended up with a spiffy C FFI, and a reasonably fast (if albeit horridly complicated) Class/Instance OO API. >> like, "wow, there is this here 'lpostinc' opcode to load a value from a >> variable and store an incremented version of the value back into the >> variable". is this opcode justified vs, say: "load x; dup; push 1; >> binary add; store x;"? I figure such cases are likely justified (they do >> tend to show favorably in a benchmark). > > >> OTOH, a person can go too far in the other direction as well. > > Like VAX. > possibly... hmm, what would be the output for "while(i>1;"?... probably something like (variable indices chosen arbitrarily): L1: jmp_ge_lfn 1,2,L0 lload_f 1 shr_fnc 1 lload_f 1 add_fn lstore_f 1 jmp L1 L0: idle thought, if I had a few more opcodes... I could shave down the above sequence to, say: L1: jmp_ge_lfn 1,2,L0 lxl_shr_fnc 1,1 lxs_add_fn 1 jmp L1 L0: these would likely add a chunk of approx 22-30 new opcodes. but, alas, I probably have more than enough complex opcodes as-is...