Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!novia!news-out.readnews.com!news-xxxfer.readnews.com!news.misty.com!news.iecc.com!nerds-end From: BGB Newsgroups: comp.compilers Subject: Re: Need an interesting topic for an undergraduate project on Compilers Date: Sat, 06 Aug 2011 14:10:12 -0700 Organization: albasani.net Lines: 122 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <11-08-008@comp.compilers> References: <11-08-006@comp.compilers> NNTP-Posting-Host: news.iecc.com X-Trace: gal.iecc.com 1312678790 99980 64.57.183.58 (7 Aug 2011 00:59:50 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Sun, 7 Aug 2011 00:59:50 +0000 (UTC) Keywords: courses Posted-Date: 06 Aug 2011 20:59:50 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: x330-a1.tempe.blueboxinc.net comp.compilers:225 On 8/6/2011 10:28 AM, amit karmakar wrote: > I would like to have some suggestions as to what *new* and > *innovative* project i can do which are based on compiler design. > Also, considering the time i have to implement the compiler, i can > think of cutting down work, like working on subset of a language. I > would preferably not tend to work on only a specific part(phase) of > compiler. It will be better if I implement a complete compiler for > some architecture and see the executable running. new+innovative and compilers, don't often go together, and another problem is that terms like new/innovative/interesting/... depend highly on who one is dealing with and their personal biases and preferences (a cool idea for one person, may be considered stale, boring, unworkable, ... by another). a few thoughts: most traditional research into compilers has been in how to squeeze as much performance as possible out of them. maybe one can look into trying for new and interesting features instead. rather than work on subset languages, maybe it may make sense to work with a simpler language design. for example, a fairly simple language is Scheme (except for a few edge cases) where often a person can throw together a working implementation fairly quickly (or, at least IME with R5RS and earlier, dunno about R6RS as I was mostly no longer dealing much with Scheme by this point, and R6RS at the time looked a bit strange vs what came before). a slightly less simplistic, but still relatively simple language, is ECMAScript (basic core language for JavaScript, ActionScript, ...). probably not worth trying to implement up-front are languages like: C or C++ (fairly complex languages to implement); Java (a lot more hairy than it looks, syntax can be deceiving); ... note that dynamic typing generally makes things much easier to implement (static typing makes things faster, and is "closer to the metal", ... but it doesn't make things easier). a more recent language of mine is using a "soft typing" model, which basically combines elements of static typing on top of an otherwise dynamically-typed VM (potentially using types as optimization hints in the codegen, but treating type-checking, behavioral semantics, and optimization, as separate issues). personally, I like RPN / Stack-Machine style ILs (recently got into a big argument over this though, a person who for whatever reason really dislikes stack-machine ILs despite them being well proven in the JVM, .NET, AVM2, ...). examples of stack-machine languages would include Forth, PostScript, Factor, ... (PostScript has had a notable influence on the design of my ILs). the upside of stack machines is that they are fairly easy to produce code for (it is often very straightforward to unwind an AST into a stack machine format), are themselves relatively simple, and are very capable despite their relative simplicity. a downside though is that they are relatively fussy about ordering issues, and a general-purpose native codegen can get a bit hairy (mostly due to ABI interfacing, for example, the SysV/AMD64 ABI is itself a complex beast, and one has to effectively "pull a rabbit out of a hat" to mesh it up directly with a stack machine IL). they are also far less "du jour" with many people than are other options, such as TAC-SSA (Three Address Code - Static Single Assignment). granted, things should be much simpler if one doesn't want to go about trying to directly call into native (statically-compiled) code, but instead uses special functions to marshal the calls (I have later found that this strategy can be fairly transparent as well). also possibly useful is allowing for eval/... as well... also, in my case, working to try to make the C interface fairly transparent (marshaling calls and data-types and similar in both directions, ideally eliminating nearly all cases of manually-written boilerplate code). ideally, the time of isolated languages and frameworks, and of languages which don't have features like eval, will soon be nearing an end (this doesn't mean I want many of the existing languages to go away, but ideally most should have eval as a relatively common library feature, ...). for example, my language has: "native import C.foo;" which allows implementing libraries from C land (the foo is a library name, and where a tool is used to mine information from C headers/...). "native package C.foo { ...body... }" allows exporting the code ("...body...") to C land (in this case, the boilerplate is written automatically by a tool). granted, yes, none of this is really terribly new or original, as most of this has been around for decades. as for languages containing some interesting ideas: Scheme (nice core language design); Self (nice object system, partly carried over in a limited form into JavaScript); PostScript (relatively clean stack-machine model); ECMAScript / JavaScript (simplistic yet conventional syntax); ActionScript (like JavaScript but more "grown up"); Erlang (concurrent programming features); ... granted, to be original, one needs to be, errm, original. like maybe try to come up with some new/interesting language feature or idea to try exploring, or something interesting to do at the compiler/codegen level, ...