Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #642 > unrolled thread

Thoughts on the JVM as a compilation Target?

Started by"Aaron W. Hsu" <arcfide@sacrideo.us>
First post2012-05-22 13:33 -0400
Last post2012-05-24 13:06 -0500
Articles 9 — 6 participants

Back to article view | Back to comp.compilers


Contents

  Thoughts on the JVM as a compilation Target? "Aaron W. Hsu" <arcfide@sacrideo.us> - 2012-05-22 13:33 -0400
    Re: Thoughts on the JVM as a compilation Target? glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2012-05-24 04:21 +0000
      Re: Thoughts on the JVM as a compilation Target? Jeremy Wright <jeremy.wright@microfocus.com> - 2012-05-24 08:30 +0000
        Re: Thoughts on the JVM as a compilation Target? BGB <cr88192@hotmail.com> - 2012-05-25 14:26 -0500
          Re: Thoughts on the JVM as a compilation Target? torbenm@diku.dk (Torben Ægidius Mogensen) - 2012-05-29 17:40 +0200
      Re: Thoughts on the JVM as a compilation Target? "Aaron W. Hsu" <arcfide@sacrideo.us> - 2012-05-24 10:05 -0400
    Re: Thoughts on the JVM as a compilation Target? torbenm@diku.dk (Torben Ægidius Mogensen) - 2012-05-24 10:34 +0200
    Re: Thoughts on the JVM as a compilation Target? "lpsantil@gmail.com" <lpsantil@gmail.com> - 2012-05-24 10:39 -0700
    Re: Thoughts on the JVM as a compilation Target? BGB <cr88192@hotmail.com> - 2012-05-24 13:06 -0500

#642 — Thoughts on the JVM as a compilation Target?

From"Aaron W. Hsu" <arcfide@sacrideo.us>
Date2012-05-22 13:33 -0400
SubjectThoughts on the JVM as a compilation Target?
Message-ID<12-05-013@comp.compilers>
Hey folks:

What are your thoughts on JVM as a compilation target, especially with new
languages all targeting high performance or multi-core type features?

--
Aaron W. Hsu | arcfide@sacrideo.us | http://www.sacrideo.us
Programming is just another word for the lost art of thinking.

[toc] | [next] | [standalone]


#644

Fromglen herrmannsfeldt <gah@ugcs.caltech.edu>
Date2012-05-24 04:21 +0000
Message-ID<12-05-015@comp.compilers>
In reply to#642
Aaron W. Hsu <arcfide@sacrideo.us> wrote:

> What are your thoughts on JVM as a compilation target,
> especially with new languages all targeting high
> performance or multi-core type features?

Some year ago I was wondering about JVM as a target for a C compiler.
It is a little tricky, as there are some things that people expect
even though the C standard doesn't require them.

I once knew about a COBOL compiler for JVM, though I don't know
any more than that.

What language(s) were you thinking about?

-- glen

[toc] | [prev] | [next] | [standalone]


#645

FromJeremy Wright <jeremy.wright@microfocus.com>
Date2012-05-24 08:30 +0000
Message-ID<12-05-016@comp.compilers>
In reply to#644
Micro Focus has a COBOL compiler that compiles direct to JVM byte code.

Last year's The Server Side Java Symposium had a panel discussion "Who Invited
All These Other Languages to My JVM?" but I can't find a write up anywhere.

To answer the specific question, the lack of "unsigned" can be inconvenient.

Jeremy

[toc] | [prev] | [next] | [standalone]


#653

FromBGB <cr88192@hotmail.com>
Date2012-05-25 14:26 -0500
Message-ID<12-05-024@comp.compilers>
In reply to#645
On 5/24/2012 3:30 AM, Jeremy Wright wrote:
> Micro Focus has a COBOL compiler that compiles direct to JVM byte code.
>
> Last year's The Server Side Java Symposium had a panel discussion "Who Invited
> All These Other Languages to My JVM?" but I can't find a write up anywhere.

yeah.

in general though, single-language VMs may be slowly coming to an end
(along with hopefully the "one language to rule them all" developer
mindset, but alas, for humans this one may be an intrinsic feature).


> To answer the specific question, the lack of "unsigned" can be inconvenient.


among other things.

(others may disagree on things here, I don't make any claim to be an
expert on the JVM here).


maybe trying for a small list here:
lack of unsigned operations;
lack of pointers;
lack of function or method references;
lack of variable references (need not be pointers);
lack of a lexical environment;
lack of good dynamic types;
lack of package-scoped declarations;
lack of operators over object types;
lack of pass-by-value / structure types;
lack of RAII or similar;
lack of a good C FFI.


lack of unsigned operations:
wouldn't need its own types, but a few additional integer operations
would be sufficient ("idivu", "imodu", "ldivu", "lmodu", ...).


lack of pointers:
this is more complex, but pointers are not outside what can likely be
handled by a verifier. if the pointer does something which can't be
validated or is clearly wrong, then the code is rejected. potentially,
some other cases could be handled using run-time bounds-checks (similar
to arrays), say the pointer is not allowed to "leave" the object it
points into, ... the verifier could also disallow using pointers to
spoof references (not allowing any writes of a non-reference value into
a reference).


lack of function or method references:
this is sort of done by the newer "invokedynamic" mechanism (JDK 1.7),
although IIRC these handles are kept internal, and are still not
directly usable as first-class values.


lack of variable references:
obvious enough, there are cases where things like this could be useful.


lack of a lexical environment:
this one is a bit of an issue IMO, as there is no good way I know of to
do lexical scoping in the JVM without it being rather costly. it can be
faked though, and one at least passable mechanism could be to use arrays
(any shared/captured variables are held in these arrays). objects are
also possible (and could be higher performance), but then the compiler
is likely to spit out a large number of inner classes in this case.

note that although Java has recently added lambdas, they are implemented
via a trick, namely classes with "SAM" ("Single Abstract Method")
interfaces, and all captured bindings are "implicitly final" (the values
are captured and stored into object fields).

another use case of lexical environments though is to implement a method
which may have parts of its body folded off into other code-blocks
(useful for implementing things like "ifdef" style mechanisms or
similar, which may need to share bindings but don't actually need to
"capture" them).


lack of good dynamic types:
although most dynamically-typed operations can be done via "Object", it
is not particularly efficient.

lack of "fixnum" and "flonum" types is a notable issue here, as
allocating large numbers of "Integer" or "Double" objects on the heap
can easily become horridly expensive.

apparently Oracle is working on this (experimental JVMs have added
fixnum support).



lack of package-scoped declarations:
this one does have some reason, in that it would likely require a
partial redesign of the class file-format and loader to make it work
well (making the "class" files "contain" classes, rather than "be"
classes). one example here would be if most of the currently fixed
structures were relocated into the constant pool (with multiple classes
in a class file, and possibly with special "package" classes for holding
package-scoped declarations).

a cheap trick here is, of course, to create a hidden "default" class or
similar for the package, which basically contains any package-scoped
declarations, but this is kind of ugly.


lack of operators over object types:
this one is a bit of a hassle IMO, although it can be argued that, say:
aadd "Lfoo/Bar;Lfoo/Baz;"
offers little over something like, say:
invokestatic "foo/Bar/operator_add(Lfoo/Bar;Lfoo/Baz;)Lfoo/Bar;"
but still...


lack of pass-by-value / structure types:
though to be fair, this one is apparently on Oracle's "planned features"
list.
as-is, this is a bit of annoyance for wanting to extend the basic
numeric tower without taking a performance hit.


lack of RAII or similar:
enough said, besides just reclaiming the memory of an object, it may be
useful to also be able to handle logic for when it goes out of scope.

this can be faked though, such as via generating implicit "finally"
blocks, and generating explicit destructor calls for when the object
leaves scope, but this is lame and brittle IMO.

nevermind if the JVM had some sort of explicit "delete" operation
(basically so the compiler can give a hint to the VM that it no longer
needs the object).


lack of a good C FFI:
apparently also on Oracle's to-do list.
admittedly, JNI and JNA are so bad I was almost left to wonder if Sun
originally did it intentionally. I have little idea what exactly
Oracle's plan here is exactly.


...


also, although arguably the JVM is fairly high-performance by VM
standards, it still tends to run a bit slower than what is possible via
code written in C, which in turn is slower than what is possible using ASM.

usually, optimized Java code can compete with naive C code, and
optimized C code can compete with naive ASM (and well optimized ASM
beats pretty much anything). but, it isn't really a major difference.

the main reason is not so much due to elaborate optimizations, but more
often what doesn't need to be optimized away in the first place.

note that an app written directly in ASM may actually easily be slower
than one written in C or Java, partly due to the limitations of a human
programmer. this is less true of C vs Java, as the actual difference in
terms of "level of abstraction" is likely much smaller than it is often
made out to be.


an advantage though that the JVM has is that, yes, it is more portable
than targeting ASM. also, it may make more sense if one is operating
more within the "Java landscape", rather than starting from the "C and
C++ landscape".

also, although C is not widely binary portable, code portability usually
works "well enough".


I personally like .NET a little more than the JVM, but it introduced
some of its own problems and limitations, and presently has a lack in
terms of being as widely or effectively implemented as the JVM.

there is also the AVM2 (AKA: Adobe Flash), which despite being an
interesting piece of technology, seems to have a less certain future.


as I see it, there is no ideal solution at present.
so, pick something and run with it I guess.

personally, I just stay within native-land, as personally this better
matches what I am doing.

[toc] | [prev] | [next] | [standalone]


#656

Fromtorbenm@diku.dk (Torben Ægidius Mogensen)
Date2012-05-29 17:40 +0200
Message-ID<12-05-027@comp.compilers>
In reply to#653
BGB <cr88192@hotmail.com> writes:


> maybe trying for a small list here:
> lack of unsigned operations;
> lack of pointers;
> lack of function or method references;
> lack of variable references (need not be pointers);
> lack of a lexical environment;
> lack of good dynamic types;
> lack of package-scoped declarations;
> lack of operators over object types;
> lack of pass-by-value / structure types;
> lack of RAII or similar;
> lack of a good C FFI.

A few more:

 - Lack of proper tail calls.

   This can be implemented using trampolines, but this is inefficient
   and kludgy.  Scala (AFAIR) implements only tail recursion and not
   proper tail calls because of this limitation.

 - Lack of non-nullable reference types.

   Object/reference types are implicitly always nullable, though it is
   simple to statically verify non-nullable types.  The consequence is
   that the VM always has to check for null pointers when following
   references, even when the reference can never be to null.

 - Lack of structural type equivalence.

   This means that you have to resort to kludges when implementing pair
   types and similar structures, and it gives problems when implementing
   (type-safe) polymorphic pair types.  Which leads to

 - Lack of parametric polymorphism.

   Generics is currently implemented with type erasure to Object and
   (runtime-checked) downcasts, which is inefficient.  Statically
   verified parametric polymorphism could avoid this.

 - Lack of an unbounded integer type.

   Though this can be implemented using JVM primitives, it is slow and
   kludgy to do so.  Many languages (Scheme, Haskell, ...) have
   unbounded integers.

 - Inefficient exception handling.

   This was a real limitation for some students that tried to implement
   a subset of Prolog in JVM.  Exceptions were the natural way to
   implement cut (!), but it was just way too slow.

This is what I could think off at the top of my head, but I'm sure more
would come up if I tried to use JVM as target for a realistically-sized
language (as opposed to toy languages).

	Torben

[toc] | [prev] | [next] | [standalone]


#647

From"Aaron W. Hsu" <arcfide@sacrideo.us>
Date2012-05-24 10:05 -0400
Message-ID<12-05-018@comp.compilers>
In reply to#644
glen herrmannsfeldt wrote:

> What language(s) were you thinking about?

It's a new research language for studying optimizations in parallel
programming.

--
Aaron W. Hsu | arcfide@sacrideo.us | http://www.sacrideo.us
Programming is just another word for the lost art of thinking.

[toc] | [prev] | [next] | [standalone]


#646

Fromtorbenm@diku.dk (Torben Ægidius Mogensen)
Date2012-05-24 10:34 +0200
Message-ID<12-05-017@comp.compilers>
In reply to#642
"Aaron W. Hsu" <arcfide@sacrideo.us> writes:
> What are your thoughts on JVM as a compilation target, especially with new
> languages all targeting high performance or multi-core type features?

JVM is O.K. as a target if you want to compile an object-oriented
statically-typed language, since this is what the VM assumes.  For other
languages, you will have to jump through hoops to express the semantics
of your language in terms of the JVM primitives.  Since JVM is
Turing-complete, it is, of course, possible to do so.  But it won't be
pretty and likely not very efficient.

JVM, however, has the huge advantage of being implemented on a lot of
different platforms, and when new platforms appear, it is likely that
these will soon support JVM, so you don't have to worry about porting
your language to new platforms.  There are also a large standard library
that you can use, but unless your language is an object-oriented
statically-typed language, these libraries may be hard to use
effectively from programs written in your language.

	Torben

[toc] | [prev] | [next] | [standalone]


#648

From"lpsantil@gmail.com" <lpsantil@gmail.com>
Date2012-05-24 10:39 -0700
Message-ID<12-05-019@comp.compilers>
In reply to#642
> What are your thoughts on JVM as a compilation target, especially with new
> languages all targeting high performance or multi-core type features?

NestedVM (http://nestedvm.ibex.org/) has some tech to target GCC to JVM by
compiling code to MIPS ISA, implementing a MIPS VM, and a MIPS libc/system
call to JNI interface.  Some wild concepts that actually work well
(http://www.zentus.com/sqlitejdbc/) and are starting to be used by others like
emscripten (https://github.com/kripken/emscripten , https://github.com/kripken
, https://github.com/kripken/emscripten/wiki), pdf.js
(https://github.com/mozilla/pdf.js)

-L

[toc] | [prev] | [next] | [standalone]


#649

FromBGB <cr88192@hotmail.com>
Date2012-05-24 13:06 -0500
Message-ID<12-05-020@comp.compilers>
In reply to#642
On 5/22/2012 12:33 PM, Aaron W. Hsu wrote:
> Hey folks:
>
> What are your thoughts on JVM as a compilation target, especially with new
> languages all targeting high performance or multi-core type features?

it likely depends on the language.

Java-like languages can be targeted to the JVM fairly well (especially
if they have less features than Java, or are basically just "Java with
certain syntactic differences" or similar).

some high-level scripting languages can passably compiled to the JVM,
and some newer VM features (such as "invokedynamic") allow making this a
little more efficient (more on-par performance-wise with natively
compiled versions of the VMs).


for much else, the JVM's architecture may be seriously deficient in some
areas, and trying to target code to it is likely to be a major pain (vs,
say, targeting C, or targeting x86 or x86-64 ASM). (consider, for
example, the pain of trying to compile something like C or C++ to the JVM).


this is partly due to the level of abstraction:
in some ways, it is too high-level, offering only a narrowly defined set
of abstractions, which have to be crufted around to build new features
(features need to be built using classes, interfaces, and method calls).

in other ways, it is too low-level, offering no way to work around
arbitrary limitations in the type-system or scoping-model apart from
falling back to high-level mechanisms (such as method calls or
reflection, ...).

some other cases are due to arbitrary limitations, many related to
design choices made in the Java language (such as the lack of
package-scoped functions or variables, lack of either first-class
methods or a way to refer to methods indirectly, ...).

yes, the JVM does have some fairly elaborate optimizations, but when
they are mostly being used to try to optimize around cruft introduced in
trying to fit the language onto the VM, this is not nearly so good.


a lot of these issues could be addressed (and some are apparently being
addressed), but Oracle is very slow-moving about it.


I am feeling rather uncertain as to getting more specific about things I
would change to the JVM architecture if given the choice, as some people
are likely to get rather defensive (note: most were related to the class
file-format and bytecode, note that likely even if a new class-format
and bytecode were introduced, it need not break backwards compatibility).

most changes in general would be related to increasing generality and
orthogonality, and likely introducing more of a "middle layer" regarding
bytecode abstraction.



personally though (at least for now), I would just assume targeting
native code (or maybe targeting C, if writing a static compiler).

in my case, my front-ends tend to compile into bytecode-based formats,
which are then either interpreted, or could be fed through a JIT or
similar. as is, my mainly-used bytecode VM currently uses an interpreter
which converts the bytecode into threaded code and then runs the program
as threaded code.

sadly, this bytecode is itself far from perfect: lots of stale
instructions, non-optimal organization, operation types are typically
encoded as prefixes, ...

[toc] | [prev] | [standalone]


Back to top | Article view | comp.compilers


csiph-web