Metascala is a tiny metacircular Java Virtual Machine (JVM) written in the Scala programming language. Metascala is barely 3000 lines of Scala, and is complete enough that it is able to interpret itself metacircularly. Being written in Scala and compiled to Java bytecode, the Metascala JVM requires a host JVM in order to run.
The goal of Metascala is to create a platform to experiment with the JVM: a 3000 line JVM written in Scala is probably much more approachable than the 1,000,000 lines of C/C++ which make up HotSpot, the standard implementation, and more amenable to implementing fun features like continuations, isolates or value classes. The 3000 lines of code gives you:
The bytecode interpreter, together with all the run-time data structures
A stack-machine to SSA register-machine bytecode translator
A custom heap, complete with a stop-the-world, copying garbage collector
Implementations of parts of the JVM's native interface
Although it is far from a complete implementation, Metascala already provides the ability to run untrusted bytecode securely (albeit slowly), since every operation which could potentially cause harm (including memory allocations and CPU usage) is virtualized and can be controlled. Ongoing work includes tightening of the security guarantees, improving compatibility and increasing performance.
ENJOYIN
5. Basic Usage
Create a new metascala VM
Plain Old Java Object
Captured variables are serialized
into VM’s environment
Closure’s class file
is given to VM to
load/parse/execute
Result is extracted
from VM into host
environment
No global state
Any other classes necessary to
evaluate the closure are loaded
from the current Classpath
6. It’s Metacircular!
Need to give the outer VM more
than the 1mb default heap
VM inside
a VM!
Simpler program avoids
initializing the scala/java std
libraries, which takes forever
under double-interpretation.
Takes a while (~10s) to produce result
9. Why Metascala?
● Fun to explore the innards of the JVM
● An almost-fully secure Java runtime!
● Small size makes fiddling fun
10. Why Metascala?
● Fun to explore the innards of the JVM
● An almost-fully secure Java runtime!
● Small size makes fiddling fun
11. Quick Tour
Immutable Defs: ~380 loc
Native Bindings: ~650 loc
Bytecode SSA transform: ~650 loc
Runtime data structures:
820 loc
Binary heap & Copying
GC: 132 loc
DIY scala-pickling: 132 loc
“This is a VM”: 243 loc
12. Quick Tour: Tests
Tests for basic Java features
GC Fuzz-tests
Test Metacircularity!
Scala std lib usage
14. What’s a Garbage Collector?
Blit (copy) all roots to new heap
Stop when you’ve
scanned everything
Not pseudocode
Scan the already
copied things for
more things and
copy them too
15. Why Metascala?
● Fun to explore the innards of the JVM
● An almost-fully secure Java runtime!
● Small size makes fiddling fun
20. Security Holes
●
●
●
●
●
Classloader can read from anywhere
Time spent classloading not accounted
Memory spent on classes not accounted
GC time not accounted
“native” methods’ time/memory not
accounted
21. Basic Problem
Outside World
User code resource
consumption is bounded
VM’s runtime resource
usage can be made to
grow arbitrarily large
Classes
Native method calls
User
Code
Runtime Data
Structures
Garbage Collector
22. Possible Solution
Put a VM Inside a VM!
Works,
... but 10000x slowdown
Outside World
Fixed Unaccounted Costs
Outside World
Classes
Native method calls
User
Code
Runtime Data
Structures
Garbage Collector
23. Another Possible Solution
Move more components into
virtual runtime
Difficult to bootstrap correctly
WIP
Outside World
Native method calls
Classes
Garbage
Collector
User Code
Runtime Data
Structures
24. Why Metascala?
● Fun to explore the innards of the JVM
● An almost-fully secure Java runtime!
● Small size makes fiddling fun
28. Nasty JVM Interface
Ideal World
Initialized
User Code
Real World
Linear Initialization
Std Library
Initialized
User Code
Std Library
Clean Interfaces
VM
VM
LazyInitialization
means repeated
dives back into
lib/native code
Nasty Language
VM Interface
29. Java’s dirty little secret
The Verbosity of Java with the Safety of C
WTF! I’d never use these things!
30. You probably do What happens if you don’t have
them
Almost every Java program
ever uses these things.
31. Next Steps
● Maximize correctness
○ Implement Threads & IO
○ Fix bugs (GC, native calls, etc.)
● Solidify security characteristics
○ Still plenty of unaccounted-for memory/processing
○ Some can be hosted “within” VM itself
● Simplify Std-Lib/VM interface
○ Try using Android Std Lib?
32. Possible Experiments
● Native codegen instead of an interpreter?
○ Generate/exec native code through JNI
○ Heap is already a binary blob that can be easily
passed to native code
● Bytecode transforms and optimizations?
○ Already in SSA form
● Continuations, Isolates, Value Classes?
● Port the whole thing to Scala.Js?
33. Metascala: a tiny DIY JVM
Ask me about:
● Single Static Assignment form
● Copying Garbage Collection
● sun.misc.Unsafe
● Warts of the .class file format
● Capabilities-based security
● Abandoned approaches