Saturday, September 12, 2009

Inside the Java Virtual Machine


What is a Java Virtual Machine?

A JVM can me either of these depending on the context.

· the abstract specification,

· a concrete implementation, or

· a runtime instance.

The abstract specification is a concept (The Java Virtual Machine Specification). Concrete implementations, which exist on many platforms and come from many vendors, are either all software or a combination of hardware and software. A runtime instance hosts a single running Java application. Each Java application runs inside a runtime instance of some concrete implementation of the abstract specification of the Java virtual machine.

The Lifetime of a Java Virtual Machine

A runtime instance of the Java virtual machine has a clear mission in life: to run one Java application. When a Java application starts, a runtime instance is born. When the application completes, the instance dies. If you start three Java applications at the same time, on the same computer, using the same concrete implementation, you'll get three Java virtual machine instances. Each Java application runs inside its own Java virtual machine.

Data types in Java Virtual Machine

The data types can be divided into a set of primitive types and a reference type. Variables of the primitive types hold primitive values, and variables of the reference type hold reference values. Reference values refer to objects, but are not objects themselves. Primitive values, by contrast, do not refer to anything. They are the actual data themselves.

Although boolean qualifies as a primitive type of the Java virtual machine, the instruction set has very limited support for it. When a compiler translates Java source code into bytecodes, it uses ints or bytes to represent booleans. In the Java virtual machine, false is represented by integer zero and true by any non-zero integer. The Java virtual machine works with one other primitive type that is unavailable to the Java programmer: the returnAddress type. This primitive type is used to implement finally clauses of Java programs. The basic unit of size for data values in the Java virtual machine is the word--a fixed size chosen by the designer of each Java virtual machine implementation. The word size must be large enough to hold a value of type byte, short, int, char, float, returnAddress, or reference.

The Architecture of the Java Virtual Machine


Fig 1.1 Architecture of the Java Virtual Machine

(Save the image locally to have a better view.)

The JVM implementation has 3 major subsystem.

1. ClassLoader
2. Runtime Data area
a. Method Area
b. Heap
c. Java Stack
d. pc register
e. Native method stack
3. Execution Engine.

Class Loader Subsystem:

The Java virtual machine contains two kinds of class loaders: a bootstrap class loader and user-defined class loaders. The bootstrap class loader is a part of the virtual machine implementation, and user-defined class loaders are part of the running Java application. Classes loaded by different class loaders are placed into separate name spaces inside the Java virtual machine.

The class loader subsystem is responsible for more than just locating and importing the binary data for classes. It must also verify the correctness of imported classes, allocate and initialize memory for class variables, and assist in the resolution of symbolic references. These activities are performed in a strict order:

  • Loading: finding and importing the binary data for a type
  • Linking: performing verification, preparation, and (optionally) resolution
  • Verification: ensuring the correctness of the imported type
  • Preparation: allocating memory for class variables and initializing the memory to default values
  • Resolution: transforming symbolic references from the type into direct references.
  • Initialization: invoking Java code that initializes class variables to their proper starting values.

The Method Area:

Inside a Java virtual machine instance, information about loaded types is stored in a logical area of memory called the method area. All threads share the same method area, so access to the method area's data structures must be designed to be thread-safe. For each type it loads, a Java virtual machine must store the following kinds of information in the method area:

  • The fully qualified name of the type
  • The fully qualified name of the type's direct superclass (unless the type is an interface or class java.lang.Object, neither of which have a superclass)
  • Whether or not the type is a class or an interface
  • The type's modifiers (some subset of` public, abstract, final)
  • An ordered list of the fully qualified names of any direct superinterfaces

In addition to the basic type information listed previously, the virtual machine must also store for each loaded type:

  • The constant pool for the type - ordered set of all constants in the class (the final static fields)
  • Field information - For each field declared in the type, the field's name, type and modifier must be stored in the method area.
  • Method information - method's name, type, modifier and number and types (in order) of the method's parameters
  • All class (static) variables declared in the type, except constants
  • A reference to class ClassLoader
  • A reference to class Class

Method Tables:

The type information stored in the method area must be organized to be quickly accessible. In addition to the raw type information listed previously, implementations may include other data structures that speed up access to the raw data. One example of such a data structure is a method table. For each non-abstract class a Java virtual machine loads, it could generate a method table and include it as part of the class information it stores in the method area.

The Heap

Whenever a class instance or array is created in a running Java application, the memory for the new object is allocated from a single heap. As there is only one heap inside a Java virtual machine instance, all threads share it. Because a Java application runs inside its "own" exclusive Java virtual machine instance, there is a separate heap for every individual running application. Two different threads of the same application, however, could trample on each other's heap data. This is why you must be concerned about proper synchronization of multi-threaded access to objects (heap data) in your Java programs.

The Java virtual machine has an instruction that allocates memory on the heap for a new object, but has no instruction for freeing that memory. Usually, a Java virtual machine implementation uses a garbage collector to manage the heap.

The Program Counter

Each thread of a running program has its own pc register, or program counter, which is created when the thread is started. The pc register is one word in size, so it can hold both a native pointer and a returnAddress. As a thread executes a Java method, the pc register contains the address of the current instruction being executed by the thread. An "address" can be a native pointer or an offset from the beginning of a method's bytecodes. If a thread is executing a native method, the value of the pc register is undefined.

The Java Stack

When a new thread is launched, the Java virtual machine creates a new Java stack for the thread. A Java stack stores a thread's state in discrete frames. The Java virtual machine only performs two operations directly on Java Stacks: it pushes and pops frames. When a thread invokes a Java method, the virtual machine creates and pushes a new frame onto the thread's Java stack. This new frame then becomes the current frame. As the method executes, it uses the frame to store parameters, local variables, intermediate computations, and other data.

A method can complete in either of two ways. If a method completes by returning, it is said to have normal completion. If it completes by throwing an exception, it is said to have abrupt completion. When a method completes, whether normally or abruptly, the Java virtual machine pops and discards the method's stack frame. The frame for the previous method then becomes the current frame. All the data on a thread's Java stack is private to that thread.

The stack frame has three parts: local variables, operand stack, and frame data. The sizes of the local variables and operand stack, which are measured in words, depend upon the needs of each individual method. These sizes are determined at compile time and included in the class file data for each method. The size of the frame data is implementation dependent.

When the Java virtual machine invokes a Java method, it checks the class data to determine the number of words required by the method in the local variables and operand stack. It creates a stack frame of the proper size for the method and pushes it onto the Java stack. The Java stack frame includes data to support constant pool resolution, normal method return, and exception dispatch. This data is stored in the frame data portion of the Java stack frame.

Native Method Stacks

In addition to all the runtime data areas defined by the Java virtual machine specification and described previously, a running Java application may use other data areas created by or for native methods. When a thread invokes a native method, it enters a new world in which the structures and security restrictions of the Java virtual machine no longer hamper its freedom. A native method can likely access the runtime data areas of the virtual machine (it depends upon the native method interface), but can also do anything else it wants. It may use registers inside the native processor, allocate memory on any number of native heaps, or use any kind of stack.

Execution Engine

At the core of any Java virtual machine implementation is its execution engine. In the Java virtual machine specification, the behavior of the execution engine is defined in terms of an instruction set. For each instruction, the JVM specification describes in detail what an implementation should do when it encounters the instruction as it executes bytecodes, but says very little about how.

Use javap java tool to get the mnemonics of the class file (javap -verbose). This gives the instruction set that is generated for a given class file.

Various execution techniques that may be used by an JVM implementation: interpreting, just-in-time compiling, adaptive optimization, native execution in silicon. One of the most interesting -- and speedy -- execution techniques is adaptive optimization. When the adaptive optimizing virtual machine decides that a particular method is in the hot spot(the 10 to 20 percent of the code that is executed 80 to 90 percent of the time.), it fires off a background thread that compiles those bytecodes to native and heavily optimizes the native code. Meanwhile, the program can still execute that method by interpreting its bytecodes. Because the program isn't held up and because the virtual machine is only compiling and optimizing the "hot spot" (perhaps 10 to 20 percent of the code), the virtual machine has more time than a traditional JIT to perform optimizations.

4 comments: