Tuesday, September 29, 2009

Interesting information about empty strings and indexOf() method in java string class.

This is post is to find some interesting stuff with the java string class and its behavior with  the empty string. Here is the code snippet and with explanation for the output in each cases. 

1. System.out.println("Karthic".indexOf("K"));
Prints: 0, as expected the functions returns the index of the last occurrence of the target string in the source string. In this case it is zero.

2. System.out.println("Karthic".indexOf("k"));
Prints: -1, as expected the function returns -1 when the target string is not found in the source string.

3. System.out.println("Karthic".indexOf(""));
Prints: 0, Now comes the interesting part. This at first glance looks very similar to the case 2 as there seems to be no empty string in the source string. But that is not the case. Everybody (even in object world) starts their life with nothing and keep on adding different thing as the life goes on. String element is initialized by the compiler with an empty string and keeps on adding the character to it.
Anyway I went to the source code of the String.class and here is what I found. If the target string length is 0 then it return the fromIndex which is 0. (see code below). It is strange, they should have return -1. 

 
if (targetCount == 0) { 
return fromIndex; 
 

 
public int indexOf(String str, int fromIndex) { 
return indexOf(value, offset, count, 
str.value, str.offset, str.count, fromIndex); 


static int indexOf(char[] source, int sourceOffset, int sourceCount, 
char[] target, int targetOffset, int targetCount, 
int fromIndex) { 
if (fromIndex >= sourceCount) { 
return (targetCount == 0 ? sourceCount : -1); 
if (fromIndex < 0) { 
fromIndex = 0; 
if (targetCount == 0) { 
return fromIndex; 


Well, can you prove your String doesn't begin with an empty string? :-) 
String name = "Karthic"; 
String name2 = "" + "Karthic"; 
System.out.println(name.equals(name2));  // true
The occurrence of "" empty string is being assumed at left most and right most positions of a string as '0' and string.length() . And the implementation stands on this assumption. So the result for the following calls would be - 
"Karthic".indexOf("") -> 0 
"Karthic".lastIndexOf("") -> 7
Try the following example to understand better what is going on:
String name = "Karthic";
name = name.replace("", ".");
System.out.println(name); // prints: .K.a.r.t.h.i.c.
Actually between 2 subsequent characters there is nothing means its an empty string coz empty string denotes nothing . for e.g. Karthic string between K and r we can say a is there but between K and a we will say 'nothing' is present that is empty string "" is present 
if you try "Karthic".split("") ypu will get "", K, a, r, t, h, i and c & length is 8
first empty string is like start of string to first char i.e. K
int i = 0;
for (String s : "Karthic".split("")) {
System.out.println(++i + " : " + s);
}
1 : 
2 : K
3 : a
4 : r
5 : t
6 : h
7 : i
8 : c

So, we have the "rule" that says every string begins with nothing :) 
Of course every string ends with nothing as well. 

But let`s think about why the rule does not apply in this case. First of all think of what would happen if the compiler will just keep on going and displaying nothing? Think that the "" is not actually nothing because "" is not NULL. "" means allocated memory. The compiler ignores the "" if it is not followed by anything but nothing. 
So there is a "" at the end of the string but it is not followed by anything and there is no use to waste memory on another "". 

Finally " " does not mean an empty string. This is a string with space in it. 
System.out.println("Karthic".indexOf(" ")); // prints: -1
This is very much same as case 2. 

Comments on this blogs are welcome for further discussion. 

Saturday, September 12, 2009

Inside the Java Virtual Machine


What is a Java Virtual Machine?

A JVM can me either of these depending on the context.

· the abstract specification,

· a concrete implementation, or

· a runtime instance.

The abstract specification is a concept (The Java Virtual Machine Specification). Concrete implementations, which exist on many platforms and come from many vendors, are either all software or a combination of hardware and software. A runtime instance hosts a single running Java application. Each Java application runs inside a runtime instance of some concrete implementation of the abstract specification of the Java virtual machine.

The Lifetime of a Java Virtual Machine

A runtime instance of the Java virtual machine has a clear mission in life: to run one Java application. When a Java application starts, a runtime instance is born. When the application completes, the instance dies. If you start three Java applications at the same time, on the same computer, using the same concrete implementation, you'll get three Java virtual machine instances. Each Java application runs inside its own Java virtual machine.

Data types in Java Virtual Machine

The data types can be divided into a set of primitive types and a reference type. Variables of the primitive types hold primitive values, and variables of the reference type hold reference values. Reference values refer to objects, but are not objects themselves. Primitive values, by contrast, do not refer to anything. They are the actual data themselves.

Although boolean qualifies as a primitive type of the Java virtual machine, the instruction set has very limited support for it. When a compiler translates Java source code into bytecodes, it uses ints or bytes to represent booleans. In the Java virtual machine, false is represented by integer zero and true by any non-zero integer. The Java virtual machine works with one other primitive type that is unavailable to the Java programmer: the returnAddress type. This primitive type is used to implement finally clauses of Java programs. The basic unit of size for data values in the Java virtual machine is the word--a fixed size chosen by the designer of each Java virtual machine implementation. The word size must be large enough to hold a value of type byte, short, int, char, float, returnAddress, or reference.

The Architecture of the Java Virtual Machine


Fig 1.1 Architecture of the Java Virtual Machine

(Save the image locally to have a better view.)

The JVM implementation has 3 major subsystem.

1. ClassLoader
2. Runtime Data area
a. Method Area
b. Heap
c. Java Stack
d. pc register
e. Native method stack
3. Execution Engine.

Class Loader Subsystem:

The Java virtual machine contains two kinds of class loaders: a bootstrap class loader and user-defined class loaders. The bootstrap class loader is a part of the virtual machine implementation, and user-defined class loaders are part of the running Java application. Classes loaded by different class loaders are placed into separate name spaces inside the Java virtual machine.

The class loader subsystem is responsible for more than just locating and importing the binary data for classes. It must also verify the correctness of imported classes, allocate and initialize memory for class variables, and assist in the resolution of symbolic references. These activities are performed in a strict order:

  • Loading: finding and importing the binary data for a type
  • Linking: performing verification, preparation, and (optionally) resolution
  • Verification: ensuring the correctness of the imported type
  • Preparation: allocating memory for class variables and initializing the memory to default values
  • Resolution: transforming symbolic references from the type into direct references.
  • Initialization: invoking Java code that initializes class variables to their proper starting values.

The Method Area:

Inside a Java virtual machine instance, information about loaded types is stored in a logical area of memory called the method area. All threads share the same method area, so access to the method area's data structures must be designed to be thread-safe. For each type it loads, a Java virtual machine must store the following kinds of information in the method area:

  • The fully qualified name of the type
  • The fully qualified name of the type's direct superclass (unless the type is an interface or class java.lang.Object, neither of which have a superclass)
  • Whether or not the type is a class or an interface
  • The type's modifiers (some subset of` public, abstract, final)
  • An ordered list of the fully qualified names of any direct superinterfaces

In addition to the basic type information listed previously, the virtual machine must also store for each loaded type:

  • The constant pool for the type - ordered set of all constants in the class (the final static fields)
  • Field information - For each field declared in the type, the field's name, type and modifier must be stored in the method area.
  • Method information - method's name, type, modifier and number and types (in order) of the method's parameters
  • All class (static) variables declared in the type, except constants
  • A reference to class ClassLoader
  • A reference to class Class

Method Tables:

The type information stored in the method area must be organized to be quickly accessible. In addition to the raw type information listed previously, implementations may include other data structures that speed up access to the raw data. One example of such a data structure is a method table. For each non-abstract class a Java virtual machine loads, it could generate a method table and include it as part of the class information it stores in the method area.

The Heap

Whenever a class instance or array is created in a running Java application, the memory for the new object is allocated from a single heap. As there is only one heap inside a Java virtual machine instance, all threads share it. Because a Java application runs inside its "own" exclusive Java virtual machine instance, there is a separate heap for every individual running application. Two different threads of the same application, however, could trample on each other's heap data. This is why you must be concerned about proper synchronization of multi-threaded access to objects (heap data) in your Java programs.

The Java virtual machine has an instruction that allocates memory on the heap for a new object, but has no instruction for freeing that memory. Usually, a Java virtual machine implementation uses a garbage collector to manage the heap.

The Program Counter

Each thread of a running program has its own pc register, or program counter, which is created when the thread is started. The pc register is one word in size, so it can hold both a native pointer and a returnAddress. As a thread executes a Java method, the pc register contains the address of the current instruction being executed by the thread. An "address" can be a native pointer or an offset from the beginning of a method's bytecodes. If a thread is executing a native method, the value of the pc register is undefined.

The Java Stack

When a new thread is launched, the Java virtual machine creates a new Java stack for the thread. A Java stack stores a thread's state in discrete frames. The Java virtual machine only performs two operations directly on Java Stacks: it pushes and pops frames. When a thread invokes a Java method, the virtual machine creates and pushes a new frame onto the thread's Java stack. This new frame then becomes the current frame. As the method executes, it uses the frame to store parameters, local variables, intermediate computations, and other data.

A method can complete in either of two ways. If a method completes by returning, it is said to have normal completion. If it completes by throwing an exception, it is said to have abrupt completion. When a method completes, whether normally or abruptly, the Java virtual machine pops and discards the method's stack frame. The frame for the previous method then becomes the current frame. All the data on a thread's Java stack is private to that thread.

The stack frame has three parts: local variables, operand stack, and frame data. The sizes of the local variables and operand stack, which are measured in words, depend upon the needs of each individual method. These sizes are determined at compile time and included in the class file data for each method. The size of the frame data is implementation dependent.

When the Java virtual machine invokes a Java method, it checks the class data to determine the number of words required by the method in the local variables and operand stack. It creates a stack frame of the proper size for the method and pushes it onto the Java stack. The Java stack frame includes data to support constant pool resolution, normal method return, and exception dispatch. This data is stored in the frame data portion of the Java stack frame.

Native Method Stacks

In addition to all the runtime data areas defined by the Java virtual machine specification and described previously, a running Java application may use other data areas created by or for native methods. When a thread invokes a native method, it enters a new world in which the structures and security restrictions of the Java virtual machine no longer hamper its freedom. A native method can likely access the runtime data areas of the virtual machine (it depends upon the native method interface), but can also do anything else it wants. It may use registers inside the native processor, allocate memory on any number of native heaps, or use any kind of stack.

Execution Engine

At the core of any Java virtual machine implementation is its execution engine. In the Java virtual machine specification, the behavior of the execution engine is defined in terms of an instruction set. For each instruction, the JVM specification describes in detail what an implementation should do when it encounters the instruction as it executes bytecodes, but says very little about how.

Use javap java tool to get the mnemonics of the class file (javap -verbose). This gives the instruction set that is generated for a given class file.

Various execution techniques that may be used by an JVM implementation: interpreting, just-in-time compiling, adaptive optimization, native execution in silicon. One of the most interesting -- and speedy -- execution techniques is adaptive optimization. When the adaptive optimizing virtual machine decides that a particular method is in the hot spot(the 10 to 20 percent of the code that is executed 80 to 90 percent of the time.), it fires off a background thread that compiles those bytecodes to native and heavily optimizes the native code. Meanwhile, the program can still execute that method by interpreting its bytecodes. Because the program isn't held up and because the virtual machine is only compiling and optimizing the "hot spot" (perhaps 10 to 20 percent of the code), the virtual machine has more time than a traditional JIT to perform optimizations.

Thursday, September 10, 2009

Telephonic Interview with Ariba Bangalore.

One of my friend recently attended a telephonic interview with Ariba technologies. These are the info that he shared with me

The interview started with the objective to know the skill of the candidate in core java. The interviewer described about the team that is recruiting. This candidate is responsible for the development of the new feature in the Ariba buyer team.

The questions were
1. Details of the latest project that you are involved ?
This was followed with core java question.

2. Difference between == operator and Equals method ?
== is used to check the equality of the reference of the object while the equals method inherited from the Object class does the same. But this method can be overridden to check of two objects were meaningfully equivalent. say

Person a = new Person(1);
Person b = new Person(1);
Person c = a;

Person takes a int to set the Id of the Person which is compared in the overridden method of the Person.

In this case, the == test will return true for Person a and c while equal comparison will return true for Person a, b and c too.

3. What is the difference between abstract and interface in terms of design ?
Abstract class is a class which has one or more abstract methods and can have other concrete methods that can be shared with the sub classes. While interface is a strict abstract class where all the methods are abstract. (This ans was considered bookish and was asked to give ans in terms of design). Extending abstract class makes the sub class of that type, while the implementation of the interface means the class can be of other type too.

4. What happens when you run a Java program?
Needs to explain the internals of JVM in this case. Please refer to the post on JVM

5. What is method area in Java ?
Method area is a runtime data area in the JVM. Once the class loader has successfully loaded the class, the information regarding the class type are stored in the Method area. Refer to the JVM Post for more details.

6. what is pattern and explain any pattern ?
A pattern is a solution for the common design problem encountered by the developer. For example, Decorator Pattern is used to add the behaviour of to the existing class. This gives the flexibility to the add behaviour without changing the code. Example is java.io package, where FileWriter and FileReader are decorated with BufferedReader and BufferedWriter to add the high level access control to the files.

The telephonic interview went for 30 mins. Will update the post if I get any more details about the personal interview.