HotSpots?!

Many folks were asking me about the Just In Time(aka JIT) heuristics after reading my One Java Two Compilers post

The basic thing that the JIT does is deciding weather a specific portion of the code should get compiled or get executed by the interpreter.

The portions that better get compiled called hotspots“Once the  … optimizer has gathered information during execution about program hot spots, it not only compiles the hot spot into native code”

The JIT has to consider what would be faster, notice that compiling a hot spot takes time, running code line by line is faster than compiling it and execute the compiled native-code, yet in case the code will be called many times in the future, getting the compiled native code out of the cache and execute it is faster than interpreting it

The JIT select hotspots to be compiled based on the prediction of how often they are likely to be called during the rest of the program’s execution.

This prediction is usually done by taking into account two major factors (1) the size of the code (2) the so-far execution statistics

Moreover: upon compiling, the JIT might optimize the code, those optimization’s sometime proven to be wrong and even cause internal JVM errors which get fixed by going back to the beginning of the hotspot and interpreting it line by line (while removing the erroneous compiled code from the memory).

You probably figure by now that the JVM is taking sophisticated statics while running.

Here is a list of possible optimization’s:

  1.  Inline, replacing method call with the code of the called method
  2. Eliminating null-verification’s (e.g. removing “if(x!=null)”)
  3. Stack-allocation (I learned this cool feature about a week ago): JIT tries to identify local variables that are almost surely local to a specific method and allocate them on the stack instead of the heap, this way the GC works faster as it has less garbage to clean..
  4. Solving polymorphic calls, instead of using a virtual table, the JIT perdict which type of object is referencing at.
  5. Array range check, when iterating an array, instead of checking for each index if it’s in the approved range of the array, it check only the first and last indices <- this makes numeric calculations run faster

Extra reading: http://java.sun.com/products/hotspot/whitepaper.html

Advertisements

One Java, Two compilers..

Lately I was discussing Java with few students of mine..

It seems like that for students there is a lot of confusion regarding how Java/The JVM works because there are TWO compilers involve, so when someone mentions a compiler or the Just In Time compiler some of them would imagine it’s the same one, the Java Compiler..

So how does it really works?

It’s simple..

1) You write Java code (file.java) which compiles to “bytecode“, this is done using the javacthe 1st compiler.

It’s well known fact that Java can be written once get compiled and run anywhere (on any platform) which mean that different types of JVM can get installed over any type of platform and read the same good old byte code

2) Upon execution of a Java program (the class file or Jar file that consists of some classes and other resources) the JVM should somehow execute the program and somehow translate it to the specific platform machine code.

In the first versions of Java, the JVM was a “stupid” interprater that executes byte-code line by line….that was extremely slow…people got mad, there were a lot of “lame-java, awesome c” talks…and the JVM guys got irratated and reinvented the JVM.

the “new” JVM initially was available as an add-on for Java 1.2 later it became the default Sun JVM (1.3+).

So what did they do? they added a second compiler.. Just In Time compiler(aka JIT)..

Instead of interpreting line by line, the JIT compiler compiles the byte-code to machine-code right after the execution..

Moreover, the JVM is getting smarter upon every release, it “knows” when it should interpat the code line-by-line and what parts of the code should get compiled beforehand (still on runtime).

It does that by taking real-usage statistics, and a long-list of super-awesome heuristics..

The JVM can get configured by the user in order to disable/enable some of those heuristics..

To summarize, In order to execute java code, you use two different compilers, the first one(javac) is generic and compiles java to bytecode, the second(jit) is platform-dependent and compiles some portions of the bytecode to machine-code in runtime!

Quick JDK 8 Suggestion

JDK 7 is just behind that door, and I am really excited about all the goodies that it brings with it.

While I was trying to benchmark JDK7 vs. older JDKs, I realized that the GC(Garbage Collector) is an unknown factor, i.e. while some piece of code is running, one can never know if the GC is running in parallel , in such a case that specific iteration might take much more time, hence the benchmark data will get corrupted.

So my suggestion is adding an awesome new block type: (JDK 8!!!)

no Garbage Collection block – noGC

noGC{

//some code here;

}

catch(AlmostOutOfMemoryError err){

}

While the code inside the noGC block is running, it’s promised that the GC won’t run in parallel

The AlmostOutOfMemoryError will get thrown in case the Heap is X% full (whereas X is configurable as -xnogcf )

Just to be clear, it wouldn’t help me out with benchmarking since the older JDKs do not support it, yet that was the trigger..

how would such a block would work in a multi-threaded environment is another issue..

In my opinion applications with a tiny real-time need would benefit a lot using such a block, much more than all those real-time Java frameworks out there…

Would love to hear your opinion

Avoid memory leaks using Weak&Soft references

Some Java developers believe that there is no such a thing as memory leak in Java (thanks to the fabulous automatic Garbage Collection concept)

Some others had met the OutOfMemoryError and understood that the JVM has encountered some memory issue but they are not sure if it’s all about the code or maybe even an OS issue…

The OutOfMemoryError API docs reveals that it “Thrown when the Java Virtual Machine cannot allocate an object because it is out of memory, and no more memory could be made available by the garbage collector. ”

As we know, the JVM has a parameter that represents the maximum heap size(-Xmx), hence we can defiantly try to increase the heap size. yet some code can generate new instances all the time, if those instances are accessible(being referenced by the main program – in a recursive manner) for the entire program life span, then the GC won’t reclaim those instances. hence the heap will keep increasing and eventually a OutOfMemoryError will be thrown <- we call that memory leak.

Our job as Java developers is to release references (that are accessible by the main program) that we won’t use in the future. by doing that we are making sure that the GC will reclaim those instances (free the memory that those instances occupying in the heap).

In some cases we reference an instance from 2 different roots. one root represent a fast-retrieval space(e.g. HashMap) and the other manages the real lifespan of that instance. Sometimes we would like to remove the reference of that instance from one root and get the other root(fast retrieval) reference removed automatically.

We wouldn’t want to do it manually due to the fact that we are not C++ developers and we wouldn’t like to manage the memory manually..

Weak references

In order to solve that we can use WeakReference.

Instances that are being referenced by only Weak references will get collected on the next collection! (Weakly reachable), in other words those references don’t protect their value from the garbage collector.

Hence if we would like to manage the life span of an instance by one reference only, we will use the WeakReference object to create all the other references. ( usage: WeakReference wr = new WeakReference(someObject);)

In some apps we would like to add all our existing references to some static list, those references should not be strong, otherwise we would have to clean those references manually, we would add those references to the list using this code.

public static void addWeakReference(Object o){
 refList.add(new WeakReference(o));
}

since most of the WeakReferences use cases needs a Map data structure, there is an implementation of Map that add a WeakReference automatically for you – WeakHashMap

Soft References

I saw few implementations of Cache using weak references (e.g. the cache is just a WeakHashMap => the GC is cleaning old objects in the cahce), without WeakReferences naive cache can easily cause memory leaks and therefor weak references might be a solution for that.

The main problem is that the GC will clean the cached-object probably and most-likely faster then you need.

Soft references solve that, those references are exactly like weak references, yet the GC won’t claim them as fast. we can be sure that the JVM won’t throw an OutOfMemory before it will claim all the soft and weak references!

using a soft references in order to cache considered the naive generic cache solution. (poor’s men cache)

( usage:SoftReference sr = new SoftReference(someObject);)

How-to speed-up your java code myths

In my last post I covered tips that I  have collected trough out the years on how to speed up your java code,

After reviewing the tips and reading my friends criticism, I updated the list and created a new list of myths, here it is:

final: developers might think that final methods are more efficient due to the fact that the compiler will be able to inline those methods. it’s false, imagine that you are compiling the class Main with the class Inline, the non static method Main.main() creates an instance of Inline and invokes the method inline.finalMethod() which is final. on compile time everything looks great, yet in runtime we might use a different version of the compiled Inline class whereas the finalMethod is not final and can be overwritten….

Synchronization blocks: old VMs used to pay a lot of overhead for running a synchronized method, new VMs mostly knows how to trace a synchronized method that is not running concurrently and treat it as a non-synchronized one.

Calling the garbage collection manually: calling the garbage collector manually (System.gc()) is usually a mistake, the new VMs garbage collection mechanism are state-of-the-art and most likely it will invoke the GC on a better timing. moreover manual GC triggers a full collection of all generations -> that’s not a smart move.

Object pooling: allocating object on the heap is not cheep but for non-complex objects it’s not that expensive as well, design an object-pooling for simple object will cause an over-head of managing the pool  in many cases.

In general it seems like performance tips should always be revisited since new compilers and VMs try to solve exactly those problems.

Immutable objects: in general immutable objects has many advantages (1) automate thread-safety (2) their hashCode value is cacheable (3) easy to work with

a quote from Effective Java: “Classes should be immutable unless there’s a very good reason to make them mutable……..If a class cannot be made immutable, limit its mutability as much as possible.”