Burrito, delicious static-analysis tool for JavaScript

Few hours ago I got introduced to burrito, a JavaScript lib that can analyze JavaScript code in a super simplified way.

I start playing with it and I love it!

https://github.com/substack/node-burrito

It built up to work on top of node.js

Here is a simple example: generating a list of all the functions

var fs = require('fs');
var burrito = require('burrito');
var functionList = [];
fs.readFile('someFile.js', function (err, data) {
	  if (err) throw err;
	  var dataStr = data.toString();
	  burrito(dataStr, function (node) {
		  if (node.name === 'function' || node.name === 'defun'){
                        functionList.push(node.value[0]);
		  }
		});
});

This is fun!

You can even wrap any function-invocation easily utilizing node.wrap(—)..

This makes profiler development much easier than ever before..

HotSpots?!

Many folks were asking me about the Just In Time(aka JIT) heuristics after reading my One Java Two Compilers post

The basic thing that the JIT does is deciding weather a specific portion of the code should get compiled or get executed by the interpreter.

The portions that better get compiled called hotspots“Once the  … optimizer has gathered information during execution about program hot spots, it not only compiles the hot spot into native code”

The JIT has to consider what would be faster, notice that compiling a hot spot takes time, running code line by line is faster than compiling it and execute the compiled native-code, yet in case the code will be called many times in the future, getting the compiled native code out of the cache and execute it is faster than interpreting it

The JIT select hotspots to be compiled based on the prediction of how often they are likely to be called during the rest of the program’s execution.

This prediction is usually done by taking into account two major factors (1) the size of the code (2) the so-far execution statistics

Moreover: upon compiling, the JIT might optimize the code, those optimization’s sometime proven to be wrong and even cause internal JVM errors which get fixed by going back to the beginning of the hotspot and interpreting it line by line (while removing the erroneous compiled code from the memory).

You probably figure by now that the JVM is taking sophisticated statics while running.

Here is a list of possible optimization’s:

  1.  Inline, replacing method call with the code of the called method
  2. Eliminating null-verification’s (e.g. removing “if(x!=null)”)
  3. Stack-allocation (I learned this cool feature about a week ago): JIT tries to identify local variables that are almost surely local to a specific method and allocate them on the stack instead of the heap, this way the GC works faster as it has less garbage to clean..
  4. Solving polymorphic calls, instead of using a virtual table, the JIT perdict which type of object is referencing at.
  5. Array range check, when iterating an array, instead of checking for each index if it’s in the approved range of the array, it check only the first and last indices <- this makes numeric calculations run faster

Extra reading: http://java.sun.com/products/hotspot/whitepaper.html

JavaScript & Hashtable

JavaScript doesn’t come with a native Hashtable object.

Yet, In Javascript, one can use any object as an associative array, similar to a Hashtable structure.

Since one can add to an object any arbitrary property on the fly. It turns out that any object is just a set of key/value pairs.

The only constraint  is that the key must be String.

e.g.

var myStrMap = new Object(); //init an object

myStrMap ["key"] = value; //put a value

var val = myStrMap ["key"] ; // get a value

Notice that if one would use non-string object as a key, JavaScript will translate it to string (utilizing the toString() function),

This can cause serious problems since some different objects might return the same toString() value..

Moreover all objects that don’t implement the toString() function will return the same value.. (mostly “[Object object]”)

In order to use non-string object as a key one could do one of the following:

  • Implement hash(myObject) function
  • Implement myObj.toString() function, bare in mind toString() might be needed for other stuff..
  • Use or develop hashtable library for javascript (e.g. http://www.timdown.co.uk/jshashtable/ )

If you are using Chrome/Chrome Frame/NodeJS and more, it means that you are using Google’s JavaScript Virtual Machine,V8, behined the scene

Specifically V8 doesn’t implement Object properties access as hashtable, it actually implement it in a better way (performance wise)

So how does it work? “V8 does not use dynamic lookup to access properties. Instead, V8 dynamically creates hidden classes behind the scenes” – that make the access to properties almost as fast as accessing properties of Java/C++ objects.

Why? because in fixed class each property can be found on a specific fixed offset location of the memory..

So in general accessing property of an object in V8 is faster than Hashtable..which mean that if one implements Hashtable (string keys) utilizing the object.properties approche, he/she will mostly get better performance than utilizing an Hashtable in Java where Hashtable come out of the box

More info can be found here: http://code.google.com/intl/sv/apis/v8/design.html#prop_access

The good news is that the next JavaScript version will probably ship with a native map object as suggested on ECMAScript – harmony 

One Java, Two compilers..

Lately I was discussing Java with few students of mine..

It seems like that for students there is a lot of confusion regarding how Java/The JVM works because there are TWO compilers involve, so when someone mentions a compiler or the Just In Time compiler some of them would imagine it’s the same one, the Java Compiler..

So how does it really works?

It’s simple..

1) You write Java code (file.java) which compiles to “bytecode“, this is done using the javacthe 1st compiler.

It’s well known fact that Java can be written once get compiled and run anywhere (on any platform) which mean that different types of JVM can get installed over any type of platform and read the same good old byte code

2) Upon execution of a Java program (the class file or Jar file that consists of some classes and other resources) the JVM should somehow execute the program and somehow translate it to the specific platform machine code.

In the first versions of Java, the JVM was a “stupid” interprater that executes byte-code line by line….that was extremely slow…people got mad, there were a lot of “lame-java, awesome c” talks…and the JVM guys got irratated and reinvented the JVM.

the “new” JVM initially was available as an add-on for Java 1.2 later it became the default Sun JVM (1.3+).

So what did they do? they added a second compiler.. Just In Time compiler(aka JIT)..

Instead of interpreting line by line, the JIT compiler compiles the byte-code to machine-code right after the execution..

Moreover, the JVM is getting smarter upon every release, it “knows” when it should interpat the code line-by-line and what parts of the code should get compiled beforehand (still on runtime).

It does that by taking real-usage statistics, and a long-list of super-awesome heuristics..

The JVM can get configured by the user in order to disable/enable some of those heuristics..

To summarize, In order to execute java code, you use two different compilers, the first one(javac) is generic and compiles java to bytecode, the second(jit) is platform-dependent and compiles some portions of the bytecode to machine-code in runtime!

Asynchronous Semaphore

Semaphore

Semaphore is a super simple notion, in short it is a “method of controlling access by several processes to a common resource”

Think about a Database that can provide no more than 5 parallel connections OR a bit more complex, a buffer that has X slots with few parallel readers and writers..  ( readers-writers problem)

When process A “need” a unit of a specific resource, process A calls the P() method which checks: if the semaphore value is larger than 0 (which mean that the resource has some free units) then it decreases the semaphore in order to state that one of the units is now taken. otherwise P() makes process A to wait until some other process increases the semaphore value.

When process A doesn’t need the unit anymore, process A increases the semaphore value ( by calling the V() method which also signal the other processes that have been blocked).

As for naming conventions, in case the resource has only one unit we call it a binary semaphore otherwise it’s a counting semaphore.

** it’s important to mention that P() and V() are atomic methods.

Asynchronous Semaphore

This was a short description of a Semaphore, but what is Asynchronous Semaphore (hereinafter: ASemaphore)?

In the asynchronous programming world, sometimes we would like to call N async methods, imagine that we would like to invoke a different F method (async or sync) only after ALL the N methods got invoked.

Asynchronous Semaphore is a construct that support this requirement.

(In short asynchronous method is a method that will get invoked later on in time, yet the thread doesn’t block and wait() for its results, the thread runtime continues to the next line of code. the async method usually get a “callback() method” as a parameter and invoke it when the results arrive.  ( E.g. setTimeout,  DHTML events, most WebServices invocations))

The ASemaphore works a bit different than the Semaphore, it gets the F function as a parameter , and it fires it only when the counting variable equals to zero, before calling a async method we increases the counter and when the async method get invoked we decreases the counter.

Here are the details: One should initialize the ASemaphore with the F method and increases it (calling V()) to one.

Before calling a async method, One should increase the ASemaphore (V())

Internally to the callback() function one should decrease the semaphore (P()).

After calling all the N async methods, One should decrease (p()) the ASemaphore.

The Asemaphore fires() only when the counting variable equals to ZERO.

* notice that the ASemaphore is a solution for the asynchronous world for a specific process and therefor parallelism is not an issue ( the methods P() and V() aren’t atomic)
** notice that it’s irrelevant to use a binary ASemaphore, in case of only 1 async call you could easily invoke the F() method internal to the async method callback()

Examples

1) In my Alligator project, when rendering a page on the server-side, one might develop a JSSP scirpt that invokes few asynce calls (e.g. putting a variable on the memcahced, dealing with the file system, connecting a database), yet Alligator needs the send the HTTP response back to the client only when all those methods have been invoked…

2) One had like to call 10 timers with random time and notify only when the last one is all set.

JavaScript Impl’

I have implemented an ASeamphore in JavaScript, you can find it on my github repository.

The actual code can be found here

var ASemaphore = function(semaphore,fireFunc){
	this.semaphore = (semaphore==undefined?0:semaphore);
	this.fire = fireFunc;
	log.debug("Starting Asemaphore with semaphore:" +this.semaphore);
};

ASemaphore.prototype.v = function(){
	++this.semaphore;
	log.debug("Asemaphore semaphore after v(): " +this.semaphore);
	return this.semaphore;
};

ASemaphore.prototype.p = function(){
	if((--this.semaphore)<1){
		log.debug("Asemaphore fire() after p()");
		this.fire.apply(this,arguments);
	}
	//log.debug("Asemaphore counter after p(): " +this.counter);
	return this.counter;
};

JSSP – JavaScript Server Page engine on top of NodeJS (Alligator)

I would like to introduce Alligator, a JSP/PHP/ASP-style template engine for JavaScript on top of NodeJS.

Most web-apps have been written in Java, PHP, ASP.net, Ruby or Python, on the server side, and JavaScript on the client side.

This approach of using different technologies and different programming languages for the different “sides” has few drawbacks:

1) Serialization of objects from one language to another has to be taken care of , e.g. take a look at GSON, a library that translates Java-to-JSON and vice versa.

2) Client-side code can’t be reused on the server-side and vice versa.

3) Web-apps “factories” has to hire different people with different expertises.

4) I bet you can think of at least 2 more drawbacks

Other than that, the multi-threaded approach that most of the current web-server/application-server utilize is problematic for load since it is limited by the maximum number of OS thread per machine.

NodeJS solved the later issue, by taking advantage of asynchronous I/O and the event-loop paradigm ( I will blog about it later on.. )

Moreover, NodeJS, let you write JavaScript code that runs outside a browser (utilizing the V8 runtime) + it includes many useful built-in Async I/O libraries as HttpServer and File System I/O management.

In the last few weeks, I have developed Alligator on top of this top-notch technology.

Alligator enable you to write PHP-style code but in the coolest and most popular language, JavaScript!!! 🙂

Let’s “Hello Word”-ing using Alligator:

<HTML>
   <BODY>
   <?="Hello World"?>
   </BODY>
</HTML>

In this example the <?= code?> just print whatever code.toString() evaluates to, yet it does that on the Server-Side..

Lets take a look at something a bit more “complex”:

<HTML>
   <BODY>
      <?  var test = 1;
          if (1 == test){?>
          YOOOO!!!!
          <?}?>
   </BODY>
</HTML>

Well, it’s pretty straight forward, the <? code ?> get rendered on the server-side, which mean that if test equals 1
then the String “YOOOO!!!!” will be written to the HTTP response.

Alligatror supports 3 type of object scopes, request, session and application, both session and application support set() and get() methods that receive key, value (if needed) and a callback function.  **One must pay attention to the fact that a callback function might get invoked later on in time regardless it’s chronological line in the code block.

The request parameters can be accessed easily using request.parameters.myKey..

Lets see the counter example which count how many people have been visit this page since the application is alive.

In this example, we have separated the logic and view and used the “commands.forward() built-in command” in order to pass the logic outcome to the view..

logic.jssp:

<?
	var counter = 1;
	application.get("counter",function(value){
		log.debug("ApplicationLOGIC.JSSP, value - " +value);
		if(value == undefined){
			application.set("counter",1);
		}else{
			counter = value+1;
			application.set("counter",counter);
		}
		request.parameters.counter = counter;
		commands.forward("counter/view.jssp");
	});
?>

view.jssp:

<HTML>
	<HEAD><TITLE>Application Scope Counter Tester</TITLE></HEAD>
	<BODY>
	<?
		var counter = request.parameters.counter;
		if(counter==1)
			commands.write("First Time");
		else
			commands.write("Number of users:" + counter);

	?>
	</BODY>
</HTML>
 

The main drawback of NodeJS is that it utlize only O(1) threads per instance, which mean that only one core can be used in a multi-core CPU system, in order to solver it, Alligator is using multi-node and memcached.

Multi-node duplicates the Node process as many time as you want, the smart thing to do is to set it to the number of available cores.

Memcached served as a shared memory between those processes.

One can easily set up Alligator to “work” on a distributed network, using a load balancer.

This is pretty existing, isn’t it? How can you start using it?
1 ) Install NodeJS – http://nodejs.org/#download

2 ) Extract Alligator somewhere – http://github.com/mrohad/Alligator/downloads

3 ) Setup alligator, by modifying settings.json -> read more here – http://github.com/mrohad/Alligator
the main thing you need to set is the root folder and the port.
I must mention that Alligator is built on top of anti-node and serves static files as well!

One can setup the server-side tags notations, for instance if you are a Java-person you can change the <?= code?> notation to <%=code%> in a super easy fashion.

Also, Alligator enables you to change the extension of the server-side script files, the default is JSSP.

If you would like to utilize a multi-core architecture, you should download memcached and set up the memcached values ( server, port ..)

Example for a setting file:

{
	"web_app_name" : "Alligator Test App",
	"port"         : 80,
	"path"         : {
				"root":"/home/vadmin/ws/Alligator/WWW/",
				"lib":"/home/vadmin/ws/Alligator/WWWlib/"
			},
	"server_script": {
				"ext":"jssp",
				"begin":"<?",
				"begin_additional_write":"=",
				"end":"?>",
				"session_minutes":30,
				"memcached":{
						"enable":1,
						"server":"localhost",
						"port":11211
					}
			 },
	"debug_mode"   : 1,
	"nodes"	      : 2
}

5 ) Run memcached ( if needed)

6 ) Run Alligator – $ sudo node alligator.js

7 ) Add your first .jssp file to the root folder

8 ) Test it , go to http://yourServer:port/yourfile.jssp

We are looking for people who would like to contribute code and suggest featuressssssss 🙂

Enjoy!!

P!=NP?!

WOW, Vinay Deolalikar from HP Labs claimed that he has proved that P!=NP…

The PDF can be found here: http://www.scribd.com/doc/35539144/pnp12pt (~100 pages of complexity )

More info and thoughts can be found here and here

Personally, I believe Deolalikar , yet I only read the abstract, but it seems like a serious guy who wouldn’t publish anything without a proper unitTest 🙂

**EDIT**:unfortunately, “they” found bugs in his proof…

From Wikipedia – “The proof has been reviewed publicly by academics,[22] and it was found to contain irreparable conceptual-level errors as well as a number of concrete errors and flaws.[23]

Coolest Google search’s new feature

Cong’ to Google for changing the search results page layout.

There is one feature which that I find extremely useful.

The “anytime” one, where once can search for “pages” that has been updated in the last X month/weeks/days

Moreover, one can search for “pages” between a range , e.g. between last July to last Aug..

Why do I find it that useful?

1st because I sent such a “feature request” my self 🙂  (2 years ago!!!)

2nd, in many cases I found my self searching for some “event”, yet the first results were much older “pages” then those I expected, Especially the keyword “new” has no meaning since things were “new” back in 2005 as well.

e.g when you search for “new features in Java” you will probably get new features in JDK 1.4 as those pages have much higher PageRank, yet you probably searched for what’s new in Java for the last few months…NOW you can do that.

Another cool feature that solves the same problem is the “Sorted by date” feature

GoodJob

Quick JDK 8 Suggestion

JDK 7 is just behind that door, and I am really excited about all the goodies that it brings with it.

While I was trying to benchmark JDK7 vs. older JDKs, I realized that the GC(Garbage Collector) is an unknown factor, i.e. while some piece of code is running, one can never know if the GC is running in parallel , in such a case that specific iteration might take much more time, hence the benchmark data will get corrupted.

So my suggestion is adding an awesome new block type: (JDK 8!!!)

no Garbage Collection block – noGC

noGC{

//some code here;

}

catch(AlmostOutOfMemoryError err){

}

While the code inside the noGC block is running, it’s promised that the GC won’t run in parallel

The AlmostOutOfMemoryError will get thrown in case the Heap is X% full (whereas X is configurable as -xnogcf )

Just to be clear, it wouldn’t help me out with benchmarking since the older JDKs do not support it, yet that was the trigger..

how would such a block would work in a multi-threaded environment is another issue..

In my opinion applications with a tiny real-time need would benefit a lot using such a block, much more than all those real-time Java frameworks out there…

Would love to hear your opinion