Current Projects – Part II: Where the hell is that class? Or, The Jar Indexer


First, a little background for the developer novice:

As any (Java) developer knows there are a ton of external class dependencies when you are developing an application. This is done via “import” statements (or fully qualified class names) in the code itself. Every language has this type of mechanism in one form or another. It’s a very simple and straightforward process.

Problems soon follow, however, as the compiling application needs to be able to find these resources to successfully build the software. Each language has it’s own dependency packaging scheme and also a means of finding these packages. Java uses .jar files, which are really nothing more than zip files with a specific layout. These jar files need to be included on the application’s “classpath” (a special variable that java looks to in order to find it’s compile-time and run-time dependencies) in order to compile and run.

Ok, now that the primer is out of the way we’ll get down to the issue at hand.

I actually spend a lot of my time going from project to project converting them to use the Maven build process. This entails finding and declaring every top-level dependency that is required for both test-, compile-, and run-times. Keep in mind that enterprise level applications have dozens of dependencies. As you can imagine this is a daunting and very boring task.

Why so daunting you ask? To be honest, it’s a basic flaw of the java dependency packaging scheme. The class is the basic unit of Java and it is on these is that software depends. Classes, to maintain a unique namespace, are housed in packages – which is no more than a directory structure inside of the jar file itself (i.e. com.mgensystems.TestClass). All that is fine and good, but Java doesn’t force any sort of standard on the naming of the jar files by what it contains. A single jar can contain any number of package and class combinations. Now you can start to see my problem – not being able to locate a jar that contains a particular package / class combination…

Well, being lazy at heart, I decided to come up with a solution that would allow me to find these pesky dependencies with a lot less effort. What I wanted to do was find some way to record every jar file on the filesystem and have every package and class they contained indexed so that I could find a specific jar file when looking for a particular class. What I came up with is the very appropriately named Jar Indexer. And yes – I do this a lot. Some folk would call it lazy – I call it being ingenuous and efficient 🙂

The Jar Indexer is, at it’s core, very simple. It utilizes an open source indexing engine from the Apache open source project called Lucene. The Lucene engine allows you to index whatever tidbits of information that you want and will provide you the means to search on it – which sounds like something that could be very handy!

Having the most difficult part of the project done for me (thank you open source!) all that I needed to do was a) provide the engine with the information to index, and b) wrap a simple interface around the index and search functions.

To accomplish Part A of the design I chose to rely upon another open source tool from the Jakarta Commons project – commons-io. Commons-IO allows me to specify a root location and have a list of recursively scanned files returned to me based on a filter – how nice is that? I take that output and break open each jar, noting every package and class it contains. I then pass that information into Lucene to index. Of course it was a little more complicated than that (I added a caching mechanism and whatnot for speed plus some other pieces of niftiness), but that was the high level process in a nutshell.

Now for Part B – a simple user interface. I again called upon the open source community and used another Jakarta Commons project- commons-cli. This handy API provides an easy way to create a command-line interface. You specify the arguments and it will parse them. Extremely simple but it’s a tremendous time saver.

Below is an example of the command-line for indexing using the Jar Indexer:

usage: JarIndexer -p paths -i path
-i,–index-path full path to the location of the index
-p,–jar-paths comma separated full paths to directories
containing jars to index

and the command-line for searching the index:

usage: JarIndexer [-q queries] [-o format] [-l repo] -i index [-f field]
-q,–path-to-file file containng multiple queries
-o,–result-format set to ‘m’ for maven dependency
-l,–repo-location the base path of the maven repository for
building the dep section
-f,–field-name sets the default field
-i,–index-path full path to the location of the index

When this application is run on the root of the filesystem it will index every jar / package / class on your machine and you can easily find in what jar any particular class is located… very handy indeed…

You can download the Jar Indexer from here.

What’s next for Jar Indexer? More information indexed, more output formats, and a fancy GUI… I’ll keep you posted. Also, any ideas would be fantastic as well.

I hope maybe this tool will be useful to folks. But more importantly it was the first step in my goal of automating the Maven set up for new and existing projects. Be sure to read my next blog on the Maven Source Dependency Scanner!

Tags: ,

Current Projects – Part I: A Preface


What do I do?

Well, I’m a software developer / architect by trade. I’ve worked for various consulting companies over the years and have worked for dozens of clients using a half dozen languages on a bevy of platforms…

Now that we have that out of the way, my main focus over the last couple of years – outside of straightforward development – has been continuous integration, or more precisely the implementation of continuous integration in the enterprise.

Don’t know what that is? Check out the wikipedia entry for a decent overview. If you are still so extremely interested after that thrilling description – please continue reading…

Nowadays I mostly work in the Java domain – so keep that in mind when I talk about the toolsets of CI. This is particularly important as the build tool of choice (at least for me and what I’ve implemented for the enterprise at Nationwide Insurance) is Maven – which you can check out at the Apache Maven site…

Maven deals with the “how” of compiling a project and producing an artifact. Unlike other build tools available it is built from the ground up to handle the dependencies of a project (in this case other jar,sar, *ar files). Dependencies are the main headache in any build / deployment scenario. From simply finding out what they are and where they are located to what version of which is needed (including their transitive dependencies).

It can be very nightmarish…

In any case, this series will not be a Maven tutorial. The projects I’ve created are for use (mainly) with Maven and I will go over the pertinents in a JIT manner (just-in-time for you non-java folk) so that you won’t be completely lost 🙂

I’m just getting back into blogging – so be gentle…

Next Up – The jar-indexer