Current Projects – Part II: Where the hell is that class? Or, The Jar Indexer

0

First, a little background for the developer novice:

As any (Java) developer knows there are a ton of external class dependencies when you are developing an application. This is done via “import” statements (or fully qualified class names) in the code itself. Every language has this type of mechanism in one form or another. It’s a very simple and straightforward process.

Problems soon follow, however, as the compiling application needs to be able to find these resources to successfully build the software. Each language has it’s own dependency packaging scheme and also a means of finding these packages. Java uses .jar files, which are really nothing more than zip files with a specific layout. These jar files need to be included on the application’s “classpath” (a special variable that java looks to in order to find it’s compile-time and run-time dependencies) in order to compile and run.

Ok, now that the primer is out of the way we’ll get down to the issue at hand.

I actually spend a lot of my time going from project to project converting them to use the Maven build process. This entails finding and declaring every top-level dependency that is required for both test-, compile-, and run-times. Keep in mind that enterprise level applications have dozens of dependencies. As you can imagine this is a daunting and very boring task.

Why so daunting you ask? To be honest, it’s a basic flaw of the java dependency packaging scheme. The class is the basic unit of Java and it is on these is that software depends. Classes, to maintain a unique namespace, are housed in packages – which is no more than a directory structure inside of the jar file itself (i.e. com.mgensystems.TestClass). All that is fine and good, but Java doesn’t force any sort of standard on the naming of the jar files by what it contains. A single jar can contain any number of package and class combinations. Now you can start to see my problem – not being able to locate a jar that contains a particular package / class combination…

Well, being lazy at heart, I decided to come up with a solution that would allow me to find these pesky dependencies with a lot less effort. What I wanted to do was find some way to record every jar file on the filesystem and have every package and class they contained indexed so that I could find a specific jar file when looking for a particular class. What I came up with is the very appropriately named Jar Indexer. And yes – I do this a lot. Some folk would call it lazy – I call it being ingenuous and efficient 🙂

The Jar Indexer is, at it’s core, very simple. It utilizes an open source indexing engine from the Apache open source project called Lucene. The Lucene engine allows you to index whatever tidbits of information that you want and will provide you the means to search on it – which sounds like something that could be very handy!

Having the most difficult part of the project done for me (thank you open source!) all that I needed to do was a) provide the engine with the information to index, and b) wrap a simple interface around the index and search functions.

To accomplish Part A of the design I chose to rely upon another open source tool from the Jakarta Commons project – commons-io. Commons-IO allows me to specify a root location and have a list of recursively scanned files returned to me based on a filter – how nice is that? I take that output and break open each jar, noting every package and class it contains. I then pass that information into Lucene to index. Of course it was a little more complicated than that (I added a caching mechanism and whatnot for speed plus some other pieces of niftiness), but that was the high level process in a nutshell.

Now for Part B – a simple user interface. I again called upon the open source community and used another Jakarta Commons project- commons-cli. This handy API provides an easy way to create a command-line interface. You specify the arguments and it will parse them. Extremely simple but it’s a tremendous time saver.

Below is an example of the command-line for indexing using the Jar Indexer:

usage: JarIndexer -p paths -i path
-i,–index-path full path to the location of the index
-p,–jar-paths comma separated full paths to directories
containing jars to index

and the command-line for searching the index:

usage: JarIndexer [-q queries] [-o format] [-l repo] -i index [-f field]
-q,–path-to-file file containng multiple queries
-o,–result-format set to ‘m’ for maven dependency
-l,–repo-location the base path of the maven repository for
building the dep section
-f,–field-name sets the default field
-i,–index-path full path to the location of the index

When this application is run on the root of the filesystem it will index every jar / package / class on your machine and you can easily find in what jar any particular class is located… very handy indeed…

You can download the Jar Indexer from sourceforge.net here.

What’s next for Jar Indexer? More information indexed, more output formats, and a fancy GUI… I’ll keep you posted. Also, any ideas would be fantastic as well.

I hope maybe this tool will be useful to folks. But more importantly it was the first step in my goal of automating the Maven set up for new and existing projects. Be sure to read my next blog on the Maven Source Dependency Scanner!

Tags: ,

Current Projects – Part III: The Maven Dependency Source Scanner

0

Maven has a marvelous dependency management feature which allows an application to declare it’s top level dependencies and then have Maven not only automatically bundle them with the application – but also discover the transitive dependency graph of all of the declared dependencies. This is an extremely useful and time saving process and allows for a bevy of management options when it comes to the project itself.

Tags: ,

Current Projects – Part I: A Preface

0

What do I do?

Well, I’m a software developer / architect by trade. I’ve worked for various consulting companies over the years and have worked for dozens of clients using a half dozen languages on a bevy of platforms…

Now that we have that out of the way, my main focus over the last couple of years – outside of straightforward development – has been continuous integration, or more precisely the implementation of continuous integration in the enterprise.

Don’t know what that is? Check out the wikipedia entry for a decent overview. If you are still so extremely interested after that thrilling description – please continue reading…

Nowadays I mostly work in the Java domain – so keep that in mind when I talk about the toolsets of CI. This is particularly important as the build tool of choice (at least for me and what I’ve implemented for the enterprise at Nationwide Insurance) is Maven – which you can check out at the Apache Maven site…

Maven deals with the “how” of compiling a project and producing an artifact. Unlike other build tools available it is built from the ground up to handle the dependencies of a project (in this case other jar,sar, *ar files). Dependencies are the main headache in any build / deployment scenario. From simply finding out what they are and where they are located to what version of which is needed (including their transitive dependencies).

It can be very nightmarish…

In any case, this series will not be a Maven tutorial. The projects I’ve created are for use (mainly) with Maven and I will go over the pertinents in a JIT manner (just-in-time for you non-java folk) so that you won’t be completely lost 🙂

I’m just getting back into blogging – so be gentle…

Next Up – The jar-indexer

Maven 2 Remote Repositories – Part II

0

It appears that archiva doesn’t work right out of the box – at least not for it’s current version. After downloading and building the project it was still throwing configuration exceptions and wouldn’t deploy. So I searched around jira and found a fix for the bug. After following the prescribed steps and creating my own basic archiva.xml in my .m2 directory it worked, at least the test did…

When I continued on to deploying the standalone version to my destination server there was another issue – a NamingException. Turns out someone checked in the plexus.xml config that duplicated a datasource. I just had to go to the conf/plexus.xml file and fix it… I crossed my fingers, closed my eyes, and ran the run.sh script…

It worked!

Now for configuration…

Follow the directions to set up your managed repositories and the repositories that they proxy. Pretty straightforward and works out of the box. The tricky part is setting up your settings.xml

It appears that at this time just setting up mirrors doesn’t work unto itself. Mirroring works for any non-plugin repositories. However, for each plugin repository you will need to set up pluginRepository elements in a profile. This is clunky and will hopefully get worked out as the product matures.

The last tidbit that took me a while to figure out is this: Any connection to the managed archiva repository is expected to be secure – meaning it wants a userid and password. This was not abundantly clear in the documentation… You need to set up a server entry in your settings.xml for each mirror / pluginRepository that you you plan on proxying. The userid and password are those that are defined in archiva. I simply defined a maven-user user with no password and assigned to it the role of Repository Observer.

Once you have these set up you are good to go!

Tags: , ,

Maven 2 Remote Repositories

0

In Maven 1.x the repositories were simple – there wasn’t a difference between a local repository and a remote repository. The layouts were the same and there wasn’t additional information in one that wasn’t contained in the other. The only variant was where the repository was located.

In Maven 2.x that all changed. With the addition of transitive dependencies everything got a little more complicated. I will attempt to explain…

A remote repository, and local for that matter, contain a few more files. The obligatory jars are still there, as are deployed poms. The additional files come in the way of metadata and their checksums.

Each artifact has at it’s root level (i.e. not by version) a maven-metadata.xml file (on the server) or multiple maven-${serverId}-metadata.xml files that contain all the releases of the artifact, as well as the latest and released versions and it’s deployed timestamp (on the remote) or it’s downloaded timestamp (on the local).

These files are used for a couple of things. The first is to allow maven to check for updates based on time. If you have repositories in your settings.xml or POM that allow updates (daily for example) Maven will check these timestamps and compare local versus remote to determine if a download is required. The second use is that of when a dependency is declared without a version. Maven will first check the local repository and it’s metadata to determine what the latest version of the artifact is and download if necessary.

This poses a small problem when trying to create an enterprise remote repository that doesn’t allow access to the internet at large. These metadata files need to be mantained by hand (or by an automated process) outside of the realm of Maven’s dependency management.

Why can’t you copy a local repository to the remote? You can, but it won’t work for these dynamic version checks. The problem is that the metadata files are renamed to that of the server id from where a particular version was downloaded. There can be several, depending on the artifact, so you can’t just rename the file back to what Maven is expecting to find.

I’m checking into a couple of options. The first I’ve implemented as a stopgap – a basic wget script that can download the artifact’s complete directory structure. It works, but it’s clunky and doesn’t automatically handle transtitive dependency downloads. The second tool I’m going to testdrive is Archiva

Check back to see the results…

Tags: , ,