Configure Maven pom.xml to build integrated executable .jar (fat jar)

Configuring Apache Maven to build an integrated .jar file took a little research, especially when building with NetBeans.. but it can be done with a little hand editing.

NOTE: Project using this example can be forked/pulled from: IngeniiCode/AvMet

Setting up the basic POM

My project originated as a NetBeans nbproject, but I wanted to convert it over to Maven for a variety of reasons, not the least of which was standardization. To do this I created a dummy Maven project, copied the pom.xml and reconfigured my project. There are many tutorials on that, so I won’t cover that here; but I will cover the pom.xml itself for reference to others, as well as myself.

Main Block

The entire pom.xml is bounded by this tag group:

<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> </project>
Basic Properties

This block defines the most basic properties of the application. I’m including the bounding block here one more time just for continuity / reference; even though all blocks are within this bounding block. The ellipsis ( […] ) is not part of the package.. it’s only denoting that there is more of pom than just this section.

<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.ingeniigroup.stratux</groupId> <artifactId>AvMet</artifactId> <version>0.1.0</version> <packaging>jar</packaging> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> </properties> [...] </project>
Dependencies Block

For the current iteration of this project, I am importing the JDBC library for accessing SQLite database. This is a fairly heavy-weight chunk of code (resulting jar is over 6 MB). Before using the Maven configuration, project simply included the current copy of the jar. That was OK for a Proof of Concept but, bad for security patching, and keeping updates integrated when they are released. Your own Maven version handling scheme will of course dictate when/if you define later updates.. but this will get you started.

This configuration is latest as of time time of this was originally published (2-NOV-2017).

<dependencies> <!-- https://mvnrepository.com/artifact/org.xerial/sqlite-jdbc --> <dependency> <groupId>org.xerial</groupId> <artifactId>sqlite-jdbc</artifactId> <version>3.20.0</version> </dependency> </dependencies>
The Build Configuration

Finally, the build block will determine how your jar(s) are built.

This configuration will end up generating two Jars (as of original publishing date, the version number was 0.1.0). AvMet-0.1.0.jar is the stripped Jar, and to execute will require the other supporting jars to be available in a ‘lib/’ sub directory. The other jar, AvMet.jar is the integrated (fat) executable jar.

74382 Nov 2 10:57 target/AvMet-0.1.0.jar 6708128 Nov 2 10:57 target/AvMet.jar

This build block will create those two executables. To prevent Maven from appending the string ‘jar-with-dependencies’ to your combined executable, the option ‘<appendAssemblyId>false</appendAssemblyId>’ must be defined in your build configuration.

To create an integrated (single) jar, this ‘<goals>’ block must be defined:

<goals> <goal>single</goal> </goals>

In addition, to generate a specific final jar name (such as not without the version number), the ‘<finalName>${project.name}</finalName>’ tag will enable that action.

This is what my Maven build block looks like:

<build> <plugins> <plugin> <!-- Build an executable JAR --> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-jar-plugin</artifactId> <version>3.0.2</version> <configuration> <archive> <manifest> <addClasspath>true</addClasspath> <classpathPrefix>lib/</classpathPrefix> <mainClass>com.ingeniigroup.stratux.AvMet.AvMet</mainClass> </manifest> </archive> </configuration> </plugin> <plugin> <!-- Build a *fat* executable JAR --> <artifactId>maven-assembly-plugin</artifactId> <executions> <execution> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> <configuration> <!-- tell plugin to NO addpend the descriptionRef into target filename --> <appendAssemblyId>false</appendAssemblyId> <!-- define the final assembly name --> <finalName>${project.name}</finalName> <archive> <manifest> <addClasspath>true</addClasspath> <mainClass>com.ingeniigroup.stratux.AvMet.AvMet</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> </plugins> </build>


One you have a good pom.xml setup, you can build run within NetBeans and end up with an integrated executable that will run with this simple command path:

java -jar target/AvMet-0.1.0-jar-with-dependencies.jar ../sqlite-dbs/stratux.sqlite.11-01 keepdb scrub

Deploying Java Gearman Job Server

OK, after more than a year it’s time to get down to really building out a Gearman system.

I’ve recently taken on a project that I believe to be the perfect fit for the pipelined distributed work manager of Gearman. I’m of course, not at liberty to discuss the various details of this project, but I can provide some high-level description for the purposes of justification.

The Project – distributed harvesting

The objective of the project is to provide a distributed method of web page scraping and parsing. This project requires that the scraping and profiling occur for 2,000,000+ websites in under 18 hours. No small feat for certain. The good news is that I’ve built a systems in the past (circa 2006) that did just this using MySQL as the task manager. It worked, but it had it’s issues, and almost every single one of them can be mitigated by using Gearman. The rest will be mitigated with the application of NoSQL solutions for site list management.

What is Gearman

Here is the synopsis from the Gearman.org main page.

Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.

The Job Server — implementing in Amazon’s ES2 environment

For the project I’m working on, I’ve opted for the Java implementation of the Job Server. This implementation’s main page is located [HERE].

Information about the Java Job Server:

Java Gearman Service is an easy-to-use distributed network application framework implementing the gearman protocol used to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages.

04-23-2012 java-gearman-service v0.6 has been released. [DOWNLOAD]

  • The service now uses the slf4j logging facade, allowing the user to have better control over logging
  • Persistent background jobs are now supported though an application hook
  • The API has been updated to be more user friendly, and it makes it easier to create divide-and-conquer/mapreduce applications (breaks the code of previous versions)
  • A .properties file now may be used to set property values and fine-tune the application.
Requirements to deploy the Java job server:
  • Java SE 7
  • slf4j 1.6.4+

For my implementation, I’ve extracted the zip into a vendor directory form where I’ll plan to launch the .jar. Development is occurring on an Apple OSX portable, then deployed to the AWS EC2 cluster for production. It’s expected that some library pathing and configuration will be required to make this all work.

Starting up the Java Gearman Service

It took a little time to locate the instructions for Starting up the Java implementation of the Gearman Service. It is located [HERE].

Instructions on how I started the GearMan server on my OSX development machine are located [HERE].

Once started, you should be able to communicate with it with your Client and Worker code!!

Next steps:

Installing Gearman PHP components

Build a GearMan Client Demonstrator

Build a GearMan Worker Demonstrator

Drop keyspace using Cassandra Cli

Dropping a an entire keyspace using the cassandra-cli is exceptionally simple.

First, access your cluster using the cli. I have an alias in my .bash_profile so I only need to type ‘cass’ to access the clid. In an attempt to be helpful though, I shall show the full command syntax for my environment. Your host and port may vary.

  alias cass='cassandra-cli -h'

In this example, I am going to drop the keyspace I was loading with test data in previous posts, ks33.

hpcass: ~$ cass
Connected to: "Test1" on
Welcome to Cassandra CLI version 1.0.8

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

DROP keyspace ks33;

Waiting for schema agreement...
... schemas agree across the cluster

That’s all there was to it. Keyspace destroyed.

Previous Cassandra related articles

Starting gearmand deamon

Getting the gearmand deamon running. Should be simple work, yet, as with a lot of thing in the software world, it’s not quite that simple. My first attempt to start up gearmand failed with the following error:

Corsa-3:~ root# gearmand
Error creating socket: IO::Socket::INET: Address already in use
Corsa-3:~ root#

Now, I’m not sure why that would be the case, since I checked the system process list and did NOT see any other instance of gearman executing (if you are not sure what you are looking at, the only thing found matching gearman, was the grep against the process list itself, not what I’m looking for)

Corsa-3:~ root# ps -ealf | grep gearman
0 33324 32942 4006 0 31 0 2425520 168 - R+ 5bb5a80 ttys000 0:00.00 grep gearman 0:00.00

A check of netstat did not reveal a listener on either of the known gearman ports (7003 or the new official port: 4730 ).


Installing Gearman APIs for JAVA and PERL

Following yesterday’s marathon package installation ‘experiment’, today I started to acquire and install the Client/Worker API modules.

JAVA – installing gearman-java-0.03.jar

Java modules can be located on the downloads page for Gearman. It is also available at the Launchpad page for Gearman. I selected the code from the gearman.org site to try first, which ended up directing me to: https://launchpad.net/gearman-java.

I downloaded all three available files:

  • gearman-java-0.03.jar
  • geareman-java-0.03-javadocs.jar
  • gearman-java-0.03-src.tar.gz

Once downloaded, I moved all three files, and the original gearmand-C source to my Development directory.

PERL – installing Gearman::XS

For the PERL implementation, I’m going with the PERL wrapper, around the core gearman C libraries. I’d rather leverage as much C as possible. The focus for me is on speed.
It is available via CPAN under Gearman::XS.

Installing PERL packages is generally pretty painless.

#root:  cpan Gearman::XS
Running install for module 'Gearman::XS'
Running make for D/DS/DSCHOEN/Gearman-XS-0.8.tar.gz
Fetching with LWP:
Writing /Library/Perl/5.10.0/darwin-thread-multi-2level/auto/Gearman/XS/.packlist
Appending installation info to /Library/Perl/Updates/5.10.0/darwin-thread-multi-2level/perllocal.pod
/usr/bin/make install  -- OK

With the installation completed, I wrote and executed a very simple script, to test PERL’s ability to locate the new module.

#!/usr/bin/perl -w
use strict;
use Gearman::XS qw(:constants);
use Gearman::XS::Client;

print “\nLoaded and ran fine\n”;
exit 0;

This was the result:

david$ ./gtest0.pl

Loaded and ran fine

Installation of the PERL module Gearman::XS was successful.

NOTE: This is part of a series of posts, centered around installation and evaluation of Gearman as a distributed scheduling product. Here are the other articles in this group: