Getting it all ready…
PREV: Cassandra and Big Data – building a single-node “cluster”
First, I wanted to see how much of a system footprint 3 instances of Cassandra had on this little system. Here you can see the 3 instances patiently waiting for something todo. Sitting idle for about 24 hours (note, TIME+ is system time, not wall clock), total memory utilization has crept up from 11% to 14% per process.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4554 bigdata 20 0 891m 134m 4388 S 7.6 14.4 7:47.96 java
4632 bigdata 20 0 917m 133m 4340 S 0.7 14.3 7:45.64 java
4593 bigdata 20 0 896m 133m 4168 S 0.3 14.3 7:40.37 java
Keep in mind this test box has a single core CPU with a whopping 1GB of memory. If I can get it to work on this box without pushing it over, you should be able to run a single instance on any box with a reasonable expectation of function.
The data model I wanted to use is pretty basic: IP traffic, consisting of the following elements:
* IPv4 address
* destination port
* timestamp
* TTL (this is a Cassandra construct to allow auto-tombstoning of data when it’s usefulness has expired)
To get this data, I’m thinking of simply running TCPdump on a box, or possibly my laptop, to generate some traffic, then stream that into a program to insert into Cassandra as fast as the packets go by.
With the limited disk space on the box (see below) I can’t run it indefinitely, but I should be able to run it for an afternoon to load a keyspace, then start to figure out how to get the data back out!
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 75956320 4344788 67753152 7% /
none 470324 640 469684 1% /dev
none 478024 420 477604 1% /dev/shm
none 478024 108 477916 1% /var/run
none 478024 0 478024 0% /var/lock
One thing I could do is load the data into the database, then run a 2nd pass processor on it and mutate the data with reverse lookups. Sort of a poor-man’s Wireshark type of tool. Now, if I wire this into my eventually to be setup RPZ enabled DNS resolver, I could track all data on my network, including all the requests from my Apple TV device. It might be interesting to see what it’s *really* doing on the network.
Downloading Support Packages for Development Environment
Before staring to code though, it looks like I need to ensure my JDK / Java libs are all up to date… and also to facilitate working with the documentation I’m reviewing.. Apache ANT will be installed too.
Java JDK – Java Software Development Kit
The JDK is a development environment for building applications, applets, and components using the Java programming language.
The JDK includes tools useful for developing and testing programs written in the Java programming language and running on the Java&™; platform.
Package URL: http://download.oracle.com/otn-pub/java/jdk/7u3-b04/jdk-7u3-linux-x64.tar.gz
mkdir jdk
cd jdk
wget http://download.oracle.com/otn-pub/java/jdk/7u3-b04/jdk-7u3-linux-x64.tar.gz
Extract the package:
tar xvzf jdk-7u3-linux-x64.tar.gz
Although I could simply run the JDK from the local user location, I decided to go for the ‘System Install’ option, and created a jdk location in user/lib, then copied the parts there according to the info in the docs. In this case I just downloaded the JRE again… you could skip that step and copy the .gz file already downloaded above. Your call.
sudo mkdir /usr/lib/jdk
cd /usr/lib/jdk
sudo wget http://download.oracle.com/otn-pub/java/jdk/7u3-b04/jdk-7u3-linux-x64.tar.gz
sudo tar xvzf jdk-7u3-linux-x64.tar.gz
sudo rm jdk-7u3-linux-x64.tar.gz
Oracle’s page says that it’s now ‘installed’ but I suspect there are a more than a few more steps required here! This is almost as good as Oracle technical support… I’ll try to be a little more helpful.
Setting the path in my ~/.bash_profile will resolve the path issue for Ant and JUnit. This is what I set in my file:
export JAVA_HOME=/usr/lib/jdk/jdk1.7.0_03
ANT – Apache Ant
Apache Ant is a Java library and command-line tool whose mission is to drive processes described in build files as targets and extension points dependent upon each other. The main known usage of Ant is the build of Java applications. Ant supplies a number of built-in tasks allowing to compile, assemble, test and run Java applications. Ant can also be used effectively to build non Java applications, for instance C or C++ applications. More generally, Ant can be used to pilot any type of process which can be described in terms of targets and tasks.
Package URL: http://www.carfab.com/apachesoftware//ant/binaries/apache-ant-1.8.3-bin.tar.gz
mkdir ant
cd ant
wget http://www.carfab.com/apachesoftware//ant/binaries/apache-ant-1.8.3-bin.tar.gz
Extract the package:
tar xvzf apache-ant-1.8.3-bin.tar.gz
Docs inside Ant say to go back to the web and read the installation instructions, located here: http://ant.apache.org/manual/install.html#installing I happen to like where my ant stuff was installed so I’m going to set ANT_HOME in my ~/.bash_profile to the location where I extracted the stuff. Ideal? Probably not but I’m doing this research on a perfectly good Saturday.. you get what you’re paying for.
export ANT_HOME=/home/bigdata/ant/apache-ant-1.8.3
export PATH=$PATH:$ANT_HOME/bin
Testing to see if the paths and parts are there worked. This error is actually expected (we’ll write the build.xml later).
$ ant
Buildfile: build.xml does not exist!
Build failed
JUnit – Test framework for test based development
JUnit is a simple framework to write repeatable tests. It is an instance of the xUnit architecture for unit testing frameworks.
mkdir junit
cd junit
wget https://github.com/downloads/KentBeck/junit/junit-4.10.jar
wget https://github.com/downloads/KentBeck/junit/junit4.10.zip
Extract the source package, in case I need it:
unzip junit4.10.zip
I can’t say this is the best way to do this, it’s cookie-cutter implementation from documentation. If you see something that does not make sense or is flat out stupid, post comment and let me know!
Development Environment Setup
Create primary development folder and expected sub-folders. You’re naming conventions may vary:
mkdir cBuild
mkdir cBuild/src
mkdir cBuild/src/{java,test}
mkdir cBuild/lib
Populate the lib with libraries from the Cassandra distribution and Junit.
cp cassA-1.0.8/lib/*.jar cBuild/lib/.
cp junit/*.jar cBuild/lib/.
To employ JUnit testing harness via Ant Java builder, a build.xml file is required in the cBuild base directory. Here are sample contents. You’re paths may differ if you went your own way on the directories.
vi cBuild/build.xml
<project name="jCas" default="dist" basedir=".">
<property name="src" location="src/java"/>
<property name="test.src" location="src/test"/>
<property name="build" location="build"/>
<property name="build.classes" location="build/classes"/>
<property name="test.build" location="build/test"/>
<property name="dist" location="dist"/>
<property name="lib" location="lib"/>
<!-- Tags used by Ant to help build paths, most useful when multiple .jar files are required -->
<path id="jCas.classpath">
<pathelement location="${build.classes}"/>
<fileset dir="${lib}" includes="*.jar"/>
</path>
<!-- exclude test cases from the final .jar file, this defines that policy -->
<path id="jCas.test.classpath">
<pathelement location="${test.build}"/>
<path refid="jCas.classpath"/>
</path>
<!-- Define the 'init' target, used by other build phases -->
<target name="init">
<mkdir dir="${build}"/>
<mkdir dir="${build.classes}"/>
<mkdir dir="${test.build}"/>
</target>
<!-- 'compile' target -->
<target name="compile" depends="init">
<javac srcdir="${src}" destdir="${build.classes}">
<classpath refid="jCas.classpath"/>
</javac>
</target>
<!-- 'test compile' target -->
<target name="compile-test" depends="init">
<javac srcdir="${test.src}" destdir="${test.build}">
<classpath refid="jCas.test.classpath"/>
</javac>
</target>
<!-- setup policies that tell JUnit to execute tests on files in test that end with .class -->
<target name="test" depends="compile-test,compile">
<junit printsummary="yes" showoutput="true">
<classpath refid="jCas.test.classpath"/>
<batchtest>
<fileset dir="${test.build}" includes="**/Test.class"/>
</batchtest>
</junit>
</target>
<!-- on a good build, dist target creates final JAR jCas.tar -->
<target name="dist" depends="compile">
<mkdir dir="${dist}/lib"/>
<jar jarfile="${dist}/lib/jCas.jar" basedir="${build.classes}"/>
</target>
<!-- run target allows execution of the built classes -->
<target name="run" depends="dist">
<java classname="${classToRun}">
<classpath refid="jCas.classpath"/>
</java>
</target>
<!-- clean target gets rid of all the left over files from builds -->
<target name="clean">
<delete dir="${build}"/>
<delete dir="${dist}"/>
</target>
</project>
NOTE!. There is a bug in Ant 1.8 that requires the addition of this element, or you will be plagued with nasty warnings:
...
This is the proper way to modify that above two javac blocks to include these element:
<javac srcdir="${src}" destdir="${build.classes} includeantruntime="false"">
<javac srcdir="${test.src}" destdir="${test.build}" includeantruntime="false">
Testing this build environment
Having created the build.xml file, it needs to be tested to make sure it even works.
Create a test case and build case
cd cBuild/src
vi Test.java
import junit.framework.*;
public class Test extends TestCase {
public void test() {
assertEquals( "Equality Test", 0, 0);
}
}
Create a really simple program..
vi X1.java
public class X1 {
public static void main (String [] args) {
System.out.println("This is Java.... drink up!");
}
}
Now the rubber meets the road if everything is setup properly and we can build a file!
Run ant with target set to 'test'
~/cBuild$ ant test
Buildfile: /home/bigdata/cBuild/build.xml
init:
compile-test:
compile:
[javac] Compiling 1 source file to /home/bigdata/cBuild/build/classes
test:
BUILD SUCCESSFUL
Total time: 7 seconds
Run ant with target set to 'diet'
~/cBuild$ ~/cBuild$ ant dist
Buildfile: /home/bigdata/cBuild/build.xml
init:
compile:
dist:
[jar] Building jar: /home/bigdata/cBuild/dist/lib/jCas.jar
BUILD SUCCESSFUL
Total time: 1 second
It's a good idea to check your .jar to make sure your class is actually in it. Ant, for some reason beyond understanding or logic, WON'T let you know if your lib was skipped (had it happen in my first build.. exceptionally ungood).
~/cBuild$ jar -tf dist/lib/jCas.jar
META-INF/
META-INF/MANIFEST.MF
X1.class
As you can see, there is no Whiskey, but X1 is in the jar.
RUN!!!
~/cBuild$ ant -DclassToRun=X1 run
Buildfile: /home/bigdata/cBuild/build.xml
init:
compile:
dist:
run:
[java] This is Java.... drink up!
BUILD SUCCESSFUL
Total time: 1 second
SUCCESS!!!
All told it took me about 3 1/2 hours to get this setup, parts installed, these notes written up and a SIMPLE Java program executed. So.. let your own expectations accordingly. Hopefully you'll save a lot of time with the build.xml file.. I typed that in char for char. You could just do a cut-paste, fix up anything you don't like in my path names and let it rip.
Good luck.. more to follow on Cassandra!!! (even though this post was more about getting ready to write code to access it).
NEXT: Re-Configuring an Empty Cassandra Cluster