Tag Archives: software

Entrepreneurship, Software Development, Technology

JIRA – tracking projects in an Agile way

14 June 2012 David Leave a comment

With the kick-off of my new Start-Up Company (this is #8 for me, since I started my first company in 1984, Bay Auto Electronics), after taking 10 years to pursue some potentially lucrative (only time will tell if those efforts ever pay off, I’m not holding my breath) employment opportunities in the Internet Security / Anti-Fraud sector.

The short term plan is for that work to continue on a project consulting basis for the remainder of the year (that is the plan.. always subject to change), however in addition to that I’ve taken on two additional clients with very diverse project needs. Those needs need to be carefully manged and time properly allocated to each of these clients and their projects.

In the past, I’ve had adequate success using Work Diary spreadsheets to call out time per project and how it was spent within each of these projects. I continue to do that now. However I want a more useful, powerful and visual tool to track efforts, tasks, sprints, milestones, etc. And in addition to that I want to expose that information to each of my clients so they can get a status update on their projects near-real time, any time, day or night, and also help project their expenses as the projects move forwards.

To make this goal a reality, I have decided to Trail out a tool recently implemented at one of my former employers. It’s name is JIRA. And so far, having only used it there for 30 days or so, I’m impressed. Here is a screen shot of my current JIRA Dashboard (projects, names etc changed to protect the innocent, etc. etc. etc.).

Sample JIRA Project Dashboard. — My Sample JIRA Project Dashboard

All that said, and after communicating with one of the helpful JIRA engineers to make sure this tool would do what I want, and provide information for my clients as well, all on one system I host, the decision was made to move forward to the project!

To get further feet-wet, I’m first downloading the distributions for both MAC and LINUX. Initially I will be installing this on a MAC workstation to get the project defines, users entered etc. To test out the waters and learn on a test environment before cutting it loose in the wild. Eventually this will roll out with a public facing (for those with the right credentials) interface for project tracking. One of the first projects that I’ll be defining in my private installation will my forthcoming programming book. After 20+ years as a professional developer, trainer, sales engineer, IT Director and Entrepreneur, there are unique perspectives I can bring to the practice of programming. Keep any eye out for announcements on this by September! 🙂

Getting JIRA – downloading distributions

The current distributions, as of this blog, are located here:
http://www.atlassian.com/software/jira/download

Installation Instructions are found here, a Confluence site (another Atlassian product):

https://confluence.atlassian.com/display/JIRA/Installing+JIRA

Screen Shot 2012-06-13 at 11.25.06 PM — Installing JIRA Instructions

NOTE: – regarding OSX
As noted in the pages, installing on OSX is only suitable for evaluation purposes. That’s OK, not a big issue, I’ll have hardware available to host it in the next two weeks. Until then, running a local evaluation will be just fine. Unfortunate that the product can support Windows, but it’s not a surprising point since Apple has shuttered it’s proper Server production lines and is no only shipping MacMini servers and those horrendous beasts know as MAC Pro workstations. There IS A LOT to be said for 19″ rack compatible system, when it comes to REAL CORPORATE operations

Installing on MAC (in this case a laptop of all things)

I selected this package named: JIRA 5.0.6 (TAR.GZ Archive).

Instead of just creating more muck in my Downloads directory, I created a dedicates Atlassian directory under Applications.

I moved the file there and ran the extraction:

First order of business was setting my JIRA Home Directory. The instructions are found here at this link:
https://confluence.atlassian.com/display/JIRA050/Setting+your+JIRA+Home+Directory.

I chose to use the LINUX configuration script located at bin/config.sh to get JIRA setup. This I ran from a console:

You must also setup an environmen var that points to the same directory you configured using the JAVA Config dialog. Since I use the ‘bash’ shell (please, no need to comment on the virtues of ksh, sh, bash.. whatever… I’m not going to listen), I edited my .bash_profile adding these two lines:

## Required Element for JIRA export JIRA_HOME=/Applications/Atlassian/atlassian-jira-5.0.6-standalone

With that little step completed, I returned to the bin/ directory where I installed JIRA and lit up the night:

FotoCorsa-3:bin david$ ./start-jira.sh

To run JIRA in the foreground, start the server with start-jira.sh -fg
executing as current user
                .....
          .... .NMMMD.  ...
        .8MMM.  $MMN,..~MMMO.
        .?MMM.         .MMM?.

     OMMMMZ.           .,NMMMN~
     .IMMMMMM. .NMMMN. .MMMMMN,
       ,MMMMMM$..3MD..ZMMMMMM.
        =NMMMMMM,. .,MMMMMMD.
         .MMMMMMMM8MMMMMMM,
           .ONMMMMMMMMMMZ.
             ,NMMMMMMM8.
            .:,.$MMMMMMM
          .IMMMM..NMMMMMD.
         .8MMMMM:  :NMMMMN.
         .MMMMMM.   .MMMMM~.
         .MMMMMN    .MMMMM?.

      Atlassian JIRA
      Version : 5.0.6
                  
Detecting JVM PermGen support...
PermGen switch is supported. Setting to 256m

If you encounter issues starting or stopping JIRA, please see the Troubleshooting guide at http://confluence.atlassian.com/display/JIRA/Installation+Troubleshooting+Guide

Using JIRA_HOME:       /Applications/Atlassian/atlassian-jira-5.0.6-standalone

Server startup logs are located in /Applications/Atlassian/atlassian-jira-5.0.6-standalone/logs/catalina.out
Using CATALINA_BASE:   /Applications/Atlassian/atlassian-jira-5.0.6-standalone
Using CATALINA_HOME:   /Applications/Atlassian/atlassian-jira-5.0.6-standalone
Using CATALINA_TMPDIR: /Applications/Atlassian/atlassian-jira-5.0.6-standalone/temp
Using JRE_HOME:        /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home
Using CLASSPATH:       /Applications/Atlassian/atlassian-jira-5.0.6-standalone/bin/bootstrap.jar
Using CATALINA_PID:    /Applications/Atlassian/atlassian-jira-5.0.6-standalone/work/catalina.pid

Opening up JIRA for the first time..

Having started JIRA on my localbox, I connected to port 8080 (the one I used as the default in the installation) and started to complete the setup:

It turns out I’ve made some sort of configuration/installation errors that was not called out in the documentation. Such is the story of software installation. I’ll have to get this one sorted out before continuing on.

Screen Shot 2012-06-14 at 12.10.04 AM — JIRA startup error.. this might take a little time to sort out my installation error.

Creating a dedicated JIRA user

Performing a little re-wind, I decided to create separate user account, that can be the JIRA home. This was suggested in the docs but I just didn’t grok it at the time (it’s after midnight.. some slack should be afforded).

Screen Shot 2012-06-14 at 12.13.58 AM — Created dedicated JIRA user.

Now.. back to the environment files… first I’m going to log into the new user and create a place for JIRA, copy it’s path, then update the configs.

I logged in to the new user, via the terminal window, then edited (creates actually) the .bash_profile for the user setting the following as the JIRA environment:

jira$ vi .bash_profile

## Required Element for JIRA export JIRA_HOME=/Users/jira/jira-home

Next, I had to sort out one permissions issue in the Applications directory, and that had to do with the permissions to updates config files in the Altassian directory. To do this, I switched to my root user (su –), moved to the install directory and executed this command to allow group write at all the directory levels for the group user (in this case ‘staff’).

su Password: sh-3.2# pwd /Applications/Atlassian/atlassian-jira-5.0.6-standalone sh-3.2# chmod -R 775 *

I closed that terminal window, then logged my desktop into my new jira user and re-launched the configuration program (see above if you’ve forgotten how that is started up), and reset the home directory:

Screen Shot 2012-06-14 at 12.23.57 AM — Re-Setting the home directory

Tested the connection:

Screen Shot 2012-06-14 at 12.25.36 AM — Testing DB connection.

Set the ports I wanted to use for JIRA (defaults shown):

Screen Shot 2012-06-14 at 12.28.51 AM — Checking / Setting ports

Then kicked off JIRA again, but this time as the jira user. This time it stuck, took and started:

Next step 2 of the installation is presented, and the requisite settings defined. I’m going to run in PRIVATE mode, as I don’t want to have people attempt to add users to my JIRA without my permissions. That sounds like a licensing seat disaster in the making….

Screen Shot 2012-06-14 at 12.49.26 AM — Step 2 of Setup.

NOTE: You will need to sign up and get an evaluation license key to go any further. Since I intend to purchase the product in the new future, unless the evaluation determines another course of action is required, this is a non-issue for me. You may be hesitant to do so, for some reason, one I won’t guess, but if so, be aware of that before digging yourself too deep a hole.

Two more quick steps follow, such as setting up your primary Admin User (sorry, NOT going to show you my settings there), and one last step confirming the setup was successful, before being shuttled over to your new Dashboard!

Screen Shot 2012-06-14 at 12.59.36 AM — Dashboard Login

And.. VIOLA!!! Notice the red warning at the far lowest left, the Evaluation DB attached is IN MEMORY only and most likely will be wrecked on a power fail or other shutdown. This could be a big issue on a laptop, wouldn’t you say? Regardless, this IS an evaluation after all…. so… next steps tomorrow will be to see how this all holds up over the next week when I’m back in CA and can install this on my office’s internal servers.

Screen Shot 2012-06-14 at 1.01.12 AM — Running, living, breathing JIRA!

MORE TO COME….

Software Development, Technology

Cassandra – Getting Started – (deployment Part 2 – Installing Ops Center)

26 March 2012 David 1 Comment

<< Previous: Cassandra – Going into Production – Part 2.

With an empty cluster running, the next step I’m going to take is to install and configure OpsCenter from DataStax. This is a fantastic tool for monitoring the health and performance of your cluster.

Installing Ops Center

The first order of business is to create a directory to store the Ops Center code on the server. I opted to do this within the user account used for Cassandra, as the directory datastax


:~$ mkdir datastax
:~$

Next, download and extract the OpsCenter package:


:~/datastax$ wget http://downloads.datastax.com/community/opscenter-1.4-free.tar.gz
--2012-03-26 08:25:30--  http://downloads.datastax.com/community/opscenter-1.4-free.tar.gz
Resolving downloads.datastax.com... 173.203.57.192
Connecting to downloads.datastax.com|173.203.57.192|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21539843 (21M) [application/octet-stream]
Saving to: `opscenter-1.4-free.tar.gz'

100%[=======================================================================>] 21,539,843  3.72M/s   in 7.5s    

2012-03-26 08:25:38 (2.74 MB/s) - `opscenter-1.4-free.tar.gz' saved [21539843/21539843]

:~/datastax$ tar -xvzf opscenter-1.4-free.tar.gz
opscenter-1.4/
opscenter-1.4/log/
opscenter-1.4/bin/
opscenter-1.4/bin/create-keystore.bat
opscenter-1.4/bin/create-key-pair.bat
[...]
opscenter-1.4/conf/event-plugins/email.conf
opscenter-1.4/conf/ssl.conf
opscenter-1.4/conf/opscenterd.conf

:~/datastax$

Next is the setup for OpsCenter. Setup is done via a Python script, located in the BIN directory. Have your listening IP ready and know which port you want to use for the Ops Center web portal. I’m going to use the default of port 8888. Make sure you have the port open on your machine. (click here to jump to the my section on ports).


:~/datastax$ ls
opscenter-1.4  opscenter-1.4-free.tar.gz
:~/datastax$ cd opscenter-1.4
:~/datastax/opscenter-1.4$ bin/setup.py
Generating a 1024 bit RSA private key
.........++++++
...++++++
writing new private key to 'ssl/opscenter.key'
-----
MAC verified OK
Certificate was added to keystore

:~/datastax/opscenter-1.4$

Configure the Ops Center deamon. Set the listening IP to an IP available on the system. I’m going to node’s interal IP address (10.1.0.23). The values I’ve changed are in bold.


:~/datastax/opscenter-1.4$ vi conf/opscenterd.conf
[...]
[jmx]
# The default jmx port for Cassandra >= 0.8.0 is 7199.  If you are using
# Cassandra 0.7.*, the default is 8080, and you should change this to
# reflect that.
port = 8001
[...] 
[webserver]
port = 8888
interface = 10.1.0.23
staticdir = ./content
log_path = ./log/http.log
[...]
[cassandra]
# a comma-separated list of places to try for a connection to your Cassandra
# cluster:
seed_hosts = 10.1.0.23,10.1.0.26
[...]

Installing the Ops Center Agents

Each node in the cluster must have a running Ops Center agent. The installation package for this was generated by the Ops Center setup process, and saves a compressed file. This file then needs to be copied and extracted on each node you plan to monitor with the Ops Center.


:~/datastax$ mkdir opscenter-agent
:~/datastax$ cp opscenter-1.4/agent.tar.gz opscenter-agent/
:~/datastax$ cd opscenter-agent/
:~/datastax/opscenter-agent$ tar -xvzf agent.tar.gz
agent/opscenter-agent-2.5-standalone.jar
agent/conf/log4j.properties
agent/bin/setup.bat
[...]
agent/bin/ssl/agentKeyStore.p12
agent/bin/ssl/opscenter.key
agent/doc/LICENSE

:~/datastax/opscenter-agent$

Now run the agent’s setup, assigning it’s IP and the Ops Center’s IP. 10.1.0.26 is this node’s IP address. 10.1.0.23 is the location of the Ops Center install (this may or may not be on the same system or even the same IP address):


:~/datastax/opscenter-agent$ agent/bin/setup 10.1.0.26 10.1.0.23

Make sure you copy the agent file to ALL your other nodes and follow the same setup procedure (this is an example of how I copied the file, your system, ports etc. may be different), and repeat the steps above, with the appropriate IPs.


:~/datastax/opscenter-agent$ scp -P41718 agent.tar.gz bigdata@10.1.0.26:.
RSA key fingerprint is 2b:5b:26:03:87:a4:b1:ea:90:b5:4e:42:60:88:cd:d1.
bigdata@10.1.0.26's password: 
agent.tar.gz                                                                   100%   10MB  10.3MB/s   00:01    
:~/datastax/opscenter-agent$

Start up Ops Center

On the Ops Center machine, move back to it’s installed directory and start the process.


:~/datastax$ cd opscenter-1.4
~/datastax/opscenter-1.4$ bin/opscenter &

Now connect to the IP address and port and you should see a base Ops Center instance. This is what you would typically see before starting up your agents:

OpsCenter 2012-03-26 at 9.05.18 AM — DataStax Ops Center 1.4

Start up the Node Agents

The last step is to start up the Agent deamons so that the OpsCenter knows the status of each node.


:~/datastax/opscenter-1.4$ cd ../opscenter-agent/
:~/datastax/opscenter-agent$ agent/bin/opscenter-agent &
:~/datastax/opscenter-agent$  INFO [main] 2012-03-26 09:12:40,465 Loading conf files: conf/address.yaml
 INFO [main] 2012-03-26 09:12:40,505 Java vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_03
 INFO [main] 2012-03-26 09:12:40,505 Waiting for the config from OpsCenter
 INFO [main] 2012-03-26 09:12:40,637 SSL communication is enabled
 INFO [main] 2012-03-26 09:12:40,637 Creating stomp connection to 10.1.0.23:61620

With the Agents fired up, you will see a nice dashboard, showing the current status of the cluster, and some metrics on performance.

OpsCenter 2012-03-26 at 9.22.47 AM — Ops Center up and running.

Conclusion

This basically concludes the fast deployment steps required to download, install, configure and start up Cassandra, along with the DataStax Ops Center.

Total time required to deploy was under 4 hours.

Software Development, Technology

Cassandra – Getting Started – (deployment Part 1 – Installing Cassandra)

26 March 2012 David 2 Comments

It’s been almost a month since I started the Apache Cassandra investigation, and now it’s time to move into a production stance. Some of these steps will differ from the original steps documented here in my blog. Later this week I will go back and amend those posts to point at this post as the more recent information. Those old links are already being referenced by multiple sites, so deleting them would not be a kind thing to do. Thus.. onward we move!

Getting the right JVM/JDK/JRE

Originally, the OpenJDK was being used for this introduction and research into Cassandra. Being a proponent of Open Source, I was going to avoid the use of Oracle’s potentially proprietary JDK/JRE in this environment. I have since seen first had, that the JDK DOES IN FACT MATTER, and the one that supports the latest tools is the one from Oracle.

That is located here:

Java SE Development Kit 7 Downloads – download this if you plan to do any Java development.
Java SE Runtime Environment 7 Downloads – Download this if you just want to run Java programs, and do not plan to write your own.

Downloading the JRE/JDK from Oracle has enabled the reliable use of DataStax’s OpsCenter management tool (more on that later).

These are the recommended minimums for Cassandra and OpsCenter from DataStax, a respected partner of the Apache Cassandra project.

Sun Java Runtime Environment 1.6.0_19 or later
Python 2.5, 2.6, or 2.7
OpenSSL version listed in Configuring SSL unless you disable SSL

I ended up selecting the JDK (linked here) and deposited it in the following location on my system as user root (create the directory path if you don’t already have it):


/opt/java/64/jdk-7u3-linux-x64.tar.gz

Extract the file:


:/opt/java/64# tar -xvzf jdk-7u3-linux-x64.tar.gz
jdk1.7.0_03/
jdk1.7.0_03/include/
jdk1.7.0_03/include/jvmti.h
jdk1.7.0_03/include/jawt.h
[...]
jdk1.7.0_03/jre/plugin/desktop/sun_java.desktop
jdk1.7.0_03/jre/COPYRIGHT
jdk1.7.0_03/LICENSE
jdk1.7.0_03/COPYRIGHT
:/opt/java/64#

The Cassandra Build I decided to use is this one: apache-cassandra-1.1.0-beta1. I downloaded the file to the user I created for this using wget:


:~$ wget http://apache.deathculture.net/cassandra/1.1.0/apache-cassandra-1.1.0-beta1-bin.tar.gz
--2012-03-25 22:52:27--  http://apache.deathculture.net/cassandra/1.1.0/apache-cassandra-1.1.0-beta1-bin.tar.gz
Resolving apache.deathculture.net... 173.236.158.254
Connecting to apache.deathculture.net|173.236.158.254|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12505037 (12M) [application/x-gzip]
Saving to: `apache-cassandra-1.1.0-beta1-bin.tar.gz'

100%[=======================================================================>] 12,505,037  8.84M/s   in 1.3s    

2012-03-25 22:52:29 (8.84 MB/s) - `apache-cassandra-1.1.0-beta1-bin.tar.gz' saved [12505037/12505037]

Next the file is extracted, moved to a shorter directory name:


:~$ tar -xvzf apache-cassandra-1.1.0-beta1-bin.tar.gz
:~$ mv apache-cassandra-1.1.0-beta1 cass-beta1

Configuring a Node

Now the configuration is edited to define the node ring. The first file to edit is the cassandra.yaml file.

This initially will be only a 2 node cluster, but the tokens must still be calculated. Here are the node tokens I generated using a PERL script I wrote (see: Cassandra and Big Data – building a single-node “cluster” – Extra Credit for the code):


:~/cass-beta1$ ./token.pl 2
Calculate tokens for 2 nodes
factor = 170141183460469231731687303715884105728
node 0	token: 0
node 1	token: 85070591730234615865843651857942052864
:~/cass-beta1$

Edit the cluster name. I’m not testing, so I changed the name to one descriptive of the data I was storing. ‘ip’. In the example below, I’m showing configs for the 2nd of the two nodes. Note: The first node would have a different IP address and also a different initial token, in this case ‘0’, as calculated by the tool.


:~$ cd cass-beta1/
:~/cass-beta1$ vi conf/cassandra.yaml

[...]

# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'ip'

[...]

 If blank, Cassandra will request a token bisecting the range of
# the heaviest-loaded existing node.  If there is no load information
# available, such as is the case with a new cluster, it will pick
# a random token, which will lead to hot spots.
initial_token: 85070591730234615865843651857942052864

[...]

# directories where Cassandra should store data on disk.
data_file_directories:
    - /home/bigdata/data/

[...]

# commit log
commitlog_directory: /home/bigdata/commitlog/

[...]

# saved caches
saved_caches_directory: /home/bigdata/saved_caches/

[...]

          # seeds is actually a comma-delimited list of addresses.
          # Ex: ",,"
          - seeds: "10.1.100.101,10.1.100.102"
[...]

# Setting this to 0.0.0.0 is always wrong.
listen_address: 10.1.1.101

[...]

rpc_address: 10.1.1.101

[...]

# Time to wait for a reply from other nodes before failing the command (this was done to increase timeout to 30 seconds, sometimes the search I need to run is pretty nasty)
rpc_timeout_in_ms: 30000

Following that, the shell file needs to be modified to designate the JMX listening port:


:~/cass-beta1$ vi conf/cassandra-env.sh

[...]

# Specifies the default port over which Cassandra will be available for
# JMX connections.
JMX_PORT="8001"

[...]

Make sure your logfile is in the desired location. I decided to keep it within the account itself for now:


vi cassA-1.0.8/conf/log4j-server.properties
[...]

log4j.appender.R.File=/home/bigdata/log/cassA.log

[...]

Next I set the paths in the .bash configuration file for the account, using the following 3 environment variables (ANT_HOME is used by the ANT compiler, if you are not writing code, your JAVA_HOME will point at the JRE, not the JDK, and you won’t need the ANT_HOME path at all):


vi ~/.bash_profile
export JAVA_HOME=/opt/java/64/jdk1.7.0_03
export ANT_HOME=/usr/lib/ant/
export CASS_BIN=$HOME/cass-beta1/bin
export PATH=$PATH:$ANT_HOME/bin:$CASS_BIN

Systems Administration

Make sure there is a location for the cassandra server to write it’s log files. You’ll need your SysAdmin, or root privs, to do this. I set the ownership to root and the user under which I’m currently running cassandra (bigdata):


root:/data/feed/indata# cd /var/log
root:/var/log# mkdir cassandra
root:/var/log# chown root:bigdata cassandra
root:/var/log# chmod 775 cassandra

The following ports need to be opened up, if you are running a firewall on each system (you ARE, right!?!), to allow Cassandra nodes to communicate with each other. This is a snippet from my rules-based firewall control file.

Port Usage:

9160 – Thrift port, where the API is serviced for Reads/Writes to Cassandra
8001 – Individual node listening port. This is used for the command line (cli)
7000 – Commands and Data TCP port, used nodes for communications
7001 – SSL port used for storage communications
8888 – Only used on systems that will host an Ops Center installation
61620 – Required for Ops Center Agent Communications


## Cassandra
ACCEPT          loc             $FW             tcp     9160,8001,7000,7001
## OpsCenter
ACCEPT          loc             $FW             tcp     8888,61620

Starting up the Cluster

This is where the truth is told. The rubber meets the road. The money is placed where your mouth is. Light ’em up!


:~$ cassandra
:~$  INFO 23:52:54,232 Logging initialized
 INFO 23:52:54,236 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_03
 INFO 23:52:54,237 Heap size: 6291456000/6291456000
[...]
INFO 23:52:55,162 Node /10.1.0.23 state jump to normal
 INFO 23:52:55,163 Bootstrap/Replace/Move completed! Now serving reads.

IT LIVES!! Now start your other node(s), and check to verify you have a complete ring, properly configured. You should see something like this in subsequent nodes, I’ve highlighted the references to the other member node:


[...]
INFO 23:54:16,042 Node /10.1.0.23 has restarted, now UP
 INFO 23:54:16,043 InetAddress /10.1.0.23 is now UP
 INFO 23:54:16,043 Node /10.1.0.23 state jump to normal
 INFO 23:54:16,088 Compacted to [/home/bigdata/data/system/LocationInfo/system-LocationInfo-hc-6-Data.db,].  544 to 413 (~75% of original) bytes for 4 keys at 0.003425MB/s.  Time: 115ms.
 INFO 23:54:16,109 Completed flushing /home/bigdata/data/system/LocationInfo/system-LocationInfo-hc-5-Data.db (163 bytes)
 INFO 23:54:16,110 Node /10.1.0.26 state jump to normal
 INFO 23:54:16,111 Bootstrap/Replace/Move completed! Now serving reads.

Run nodetool:


:~$ nodetool -h10.1.0.23 -p 8001 ring
Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               85070591730234615865843651857942052864      
10.1.0.23      datacenter1 rack1       Up     Normal  17.77 KB        50.00%  0                                           
10.1.0.26      datacenter1 rack1       Up     Normal  17.66 KB        50.00%  85070591730234615865843651857942052864

WE HAVE A RING!

NEXT: SETTING UP OPS CENTER

Software Development, Technology

Drop keyspace using Cassandra Cli

18 March 2012 David Leave a comment

Dropping a an entire keyspace using the cassandra-cli is exceptionally simple.

First, access your cluster using the cli. I have an alias in my .bash_profile so I only need to type ‘cass’ to access the clid. In an attempt to be helpful though, I shall show the full command syntax for my environment. Your host and port may vary.


  alias cass='cassandra-cli -h 10.1.0.26'

In this example, I am going to drop the keyspace I was loading with test data in previous posts, ks33.


hpcass: ~$ cass
Connected to: "Test1" on 10.1.0.23/9160
Welcome to Cassandra CLI version 1.0.8

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

DROP keyspace ks33;

07ad5e00-7120-11e1-0000-13393ec611bd
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown]

That’s all there was to it. Keyspace destroyed.

Previous Cassandra related articles

Apache Cassandra Project – processing “Big Data” – blog
Cassandra and Big Data – building a single-node “cluster” – blog
Cassandra – Getting a 3 node cluster built – apps
Java build env to prepare for Cassandra development – apps
Re-Configuring an Empty Cassandra Cluster – blog
Cassandra – Running some simple tests, including a multi-get strategy. – blog
Creating a simple Utils class – apps
Cassandra DB Connetor in Java, using Thrift API – apps
Java multi-get demonstrator for Cassandra NoSQL db – apps

Software Development, Technology

Cassandra – Running some simple tests, including a multi-get strategy.

14 March 2012 David 5 Comments

PREV: Re-Configuring an Empty Cassandra Cluster

Time for the rubber to meet the road. Get some data loaded and validate the theoretical concepts garnered from the documentation consumed.

This is an record example (IP’s have been changed to protect the clueless):

      ip_key: 1598595809
          ip: 10.2.162.225
     prop_id: 1033
    property: Bad Stuff
      threat: 1
   attribute: suspicious
        meta: 10.25.112.7
    detected: 2012-01-05 15:17:14
detected_sec: 1325805434
    reported: 2012-01-06 01:44:02
reported_sec: 1325843042

Preliminary model concept centers around the IP, however with over 60,000,000 records there are overlaps, so a single IP is not going to survive as the primary key. Trying to get a distribution out of MySQL takes some time. Here are some distributions by key. Thousands of of events per IP, and this is just a short 1 month window:

+------------+--------+
| ip_dec     | events |
+------------+--------+
| 3158358206 |   2705 |
|  652542280 |   2506 |
| 3495573656 |   2089 |
| 3232235778 |   2015 |
| 1072721396 |   1528 |
|  652542281 |   1432 |
| 3232235876 |   1427 |
| 3448822506 |   1232 |
| 1280052209 |   1106 |
| 3232235779 |   1086 |
+------------+--------+

Now, Cassandra will support MILLIONS of column items on a single row, thus, this actually might work, and scale without using Super Column Families (SCFs). Using the detected time seconds as the column name with an attribute suffix, then enclosing the data in a JSON blob could provide the required results. Using the datekey as a secondary index across the columns, or using them as a time progression. Concepts that need to be tested, which precisely the task at hand.

Considering that a good detected time is not always available, and the data is processed in batches, there could be a heavy grouping of timestamps. If there are a variety of issues detected on a specific IP, at the same obfuscated time, loss of data will occur. This is certainly NOT the desired result. Given this, the datastamp is not unique enough for a hash structure datastore such as Cassandra, without using SCFs.

A structure such as this could deliver the required granularity:


ipstore[$ipkey][$timekey][$propkey] = JSON:{}, JSON:{}, JSON{}...  ;

To get started with loading data, wrote a quick test program in Java, compliled it and ran it:

test1.java – source code


public class test1 {
  public static void main (String [] args) {
    System.out.println("Cassandra Calling!");
  }
}

compiling….


java/src/loader1$ javac test1.java -d ../../class/.

executing…


/java/class$ java test1
Cassandra Calling!

Environment confirmed for compiling loader code. With a model in mind…


ipstore[$ipkey][$popkey][$timestamp] = JSON:{}

..and IP data to load,

ipp < get_a_million.sql > a_million_ips.dta
cass:~$ ls -l
126180075 2012-03-13 13:06 a_million_ips.dta

cass:~$ wc -l a_million_ips.dta
1000001 a_million_ips.dta

...next it's designing the schema builder and loader.

REFERENCE: Setting up a Java build env to prepare for Cassandra development

With the environment confirmed, and a test file (test1.java) written, execute and verify function:


cass:~$ ant -DclassToRun=test1 run
Buildfile: ./build.xml

[...]

run:
     [java] This is Java.... drink up!

VERIFIED.

To get moving forward, I created a Utilities class and a DB connector Class. You can look at the source code for those at these two links:

Util Source Code

Cassandra DB Connector Source Code

With the code done, need to perform a couple of house keeping tasks to get it prepared for loading.

Adding the ks33 keyspace


[default@unknown] create keyspace ks33:
c7944700-6e2e-11e1-0000-13393ec611bd
Waiting for schema agreement...
... schemas agree across the cluster

[default@unknown] use ks33;
Authenticated to keyspace: ks33

Adding the cf33 ColumnFamily to ks33 Keyspace:


[default@ks33] create column family cf33 with comparator = UTF8Type; 
2501f8b0-6e2f-11e1-0000-13393ec611bd
Waiting for schema agreement...
... schemas agree across the cluster

Next, to load 100 trial rows. Here is a link to the source code:

Source for useMultiGet (tba)


hpcass@feed0:~/cassIP/java/cBuild$ host=10.1.0.123 port=9160 inserts=100 ks=ks33 cf=cf33 ant -DclassToRun=c01.useMultiGet run
Buildfile: /home/hpcass/cassIP/java/cBuild/build.xml

init:

compile:
    [javac] Compiling 1 source file to /home/hpcass/cassIP/java/cBuild/build/classes

dist:
      [jar] Building jar: /home/hpcass/cassIP/java/cBuild/dist/lib/cassIP.jar

run:
     [java] get time   89062577
     [java] mget time 494039096

BUILD SUCCESSFUL

Here are some results from multi-get tests. It's actually the inverse of my hope, the multi-get seems to rapidly lose it's benefit.

5 Item Slices  (1000 item dataset)
=========================================================
run:                    RUN 1      RUN 2      RUN 3   
     [java] get time  339041199  436440551  358115310
     [java] mget time 172484370  174690508  182833140

10 Item Slices  (1000 item dataset)
=========================================================
run:                    RUN 1      RUN 2      RUN 3   
     [java] get time  346512511  332820479  314136351
     [java] mget time 394049160  251152592  234719383

25 Item Slices  (1000 item dataset)
=========================================================
run:                    RUN 1      RUN 2      RUN 3   
     [java] get time  335286775  293802010  295948562
     [java] mget time 464933443  324505741  312226035

What I didn't expect to see, based on the information in the 'High Performance
Cookbook, was rapid fall-off in performance, and in face in all cases in the
slices of size 25 inverted the performance, showing that it became worse.

2 Item Slices  (1000 item dataset)
=========================================================
run:                    RUN 1      RUN 2      RUN 3   
     [java] get time  285509637  331970814  317512021
     [java] mget time 104567639   96477512  124040195

One thing I didn't think of testing was doing a slice of size 1, and see if maybe part of the perceived performance in the lower slices is really cache hits. AH! Look at this, it looks like the *test* is highly suspect at best. I think this shows some evidence the performance 'benefit' of the multi-get is really a cache hit artifact from extracting the exact same data a second time:

host=10.1.0.123 port=9160 inserts=1000 ks=ks33 cf=cf33 slice=1 ant -DclassToRun=c01.useMultiGet run
Buildfile: /home/hpcass/cassIP/java/cBuild/build.xml

1 Item Slices  (1000 item dataset)
=========================================================
run:                    RUN 1      RUN 2      RUN 3   
     [java] get time  295158535  298466321  283438099
     [java] mget time 109982545  103658894   98260286

This demonstrator failure to perform, is not a failure in and of itself. It's provided useful information regarding some concepts recommended in some documentation, but may not really be a true best practice. I long ago developed a healthy skepticism of expert advice in lieu of verification.

Software Development, Technology

Re-Configuring an Empty Cassandra Cluster

12 March 2012 David 1 Comment

PREV: Setting up a Java build env to prepare for Cassandra development

After doing more research, I decided the Ordered Partitioning was not going to buy me anything but a lop-sided distribution. Looking at this (it’s a case of IP distributions, not hostnames as originally envisioned, that will be a later evaluation).

I’d have 3 very heavy nodes and 3 very light nodes. This is a distribution of real world data.

Node:  Range:                             Dist:    
====== ================================== ======  
node00         0.0.0.0 to 42.170.170.171     6 %  
node01  42.170.170.172 to 85.85.85.87       32 %  
node02     85.85.85.88 to 128.0.0.3         34 %  
node03       128.0.0.4 to 170.170.170.175    2 %  
node04 170.170.170.176 to 213.85.85.91      21 %  
node05    213.85.85.92 to 255.255.255.255    3 %

Goofing around with pseudo random key naming to get a better balance only does one thing, make the keys I wanted to use (IPs) basically worthless, so the ordering is wrecked regardless. Random partitioning is the default configuration for Cassandra, so, that’s what I plan to use. Problem is, I’d built out this specific node set with this setting first:

– ByteOrderedPartitioner orders rows lexically by key bytes. BOP allows scanning rows in key order, but the ordering can generate hot spots for sequential insertion workloads.

I re-set the configurations to use the default instead:

– RandomPartitioner distributes rows across the cluster evenly by md5. When in doubt, this is the best option.

After changing the configuration from ByteOrderedPartitioner to RandomPartitioner and restarting the first node.. I am greeted with this happy message:

ERROR 13:03:36,113 Fatal exception in thread Thread[SSTableBatchOpen:3,5,main]
java.lang.RuntimeException: Cannot open /home/hpcass/data/node00/system/Versions-hc-3 because partitioner does not match org.apache.cassandra.dht.RandomPartitioner

In fact I’m greeted with a lot of them. This is then followed by what looks like possibly.. normal startup messaging?

 INFO 13:03:36,166 Creating new commitlog segment /home/hpcass/commitlog/node00/CommitLog-1331586216166.log
 INFO 13:03:36,175 Couldn't detect any schema definitions in local storage.
 INFO 13:03:36,175 Found table data in data directories. Consider using the CLI to define your schema.
 INFO 13:03:36,197 Replaying /home/hpcass/commitlog/node00/CommitLog-1331328557751.log
 INFO 13:03:36,222 Finished reading /home/hpcass/commitlog/node00/CommitLog-1331328557751.log
 INFO 13:03:36,227 Enqueuing flush of Memtable-LocationInfo@1762056890(213/266 serialized/live bytes, 7 ops)
 INFO 13:03:36,228 Writing Memtable-LocationInfo@1762056890(213/266 serialized/live bytes, 7 ops)
 INFO 13:03:36,228 Enqueuing flush of Memtable-Versions@202783062(83/103 serialized/live bytes, 3 ops)
 INFO 13:03:36,277 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-16-Data.db (377 bytes)
 INFO 13:03:36,285 Writing Memtable-Versions@202783062(83/103 serialized/live bytes, 3 ops)
 INFO 13:03:36,357 Completed flushing /home/hpcass/data/node00/system/Versions-hc-4-Data.db (247 bytes)
 INFO 13:03:36,358 Log replay complete, 9 replayed mutations
 INFO 13:03:36,366 Cassandra version: 1.0.8
 INFO 13:03:36,366 Thrift API version: 19.20.0
 INFO 13:03:36,367 Loading persisted ring state
 INFO 13:03:36,384 Starting up server gossip
 INFO 13:03:36,386 Enqueuing flush of Memtable-LocationInfo@846275759(88/110 serialized/live bytes, 2 ops)
 INFO 13:03:36,386 Writing Memtable-LocationInfo@846275759(88/110 serialized/live bytes, 2 ops)
 INFO 13:03:36,440 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-17-Data.db (196 bytes)
 INFO 13:03:36,446 Starting Messaging Service on port 7000
 INFO 13:03:36,452 Using saved token 0
 INFO 13:03:36,453 Enqueuing flush of Memtable-LocationInfo@59584763(38/47 serialized/live bytes, 2 ops)
 INFO 13:03:36,454 Writing Memtable-LocationInfo@59584763(38/47 serialized/live bytes, 2 ops)
 INFO 13:03:36,556 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-18-Data.db (148 bytes)
 INFO 13:03:36,558 Node /10.1.0.23 state jump to normal
 INFO 13:03:36,558 Bootstrap/Replace/Move completed! Now serving reads.
 INFO 13:03:36,559 Will not load MX4J, mx4j-tools.jar is not in the classpath
 INFO 13:03:36,587 Binding thrift service to /10.1.0.23:9160
 INFO 13:03:36,590 Using TFastFramedTransport with a max frame size of 15728640 bytes.
 INFO 13:03:36,593 Using synchronous/threadpool thrift server on /10.1.0.23 : 9160
 INFO 13:03:36,593 Listening for thrift clients...

Despite the fatal errors, it does seem to have restarted the cluster with the new Partition engine:

Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               7169015515630842424558524306038950250903273734
10.1.0.27      datacenter1 rack1       Down   Normal  ?               93.84%  -2742379978670691477635174047251157095949195165
10.1.0.23      datacenter1 rack1       Up     Normal  15.79 KB        86.37%  0                                           
10.1.0.26      datacenter1 rack1       Down   Normal  ?               77.79%  896682280808232140910919391534960240163386913
10.1.0.24      datacenter1 rack1       Up     Normal  15.79 KB        53.08%  1927726543429020693034590137790785169819652674
10.1.0.25      datacenter1 rack1       Up     Normal  15.79 KB        35.85%  6138493926725652010223830601932265434881918085
10.1.0.28      datacenter1 rack1       Down   Normal  ?               53.08%  716901551563084242455852430603895025090327373

Starting up the other three nodes (example:)

 INFO 14:10:06,663 Node /10.1.0.25 has restarted, now UP
 INFO 14:10:06,663 InetAddress /10.1.0.25 is now UP
 INFO 14:10:06,664 Node /10.1.0.25 state jump to normal
 INFO 14:10:06,664 Node /10.1.0.24 has restarted, now UP
 INFO 14:10:06,665 InetAddress /10.1.0.24 is now UP
 INFO 14:10:06,665 Node /10.1.0.24 state jump to normal
 INFO 14:10:06,666 Node /10.1.0.23 has restarted, now UP
 INFO 14:10:06,667 InetAddress /10.1.0.23 is now UP
 INFO 14:10:06,668 Node /10.1.0.23 state jump to normal
 INFO 14:10:06,760 Completed flushing /home/hpcass/data/node01/system/LocationInfo-hc-18-Data.db (166 bytes)
 INFO 14:10:06,762 Node /10.1.0.26 state jump to normal
 INFO 14:10:06,763 Bootstrap/Replace/Move completed! Now serving reads.
 INFO 14:10:06,764 Will not load MX4J, mx4j-tools.jar is not in the classpath
 INFO 14:10:06,862 Binding thrift service to /10.1.0.26:9160

Re-checking the ring displays:

Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               7169015515630842424558524306038950250903273734
10.1.0.27      datacenter1 rack1       Up     Normal  11.37 KB        93.84%  -2742379978670691477635174047251157095949195165
10.1.0.23      datacenter1 rack1       Up     Normal  15.79 KB        86.37%  0                                           
10.1.0.26      datacenter1 rack1       Up     Normal  18.38 KB        77.79%  896682280808232140910919391534960240163386913
10.1.0.24      datacenter1 rack1       Up     Normal  15.79 KB        53.08%  1927726543429020693034590137790785169819652674
10.1.0.25      datacenter1 rack1       Up     Normal  15.79 KB        35.85%  6138493926725652010223830601932265434881918085
10.1.0.28      datacenter1 rack1       Up     Normal  15.79 KB        53.08%  7169015515630842424558524306038950250903273734

Switching partition engine appears to be easy enough. What I suspect however (and I’ve not confirmed this, is that the data would have been compromised or likely destroyed in this process. The documentation I’ve read so far indicated that you could not do this. Once setup with a specific partitioning engine that cluster was bound to it.

My conclusion is that if you have not yet started to saturate your cluster with data, and you wish to change the partitioning engine, it would appear that the right time to do it is now.. before you start to load data.

I plan to test this theory later after the first trial data load to see if in fact it mangles the information. More to follow!

UPDATE!

Despite the information that I thought nodetool was telling me, my cluster was unusable because of the partitioner change. What is the last step required to change partition? NUKE THE DATA. Unfun.. but that is what I need to do.

Having 6 nodes means 6 times the fun. Here is the kicker though, I’ll just move the data aside and re-construct, and that will let me swap it back in if I decided to go back and forth testing the impacts of Random vs. Ordered for my needs. Will I get away with this? I don’t know. That won’t stop me from trying!

The data was stored in ~/data/node00 (node## etc.). This is all I did:


mv data/node00 data/node00-bop       # bop = btye order partition.

Restarted node00:


hpcass:~/nodes$ node00/bin/cassandra -f
 INFO 16:38:46,525 Logging initialized
 INFO 16:38:46,529 JVM vendor/version: OpenJDK 64-Bit Server VM/1.6.0_0
 INFO 16:38:46,529 Heap size: 6291456000/6291456000
 INFO 16:38:46,529 Classpath: node00/bin/../conf:node00/bin/../build/classes/main:node00/bin/../build/classes/thrift:node00/bin/../lib/antlr-3.2.jar:node00/bin/../lib/apache-cassandra-1.0.8.jar:node00/bin/../lib/apache-cassandra-clientutil-1.0.8.jar:node00/bin/../lib/apache-cassandra-thrift-1.0.8.jar:node00/bin/../lib/avro-1.4.0-fixes.jar:node00/bin/../lib/avro-1.4.0-sources-fixes.jar:node00/bin/../lib/commons-cli-1.1.jar:node00/bin/../lib/commons-codec-1.2.jar:node00/bin/../lib/commons-lang-2.4.jar:node00/bin/../lib/compress-lzf-0.8.4.jar:node00/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:node00/bin/../lib/guava-r08.jar:node00/bin/../lib/high-scale-lib-1.1.2.jar:node00/bin/../lib/jackson-core-asl-1.4.0.jar:node00/bin/../lib/jackson-mapper-asl-1.4.0.jar:node00/bin/../lib/jamm-0.2.5.jar:node00/bin/../lib/jline-0.9.94.jar:node00/bin/../lib/json-simple-1.1.jar:node00/bin/../lib/libthrift-0.6.jar:node00/bin/../lib/log4j-1.2.16.jar:node00/bin/../lib/servlet-api-2.5-20081211.jar:node00/bin/../lib/slf4j-api-1.6.1.jar:node00/bin/../lib/slf4j-log4j12-1.6.1.jar:node00/bin/../lib/snakeyaml-1.6.jar:node00/bin/../lib/snappy-java-1.0.4.1.jar
 INFO 16:38:46,531 JNA not found. Native methods will be disabled.
 INFO 16:38:46,538 Loading settings from file:/home/hpcass/nodes/node00/conf/cassandra.yaml
 INFO 16:38:46,635 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
 INFO 16:38:46,645 Global memtable threshold is enabled at 2000MB
 INFO 16:38:46,839 Creating new commitlog segment /home/hpcass/commitlog/node00/CommitLog-1331599126839.log
 INFO 16:38:46,848 Couldn't detect any schema definitions in local storage.
 INFO 16:38:46,849 Found table data in data directories. Consider using the CLI to define your schema.
 INFO 16:38:46,863 Replaying /home/hpcass/commitlog/node00/CommitLog-1331597615041.log
 INFO 16:38:46,887 Finished reading /home/hpcass/commitlog/node00/CommitLog-1331597615041.log
 INFO 16:38:46,892 Enqueuing flush of Memtable-LocationInfo@1834491520(98/122 serialized/live bytes, 4 ops)
 INFO 16:38:46,893 Enqueuing flush of Memtable-Versions@875509103(83/103 serialized/live bytes, 3 ops)
 INFO 16:38:46,894 Writing Memtable-LocationInfo@1834491520(98/122 serialized/live bytes, 4 ops)
 INFO 16:38:47,001 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-1-Data.db (208 bytes)
 INFO 16:38:47,009 Writing Memtable-Versions@875509103(83/103 serialized/live bytes, 3 ops)
 INFO 16:38:47,057 Completed flushing /home/hpcass/data/node00/system/Versions-hc-1-Data.db (247 bytes)
 INFO 16:38:47,057 Log replay complete, 6 replayed mutations
 INFO 16:38:47,066 Cassandra version: 1.0.8
 INFO 16:38:47,066 Thrift API version: 19.20.0
 INFO 16:38:47,067 Loading persisted ring state
 INFO 16:38:47,070 Starting up server gossip
 INFO 16:38:47,091 Enqueuing flush of Memtable-LocationInfo@952443392(88/110 serialized/live bytes, 2 ops)
 INFO 16:38:47,092 Writing Memtable-LocationInfo@952443392(88/110 serialized/live bytes, 2 ops)
 INFO 16:38:47,141 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-2-Data.db (196 bytes)
 INFO 16:38:47,149 Starting Messaging Service on port 7000
 INFO 16:38:47,155 Using saved token 0
 INFO 16:38:47,157 Enqueuing flush of Memtable-LocationInfo@1623810826(38/47 serialized/live bytes, 2 ops)
 INFO 16:38:47,157 Writing Memtable-LocationInfo@1623810826(38/47 serialized/live bytes, 2 ops)
 INFO 16:38:47,237 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-3-Data.db (148 bytes)
 INFO 16:38:47,239 Node /10.1.0.23 state jump to normal
 INFO 16:38:47,240 Bootstrap/Replace/Move completed! Now serving reads.
 INFO 16:38:47,241 Will not load MX4J, mx4j-tools.jar is not in the classpath
 INFO 16:38:47,269 Binding thrift service to /10.1.0.23:9160
 INFO 16:38:47,272 Using TFastFramedTransport with a max frame size of 15728640 bytes.
 INFO 16:38:47,274 Using synchronous/threadpool thrift server on /10.1.0.23 : 9160
 INFO 16:38:47,275 Listening for thrift clients...

^Z
[1]+  Stopped                 node00/bin/cassandra -f
hpcass:~/nodes$ bg
[1]+ node00/bin/cassandra -f &

With the process backgrounded, checked the files in the new data directory for my node:


hpcass:~/data/node00$ ls -1 system
LocationInfo-hc-1-Data.db
LocationInfo-hc-1-Digest.sha1
LocationInfo-hc-1-Filter.db
LocationInfo-hc-1-Index.db
LocationInfo-hc-1-Statistics.db
LocationInfo-hc-2-Data.db
LocationInfo-hc-2-Digest.sha1
LocationInfo-hc-2-Filter.db
LocationInfo-hc-2-Index.db
LocationInfo-hc-2-Statistics.db
LocationInfo-hc-3-Data.db
LocationInfo-hc-3-Digest.sha1
LocationInfo-hc-3-Filter.db
LocationInfo-hc-3-Index.db
LocationInfo-hc-3-Statistics.db
Versions-hc-1-Data.db
Versions-hc-1-Digest.sha1
Versions-hc-1-Filter.db
Versions-hc-1-Index.db
Versions-hc-1-Statistics.db

Following that clearing and rebuild, I see the node tool results look a lot better:

hpcass@feed0:~/nodes$ cass00/bin/nodetool -h localhost ring
Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               6138493926725652010223830601932265434881918085
10.1.0.23      datacenter1 rack1       Up     Normal  15.68 KB        33.29%  0                                           
10.1.0.24      datacenter1 rack1       Up     Normal  18.34 KB        30.87%  1927726543429020693034590137790785169819652674
10.1.0.25      datacenter1 rack1       Up     Normal  18.34 KB        35.85%  6138493926725652010223830601932265434881918085

After resetting the old numerated nodes, I had a complete disaster! Negative node tokens? How did that happen? Restarts did nothing to fix this either.

Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               7169015515630842424558524306038950250903273734
10.1.0.27      datacenter1 rack1       Up     Normal  15.79 KB        93.84%  -2742379978670691477635174047251157095949195165
10.1.0.23      datacenter1 rack1       Up     Normal  15.79 KB        86.37%  0                                           
10.1.0.26      datacenter1 rack1       Up     Normal  15.79 KB        77.79%  896682280808232140910919391534960240163386913
10.1.0.24      datacenter1 rack1       Up     Normal  15.79 KB        53.08%  1927726543429020693034590137790785169819652674
10.1.0.25      datacenter1 rack1       Up     Normal  15.79 KB        35.85%  6138493926725652010223830601932265434881918085
10.1.0.28      datacenter1 rack1       Up     Normal  15.79 KB        53.08%  7169015515630842424558524306038950250903273734

To resolve this, I simply re-ran my token generator to get a new set of tokens:

node00	10.1.0.23  token: 0
node01	10.1.0.26  token: 28356863910078205288614550619314017621
node02	10.1.0.24  token: 56713727820156410577229101238628035242
node03	10.1.0.27  token: 85070591730234615865843651857942052863
node04	10.1.0.25  token: 113427455640312821154458202477256070485
node05	10.1.0.28  token: 141784319550391026443072753096570088106

Followed by manually setting the tokens in the ring:

bin/nodetool -h 10.1.0.24 move 56713727820156410577229101238628035242
bin/nodetool -h 10.1.0.25 move 113427455640312821154458202477256070485

bin/nodetool -h 10.1.0.26 move 28356863910078205288614550619314017621
bin/nodetool -h 10.1.0.27 move 85070591730234615865843651857942052863
bin/nodetool -h 10.1.0.28 move 141784319550391026443072753096570088106

This.. gave me the results I was expecting!

Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               141784319550391026443072753096570088106     
10.1.0.23      datacenter1 rack1       Up     Normal  24.95 KB        16.67%  0                                           
10.1.0.26      datacenter1 rack1       Up     Normal  20.72 KB        16.67%  28356863910078205288614550619314017621      
10.1.0.24      datacenter1 rack1       Up     Normal  25.1 KB         16.67%  56713727820156410577229101238628035242      
10.1.0.27      datacenter1 rack1       Up     Normal  13.38 KB        16.67%  85070591730234615865843651857942052863      
10.1.0.25      datacenter1 rack1       Up     Normal  25.1 KB         16.67%  113427455640312821154458202477256070485     
10.1.0.28      datacenter1 rack1       Up     Normal  25.14 KB        16.67%  141784319550391026443072753096570088106

Now, the question of actually connecting to the cluster can be answered. Pick one of the nodes and ports to connect too. I picked node00 on .23 (cli defaulted to port 9160 so I didn’t have to specify that):

node00/bin/cassandra-cli -h 10.1.0.23 
Connected to: "test-ip" on 10.1.0.23/9160
Welcome to Cassandra CLI version 1.0.8

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

The big problem I had, was that the cli never did seem to respond. The trick is to end your command with a semi-colon. That might seem obvious to you, and generally obvious to me.. but I’d not seen the docs actually call out that little FACT.

[default@unknown] show cluster name;
test-ip

Created a test column family from the helpful Cassandra Wiki.

create keyspace Twissandra;
Keyspace names must be case-insensitively unique ("Twissandra" conflicts with "Twissandra")
[default@unknown] 
[default@unknown] 
[default@unknown] create column family User with comparator = UTF8Type;
Not authenticated to a working keyspace.
[default@unknown] use Twissandra;
Authenticated to keyspace: Twissandra
[default@Twissandra] create column family User with comparator = UTF8Type;
adf453a0-6cb0-11e1-0000-13393ec611bd
Waiting for schema agreement...
... schemas agree across the cluster
[default@Twissandra]

AND WE’RE OFF!! Next article will cover actually finishing up this last test and then adding real data. MORE TO COME!!

NEXT: Cassandra – A Use case examined (IP data)

Entrepreneurship, Software Development, Technology

Setting up a Java build env to prepare for Cassandra development

3 March 2012 David 5 Comments

Getting it all ready…
PREV: Cassandra and Big Data – building a single-node “cluster”

First, I wanted to see how much of a system footprint 3 instances of Cassandra had on this little system. Here you can see the 3 instances patiently waiting for something todo. Sitting idle for about 24 hours (note, TIME+ is system time, not wall clock), total memory utilization has crept up from 11% to 14% per process.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4554 bigdata 20 0 891m 134m 4388 S 7.6 14.4 7:47.96 java 4632 bigdata 20 0 917m 133m 4340 S 0.7 14.3 7:45.64 java 4593 bigdata 20 0 896m 133m 4168 S 0.3 14.3 7:40.37 java

Keep in mind this test box has a single core CPU with a whopping 1GB of memory. If I can get it to work on this box without pushing it over, you should be able to run a single instance on any box with a reasonable expectation of function.

The data model I wanted to use is pretty basic: IP traffic, consisting of the following elements:

* IPv4 address
* destination port
* timestamp
* TTL (this is a Cassandra construct to allow auto-tombstoning of data when it’s usefulness has expired)

To get this data, I’m thinking of simply running TCPdump on a box, or possibly my laptop, to generate some traffic, then stream that into a program to insert into Cassandra as fast as the packets go by.

With the limited disk space on the box (see below) I can’t run it indefinitely, but I should be able to run it for an afternoon to load a keyspace, then start to figure out how to get the data back out!

Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 75956320 4344788 67753152 7% / none 470324 640 469684 1% /dev none 478024 420 477604 1% /dev/shm none 478024 108 477916 1% /var/run none 478024 0 478024 0% /var/lock

One thing I could do is load the data into the database, then run a 2nd pass processor on it and mutate the data with reverse lookups. Sort of a poor-man’s Wireshark type of tool. Now, if I wire this into my eventually to be setup RPZ enabled DNS resolver, I could track all data on my network, including all the requests from my Apple TV device. It might be interesting to see what it’s *really* doing on the network.

Downloading Support Packages for Development Environment

Before staring to code though, it looks like I need to ensure my JDK / Java libs are all up to date… and also to facilitate working with the documentation I’m reviewing.. Apache ANT will be installed too.

Java JDK – Java Software Development Kit

The JDK is a development environment for building applications, applets, and components using the Java programming language.
The JDK includes tools useful for developing and testing programs written in the Java programming language and running on the Java&™; platform.

Package URL: http://download.oracle.com/otn-pub/java/jdk/7u3-b04/jdk-7u3-linux-x64.tar.gz

mkdir jdk cd jdk wget http://download.oracle.com/otn-pub/java/jdk/7u3-b04/jdk-7u3-linux-x64.tar.gz

Extract the package:
tar xvzf jdk-7u3-linux-x64.tar.gz

Although I could simply run the JDK from the local user location, I decided to go for the ‘System Install’ option, and created a jdk location in user/lib, then copied the parts there according to the info in the docs. In this case I just downloaded the JRE again… you could skip that step and copy the .gz file already downloaded above. Your call.

sudo mkdir /usr/lib/jdk cd /usr/lib/jdk sudo wget http://download.oracle.com/otn-pub/java/jdk/7u3-b04/jdk-7u3-linux-x64.tar.gz sudo tar xvzf jdk-7u3-linux-x64.tar.gz sudo rm jdk-7u3-linux-x64.tar.gz

Oracle’s page says that it’s now ‘installed’ but I suspect there are a more than a few more steps required here! This is almost as good as Oracle technical support… I’ll try to be a little more helpful.

Setting the path in my ~/.bash_profile will resolve the path issue for Ant and JUnit. This is what I set in my file:
export JAVA_HOME=/usr/lib/jdk/jdk1.7.0_03

ANT – Apache Ant

Apache Ant is a Java library and command-line tool whose mission is to drive processes described in build files as targets and extension points dependent upon each other. The main known usage of Ant is the build of Java applications. Ant supplies a number of built-in tasks allowing to compile, assemble, test and run Java applications. Ant can also be used effectively to build non Java applications, for instance C or C++ applications. More generally, Ant can be used to pilot any type of process which can be described in terms of targets and tasks.

Package URL: http://www.carfab.com/apachesoftware//ant/binaries/apache-ant-1.8.3-bin.tar.gz

mkdir ant cd ant wget http://www.carfab.com/apachesoftware//ant/binaries/apache-ant-1.8.3-bin.tar.gz

Extract the package:

tar xvzf apache-ant-1.8.3-bin.tar.gz

Docs inside Ant say to go back to the web and read the installation instructions, located here: http://ant.apache.org/manual/install.html#installing I happen to like where my ant stuff was installed so I’m going to set ANT_HOME in my ~/.bash_profile to the location where I extracted the stuff. Ideal? Probably not but I’m doing this research on a perfectly good Saturday.. you get what you’re paying for.

export ANT_HOME=/home/bigdata/ant/apache-ant-1.8.3 export PATH=$PATH:$ANT_HOME/bin

Testing to see if the paths and parts are there worked. This error is actually expected (we’ll write the build.xml later).
$ ant Buildfile: build.xml does not exist! Build failed

JUnit – Test framework for test based development

JUnit is a simple framework to write repeatable tests. It is an instance of the xUnit architecture for unit testing frameworks.

mkdir junit cd junit wget https://github.com/downloads/KentBeck/junit/junit-4.10.jar wget https://github.com/downloads/KentBeck/junit/junit4.10.zip

Extract the source package, in case I need it:

unzip junit4.10.zip

I can’t say this is the best way to do this, it’s cookie-cutter implementation from documentation. If you see something that does not make sense or is flat out stupid, post comment and let me know!

Development Environment Setup

Create primary development folder and expected sub-folders. You’re naming conventions may vary:

mkdir cBuild mkdir cBuild/src mkdir cBuild/src/{java,test} mkdir cBuild/lib

Populate the lib with libraries from the Cassandra distribution and Junit.

cp cassA-1.0.8/lib/*.jar cBuild/lib/. cp junit/*.jar cBuild/lib/.

To employ JUnit testing harness via Ant Java builder, a build.xml file is required in the cBuild base directory. Here are sample contents. You’re paths may differ if you went your own way on the directories.

vi cBuild/build.xml


<project name="jCas" default="dist" basedir=".">
  <property name="src" location="src/java"/>
  <property name="test.src" location="src/test"/>
  <property name="build" location="build"/>
  <property name="build.classes" location="build/classes"/>
  <property name="test.build" location="build/test"/>
  <property name="dist" location="dist"/>
  <property name="lib" location="lib"/>
  <!-- Tags used by Ant to help build paths, most useful when multiple .jar files are required --> 
  <path id="jCas.classpath">
    <pathelement location="${build.classes}"/>
    <fileset dir="${lib}" includes="*.jar"/>
  </path>
  <!-- exclude test cases from the final .jar file, this defines that policy -->
  <path id="jCas.test.classpath">
     <pathelement location="${test.build}"/>
     <path refid="jCas.classpath"/>
  </path>
  <!-- Define the 'init' target, used by other build phases -->
  <target name="init">
    <mkdir dir="${build}"/>
    <mkdir dir="${build.classes}"/>
    <mkdir dir="${test.build}"/>
  </target>
  <!-- 'compile' target -->
  <target name="compile" depends="init">
    <javac srcdir="${src}" destdir="${build.classes}">
       <classpath refid="jCas.classpath"/>
    </javac>
  </target>
  <!-- 'test compile' target -->
  <target name="compile-test" depends="init">
    <javac srcdir="${test.src}" destdir="${test.build}">
      <classpath refid="jCas.test.classpath"/>
    </javac>
  </target>
  <!-- setup policies that tell JUnit to execute tests on files in test that end with .class -->
  <target name="test" depends="compile-test,compile">
    <junit printsummary="yes" showoutput="true">
      <classpath refid="jCas.test.classpath"/>
      <batchtest>
        <fileset dir="${test.build}" includes="**/Test.class"/>
      </batchtest>
    </junit>
  </target>
  <!-- on a good build, dist target creates final JAR  jCas.tar -->
  <target name="dist" depends="compile">
    <mkdir dir="${dist}/lib"/>
    <jar jarfile="${dist}/lib/jCas.jar" basedir="${build.classes}"/>
  </target>
  <!-- run target allows execution of the built classes -->
  <target name="run" depends="dist">
    <java classname="${classToRun}">
      <classpath refid="jCas.classpath"/>
    </java>
  </target>
  <!-- clean target gets rid of all the left over files from builds -->
  <target name="clean">
    <delete dir="${build}"/>
    <delete dir="${dist}"/>
  </target>
</project>

NOTE!. There is a bug in Ant 1.8 that requires the addition of this element, or you will be plagued with nasty warnings:

...

This is the proper way to modify that above two javac blocks to include these element:


<javac srcdir="${src}" destdir="${build.classes} includeantruntime="false"">
<javac srcdir="${test.src}" destdir="${test.build}" includeantruntime="false">

Testing this build environment
Having created the build.xml file, it needs to be tested to make sure it even works.

Create a test case and build case


cd cBuild/src
vi Test.java

import junit.framework.*;
public class Test extends TestCase {
  public void test() {
    assertEquals( "Equality Test", 0, 0);
  }
}

Create a really simple program..


vi X1.java

public class X1 {
  public static void main (String [] args) {
    System.out.println("This is Java.... drink up!");
  }
}

Now the rubber meets the road if everything is setup properly and we can build a file!

Run ant with target set to 'test'


~/cBuild$ ant test
Buildfile: /home/bigdata/cBuild/build.xml

init:

compile-test:

compile:
[javac] Compiling 1 source file to /home/bigdata/cBuild/build/classes

test:

BUILD SUCCESSFUL
Total time: 7 seconds

Run ant with target set to 'diet'


~/cBuild$ ~/cBuild$ ant dist
Buildfile: /home/bigdata/cBuild/build.xml

init:

compile:

dist:
[jar] Building jar: /home/bigdata/cBuild/dist/lib/jCas.jar

BUILD SUCCESSFUL
Total time: 1 second

It's a good idea to check your .jar to make sure your class is actually in it. Ant, for some reason beyond understanding or logic, WON'T let you know if your lib was skipped (had it happen in my first build.. exceptionally ungood).

~/cBuild$ jar -tf dist/lib/jCas.jar META-INF/ META-INF/MANIFEST.MF X1.class

As you can see, there is no Whiskey, but X1 is in the jar.

RUN!!!


~/cBuild$ ant -DclassToRun=X1 run
Buildfile: /home/bigdata/cBuild/build.xml

init:

compile:

dist:

run:
[java] This is Java.... drink up!

BUILD SUCCESSFUL
Total time: 1 second

SUCCESS!!!

All told it took me about 3 1/2 hours to get this setup, parts installed, these notes written up and a SIMPLE Java program executed. So.. let your own expectations accordingly. Hopefully you'll save a lot of time with the build.xml file.. I typed that in char for char. You could just do a cut-paste, fix up anything you don't like in my path names and let it rip.

Good luck.. more to follow on Cassandra!!! (even though this post was more about getting ready to write code to access it).

NEXT: Re-Configuring an Empty Cassandra Cluster

Entrepreneurship, Software Development, Technology

Cassandra and Big Data – building a single-node “cluster”

2 March 2012 David 5 Comments

Cassandra – Getting off the ground.
Continuation of post: Apache Cassandra Project – processing “Big Data”

While researching a project on Big Data services, I knew that I’d need a multi-node cluster to experiment with, but a pile of hardware was not immediately available.

Using the VERY helpful book Cassandra High Performance Cookbook I was able to build a 3 node cluster on a single machine. This is how I did it:

For this cluster test example, I am using Ubunto 10, with following JVM


      JVM vendor/version: OpenJDK 64-Bit Server VM/1.6.0_22

Downloaded Cassandra 1.0.8 package from here:
http://apache.mirrors.tds.net//cassandra/1.0.8/apache-cassandra-1.0.8-bin.tar.gz

Created new user on system: bigdata

Create the required base data directories


  $ mkdir commitlog,log,data,saved_caches

Moved that package there and started the build


$ cp /tmp/apache-cassandra-1.0.8-bin.tar.gz .

Unzipped and extracted the contents


$ gunzip apache-cassandra-1.0.8-bin.tar.gz
$ tar xvf apache-cassandra-1.0.8-bin.tar

Moved the long directory name to first instance cassA-1.0.8


$ mv apache-cassandra-1.0.8 cassA-1.0.8

Extracted again and renamed this to the other two planned instances:


$ tar xfv apache-cassandra-1.0.8-bin.tar
$ mv apache-cassandra-1.0.8 cassB-1.0.8  

$ tar xfv apache-cassandra-1.0.8-bin.tar
$ mv apache-cassandra-1.0.8 cassC-1.0.8

This gave me three packages to build, and each with a unique IP


  cassA-1.0.8   10.1.1.101
  cassB-1.0.8   10.1.1.102
  cassC-1.0.8   10.1.1.103

Edit configuration files in each instance (casaA-1.0.8 used as example:)


$ vi cassA-1.0.8/conf/cassandra.yaml 

[...]

# directories where Cassandra should store data on disk.
data_file_directories: 
    - /home/bigdata/data/cassA

# commit log
commitlog_directory: /home/bigdata/commitlog/cassA

# saved caches
saved_caches_directory: /home/bigdata/saved_caches/cassA

[...]

# If blank, Cassandra will request a token bisecting the range of
# the heaviest-loaded existing node.  If there is no load information
# available, such as is the case with a new cluster, it will pick
# a random token, which will lead to hot spots.
initial_token: 0

[...]

# Setting this to 0.0.0.0 is always wrong.
listen_address: 10.1.1.101

[...]

rpc_address: 10.1.1.101

[...]

          # seeds is actually a comma-delimited list of addresses.
          # Ex: ",,"
          - seeds: "10.1.100.101,10.1.100.102,10.1.100.103"
[...]

Setting a separate logfile is recommended. Edit config to set separate log


vi cassA-1.0.8/conf/log4j-server.properties

[...]
log4j.appender.R.File=/home/bigdata/log/cassA.log
[...]

Repeat for instances cassB and cassC, setting the token value for B and C to appropriate values (see Extra Credit below if you need to know how to do *that* part):


#cassB
initial_token: 56713727820156410577229101238628035242

#cassC
initial_token: 113427455640312821154458202477256070485

To enable the JMX management console, each instance will require it’s own port. Edit the env file to set that up.


vi cassA-1.0.8/conf/cassandra-env.sh

[...]
# Specifies the default port over which Cassandra will be available for
# JMX connections.
JMX_PORT="8001"
[...]

Repeated for the other two instances, defining 8002 and 8003 respectively.

Now, for the final trick, start up the instances:


  cassA-1.0.8/bin/cassandra
  cassB-1.0.8/bin/cassandra
  cassC-1.0.8/bin/cassandra

Cluster elements started up, and they can be seen active in the process table here:


$ ps -lf
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 S bigdata   4554     1  2  80   0 - 226846 futex_ 12:13 pts/0   00:00:05 java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=
0 S bigdata   4593     1  2  80   0 - 210824 futex_ 12:13 pts/0   00:00:05 java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=
0 S bigdata   4632     1  2  80   0 - 226830 futex_ 12:13 pts/0   00:00:05 java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=
0 R bigdata   5047  3054  0  80   0 -  5483 -      12:16 pts/0    00:00:00 ps -lf

Finally, to check the status, connect to of the JMX node ports and check the ring. You only need to connect to one of the cluster’s nodes to check the complete cluster’s status:


$ bin/nodetool -h 10.1.100.101 -port 8001 ring
Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               113427455640312821154458202477256070485     
10.1.100.101    datacenter1 rack1       Up     Normal  21.86 KB        33.33%  0                                           
10.1.100.102    datacenter1 rack1       Up     Normal  20.28 KB        33.33%  56713727820156410577229101238628035242      
10.1.100.103    datacenter1 rack1       Up     Normal  29.1 KB         33.33%  113427455640312821154458202477256070485

Now, that’s a functional 3 instance cluster running on a single node. These are not in separate VMs, and if you wanted to experiment with this on a larger cluster, running multiple instances on multiple VM’s on a single hypervisor.. I don’t really see why you cannot!

In the next article, I’m going to start feeding data into the cluster. Stay tuned for that!

Extra Credit:

To create the token value I needed for this three ring cluster, I used the following PERL script. BTW, bignum is required unless you want PERL printing these big numbers in scientific notation:


#!/usr/bin/perl
use bignum;
my $nodes = shift;
print "Calculate tokens for $nodes nodes\n";
print "node 0\ttoken: 0\n" unless $nodes;
exit unless $nodes;
my $factor = 2**127;
print "factor = $factor\n";
for (my $i=0;$i<$nodes;$i++) {
	my $token = $i * ( $factor / $nodes);
	print "node $i\ttoken: $token\n";
}

Running the script for three nodes gave me the following results:


$ ./maketokens.pl  3

Calculate tokens for 3 nodes
factor = 170141183460469231731687303715884105728
node 0	token: 0
node 1	token: 56713727820156410577229101238628035242.67
node 2	token: 113427455640312821154458202477256070485.34

Additional Comments:

If you are setting up a standard mutli-box cluster, make sure you have the following ports opened up on any firewalls. If not, the cluster members wont' find each other:


# TCP port, for commands and data
storage_port: 7000

# SSL port, for encrypted communication.  Unused unless enabled in
# encryption_options
ssl_storage_port: 7001

NEXT: Setting up a Java build env to prepare for Cassandra development

Software Development, Technology

Apache Cassandra Project – processing “Big Data”

17 February 2012 David 3 Comments

Being an old-school OSS’er, MySql has been my go-to DB for data storage since the turn of the century. It’s great, I love it (mostly) but it does have it’s drawbacks. Largest of which is it’s now owned by Oracle which does a HORRIBLE JOB of supporting it. I have personal experience with this, as the results of a recent issue with InnoDB and MySQL.

In the mean time, some of the hot-shot up-and-commers in another department have been facing their own data processing challenges (with MySql and other DB’s), and have started to look at some highly scalable alternatives. One of the front-runners right now is Apache’s Cassandra database project.

The synopsis from the page is (as would be most marketing verbiage) very encouraging!

The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra’s support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.

This sounds too good to be true. Finally a solution that we might be able to implement and grow, and one that doe not have the incredibly frustrating drawback of InnoDB and MySql’s fragile replication architecture. I’ve found out exactly how fragile it is, despite have a cluster of high-speed specially designed DB servers, the amount of down time we had was NOT ACCEPTABLE!).

With a charter to handle ever growing amounts of data and the need for ultimate availability and reliability, an alternative to MySQL is almost certainly required.

Of the items discussed on the main page, this one really hits home and stands out to me:

Fault Tolerant

Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.

I recently watched a video from the 2011 Cassandra Conference in San Francisco. A lot of good information shared. This video is available on the Cassandra home page. I recommend muscling through the marketing BS as the beginning and take in what they cover.

Job graph for ‘Big Data’ is skyrocketing.

Demand for Cassandra experts is also skyrocketing.

Big data players are using Cassandra.

It’s a known issue that RDBM’s (ex. MySql) have serious limitations (no kidding).

RDBM’s generally have an 8GB cache limit (this is interesting, and would explain some issues we’ve had with scalability in our DB severs, which have 64GB of memory).

The notion that Cassandra does not have good read speed, is a fallacy. Version 0.8 read speed is at parity of the already considered fast 0.6 write speed. Fast!?

No global or even low-level write locks. The column swap architecture alleviates the need for these locks, this allows high-speed writes.

Quorum reads and writes are consistent across the distribution.

New feature of local LOCAL_QUORUM allows quorums to be established from only the local nodes, alleviating latency waiting for a quorum including remote nodes in other geographic locations.

Cassandra uses XML files for schema modifications. In version 0.7 provides new features to allow on-line schema updates.

CLI for Cassandra is now very powerful.

Has a SQL language capability (yes!).

Latest version provides much easier to implement secondary indexing (indexes other than the primary).

Version 0.8 supports bulk loading. This is very interesting for my current project

There is wide support for Cassandra in both interpreted and compiled OSS languages, including the ones I most frequently use.

CQL Cassandra Query Language.

Replication architecture is vastly superior to MySQLs transaction and log replay strategy. Cassandra uses an rsync style replication where hash comparisons are exchanged to find which parts of the data tree a given replication node (that is responsible for that tree of data) might need updating, then then transferring just that data. Not only does this reduce bandwidth, but this implies asynchronous replication! Finally! Now this makes sense to me!!

Hadoop support exists for Cassandra, BUT, it’s not a great fit for Cassandra. Look into Brisk if Hadoop implementation is desired or required.

Division of Real-Time and Analytics nodes.

Nodes can be configured to communicate with each other in an encrypted fashion, but in general inter-node communication across public-private networks should be established using VPN tunnels.

This needs further research, but it’s very, VERY promising!

NEXT: “Cassandra and Big Data – building a single-node ‘cluster’“

Software Development, Technology

CIDR Calculator App picked up by Softpedia

1 February 2012 David 1 Comment

This just in…. (seems like a good thing). Notice that my new App CIDR Calculator for the MAC (in the wide for barely 24 hours now), was found by Softpedia, and linked in their site.

Congratulations,

CIDR Calculator, one of your products, has been added to Softpedia’s
database of software programs for Mac OS. It is featured with a description
text, screenshots, download links and technical details on this page:
http://mac.softpedia.com/get/Utilities/DeMartini-CIDR-Calculator.shtml

The description text was created by our editors, using sources such as text
from your product’s homepage, information from its help system, the PAD
file (if available) and the editor’s own opinions on the program itself.

Nothing wrong with a little free exposure. No ratings so far, but I hope to get some good feedback. It’s already sold several units so I know someone is out there giving it a test.

If you want to learn more about this entry in Softpedia, [ HERE IS THE LINK ].

If you want to check out the App itself at the Apple MAC App Store.. just click on the button below!

David DeMartini actual

Tag Archives: software

JIRA – tracking projects in an Agile way

Getting JIRA – downloading distributions

Installing on MAC (in this case a laptop of all things)

Opening up JIRA for the first time..

Creating a dedicated JIRA user

Cassandra – Getting Started – (deployment Part 2 – Installing Ops Center)

Installing Ops Center

Installing the Ops Center Agents

Start up Ops Center

Start up the Node Agents

Conclusion

Cassandra – Getting Started – (deployment Part 1 – Installing Cassandra)

Getting the right JVM/JDK/JRE

Configuring a Node

Systems Administration

Starting up the Cluster

Drop keyspace using Cassandra Cli

Previous Cassandra related articles

Cassandra – Running some simple tests, including a multi-get strategy.

Re-Configuring an Empty Cassandra Cluster

UPDATE!

Setting up a Java build env to prepare for Cassandra development

Downloading Support Packages for Development Environment

Development Environment Setup

Cassandra and Big Data – building a single-node “cluster”

Extra Credit:

Additional Comments:

Apache Cassandra Project – processing “Big Data”

CIDR Calculator App picked up by Softpedia

Racing, Photography, Software and Politics.