Create a Self-Signed wildcard SSL Certificate

9 June 2012 David Leave a comment

Justification

Are you a developer that commonly uses SSL / HTTPS communications on your websites? Do you have multiple development environments hosted on the same domain (such as separate client demo/eval/testing VirtualHosts?), then a wildcarded SSL cert might be for you.

Generating one is very simple process. You will need to have the OpenSSL libraries installed on your computer. All but the worst of Operating systems is likely to have this already installed. If not, you can always go here and get a package: [OpenSSL.org]

Enough reasoning and rationalization, time to get down to business.

Overview

First you must have a private key generated and installed. Second that key is then used to generate a simultaneous signing request and cert signing operation.

Once you have your files created, reference them in the Webserver of your choice (such as nginX or Apach2, if you are using IIS… my heart aches for your plight), using the documentation for that webserver. I’m not going to go into there here, because I’m just taking the time to share this simple process fore generating the CERT.

Step 1 – Generate your private key

If you do not have a private key generated, I’m going to show you have to do it. If you have one that you want to use already, and you know where it is, move onto the next step.

Open a termnal window and execute the following openssl command to generate a private key. For my own installations I never use a key shorter than 2048. Most of the time, I use one that is quite a bit longer. That said, 2048 should provide a sufficiently long key for any practical SSL purposes. Yes, SSL has security issues and a motivated hacker can likely piggy-back it, regardless of your key size… but for the sake of argument and getting through this post, we’ll pretend the Interwebs are a safe place.

Move to the location where you will store your private key (this is a typical location, you can use whatever you want):
cd /etc/ssl/private

Run the command to generate the key:
openssl genrsa 2048 > my.super-awesome.hostname.key

Generating RSA private key, 2048 bit long modulus ......................................+++ .........+++ e is 65537 (0x10001)

So, now we have a key:
ls -l -rw-r--r-- 1 root wheel 1679 Jan 9 09:41 my.super-awesome.hostname.key

Step 2 – Generate your CERT

This is the fun part, and the 2nd of the super easy steps. To complete this you’ll want to know up front, some important pieces of data, such as the hostname for your site (I’m going to use super-awesome.net for this example). You want to have the address you want to use handy, including the country. Also want to have an e-mail address that will be published in the SSL cert to contact you, and a department and company name if so inclined. Below the actual command and responses will be in bold:

openssl req -new -x509 -nodes -sha1 -days 3650 -key my.super-awesome.hostname.key > my.super-awesome.hostname.cert
You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [AU]: US State or Province Name (full name) [Some-State]: Kellyfornia Locality Name (eg, city) []: Sac-of-Tomatoes Organization Name (eg, company) [Internet Widgits Pty Ltd]: Crazy Assembly House Organizational Unit Name (eg, section) []: Committee on wasting tax payer money Common Name (eg, YOUR name) []: *.super-awesome.net Email Address []: admin@super-awesome.net

Verify that you have the file:
ls -l -rw-r--r-- 1 root wheel 1927 Jan 9 09:50 my.super-awesome.hostname.cert

That’s all there is to it! You’re done. Now you have a Self-Signed SSL wildcard sert for super-awesome.net. This would allow you to secure (and I always use the word secure with a certain degree of sarcasm) any sub-domain / hostname under super-awesome.net. Examples of what it would handle:

https://www.super-awesome.net https://qa-server.super-awesome.net https://some-client.super-awesome.net https://another-client.super-awesome.net https://ya-client.super-awesome.net

Now, it’s important to note that this DOES NOT secure anything beyond that first level.. here are a couple more examples:

https://www.super-awesome.net -- OK https://qa-server.super-awesome.net -- OK https://some.client.super-awesome.net -- FAILS https://another-client.super-awesome.net -- OK https://test.ya-client.super-awesome.net -- FAILS

Extra Credit – viewing the contents of your CERT

It’s all well and good to generate the cert, but what if you want to verify it’s properly setup? What if you find a cert on your system and you want to know what it covers, when it expires, whom might own it, etc. Well, that’s possible too. Running a simple command we’ll examine the SSL Cert just created. The important info is in the ‘Issuer’ and ‘Subject’ blocks.

openssl x509 -noout -text -in my.super-awesome.hostname.cert Certificate: Data: Version: 3 (0x2) Serial Number: c4:3d:66:b4:e3:cc:61:86 Signature Algorithm: sha1WithRSAEncryption Issuer: C=US, ST=Kellyfornia, L=Sac-of-Tomatoes, O=Crazy Assembly House, OU=Committe on wasting tax payer money, CN=*.super-awesome.net/emailAddress=admin@super-awesome.net Validity Not Before: Jan 9 17:50:56 2012 GMT Not After : Jan 6 17:50:56 2022 GMT Subject: C=US, ST=Kellyfornia, L=Sac-of-Tomatoes, O=Crazy Assembly House, OU=Committe on wasting tax payer money, CN=*.super-awesome.net/emailAddress=admin@super-awesome.net Subject Public Key Info: Public Key Algorithm: rsaEncryption RSA Public Key: (2048 bit) Modulus (2048 bit): [...] /* removed the modulus to keep the post short */ Exponent: 65537 (0x10001) X509v3 extensions: X509v3 Subject Key Identifier: 9D:72:0C:A0:E6:EB:77:2C:77:EF:E8:9E:B7:BC:9F:53:81:1A:40:9D X509v3 Authority Key Identifier: keyid:9D:72:0C:A0:E6:EB:77:2C:77:EF:E8:9E:B7:BC:9F:53:81:1A:40:9D DirName:/C=US/ST=Kellyfornia/L=Sac-of-Tomatoes/O=Crazy Assembly House/OU=Committe on wasting tax payer money/CN=*.super-awesome.net/emailAddress=admin@super-awesome.net serial:C4:3D:66:B4:E3:CC:61:86 X509v3 Basic Constraints: CA:TRUE Signature Algorithm: sha1WithRSAEncryption [...] /* removed the signature to keep the post short */ Looking at the Subject breaks downs as follows:
Subject: C=US, ST=Kellyfornia, L=Sac-of-Tomatoes, O=Crazy Assembly House, OU=Committe on wasting tax payer money, CN=*.super-awesome.net/emailAddress=admin@super-awesome.net C=US - Country code 'US' ST=Kellyfornia - State or Provence. Sac-of-Tomatoes - City/Location O=Crazy Assembly House - Company or Organization name OU=Committe on wasting tax payer money Organizational Unit (department, etc.) CN=*.super-awesome.net - Canonical Name (hostname / domain) that the CERT services. In this case it's a wildcard, signfied by the '*'

That's all there is to it. Now, secure those website communications!

apache2 apple application cert how-to nginX openssl ssl web server xcode xcode 4

Entrepreneurship, Software Development, Technology

Creating a Logo – birth of a brand

3 June 2012 David Leave a comment

With the impending launch of my new enterprise, it’s time to get down to the business of creating a logo. This article will document the process, good or bad, success for fail, the steps I take will be detailed here for your amusement, edification or horror. You be the judge.

STEP 1 – Get the idea formulated.

Get get started, sketch out a general idea for the logo. Since this I am still a pre-launch state, that drawing will remain a work of the readers imagination. I will tell you though, that after a few iterations I knew what I wanted to go after, and what base information I needed.

First I decided that one of the components I needed for this logo project was an image of the crescent moon. This was of course very easy to locate on the web. There is no lack of such images.

Google search for some sample images

I decided on a sampling of the images and saved them off to a special directory on my computer. Now it was time to step away for a few minutes, clear my mind and and prejudices regarding the images, and then re-open the directory and look at each one. After some time, I selected one of the images that seemed to have the most promise.

Now, it’s important to NOT get locked in or bogged down on one image. Don’t shoot for perfection here, you must be comfortable with the concept that your first, seconds, or maybe even your fifth attempt will be abject failure. It’s a process, and if it will take you 5 attempts to get it right, you’ll never get there unless you get through those first four… so.. let’s get to it.

Step 2 – Open and adjust image to suit

Select your first best guess as a starting point and open the image in some photo editing software. My choice is Photoshop:

The goal I have in mind, requires the crescent to be on the other side of the logo, and the curve must extend over the top. So, this image as is, will not do. Opening up a variety of Photoshop tools (at this point I should point out this is NOT going to be tutorial on how to use photoshop, there are plenty of those done by people better than I at providing such help), I flipped the image and rotated it clockwise 22 degrees.

The next step was to then upon up the ‘Levels’ tool and start cranking in as much contrast, at both the white and black ends of the scale, to remove as much detail as I felt practical. This is needed to get the image close to something you can work with then we open this up again in vector editing suite. This will hopefully make sense, shortly.

This is when things start to get tricky. I know that I may need to go back to photoshop and crank in more contract, or maybe I need to change which end of the contrasting I apply to get just the right amount needed when I move to Illustrator and start my vector editing. Take a few moments to look at your image, if you are attempting something yourself, and apply any tweaks you feel you need now. You’ll notice the file name has changed. I like to keep a good clean copy of the original files aside in case I really destroy the current version. Hitting the “reset button” is more likely to happen than not. Don’t get discouraged if you have to start over. It’s better to have tried and failed than to have never tried at all, simply because you’re afraid to fail. The only way to completely fail is to never try at all.. Be a DOER… not a WISH I DIDer! Here I am making a few more adjustments.

STEP 3 – Open in vector editor

Now, it’s time to find out if I did enough contrast adjustment. This is also the point when I find out if I have a clue what I’m going in Illustrator. caveat emptor, you’re getting what you paid for here.

Now, before doing anything, I’m going to save the workspace. Again, it’s nice to be able to get back to the beginning if something goes wrong.

Next step I want to increase the size of the workspace into roughly a 2×9 ratio (height x width). This will give me room for the next parts of the logo, including text etc., and finally used one of the built in tools called “Live Trace” to convert the image into an Illustrator vector:

Here is what the resulting vector nodes look like once the trace is complete. I adjusted the minimum pixel and path size vars up and down until I had a trace I liked.

After inverting the image, using live trace and use the ‘Auto convert to Live Paint, I had the primary image I wanted. Following the conversion, I added a target and underline, then selected a text I wanted to use. Again, I’m not totally in love with this text and I way decide to change this later, but for now, to test this image, I have to start somewhere. Of course, Sample Company is *not* the name I’m going to be using, this is just that, a sample. After about 3 hours of work, mostly poking around in Illustrator for the options I really wanted to use I have this concept proof.

Step 4 – wrapping things up

Once the new image is saved, make sure cropping is properly set and export the image to the apps you need. For me, I needed it in a transparent .png for use with invoices, letterheads and publishing.

This is just the beginning. Once the new company is fully launched I’ll be posting the final logos. Keep an eye out for more news on Jun 18th!

entrepreneurship illustrator llc photoshop sample company llc

Entrepreneurship, Software Development, Technology

California Dreamin’ — setting up the new California Office

24 April 2012 David 2 Comments

It has been a long 8 months since the decisions was made to re-locate operations to Santa Cruz, California. On April 16th, the first of the equipment, furniture, and transient stuff arrived in California from the Washington office.

Despite all the best intentions and plans, construction of the office conference room is still underway, so equipment remains temporarily stacked until they can be placed in their designated locations.

Below is a gallery of photos taken yesterday as things got closer and closer to completion.

Front Office – boxes and furniture make things a little crowded

Front Office – conference table becomes a temporary workspace

Front Office – looking back towards printer room

Printer room, looking back towards office

Printer Room (future)

Conference Room – looking back towards shop

Conference Room – looking forward towards printer room

Looking from Printer Room towards side shop

Side shop, looking back towards main shop

Main shop space, from exterior

Main shop space

Side shop seen from main shop

Main shop – workbench and tool boxes

Main shop space, tool boxes and stairs to loft

Workbench and tool boxes

Creative location of air compressors under loft stairs

Break area in front of loft

Loft beverage center

Temporary roof vent fan, ugly but creative

Loft vent shaft with temporary force-feed. Heat is an issue that we have to resolve

Loft break area and main storage

Transient storage in the loft

Rear of loft overlooks main shop area.

After 8 days of 10+ hours each moving equipment, negotiating with contractors, delivering two 16′ box trucks, a 20′ trailer and countless pickup truck loads, we have accomplished a lot. Much work remains to be done before the offices are fully functional, but we are already generating revenue of the new location.

When I return to California following the final contractors work, presentation monitors, servers and printers will be installed in their proper locations. I’m looking forward to getting this all wrapped up so we can concentrate on moving forward with day to day operations.

Software Development, Technology

Cassandra – Getting Started – (deployment Part 2 – Installing Ops Center)

26 March 2012 David 1 Comment

<< Previous: Cassandra – Going into Production – Part 2.

With an empty cluster running, the next step I’m going to take is to install and configure OpsCenter from DataStax. This is a fantastic tool for monitoring the health and performance of your cluster.

Installing Ops Center

The first order of business is to create a directory to store the Ops Center code on the server. I opted to do this within the user account used for Cassandra, as the directory datastax

:~$ mkdir datastax :~$

Next, download and extract the OpsCenter package:

:~/datastax$ wget http://downloads.datastax.com/community/opscenter-1.4-free.tar.gz --2012-03-26 08:25:30-- http://downloads.datastax.com/community/opscenter-1.4-free.tar.gz Resolving downloads.datastax.com... 173.203.57.192 Connecting to downloads.datastax.com|173.203.57.192|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 21539843 (21M) [application/octet-stream] Saving to: `opscenter-1.4-free.tar.gz' 100%[=======================================================================>] 21,539,843 3.72M/s in 7.5s 2012-03-26 08:25:38 (2.74 MB/s) - `opscenter-1.4-free.tar.gz' saved [21539843/21539843] :~/datastax$ tar -xvzf opscenter-1.4-free.tar.gz opscenter-1.4/ opscenter-1.4/log/ opscenter-1.4/bin/ opscenter-1.4/bin/create-keystore.bat opscenter-1.4/bin/create-key-pair.bat [...] opscenter-1.4/conf/event-plugins/email.conf opscenter-1.4/conf/ssl.conf opscenter-1.4/conf/opscenterd.conf :~/datastax$

Next is the setup for OpsCenter. Setup is done via a Python script, located in the BIN directory. Have your listening IP ready and know which port you want to use for the Ops Center web portal. I’m going to use the default of port 8888. Make sure you have the port open on your machine. (click here to jump to the my section on ports).

:~/datastax$ ls opscenter-1.4 opscenter-1.4-free.tar.gz :~/datastax$ cd opscenter-1.4 :~/datastax/opscenter-1.4$ bin/setup.py Generating a 1024 bit RSA private key .........++++++ ...++++++ writing new private key to 'ssl/opscenter.key' ----- MAC verified OK Certificate was added to keystore :~/datastax/opscenter-1.4$

Configure the Ops Center deamon. Set the listening IP to an IP available on the system. I’m going to node’s interal IP address (10.1.0.23). The values I’ve changed are in bold.

:~/datastax/opscenter-1.4$ vi conf/opscenterd.conf [...] [jmx] # The default jmx port for Cassandra >= 0.8.0 is 7199. If you are using # Cassandra 0.7.*, the default is 8080, and you should change this to # reflect that. port = 8001 [...] [webserver] port = 8888 interface = 10.1.0.23 staticdir = ./content log_path = ./log/http.log [...] [cassandra] # a comma-separated list of places to try for a connection to your Cassandra # cluster: seed_hosts = 10.1.0.23,10.1.0.26 [...]

Installing the Ops Center Agents

Each node in the cluster must have a running Ops Center agent. The installation package for this was generated by the Ops Center setup process, and saves a compressed file. This file then needs to be copied and extracted on each node you plan to monitor with the Ops Center.

:~/datastax$ mkdir opscenter-agent :~/datastax$ cp opscenter-1.4/agent.tar.gz opscenter-agent/ :~/datastax$ cd opscenter-agent/ :~/datastax/opscenter-agent$ tar -xvzf agent.tar.gz agent/opscenter-agent-2.5-standalone.jar agent/conf/log4j.properties agent/bin/setup.bat [...] agent/bin/ssl/agentKeyStore.p12 agent/bin/ssl/opscenter.key agent/doc/LICENSE :~/datastax/opscenter-agent$

Now run the agent’s setup, assigning it’s IP and the Ops Center’s IP. 10.1.0.26 is this node’s IP address. 10.1.0.23 is the location of the Ops Center install (this may or may not be on the same system or even the same IP address):

:~/datastax/opscenter-agent$ agent/bin/setup 10.1.0.26 10.1.0.23

Make sure you copy the agent file to ALL your other nodes and follow the same setup procedure (this is an example of how I copied the file, your system, ports etc. may be different), and repeat the steps above, with the appropriate IPs.

:~/datastax/opscenter-agent$ scp -P41718 agent.tar.gz bigdata@10.1.0.26:. RSA key fingerprint is 2b:5b:26:03:87:a4:b1:ea:90:b5:4e:42:60:88:cd:d1. bigdata@10.1.0.26's password: agent.tar.gz 100% 10MB 10.3MB/s 00:01 :~/datastax/opscenter-agent$

Start up Ops Center

On the Ops Center machine, move back to it’s installed directory and start the process.

:~/datastax$ cd opscenter-1.4 ~/datastax/opscenter-1.4$ bin/opscenter &

Now connect to the IP address and port and you should see a base Ops Center instance. This is what you would typically see before starting up your agents:

DataStax Ops Center 1.4

Start up the Node Agents

The last step is to start up the Agent deamons so that the OpsCenter knows the status of each node.

:~/datastax/opscenter-1.4$ cd ../opscenter-agent/ :~/datastax/opscenter-agent$ agent/bin/opscenter-agent & :~/datastax/opscenter-agent$ INFO [main] 2012-03-26 09:12:40,465 Loading conf files: conf/address.yaml INFO [main] 2012-03-26 09:12:40,505 Java vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_03 INFO [main] 2012-03-26 09:12:40,505 Waiting for the config from OpsCenter INFO [main] 2012-03-26 09:12:40,637 SSL communication is enabled INFO [main] 2012-03-26 09:12:40,637 Creating stomp connection to 10.1.0.23:61620

With the Agents fired up, you will see a nice dashboard, showing the current status of the cluster, and some metrics on performance.

Ops Center up and running.

Conclusion

This basically concludes the fast deployment steps required to download, install, configure and start up Cassandra, along with the DataStax Ops Center.

Total time required to deploy was under 4 hours.

apache big data cassandra cql database db nosql open source software

Software Development, Technology

Cassandra – Getting Started – (deployment Part 1 – Installing Cassandra)

26 March 2012 David 2 Comments

It’s been almost a month since I started the Apache Cassandra investigation, and now it’s time to move into a production stance. Some of these steps will differ from the original steps documented here in my blog. Later this week I will go back and amend those posts to point at this post as the more recent information. Those old links are already being referenced by multiple sites, so deleting them would not be a kind thing to do. Thus.. onward we move!

Getting the right JVM/JDK/JRE

Originally, the OpenJDK was being used for this introduction and research into Cassandra. Being a proponent of Open Source, I was going to avoid the use of Oracle’s potentially proprietary JDK/JRE in this environment. I have since seen first had, that the JDK DOES IN FACT MATTER, and the one that supports the latest tools is the one from Oracle.

That is located here:

Java SE Development Kit 7 Downloads – download this if you plan to do any Java development.

Java SE Runtime Environment 7 Downloads – Download this if you just want to run Java programs, and do not plan to write your own.

Downloading the JRE/JDK from Oracle has enabled the reliable use of DataStax’s OpsCenter management tool (more on that later).

These are the recommended minimums for Cassandra and OpsCenter from DataStax, a respected partner of the Apache Cassandra project.

Sun Java Runtime Environment 1.6.0_19 or later
Python 2.5, 2.6, or 2.7
OpenSSL version listed in Configuring SSL unless you disable SSL

I ended up selecting the JDK (linked here) and deposited it in the following location on my system as user root (create the directory path if you don’t already have it):

/opt/java/64/jdk-7u3-linux-x64.tar.gz

Extract the file:

:/opt/java/64# tar -xvzf jdk-7u3-linux-x64.tar.gz jdk1.7.0_03/ jdk1.7.0_03/include/ jdk1.7.0_03/include/jvmti.h jdk1.7.0_03/include/jawt.h [...] jdk1.7.0_03/jre/plugin/desktop/sun_java.desktop jdk1.7.0_03/jre/COPYRIGHT jdk1.7.0_03/LICENSE jdk1.7.0_03/COPYRIGHT :/opt/java/64#

The Cassandra Build I decided to use is this one: apache-cassandra-1.1.0-beta1. I downloaded the file to the user I created for this using wget:

:~$ wget http://apache.deathculture.net/cassandra/1.1.0/apache-cassandra-1.1.0-beta1-bin.tar.gz --2012-03-25 22:52:27-- http://apache.deathculture.net/cassandra/1.1.0/apache-cassandra-1.1.0-beta1-bin.tar.gz Resolving apache.deathculture.net... 173.236.158.254 Connecting to apache.deathculture.net|173.236.158.254|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 12505037 (12M) [application/x-gzip] Saving to: `apache-cassandra-1.1.0-beta1-bin.tar.gz' 100%[=======================================================================>] 12,505,037 8.84M/s in 1.3s 2012-03-25 22:52:29 (8.84 MB/s) - `apache-cassandra-1.1.0-beta1-bin.tar.gz' saved [12505037/12505037]

Next the file is extracted, moved to a shorter directory name:

:~$ tar -xvzf apache-cassandra-1.1.0-beta1-bin.tar.gz :~$ mv apache-cassandra-1.1.0-beta1 cass-beta1

Configuring a Node

Now the configuration is edited to define the node ring. The first file to edit is the cassandra.yaml file.

This initially will be only a 2 node cluster, but the tokens must still be calculated. Here are the node tokens I generated using a PERL script I wrote (see: Cassandra and Big Data – building a single-node “cluster” – Extra Credit for the code):

:~/cass-beta1$ ./token.pl 2 Calculate tokens for 2 nodes factor = 170141183460469231731687303715884105728 node 0 token: 0 node 1 token: 85070591730234615865843651857942052864 :~/cass-beta1$

Edit the cluster name. I’m not testing, so I changed the name to one descriptive of the data I was storing. ‘ip’. In the example below, I’m showing configs for the 2nd of the two nodes. Note: The first node would have a different IP address and also a different initial token, in this case ‘0’, as calculated by the tool.

:~$ cd cass-beta1/ :~/cass-beta1$ vi conf/cassandra.yaml [...] # The name of the cluster. This is mainly used to prevent machines in # one logical cluster from joining another. cluster_name: 'ip' [...] If blank, Cassandra will request a token bisecting the range of # the heaviest-loaded existing node. If there is no load information # available, such as is the case with a new cluster, it will pick # a random token, which will lead to hot spots. initial_token: 85070591730234615865843651857942052864 [...] # directories where Cassandra should store data on disk. data_file_directories: - /home/bigdata/data/ [...] # commit log commitlog_directory: /home/bigdata/commitlog/ [...] # saved caches saved_caches_directory: /home/bigdata/saved_caches/ [...] # seeds is actually a comma-delimited list of addresses. # Ex: ",," - seeds: "10.1.100.101,10.1.100.102" [...] # Setting this to 0.0.0.0 is always wrong. listen_address: 10.1.1.101 [...] rpc_address: 10.1.1.101 [...] # Time to wait for a reply from other nodes before failing the command (this was done to increase timeout to 30 seconds, sometimes the search I need to run is pretty nasty) rpc_timeout_in_ms: 30000

Following that, the shell file needs to be modified to designate the JMX listening port:

:~/cass-beta1$ vi conf/cassandra-env.sh [...] # Specifies the default port over which Cassandra will be available for # JMX connections. JMX_PORT="8001" [...]

Make sure your logfile is in the desired location. I decided to keep it within the account itself for now:

vi cassA-1.0.8/conf/log4j-server.properties [...] log4j.appender.R.File=/home/bigdata/log/cassA.log [...]

Next I set the paths in the .bash configuration file for the account, using the following 3 environment variables (ANT_HOME is used by the ANT compiler, if you are not writing code, your JAVA_HOME will point at the JRE, not the JDK, and you won’t need the ANT_HOME path at all):

vi ~/.bash_profile export JAVA_HOME=/opt/java/64/jdk1.7.0_03 export ANT_HOME=/usr/lib/ant/ export CASS_BIN=$HOME/cass-beta1/bin export PATH=$PATH:$ANT_HOME/bin:$CASS_BIN

Systems Administration

Make sure there is a location for the cassandra server to write it’s log files. You’ll need your SysAdmin, or root privs, to do this. I set the ownership to root and the user under which I’m currently running cassandra (bigdata):

root:/data/feed/indata# cd /var/log root:/var/log# mkdir cassandra root:/var/log# chown root:bigdata cassandra root:/var/log# chmod 775 cassandra

The following ports need to be opened up, if you are running a firewall on each system (you ARE, right!?!), to allow Cassandra nodes to communicate with each other. This is a snippet from my rules-based firewall control file.

Port Usage:

9160 – Thrift port, where the API is serviced for Reads/Writes to Cassandra

8001 – Individual node listening port. This is used for the command line (cli)

7000 – Commands and Data TCP port, used nodes for communications

7001 – SSL port used for storage communications

8888 – Only used on systems that will host an Ops Center installation

61620 – Required for Ops Center Agent Communications

## Cassandra ACCEPT loc $FW tcp 9160,8001,7000,7001 ## OpsCenter ACCEPT loc $FW tcp 8888,61620

Starting up the Cluster

This is where the truth is told. The rubber meets the road. The money is placed where your mouth is. Light ’em up!

:~$ cassandra :~$ INFO 23:52:54,232 Logging initialized INFO 23:52:54,236 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_03 INFO 23:52:54,237 Heap size: 6291456000/6291456000 [...] INFO 23:52:55,162 Node /10.1.0.23 state jump to normal INFO 23:52:55,163 Bootstrap/Replace/Move completed! Now serving reads.

IT LIVES!! Now start your other node(s), and check to verify you have a complete ring, properly configured. You should see something like this in subsequent nodes, I’ve highlighted the references to the other member node:

[...] INFO 23:54:16,042 Node /10.1.0.23 has restarted, now UP INFO 23:54:16,043 InetAddress /10.1.0.23 is now UP INFO 23:54:16,043 Node /10.1.0.23 state jump to normal INFO 23:54:16,088 Compacted to [/home/bigdata/data/system/LocationInfo/system-LocationInfo-hc-6-Data.db,]. 544 to 413 (~75% of original) bytes for 4 keys at 0.003425MB/s. Time: 115ms. INFO 23:54:16,109 Completed flushing /home/bigdata/data/system/LocationInfo/system-LocationInfo-hc-5-Data.db (163 bytes) INFO 23:54:16,110 Node /10.1.0.26 state jump to normal INFO 23:54:16,111 Bootstrap/Replace/Move completed! Now serving reads.

Run nodetool:

:~$ nodetool -h10.1.0.23 -p 8001 ring Address DC Rack Status State Load Owns Token 85070591730234615865843651857942052864 10.1.0.23 datacenter1 rack1 Up Normal 17.77 KB 50.00% 0 10.1.0.26 datacenter1 rack1 Up Normal 17.66 KB 50.00% 85070591730234615865843651857942052864

WE HAVE A RING!

NEXT: SETTING UP OPS CENTER

apache big data cassandra cql database db nosql open source software

Software Development, Technology

Inserting and Reading data from a Cassandra Cluster

19 March 2012 David Leave a comment

Rubber meeting the road. Time to insert some column families, then some data and finally pull it back off the stack.

First off, the keyspace was already defined, so I’m going to simply list it’s structure:

[default@unknown] describe ip_store; Keyspace: ip_store: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2]

With a keyspace ready for some column families, those are created next. Here I’m establishing that there will be 4 families in this single keyspace. This is contrary to suggestions in the High Performance Cassandra Handbook, but follows all other documentation I’ve seen. Considering that this is NOT a production implementation, I’m going to go with a more conventional strategy of organizing related data in the same keyspace.

The first action is to assume the desired keyspace, then add the desired column families:

[default@unknown] use ip_store; Authenticated to keyspace: ip_store [default@ip_store] create column family warehouse with comparator = UTF8Type; 595945d0-71ce-11e1-0000-13393ec611bf Waiting for schema agreement... ... schemas agree across the cluster [default@ip_store] create column family hourly with comparator = UTF8Type; 65ea2170-71ce-11e1-0000-13393ec611bf Waiting for schema agreement... ... schemas agree across the cluster [default@ip_store] create column family daily with comparator = UTF8Type; 6aaeae60-71ce-11e1-0000-13393ec611bf Waiting for schema agreement... ... schemas agree across the cluster [default@ip_store] create column family 30day with comparator = UTF8Type; 7b85bf30-71ce-11e1-0000-13393ec611bf Waiting for schema agreement... ... schemas agree across the cluster

OK, a basic schema has been established. Now.. to load the data. I’ll post the relevant sections of the loader code at a later date. At this point you only need to consider that the loader DOES work and it’s loading data. We’ll look at the extraction of the data following loading a very small set.

time host=10.1.0.23 port=9160 ks=ip_store cf=warehouse ttl=0 datafile=5.ips ant -DclassToRun=loader.bulkIpLoader run Buildfile: cBuild/build.xml init: compile: [javac] Compiling 1 source file to cBuild/build/classes dist: [jar] Building jar: cBuild/dist/lib/cass.jar run: [java] ks ip_store [java] cf warehouse [java] ttl 0 [java] datafile 5.ips BUILD SUCCESSFUL Total time: 1 second

Of the set, there are three unique IPs and 2 are duplicates of other data (IMPORTANT NOTE: The IP’s have been changed to protect the innocent and clueless):

2016468288 1011 suspicious 2012-03-13 18:40:01 2016468288 1011 suspicious 2012-03-13 18:40:02 3149138705 1011 suspicious 2012-03-13 18:40:00 3149138705 1011 suspicious 2012-03-13 18:40:01 2179293112 1011 suspicious 2012-03-13 18:39:59

Having loaded these, I re-launch the command line interface, authenticate to the desired keyspace, and then a VERY important command to set an assumption about how we’re going to reference the keys. If you get a strange error like this “cannot parse ‘187.180.11.17’ as hex bytes“, that means you likely forgot to issue the assumes command. Commands I issued are in bold.

cass Connected to: "ak-ip" on 10.1.0.23/9160 Welcome to Cassandra CLI version 1.0.8 [default@unknown] use ip_store [default@ip_store] assume warehouse keys as utf8; Assumption for column family 'warehouse' added successfully. [default@ip_store] get warehouse['3149138705']; => (column=2012-03-13 18:40:00, value=7b227265706f72746564223a22323031322d30332d31332031383a34383a3031222c22617474726962757465223a22737573706963696f7573222c2270726f705f6964223a2231303131222c2270726f7065727479223a22426f74204375747761696c222c226465746563746564223a22323031322d30332d31332031383a34303a3030222c226d65746164617461223a22222c226970223a223139302e3137342e3235312e313435227d, timestamp=1332168862658) => (column=2012-03-13 18:40:01, value=7b227265706f72746564223a22323031322d30332d31332031383a34383a3031222c22617474726962757465223a22737573706963696f7573222c2270726f705f6964223a2231303131222c2270726f7065727479223a22426f74204375747761696c222c226465746563746564223a22323031322d30332d31332031383a34303a3031222c226d65746164617461223a22222c226970223a223139302e3137342e3235312e313435227d, timestamp=1331689681) Returned 2 results. Elapsed time: 39 msec(s).

There we go. A single key row ip_store[‘warehouse’][‘3149138705’] containing to column records, each with a JSON blob within it. Now.. the next step, to set the assumption of utf8 when recalling the records and get output mere mortals such as yourselves can understand.

[default@ip_store] assume warehouse validator as ascii; Assumption for column family 'warehouse' added successfully. [default@ip_store] t warehouse['3149138705']; => (column=2012-03-13 18:40:00, value={ "reported":"2012-03-13 18:48:01", "attribute":"suspicious", "prop_id":"1011", "detected":"2012-03-13 18:40:00", "ip":"187.180.11.17" }, timestamp=1331689680) => (column=2012-03-13 18:40:01, value={ "reported":"2012-03-13 18:48:01", "attribute":"suspicious", "prop_id":"1011", "detected":"2012-03-13 18:40:01", "ip":"187.180.11.17" }, timestamp=1331689681) Returned 2 results. Elapsed time: 2 msec(s).

There is it! Data written, data read. Now, it’s up to you to think about how you might use this simple, flexible and powerful storage engine to solve your business needs.

apache big data cassandra cli data database db db warehouse nosql

Software Development, Technology

Drop keyspace using Cassandra Cli

18 March 2012 David Leave a comment

Dropping a an entire keyspace using the cassandra-cli is exceptionally simple.

First, access your cluster using the cli. I have an alias in my .bash_profile so I only need to type ‘cass’ to access the clid. In an attempt to be helpful though, I shall show the full command syntax for my environment. Your host and port may vary.

alias cass='cassandra-cli -h 10.1.0.26'

In this example, I am going to drop the keyspace I was loading with test data in previous posts, ks33.

hpcass: ~$ cass Connected to: "Test1" on 10.1.0.23/9160 Welcome to Cassandra CLI version 1.0.8 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. DROP keyspace ks33; 07ad5e00-7120-11e1-0000-13393ec611bd Waiting for schema agreement... ... schemas agree across the cluster [default@unknown]

That’s all there was to it. Keyspace destroyed.

Previous Cassandra related articles

Apache Cassandra Project – processing “Big Data” – blog

Cassandra and Big Data – building a single-node “cluster” – blog

Cassandra – Getting a 3 node cluster built – apps

Java build env to prepare for Cassandra development – apps

Re-Configuring an Empty Cassandra Cluster – blog

Cassandra – Running some simple tests, including a multi-get strategy. – blog

Creating a simple Utils class – apps

Cassandra DB Connetor in Java, using Thrift API – apps

Java multi-get demonstrator for Cassandra NoSQL db – apps

apache big data cassandra cassandra-cli cli data database db java nosql open source software thrift api

Military, photography

Rare View – 7 US Carriers in one photo

17 March 2012 David Leave a comment

You won’t see this any time soon, and possibly never again. 7 US Super Carriers in port together, in one photograph.

This is a stitch of 4 photos taken on 3-March-2012 of Naval Base Kitsap, showing 7 of the world’s largest warships in port at the same time.

On the left, are some of the last 4 diesel (oil) powered Super Carriers. On the right side of the image is the most unusual aspect of this photograph, 3 Nimitz class nuclear Super Carriers in port together, including the classes namesake ship, the USS Nimitz (CVN-68), just days prior to her re-locating to hew new home port of Everett Washington.

Ships are labeled in the photo, left to right.

CV-62 – USS Independence – The fifth USS Independence (CV/CVA-62) is a Forrestal-class aircraft carrier of the United States Navy. It was the fourth and final member of the Forrestal-class conventional-powered Supercarrier. It entered service in 1959, with much of its early years spent in the Mediterranean Fleet.

CV-63 – USS Kitty Hawk – formerly CVA-63, was the second naval ship named after Kitty Hawk, North Carolina, the site of the Wright brothers’ first powered airplane flight. Kitty Hawk was both the first and last active ship of her class, and the last conventionally-fuelled aircraft carrier in service with the US Navy.

CV-64 – USS Constellation – a Kitty Hawk-class supercarrier, was the third ship of the United States Navy to be named in honor of the “new constellation of stars” on the flag of the United States and the only naval vessel ever authorized to display red, white, and blue designation numbers.

CV-61 – USS Ranger – The seventh USS Ranger (CV/CVA-61) is one of four Forrestal-class supercarriers built for the US Navy in the 1950s. Commissioned in 1957, she served extensively in the Pacific, especially the Vietnam War, for which she earned 13 battle stars.

CVN-68 – USS Nimitz – is a supercarrier in the United States Navy, the lead ship of her class. She is one of the largest warships in the world. She was laid down, launched and commissioned as CVAN-68 but was redesignated CVN 68 (nuclear-powered multimission aircraft carrier) on 30 June 1975 as part of the fleet realignment of that year.

CVN-74 – USS John C. Stennis – is the seventh Nimitz-class nuclear-powered supercarrier in the United States Navy, named for Senator John C. Stennis of Mississippi. She was commissioned on 9 December 1995. Her home port is Bremerton, Washington.

CVN-76 – USS Ronald Reagan – is a Nimitz-class nuclear-powered supercarrier in the service of the United States Navy. The ninth ship of her class,[2] she is named in honor of former President Ronald Reagan, President of the United States from 1981 to 1989. Upon her christening in 2001, she was the first ship to be named for a former president still living at the time.

carrier cv-61 cv-62 cv-63 cv-64 cv61 cv62 cv63 cv64 cvn-68 cvn-74 cvn-79 cvn68 cvn74 cvn76 NAVY super carrier

Software Development, Technology

Cassandra – Running some simple tests, including a multi-get strategy.

14 March 2012 David 5 Comments

PREV: Re-Configuring an Empty Cassandra Cluster

Time for the rubber to meet the road. Get some data loaded and validate the theoretical concepts garnered from the documentation consumed.

This is an record example (IP’s have been changed to protect the clueless):

ip_key: 1598595809 ip: 10.2.162.225 prop_id: 1033 property: Bad Stuff threat: 1 attribute: suspicious meta: 10.25.112.7 detected: 2012-01-05 15:17:14 detected_sec: 1325805434 reported: 2012-01-06 01:44:02 reported_sec: 1325843042

Preliminary model concept centers around the IP, however with over 60,000,000 records there are overlaps, so a single IP is not going to survive as the primary key. Trying to get a distribution out of MySQL takes some time. Here are some distributions by key. Thousands of of events per IP, and this is just a short 1 month window:

+------------+--------+ | ip_dec | events | +------------+--------+ | 3158358206 | 2705 | | 652542280 | 2506 | | 3495573656 | 2089 | | 3232235778 | 2015 | | 1072721396 | 1528 | | 652542281 | 1432 | | 3232235876 | 1427 | | 3448822506 | 1232 | | 1280052209 | 1106 | | 3232235779 | 1086 | +------------+--------+

Now, Cassandra will support MILLIONS of column items on a single row, thus, this actually might work, and scale without using Super Column Families (SCFs). Using the detected time seconds as the column name with an attribute suffix, then enclosing the data in a JSON blob could provide the required results. Using the datekey as a secondary index across the columns, or using them as a time progression. Concepts that need to be tested, which precisely the task at hand.

Considering that a good detected time is not always available, and the data is processed in batches, there could be a heavy grouping of timestamps. If there are a variety of issues detected on a specific IP, at the same obfuscated time, loss of data will occur. This is certainly NOT the desired result. Given this, the datastamp is not unique enough for a hash structure datastore such as Cassandra, without using SCFs.

A structure such as this could deliver the required granularity:

ipstore[$ipkey][$timekey][$propkey] = JSON:{}, JSON:{}, JSON{}... ;

To get started with loading data, wrote a quick test program in Java, compliled it and ran it:

test1.java – source code

public class test1 { public static void main (String [] args) { System.out.println("Cassandra Calling!"); } }

compiling….

java/src/loader1$ javac test1.java -d ../../class/.

executing…

/java/class$ java test1 Cassandra Calling!

Environment confirmed for compiling loader code. With a model in mind…

ipstore[$ipkey][$popkey][$timestamp] = JSON:{}

..and IP data to load,

ipp < get_a_million.sql > a_million_ips.dta cass:~$ ls -l 126180075 2012-03-13 13:06 a_million_ips.dta cass:~$ wc -l a_million_ips.dta 1000001 a_million_ips.dta

...next it's designing the schema builder and loader.

REFERENCE: Setting up a Java build env to prepare for Cassandra development

With the environment confirmed, and a test file (test1.java) written, execute and verify function:

cass:~$ ant -DclassToRun=test1 run Buildfile: ./build.xml [...] run: [java] This is Java.... drink up!

VERIFIED.

To get moving forward, I created a Utilities class and a DB connector Class. You can look at the source code for those at these two links:

Util Source Code

Cassandra DB Connector Source Code

With the code done, need to perform a couple of house keeping tasks to get it prepared for loading.

Adding the ks33 keyspace

[default@unknown] create keyspace ks33: c7944700-6e2e-11e1-0000-13393ec611bd Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] use ks33; Authenticated to keyspace: ks33

Adding the cf33 ColumnFamily to ks33 Keyspace:

[default@ks33] create column family cf33 with comparator = UTF8Type; 2501f8b0-6e2f-11e1-0000-13393ec611bd Waiting for schema agreement... ... schemas agree across the cluster

Next, to load 100 trial rows. Here is a link to the source code:

Source for useMultiGet (tba)

hpcass@feed0:~/cassIP/java/cBuild$ host=10.1.0.123 port=9160 inserts=100 ks=ks33 cf=cf33 ant -DclassToRun=c01.useMultiGet run Buildfile: /home/hpcass/cassIP/java/cBuild/build.xml init: compile: [javac] Compiling 1 source file to /home/hpcass/cassIP/java/cBuild/build/classes dist: [jar] Building jar: /home/hpcass/cassIP/java/cBuild/dist/lib/cassIP.jar run: [java] get time 89062577 [java] mget time 494039096 BUILD SUCCESSFUL

Here are some results from multi-get tests. It's actually the inverse of my hope, the multi-get seems to rapidly lose it's benefit.

5 Item Slices (1000 item dataset) ========================================================= run: RUN 1 RUN 2 RUN 3 [java] get time 339041199 436440551 358115310 [java] mget time 172484370 174690508 182833140 10 Item Slices (1000 item dataset) ========================================================= run: RUN 1 RUN 2 RUN 3 [java] get time 346512511 332820479 314136351 [java] mget time 394049160 251152592 234719383 25 Item Slices (1000 item dataset) ========================================================= run: RUN 1 RUN 2 RUN 3 [java] get time 335286775 293802010 295948562 [java] mget time 464933443 324505741 312226035

What I didn't expect to see, based on the information in the 'High Performance
Cookbook, was rapid fall-off in performance, and in face in all cases in the
slices of size 25 inverted the performance, showing that it became worse.

2 Item Slices (1000 item dataset) ========================================================= run: RUN 1 RUN 2 RUN 3 [java] get time 285509637 331970814 317512021 [java] mget time 104567639 96477512 124040195

One thing I didn't think of testing was doing a slice of size 1, and see if maybe part of the perceived performance in the lower slices is really cache hits. AH! Look at this, it looks like the *test* is highly suspect at best. I think this shows some evidence the performance 'benefit' of the multi-get is really a cache hit artifact from extracting the exact same data a second time:

host=10.1.0.123 port=9160 inserts=1000 ks=ks33 cf=cf33 slice=1 ant -DclassToRun=c01.useMultiGet run Buildfile: /home/hpcass/cassIP/java/cBuild/build.xml 1 Item Slices (1000 item dataset) ========================================================= run: RUN 1 RUN 2 RUN 3 [java] get time 295158535 298466321 283438099 [java] mget time 109982545 103658894 98260286

This demonstrator failure to perform, is not a failure in and of itself. It's provided useful information regarding some concepts recommended in some documentation, but may not really be a true best practice. I long ago developed a healthy skepticism of expert advice in lieu of verification.

apache big data cassandra data database db open source software

Software Development, Technology

Re-Configuring an Empty Cassandra Cluster

12 March 2012 David 1 Comment

PREV: Setting up a Java build env to prepare for Cassandra development

After doing more research, I decided the Ordered Partitioning was not going to buy me anything but a lop-sided distribution. Looking at this (it’s a case of IP distributions, not hostnames as originally envisioned, that will be a later evaluation).

I’d have 3 very heavy nodes and 3 very light nodes. This is a distribution of real world data.

Node: Range: Dist: ====== ================================== ====== node00 0.0.0.0 to 42.170.170.171 6 % node01 42.170.170.172 to 85.85.85.87 32 % node02 85.85.85.88 to 128.0.0.3 34 % node03 128.0.0.4 to 170.170.170.175 2 % node04 170.170.170.176 to 213.85.85.91 21 % node05 213.85.85.92 to 255.255.255.255 3 %

Goofing around with pseudo random key naming to get a better balance only does one thing, make the keys I wanted to use (IPs) basically worthless, so the ordering is wrecked regardless. Random partitioning is the default configuration for Cassandra, so, that’s what I plan to use. Problem is, I’d built out this specific node set with this setting first:

– ByteOrderedPartitioner orders rows lexically by key bytes. BOP allows scanning rows in key order, but the ordering can generate hot spots for sequential insertion workloads.

I re-set the configurations to use the default instead:

– RandomPartitioner distributes rows across the cluster evenly by md5. When in doubt, this is the best option.

After changing the configuration from ByteOrderedPartitioner to RandomPartitioner and restarting the first node.. I am greeted with this happy message:

ERROR 13:03:36,113 Fatal exception in thread Thread[SSTableBatchOpen:3,5,main]
java.lang.RuntimeException: Cannot open /home/hpcass/data/node00/system/Versions-hc-3 because partitioner does not match org.apache.cassandra.dht.RandomPartitioner

In fact I’m greeted with a lot of them. This is then followed by what looks like possibly.. normal startup messaging?

INFO 13:03:36,166 Creating new commitlog segment /home/hpcass/commitlog/node00/CommitLog-1331586216166.log INFO 13:03:36,175 Couldn't detect any schema definitions in local storage. INFO 13:03:36,175 Found table data in data directories. Consider using the CLI to define your schema. INFO 13:03:36,197 Replaying /home/hpcass/commitlog/node00/CommitLog-1331328557751.log INFO 13:03:36,222 Finished reading /home/hpcass/commitlog/node00/CommitLog-1331328557751.log INFO 13:03:36,227 Enqueuing flush of Memtable-LocationInfo@1762056890(213/266 serialized/live bytes, 7 ops) INFO 13:03:36,228 Writing Memtable-LocationInfo@1762056890(213/266 serialized/live bytes, 7 ops) INFO 13:03:36,228 Enqueuing flush of Memtable-Versions@202783062(83/103 serialized/live bytes, 3 ops) INFO 13:03:36,277 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-16-Data.db (377 bytes) INFO 13:03:36,285 Writing Memtable-Versions@202783062(83/103 serialized/live bytes, 3 ops) INFO 13:03:36,357 Completed flushing /home/hpcass/data/node00/system/Versions-hc-4-Data.db (247 bytes) INFO 13:03:36,358 Log replay complete, 9 replayed mutations INFO 13:03:36,366 Cassandra version: 1.0.8 INFO 13:03:36,366 Thrift API version: 19.20.0 INFO 13:03:36,367 Loading persisted ring state INFO 13:03:36,384 Starting up server gossip INFO 13:03:36,386 Enqueuing flush of Memtable-LocationInfo@846275759(88/110 serialized/live bytes, 2 ops) INFO 13:03:36,386 Writing Memtable-LocationInfo@846275759(88/110 serialized/live bytes, 2 ops) INFO 13:03:36,440 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-17-Data.db (196 bytes) INFO 13:03:36,446 Starting Messaging Service on port 7000 INFO 13:03:36,452 Using saved token 0 INFO 13:03:36,453 Enqueuing flush of Memtable-LocationInfo@59584763(38/47 serialized/live bytes, 2 ops) INFO 13:03:36,454 Writing Memtable-LocationInfo@59584763(38/47 serialized/live bytes, 2 ops) INFO 13:03:36,556 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-18-Data.db (148 bytes) INFO 13:03:36,558 Node /10.1.0.23 state jump to normal INFO 13:03:36,558 Bootstrap/Replace/Move completed! Now serving reads. INFO 13:03:36,559 Will not load MX4J, mx4j-tools.jar is not in the classpath INFO 13:03:36,587 Binding thrift service to /10.1.0.23:9160 INFO 13:03:36,590 Using TFastFramedTransport with a max frame size of 15728640 bytes. INFO 13:03:36,593 Using synchronous/threadpool thrift server on /10.1.0.23 : 9160 INFO 13:03:36,593 Listening for thrift clients...

Despite the fatal errors, it does seem to have restarted the cluster with the new Partition engine:

Address DC Rack Status State Load Owns Token 7169015515630842424558524306038950250903273734 10.1.0.27 datacenter1 rack1 Down Normal ? 93.84% -2742379978670691477635174047251157095949195165 10.1.0.23 datacenter1 rack1 Up Normal 15.79 KB 86.37% 0 10.1.0.26 datacenter1 rack1 Down Normal ? 77.79% 896682280808232140910919391534960240163386913 10.1.0.24 datacenter1 rack1 Up Normal 15.79 KB 53.08% 1927726543429020693034590137790785169819652674 10.1.0.25 datacenter1 rack1 Up Normal 15.79 KB 35.85% 6138493926725652010223830601932265434881918085 10.1.0.28 datacenter1 rack1 Down Normal ? 53.08% 716901551563084242455852430603895025090327373

Starting up the other three nodes (example:)

INFO 14:10:06,663 Node /10.1.0.25 has restarted, now UP INFO 14:10:06,663 InetAddress /10.1.0.25 is now UP INFO 14:10:06,664 Node /10.1.0.25 state jump to normal INFO 14:10:06,664 Node /10.1.0.24 has restarted, now UP INFO 14:10:06,665 InetAddress /10.1.0.24 is now UP INFO 14:10:06,665 Node /10.1.0.24 state jump to normal INFO 14:10:06,666 Node /10.1.0.23 has restarted, now UP INFO 14:10:06,667 InetAddress /10.1.0.23 is now UP INFO 14:10:06,668 Node /10.1.0.23 state jump to normal INFO 14:10:06,760 Completed flushing /home/hpcass/data/node01/system/LocationInfo-hc-18-Data.db (166 bytes) INFO 14:10:06,762 Node /10.1.0.26 state jump to normal INFO 14:10:06,763 Bootstrap/Replace/Move completed! Now serving reads. INFO 14:10:06,764 Will not load MX4J, mx4j-tools.jar is not in the classpath INFO 14:10:06,862 Binding thrift service to /10.1.0.26:9160

Re-checking the ring displays:

Address DC Rack Status State Load Owns Token 7169015515630842424558524306038950250903273734 10.1.0.27 datacenter1 rack1 Up Normal 11.37 KB 93.84% -2742379978670691477635174047251157095949195165 10.1.0.23 datacenter1 rack1 Up Normal 15.79 KB 86.37% 0 10.1.0.26 datacenter1 rack1 Up Normal 18.38 KB 77.79% 896682280808232140910919391534960240163386913 10.1.0.24 datacenter1 rack1 Up Normal 15.79 KB 53.08% 1927726543429020693034590137790785169819652674 10.1.0.25 datacenter1 rack1 Up Normal 15.79 KB 35.85% 6138493926725652010223830601932265434881918085 10.1.0.28 datacenter1 rack1 Up Normal 15.79 KB 53.08% 7169015515630842424558524306038950250903273734

Switching partition engine appears to be easy enough. What I suspect however (and I’ve not confirmed this, is that the data would have been compromised or likely destroyed in this process. The documentation I’ve read so far indicated that you could not do this. Once setup with a specific partitioning engine that cluster was bound to it.

My conclusion is that if you have not yet started to saturate your cluster with data, and you wish to change the partitioning engine, it would appear that the right time to do it is now.. before you start to load data.

I plan to test this theory later after the first trial data load to see if in fact it mangles the information. More to follow!

UPDATE!

Despite the information that I thought nodetool was telling me, my cluster was unusable because of the partitioner change. What is the last step required to change partition? NUKE THE DATA. Unfun.. but that is what I need to do.

Having 6 nodes means 6 times the fun. Here is the kicker though, I’ll just move the data aside and re-construct, and that will let me swap it back in if I decided to go back and forth testing the impacts of Random vs. Ordered for my needs. Will I get away with this? I don’t know. That won’t stop me from trying!

The data was stored in ~/data/node00 (node## etc.). This is all I did:

mv data/node00 data/node00-bop # bop = btye order partition.

Restarted node00:

hpcass:~/nodes$ node00/bin/cassandra -f INFO 16:38:46,525 Logging initialized INFO 16:38:46,529 JVM vendor/version: OpenJDK 64-Bit Server VM/1.6.0_0 INFO 16:38:46,529 Heap size: 6291456000/6291456000 INFO 16:38:46,529 Classpath: node00/bin/../conf:node00/bin/../build/classes/main:node00/bin/../build/classes/thrift:node00/bin/../lib/antlr-3.2.jar:node00/bin/../lib/apache-cassandra-1.0.8.jar:node00/bin/../lib/apache-cassandra-clientutil-1.0.8.jar:node00/bin/../lib/apache-cassandra-thrift-1.0.8.jar:node00/bin/../lib/avro-1.4.0-fixes.jar:node00/bin/../lib/avro-1.4.0-sources-fixes.jar:node00/bin/../lib/commons-cli-1.1.jar:node00/bin/../lib/commons-codec-1.2.jar:node00/bin/../lib/commons-lang-2.4.jar:node00/bin/../lib/compress-lzf-0.8.4.jar:node00/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:node00/bin/../lib/guava-r08.jar:node00/bin/../lib/high-scale-lib-1.1.2.jar:node00/bin/../lib/jackson-core-asl-1.4.0.jar:node00/bin/../lib/jackson-mapper-asl-1.4.0.jar:node00/bin/../lib/jamm-0.2.5.jar:node00/bin/../lib/jline-0.9.94.jar:node00/bin/../lib/json-simple-1.1.jar:node00/bin/../lib/libthrift-0.6.jar:node00/bin/../lib/log4j-1.2.16.jar:node00/bin/../lib/servlet-api-2.5-20081211.jar:node00/bin/../lib/slf4j-api-1.6.1.jar:node00/bin/../lib/slf4j-log4j12-1.6.1.jar:node00/bin/../lib/snakeyaml-1.6.jar:node00/bin/../lib/snappy-java-1.0.4.1.jar INFO 16:38:46,531 JNA not found. Native methods will be disabled. INFO 16:38:46,538 Loading settings from file:/home/hpcass/nodes/node00/conf/cassandra.yaml INFO 16:38:46,635 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 16:38:46,645 Global memtable threshold is enabled at 2000MB INFO 16:38:46,839 Creating new commitlog segment /home/hpcass/commitlog/node00/CommitLog-1331599126839.log INFO 16:38:46,848 Couldn't detect any schema definitions in local storage. INFO 16:38:46,849 Found table data in data directories. Consider using the CLI to define your schema. INFO 16:38:46,863 Replaying /home/hpcass/commitlog/node00/CommitLog-1331597615041.log INFO 16:38:46,887 Finished reading /home/hpcass/commitlog/node00/CommitLog-1331597615041.log INFO 16:38:46,892 Enqueuing flush of Memtable-LocationInfo@1834491520(98/122 serialized/live bytes, 4 ops) INFO 16:38:46,893 Enqueuing flush of Memtable-Versions@875509103(83/103 serialized/live bytes, 3 ops) INFO 16:38:46,894 Writing Memtable-LocationInfo@1834491520(98/122 serialized/live bytes, 4 ops) INFO 16:38:47,001 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-1-Data.db (208 bytes) INFO 16:38:47,009 Writing Memtable-Versions@875509103(83/103 serialized/live bytes, 3 ops) INFO 16:38:47,057 Completed flushing /home/hpcass/data/node00/system/Versions-hc-1-Data.db (247 bytes) INFO 16:38:47,057 Log replay complete, 6 replayed mutations INFO 16:38:47,066 Cassandra version: 1.0.8 INFO 16:38:47,066 Thrift API version: 19.20.0 INFO 16:38:47,067 Loading persisted ring state INFO 16:38:47,070 Starting up server gossip INFO 16:38:47,091 Enqueuing flush of Memtable-LocationInfo@952443392(88/110 serialized/live bytes, 2 ops) INFO 16:38:47,092 Writing Memtable-LocationInfo@952443392(88/110 serialized/live bytes, 2 ops) INFO 16:38:47,141 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-2-Data.db (196 bytes) INFO 16:38:47,149 Starting Messaging Service on port 7000 INFO 16:38:47,155 Using saved token 0 INFO 16:38:47,157 Enqueuing flush of Memtable-LocationInfo@1623810826(38/47 serialized/live bytes, 2 ops) INFO 16:38:47,157 Writing Memtable-LocationInfo@1623810826(38/47 serialized/live bytes, 2 ops) INFO 16:38:47,237 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-3-Data.db (148 bytes) INFO 16:38:47,239 Node /10.1.0.23 state jump to normal INFO 16:38:47,240 Bootstrap/Replace/Move completed! Now serving reads. INFO 16:38:47,241 Will not load MX4J, mx4j-tools.jar is not in the classpath INFO 16:38:47,269 Binding thrift service to /10.1.0.23:9160 INFO 16:38:47,272 Using TFastFramedTransport with a max frame size of 15728640 bytes. INFO 16:38:47,274 Using synchronous/threadpool thrift server on /10.1.0.23 : 9160 INFO 16:38:47,275 Listening for thrift clients... ^Z [1]+ Stopped node00/bin/cassandra -f hpcass:~/nodes$ bg [1]+ node00/bin/cassandra -f &

With the process backgrounded, checked the files in the new data directory for my node:

hpcass:~/data/node00$ ls -1 system LocationInfo-hc-1-Data.db LocationInfo-hc-1-Digest.sha1 LocationInfo-hc-1-Filter.db LocationInfo-hc-1-Index.db LocationInfo-hc-1-Statistics.db LocationInfo-hc-2-Data.db LocationInfo-hc-2-Digest.sha1 LocationInfo-hc-2-Filter.db LocationInfo-hc-2-Index.db LocationInfo-hc-2-Statistics.db LocationInfo-hc-3-Data.db LocationInfo-hc-3-Digest.sha1 LocationInfo-hc-3-Filter.db LocationInfo-hc-3-Index.db LocationInfo-hc-3-Statistics.db Versions-hc-1-Data.db Versions-hc-1-Digest.sha1 Versions-hc-1-Filter.db Versions-hc-1-Index.db Versions-hc-1-Statistics.db

Following that clearing and rebuild, I see the node tool results look a lot better:

hpcass@feed0:~/nodes$ cass00/bin/nodetool -h localhost ring Address DC Rack Status State Load Owns Token 6138493926725652010223830601932265434881918085 10.1.0.23 datacenter1 rack1 Up Normal 15.68 KB 33.29% 0 10.1.0.24 datacenter1 rack1 Up Normal 18.34 KB 30.87% 1927726543429020693034590137790785169819652674 10.1.0.25 datacenter1 rack1 Up Normal 18.34 KB 35.85% 6138493926725652010223830601932265434881918085

After resetting the old numerated nodes, I had a complete disaster! Negative node tokens? How did that happen? Restarts did nothing to fix this either.

Address DC Rack Status State Load Owns Token 7169015515630842424558524306038950250903273734 10.1.0.27 datacenter1 rack1 Up Normal 15.79 KB 93.84% -2742379978670691477635174047251157095949195165 10.1.0.23 datacenter1 rack1 Up Normal 15.79 KB 86.37% 0 10.1.0.26 datacenter1 rack1 Up Normal 15.79 KB 77.79% 896682280808232140910919391534960240163386913 10.1.0.24 datacenter1 rack1 Up Normal 15.79 KB 53.08% 1927726543429020693034590137790785169819652674 10.1.0.25 datacenter1 rack1 Up Normal 15.79 KB 35.85% 6138493926725652010223830601932265434881918085 10.1.0.28 datacenter1 rack1 Up Normal 15.79 KB 53.08% 7169015515630842424558524306038950250903273734

To resolve this, I simply re-ran my token generator to get a new set of tokens:

node00 10.1.0.23 token: 0 node01 10.1.0.26 token: 28356863910078205288614550619314017621 node02 10.1.0.24 token: 56713727820156410577229101238628035242 node03 10.1.0.27 token: 85070591730234615865843651857942052863 node04 10.1.0.25 token: 113427455640312821154458202477256070485 node05 10.1.0.28 token: 141784319550391026443072753096570088106

Followed by manually setting the tokens in the ring:

bin/nodetool -h 10.1.0.24 move 56713727820156410577229101238628035242 bin/nodetool -h 10.1.0.25 move 113427455640312821154458202477256070485 bin/nodetool -h 10.1.0.26 move 28356863910078205288614550619314017621 bin/nodetool -h 10.1.0.27 move 85070591730234615865843651857942052863 bin/nodetool -h 10.1.0.28 move 141784319550391026443072753096570088106

This.. gave me the results I was expecting!

Address DC Rack Status State Load Owns Token 141784319550391026443072753096570088106 10.1.0.23 datacenter1 rack1 Up Normal 24.95 KB 16.67% 0 10.1.0.26 datacenter1 rack1 Up Normal 20.72 KB 16.67% 28356863910078205288614550619314017621 10.1.0.24 datacenter1 rack1 Up Normal 25.1 KB 16.67% 56713727820156410577229101238628035242 10.1.0.27 datacenter1 rack1 Up Normal 13.38 KB 16.67% 85070591730234615865843651857942052863 10.1.0.25 datacenter1 rack1 Up Normal 25.1 KB 16.67% 113427455640312821154458202477256070485 10.1.0.28 datacenter1 rack1 Up Normal 25.14 KB 16.67% 141784319550391026443072753096570088106

Now, the question of actually connecting to the cluster can be answered. Pick one of the nodes and ports to connect too. I picked node00 on .23 (cli defaulted to port 9160 so I didn’t have to specify that):

node00/bin/cassandra-cli -h 10.1.0.23 Connected to: "test-ip" on 10.1.0.23/9160 Welcome to Cassandra CLI version 1.0.8 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.

The big problem I had, was that the cli never did seem to respond. The trick is to end your command with a semi-colon. That might seem obvious to you, and generally obvious to me.. but I’d not seen the docs actually call out that little FACT.

[default@unknown] show cluster name; test-ip

Created a test column family from the helpful Cassandra Wiki.

create keyspace Twissandra; Keyspace names must be case-insensitively unique ("Twissandra" conflicts with "Twissandra") [default@unknown] [default@unknown] [default@unknown] create column family User with comparator = UTF8Type; Not authenticated to a working keyspace. [default@unknown] use Twissandra; Authenticated to keyspace: Twissandra [default@Twissandra] create column family User with comparator = UTF8Type; adf453a0-6cb0-11e1-0000-13393ec611bd Waiting for schema agreement... ... schemas agree across the cluster [default@Twissandra]

AND WE’RE OFF!! Next article will cover actually finishing up this last test and then adding real data. MORE TO COME!!

NEXT: Cassandra – A Use case examined (IP data)

apache big data cassandra data database db open source software

Posts navigation

← Previous 1 … 15 16 17 … 47 Next →

Search
Search for:

Categories

3D Printing

Amateur Radio

automotive

Aviation

Beef Recipes

books

Cafe Racer Project

Chicken Recipes

Cuisine

Cyber Crime

Data

Data Visualization

Economy

Entrepreneurship

Enviromentalism

Fish Recipes

Formula 1

Forza 3

Golf

Health

Helicopters

Hot Sauces

Human Canvas Project

Literature

Lola the Dog

Military

Model Rocketry

Motorcycles

Movie Review

photography

Politics

racing

Radio Control

Raspberry Pi

RC Fixed Wing

RC Trucks

Ride Reports

Santa Cruz

Software Development

Sports

Stratux

Technology

Texas

Track Bike Project

travel

Trucking

Uncategorized

Video Gaming

YouTube

Tag Cloud

aircraft

apache

apple

Aviation

big data

bike

bremerton

cafe racer

california

casperjs

cassandra

database

db

ducati

exi-450

gearman

heli

helicopter

heli r/c heli

iphone

kawasaki

kz400

mercedes-benz

motorcycle

Motorcycles

NAVY

OSX

photo

photography

programming

r/c

racing

Radio Control

river

santa cruz

SC10

software

spektrum r/c

sprinter

stratux

Technology

travel

truck

video

weather

Racing, Photography, Software and Politics.

Recent Posts

Qidi 4Plus WiFi Solution – TP-Link AC600

Testing Qidi 4Plus Wireless Performance

Reno Air Races 2023 – Airside photos

Castle Air Museum – California

USS Lexington – Corpus Cristi Texas

STRATUX ADS-B Receiver – open ports inventory

Some Favorite Moments in Pictures Years Past

Archives

May 2026 (2)

September 2023 (2)

October 2021 (1)

March 2021 (2)

February 2021 (1)

October 2020 (1)

September 2020 (1)

August 2020 (1)

May 2020 (3)

October 2019 (3)

July 2019 (1)

December 2018 (3)

October 2018 (1)

September 2018 (2)

July 2018 (1)

June 2018 (3)

November 2017 (2)

October 2017 (2)

September 2017 (1)

June 2017 (1)

May 2017 (1)

April 2017 (6)

February 2017 (1)

October 2016 (5)

September 2016 (1)

August 2016 (1)

July 2016 (1)

June 2016 (2)

February 2016 (7)

January 2016 (1)

December 2015 (6)

October 2015 (5)

September 2015 (4)

August 2015 (2)

July 2015 (3)

May 2015 (2)

April 2015 (2)

March 2015 (1)

January 2015 (15)

December 2014 (4)

November 2014 (3)

October 2014 (2)

September 2014 (6)

July 2014 (1)

May 2014 (1)

April 2014 (2)

March 2014 (2)

November 2013 (1)

August 2013 (1)

July 2013 (3)

June 2013 (9)

February 2013 (1)

January 2013 (1)

December 2012 (2)

November 2012 (2)

October 2012 (1)

September 2012 (2)

August 2012 (4)

June 2012 (4)

April 2012 (1)

March 2012 (9)

February 2012 (6)

January 2012 (2)

December 2011 (4)

November 2011 (1)

October 2011 (7)

September 2011 (3)

August 2011 (2)

July 2011 (3)

June 2011 (2)

May 2011 (2)

April 2011 (4)

March 2011 (3)

February 2011 (10)

January 2011 (6)

December 2010 (4)

November 2010 (7)

October 2010 (11)

September 2010 (8)

August 2010 (6)

July 2010 (11)

June 2010 (24)

May 2010 (20)

April 2010 (16)

March 2010 (11)

February 2010 (6)

January 2010 (4)

December 2009 (10)

November 2009 (4)

October 2009 (4)

September 2009 (14)

August 2009 (16)

July 2009 (23)

June 2009 (13)

May 2009 (11)

April 2009 (14)

March 2009 (10)