Cassandra – Getting Started – (deployment Part 1 – Installing Cassandra)

It’s been almost a month since I started the Apache Cassandra investigation, and now it’s time to move into a production stance. Some of these steps will differ from the original steps documented here in my blog. Later this week I will go back and amend those posts to point at this post as the more recent information. Those old links are already being referenced by multiple sites, so deleting them would not be a kind thing to do. Thus.. onward we move!

Getting the right JVM/JDK/JRE

Originally, the OpenJDK was being used for this introduction and research into Cassandra. Being a proponent of Open Source, I was going to avoid the use of Oracle’s potentially proprietary JDK/JRE in this environment. I have since seen first had, that the JDK DOES IN FACT MATTER, and the one that supports the latest tools is the one from Oracle.

That is located here:

Downloading the JRE/JDK from Oracle has enabled the reliable use of DataStax’s OpsCenter management tool (more on that later).

These are the recommended minimums for Cassandra and OpsCenter from DataStax, a respected partner of the Apache Cassandra project.

Sun Java Runtime Environment 1.6.0_19 or later
Python 2.5, 2.6, or 2.7
OpenSSL version listed in Configuring SSL unless you disable SSL

I ended up selecting the JDK (linked here) and deposited it in the following location on my system as user root (create the directory path if you don’t already have it):

/opt/java/64/jdk-7u3-linux-x64.tar.gz

Extract the file:

:/opt/java/64# tar -xvzf jdk-7u3-linux-x64.tar.gz
jdk1.7.0_03/
jdk1.7.0_03/include/
jdk1.7.0_03/include/jvmti.h
jdk1.7.0_03/include/jawt.h
[...]
jdk1.7.0_03/jre/plugin/desktop/sun_java.desktop
jdk1.7.0_03/jre/COPYRIGHT
jdk1.7.0_03/LICENSE
jdk1.7.0_03/COPYRIGHT
:/opt/java/64# 

The Cassandra Build I decided to use is this one: apache-cassandra-1.1.0-beta1. I downloaded the file to the user I created for this using wget:

:~$ wget http://apache.deathculture.net/cassandra/1.1.0/apache-cassandra-1.1.0-beta1-bin.tar.gz
--2012-03-25 22:52:27--  http://apache.deathculture.net/cassandra/1.1.0/apache-cassandra-1.1.0-beta1-bin.tar.gz
Resolving apache.deathculture.net... 173.236.158.254
Connecting to apache.deathculture.net|173.236.158.254|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12505037 (12M) [application/x-gzip]
Saving to: `apache-cassandra-1.1.0-beta1-bin.tar.gz'

100%[=======================================================================>] 12,505,037  8.84M/s   in 1.3s    

2012-03-25 22:52:29 (8.84 MB/s) - `apache-cassandra-1.1.0-beta1-bin.tar.gz' saved [12505037/12505037]

Next the file is extracted, moved to a shorter directory name:

:~$ tar -xvzf apache-cassandra-1.1.0-beta1-bin.tar.gz
:~$ mv apache-cassandra-1.1.0-beta1 cass-beta1

Configuring a Node

Now the configuration is edited to define the node ring. The first file to edit is the cassandra.yaml file.

This initially will be only a 2 node cluster, but the tokens must still be calculated. Here are the node tokens I generated using a PERL script I wrote (see: Cassandra and Big Data – building a single-node “cluster” – Extra Credit for the code):

:~/cass-beta1$ ./token.pl 2
Calculate tokens for 2 nodes
factor = 170141183460469231731687303715884105728
node 0	token: 0
node 1	token: 85070591730234615865843651857942052864
:~/cass-beta1$ 

Edit the cluster name. I’m not testing, so I changed the name to one descriptive of the data I was storing. ‘ip’. In the example below, I’m showing configs for the 2nd of the two nodes. Note: The first node would have a different IP address and also a different initial token, in this case ‘0’, as calculated by the tool.

:~$ cd cass-beta1/
:~/cass-beta1$ vi conf/cassandra.yaml

[...]

# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'ip'

[...]

 If blank, Cassandra will request a token bisecting the range of
# the heaviest-loaded existing node.  If there is no load information
# available, such as is the case with a new cluster, it will pick
# a random token, which will lead to hot spots.
initial_token: 85070591730234615865843651857942052864

[...]

# directories where Cassandra should store data on disk.
data_file_directories:
    - /home/bigdata/data/

[...]

# commit log
commitlog_directory: /home/bigdata/commitlog/

[...]

# saved caches
saved_caches_directory: /home/bigdata/saved_caches/

[...]

          # seeds is actually a comma-delimited list of addresses.
          # Ex: ",,"
          - seeds: "10.1.100.101,10.1.100.102"
[...]

# Setting this to 0.0.0.0 is always wrong.
listen_address: 10.1.1.101

[...]

rpc_address: 10.1.1.101

[...]

# Time to wait for a reply from other nodes before failing the command (this was done to increase timeout to 30 seconds, sometimes the search I need to run is pretty nasty)
rpc_timeout_in_ms: 30000

Following that, the shell file needs to be modified to designate the JMX listening port:

:~/cass-beta1$ vi conf/cassandra-env.sh

[...]

# Specifies the default port over which Cassandra will be available for
# JMX connections.
JMX_PORT="8001"

[...]

Make sure your logfile is in the desired location. I decided to keep it within the account itself for now:

vi cassA-1.0.8/conf/log4j-server.properties
[...]

log4j.appender.R.File=/home/bigdata/log/cassA.log

[...]

Next I set the paths in the .bash configuration file for the account, using the following 3 environment variables (ANT_HOME is used by the ANT compiler, if you are not writing code, your JAVA_HOME will point at the JRE, not the JDK, and you won’t need the ANT_HOME path at all):

vi ~/.bash_profile
export JAVA_HOME=/opt/java/64/jdk1.7.0_03
export ANT_HOME=/usr/lib/ant/
export CASS_BIN=$HOME/cass-beta1/bin
export PATH=$PATH:$ANT_HOME/bin:$CASS_BIN

Systems Administration

Make sure there is a location for the cassandra server to write it’s log files. You’ll need your SysAdmin, or root privs, to do this. I set the ownership to root and the user under which I’m currently running cassandra (bigdata):

root:/data/feed/indata# cd /var/log
root:/var/log# mkdir cassandra
root:/var/log# chown root:bigdata cassandra
root:/var/log# chmod 775 cassandra

The following ports need to be opened up, if you are running a firewall on each system (you ARE, right!?!), to allow Cassandra nodes to communicate with each other. This is a snippet from my rules-based firewall control file.


Port Usage:

  • 9160 – Thrift port, where the API is serviced for Reads/Writes to Cassandra
  • 8001 – Individual node listening port. This is used for the command line (cli)
  • 7000 – Commands and Data TCP port, used nodes for communications
  • 7001 – SSL port used for storage communications
  • 8888 – Only used on systems that will host an Ops Center installation
  • 61620 – Required for Ops Center Agent Communications

## Cassandra
ACCEPT          loc             $FW             tcp     9160,8001,7000,7001
## OpsCenter
ACCEPT          loc             $FW             tcp     8888,61620


Starting up the Cluster

This is where the truth is told. The rubber meets the road. The money is placed where your mouth is. Light ’em up!

:~$ cassandra
:~$  INFO 23:52:54,232 Logging initialized
 INFO 23:52:54,236 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_03
 INFO 23:52:54,237 Heap size: 6291456000/6291456000
[...]
INFO 23:52:55,162 Node /10.1.0.23 state jump to normal
 INFO 23:52:55,163 Bootstrap/Replace/Move completed! Now serving reads.

IT LIVES!! Now start your other node(s), and check to verify you have a complete ring, properly configured. You should see something like this in subsequent nodes, I’ve highlighted the references to the other member node:

[...]
INFO 23:54:16,042 Node /10.1.0.23 has restarted, now UP
 INFO 23:54:16,043 InetAddress /10.1.0.23 is now UP
 INFO 23:54:16,043 Node /10.1.0.23 state jump to normal
 INFO 23:54:16,088 Compacted to [/home/bigdata/data/system/LocationInfo/system-LocationInfo-hc-6-Data.db,].  544 to 413 (~75% of original) bytes for 4 keys at 0.003425MB/s.  Time: 115ms.
 INFO 23:54:16,109 Completed flushing /home/bigdata/data/system/LocationInfo/system-LocationInfo-hc-5-Data.db (163 bytes)
 INFO 23:54:16,110 Node /10.1.0.26 state jump to normal
 INFO 23:54:16,111 Bootstrap/Replace/Move completed! Now serving reads.

Run nodetool:

:~$ nodetool -h10.1.0.23 -p 8001 ring
Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               85070591730234615865843651857942052864      
10.1.0.23      datacenter1 rack1       Up     Normal  17.77 KB        50.00%  0                                           
10.1.0.26      datacenter1 rack1       Up     Normal  17.66 KB        50.00%  85070591730234615865843651857942052864      
 

WE HAVE A RING!

NEXT: SETTING UP OPS CENTER

2 thoughts on “Cassandra – Getting Started – (deployment Part 1 – Installing Cassandra)”

  1. i am trying to open cassandra on ssl mode
    but after starting the ssl service ”
    INFO 08:25:11,611 Cassandra version: 2.0.4
    INFO 08:25:11,611 Thrift API version: 19.39.0
    INFO 08:25:11,615 CQL supported versions: 2.0.0,3.1.3 (default: 3.1.3)
    INFO 08:25:11,632 Loading persisted ring state
    INFO 08:25:11,658 Starting up server gossip
    INFO 08:25:11,666 Enqueuing flush of Memtable-local@1693888143(286/2860 serialized/live bytes, 10 ops)
    INFO 08:25:11,667 Writing Memtable-local@1693888143(286/2860 serialized/live bytes, 10 ops)
    INFO 08:25:11,677 Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-171-Data.db (261 bytes) for commitlog position ReplayPosition(segmentId=1391667910363, position=179425)
    INFO 08:25:11,958 Starting Encrypted Messaging Service on SSL port 7001
    ” it jump back to normal mode

    INFO 08:25:12,016 Enqueuing flush of Memtable-local@515755162(84/840 serialized/live bytes, 4 ops)
    INFO 08:25:12,017 Writing Memtable-local@515755162(84/840 serialized/live bytes, 4 ops)
    INFO 08:25:12,040 Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-172-Data.db (116 bytes) for commitlog position ReplayPosition(segmentId=1391667910363, position=179684)
    INFO 08:25:12,041 Compacting [SSTableReader(path=’/var/lib/cassandra/data/system/local/system-local-jb-170-Data.db’), SSTableReader(path=’/var/lib/cassandra/data/system/local/system-local-jb-171-Data.db’), SSTableReader(path=’/var/lib/cassandra/data/system/local/system-local-jb-172-Data.db’), SSTableReader(path=’/var/lib/cassandra/data/system/local/system-local-jb-169-Data.db’)]
    INFO 08:25:12,073 Compacted 4 sstables to [/var/lib/cassandra/data/system/local/system-local-jb-173,]. 6,223 bytes to 5,681 (~91% of original) in 28ms = 0.193494MB/s. 4 total partitions merged to 1. Partition merge counts were {4:1, }
    INFO 08:25:12,078 Enqueuing flush of Memtable-local@1005594181(10092/100920 serialized/live bytes, 257 ops)
    INFO 08:25:12,080 Writing Memtable-local@1005594181(10092/100920 serialized/live bytes, 257 ops)
    INFO 08:25:12,100 Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-174-Data.db (5266 bytes) for commitlog position ReplayPosition(segmentId=1391667910363, position=192102)
    INFO 08:25:12,176 Node /5.5.5.18 state jump to normal
    INFO 08:25:12,197 Startup completed! Now serving reads.
    INFO 08:25:12,343 Starting listening for CQL clients on /0.0.0.0:9042…
    INFO 08:25:12,419 Using TFramedTransport with a max frame size of 15728640 bytes.
    INFO 08:25:12,421 Binding thrift service to /0.0.0.0:9589
    INFO 08:25:17,596 Using synchronous/threadpool thrift server on 0.0.0.0 : 9589
    INFO 08:25:17,597 Listening for thrift clients…

    do you have any idea how to fix it or whom can i talk to?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.