Tag Archives: data

Stratux – repartition to use entire sd card

The issue..

It’s been an ongoing tail of woe, when it comes to installing a new version of STRATUX on a new card; out of the box, at least in all the installs I’ve done, it fails to make use of the entire card. With any sort of logging enabled to measure performance, run tests or just keep even a short history of contacts received.. you’re out of disc space… FAST:

Here is the ‘df’ dump of a STRATUX fixed position station running the latest version (released September 2019). As you can see, it’s only allocated 2GB of the 32GB card’s total available space. That is a lot of wasted space on the card.

Filesystem      Size  Used Avail Use% Mounted on
/dev/root       1.8G  1.6G  110M  94% /
/dev/mmcblk0p1   60M   20M   41M  34% /boot

Get total device size:

fdisk -l | grep Disk

Disk /dev/ram0: 4 MiB, 4194304 bytes, 8192 sectors
...
Disk /dev/mmcblk0: 29.7 GiB, 31914983424 bytes, 62333952 sectors

Disklabel type: dos
Disk identifier: 0xe6a544c8

Repartitioning the Device

With the physical partition located.. start fdisk:
fdisk -u /dev/mmcblk0

I like to increase the size of the main partition to 6G to leave room for installing more system updates and tools.

To do this you will need to know the starting and ending blocks of the partition. That is available with the ‘print’ command:

Command (m for help): p

Results:

Disk /dev/mmcblk0: 29.7 GiB, 31914983424 bytes, 62333952 sectors
Units: sectors of 1 * 512 = 512 bytes
Disk identifier: 0xe6a544c8

Device         Boot  Start     End Sectors  Size Id Type
/dev/mmcblk0p1        8192  131071  122880   60M  c W95 FAT32 (LBA)
/dev/mmcblk0p2      131072 3887103 3756032  1.8G 83 Linux

To increase the size, the partition must first be deleted, then re-create at the exact same starting block, or the filesystem will become corrupted.

First, delete the partition with the ‘d’ command, selecting partition #2.

Next, re-create the partition with the same starting block, but now with increased filesystem size:

Command (m for help): d 

  Partition number (1,2, default 2): 2

  Partition 2 has been deleted.

Command (m for help): p
 
  Device         Boot Start    End Sectors Size Id Type
  /dev/mmcblk0p1       8192 131071  122880  60M  c W95 FAT32 (LBA)


Command (m for help): n

  Partition type
     p   primary (1 primary, 0 extended, 3 free)
     e   extended (container for logical partitions)
  Select (default p): p
  Partition number (2-4, default 2): 2

First sector (2048-62333951, default 2048): 131072

Last sector, +sectors or +size{K,M,G,T,P} (131072-62333951, default 62333951): +6G

  Created a new partition 2 of type 'Linux' and of size 6 GiB.

Command (m for help): w
  
  The partition table has been altered.
  Calling ioctl() to re-read partition table.
  Re-reading the partition table failed.: Device or resource busy

  The kernel still uses the old table. The new table will be used
  at the next reboot or after you run partprobe(8) or kpartx(8).

The partition table will have been modified but the kernel will not be able to take that into account as some partitions are mounted.

In theory, the command ‘partx /dev/mmcblk0’ is all that is required.. however I’ve found that rebooting is the only way to really reload the partition, so that the filespace can be increased.

reboot

Once system comes back up, run ‘resize2fs‘ to expand the filesystem.

Fill up drive with current filesystem

Execute `resize2fs` and run an on-line expansion of the filesystem, and finally verify it again with ‘df -h’

resize2fs /dev/mmcblk0p2

  resize2fs 1.43.3 (04-Sep-2016)
  Filesystem at /dev/mmcblk0p2 is mounted on /; on-line resizing
  required
  old_desc_blocks = 1, new_desc_blocks = 1
  The filesystem on /dev/mmcblk0p2 is now 1572864 (4k) blocks long.

Running df shows that it has in fact resized. STEP 1 COMPLETED

root@STRATUX-FIXED:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       5.9G  1.6G  4.2G  28% /
devtmpfs        459M     0  459M   0% /dev
tmpfs           463M     0  463M   0% /dev/shm
tmpfs           463M  6.2M  457M   2% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           463M     0  463M   0% /sys/fs/cgroup
/dev/mmcblk0p1   60M   20M   41M  34% /boot

Inserting and Reading data from a Cassandra Cluster

Rubber meeting the road. Time to insert some column families, then some data and finally pull it back off the stack.

First off, the keyspace was already defined, so I’m going to simply list it’s structure:

[default@unknown] describe ip_store;

Keyspace: ip_store:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:2]

With a keyspace ready for some column families, those are created next. Here I’m establishing that there will be 4 families in this single keyspace. This is contrary to suggestions in the High Performance Cassandra Handbook, but follows all other documentation I’ve seen. Considering that this is NOT a production implementation, I’m going to go with a more conventional strategy of organizing related data in the same keyspace.

The first action is to assume the desired keyspace, then add the desired column families:

[default@unknown] use ip_store;
Authenticated to keyspace: ip_store

[default@ip_store] create column family warehouse with comparator = UTF8Type;
595945d0-71ce-11e1-0000-13393ec611bf
Waiting for schema agreement...
... schemas agree across the cluster

[default@ip_store] create column family hourly with comparator = UTF8Type;
65ea2170-71ce-11e1-0000-13393ec611bf
Waiting for schema agreement...
... schemas agree across the cluster

[default@ip_store] create column family daily with comparator = UTF8Type; 
6aaeae60-71ce-11e1-0000-13393ec611bf
Waiting for schema agreement...
... schemas agree across the cluster

[default@ip_store] create column family 30day with comparator = UTF8Type;
7b85bf30-71ce-11e1-0000-13393ec611bf
Waiting for schema agreement...
... schemas agree across the cluster

OK, a basic schema has been established. Now.. to load the data. I’ll post the relevant sections of the loader code at a later date. At this point you only need to consider that the loader DOES work and it’s loading data. We’ll look at the extraction of the data following loading a very small set.

time host=10.1.0.23 port=9160 ks=ip_store cf=warehouse ttl=0 datafile=5.ips ant -DclassToRun=loader.bulkIpLoader run
Buildfile: cBuild/build.xml

init:

compile:
    [javac] Compiling 1 source file to cBuild/build/classes

dist:
      [jar] Building jar: cBuild/dist/lib/cass.jar

run:
     [java] ks       ip_store
     [java] cf       warehouse
     [java] ttl      0
     [java] datafile 5.ips

BUILD SUCCESSFUL
Total time: 1 second

Of the set, there are three unique IPs and 2 are duplicates of other data (IMPORTANT NOTE: The IP’s have been changed to protect the innocent and clueless):

2016468288	1011	suspicious	2012-03-13 18:40:01
2016468288	1011	suspicious	2012-03-13 18:40:02
3149138705	1011	suspicious	2012-03-13 18:40:00
3149138705	1011	suspicious	2012-03-13 18:40:01
2179293112	1011	suspicious	2012-03-13 18:39:59

Having loaded these, I re-launch the command line interface, authenticate to the desired keyspace, and then a VERY important command to set an assumption about how we’re going to reference the keys. If you get a strange error like this “cannot parse ‘187.180.11.17’ as hex bytes“, that means you likely forgot to issue the assumes command. Commands I issued are in bold.

cass
Connected to: "ak-ip" on 10.1.0.23/9160
Welcome to Cassandra CLI version 1.0.8

[default@unknown] use ip_store

[default@ip_store] assume warehouse keys as utf8;   
Assumption for column family 'warehouse' added successfully.

[default@ip_store]  get warehouse['3149138705'];

=> (column=2012-03-13 18:40:00, value=7b227265706f72746564223a22323031322d30332d31332031383a34383a3031222c22617474726962757465223a22737573706963696f7573222c2270726f705f6964223a2231303131222c2270726f7065727479223a22426f74204375747761696c222c226465746563746564223a22323031322d30332d31332031383a34303a3030222c226d65746164617461223a22222c226970223a223139302e3137342e3235312e313435227d, timestamp=1332168862658)
=> (column=2012-03-13 18:40:01, value=7b227265706f72746564223a22323031322d30332d31332031383a34383a3031222c22617474726962757465223a22737573706963696f7573222c2270726f705f6964223a2231303131222c2270726f7065727479223a22426f74204375747761696c222c226465746563746564223a22323031322d30332d31332031383a34303a3031222c226d65746164617461223a22222c226970223a223139302e3137342e3235312e313435227d, timestamp=1331689681)
Returned 2 results.
Elapsed time: 39 msec(s).

There we go. A single key row ip_store[‘warehouse’][‘3149138705’] containing to column records, each with a JSON blob within it. Now.. the next step, to set the assumption of utf8 when recalling the records and get output mere mortals such as yourselves can understand.

[default@ip_store] assume warehouse validator as ascii; 
Assumption for column family 'warehouse' added successfully.

[default@ip_store]  t warehouse['3149138705'];  
     
=> (column=2012-03-13 18:40:00, 
  value={
   "reported":"2012-03-13 18:48:01",
   "attribute":"suspicious",
   "prop_id":"1011",
   "detected":"2012-03-13 18:40:00",
   "ip":"187.180.11.17"
  }, timestamp=1331689680)

=> (column=2012-03-13 18:40:01, 
  value={
    "reported":"2012-03-13 18:48:01",
    "attribute":"suspicious",
    "prop_id":"1011",
    "detected":"2012-03-13 18:40:01",
     "ip":"187.180.11.17"
  }, timestamp=1331689681)

Returned 2 results.
Elapsed time: 2 msec(s).

There is it! Data written, data read. Now, it’s up to you to think about how you might use this simple, flexible and powerful storage engine to solve your business needs.

Drop keyspace using Cassandra Cli

Dropping a an entire keyspace using the cassandra-cli is exceptionally simple.

First, access your cluster using the cli. I have an alias in my .bash_profile so I only need to type ‘cass’ to access the clid. In an attempt to be helpful though, I shall show the full command syntax for my environment. Your host and port may vary.

  alias cass='cassandra-cli -h 10.1.0.26'

In this example, I am going to drop the keyspace I was loading with test data in previous posts, ks33.

hpcass: ~$ cass
Connected to: "Test1" on 10.1.0.23/9160
Welcome to Cassandra CLI version 1.0.8

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

DROP keyspace ks33;

07ad5e00-7120-11e1-0000-13393ec611bd
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] 

That’s all there was to it. Keyspace destroyed.

Previous Cassandra related articles


Cassandra – Running some simple tests, including a multi-get strategy.

PREV: Re-Configuring an Empty Cassandra Cluster

Time for the rubber to meet the road. Get some data loaded and validate the theoretical concepts garnered from the documentation consumed.

This is an record example (IP’s have been changed to protect the clueless):

      ip_key: 1598595809
          ip: 10.2.162.225
     prop_id: 1033
    property: Bad Stuff
      threat: 1
   attribute: suspicious
        meta: 10.25.112.7
    detected: 2012-01-05 15:17:14
detected_sec: 1325805434
    reported: 2012-01-06 01:44:02
reported_sec: 1325843042

Preliminary model concept centers around the IP, however with over 60,000,000 records there are overlaps, so a single IP is not going to survive as the primary key. Trying to get a distribution out of MySQL takes some time. Here are some distributions by key. Thousands of of events per IP, and this is just a short 1 month window:

+------------+--------+
| ip_dec     | events |
+------------+--------+
| 3158358206 |   2705 |
|  652542280 |   2506 |
| 3495573656 |   2089 |
| 3232235778 |   2015 |
| 1072721396 |   1528 |
|  652542281 |   1432 |
| 3232235876 |   1427 |
| 3448822506 |   1232 |
| 1280052209 |   1106 |
| 3232235779 |   1086 |
+------------+--------+

Now, Cassandra will support MILLIONS of column items on a single row, thus, this actually might work, and scale without using Super Column Families (SCFs). Using the detected time seconds as the column name with an attribute suffix, then enclosing the data in a JSON blob could provide the required results. Using the datekey as a secondary index across the columns, or using them as a time progression. Concepts that need to be tested, which precisely the task at hand.

Considering that a good detected time is not always available, and the data is processed in batches, there could be a heavy grouping of timestamps. If there are a variety of issues detected on a specific IP, at the same obfuscated time, loss of data will occur. This is certainly NOT the desired result. Given this, the datastamp is not unique enough for a hash structure datastore such as Cassandra, without using SCFs.

A structure such as this could deliver the required granularity:

ipstore[$ipkey][$timekey][$propkey] = JSON:{}, JSON:{}, JSON{}...  ;

To get started with loading data, wrote a quick test program in Java, compliled it and ran it:

test1.java – source code

public class test1 {
  public static void main (String [] args) {
    System.out.println("Cassandra Calling!");
  }
}

compiling….

java/src/loader1$ javac test1.java -d ../../class/.

executing…

/java/class$ java test1
Cassandra Calling!

Environment confirmed for compiling loader code. With a model in mind…

ipstore[$ipkey][$popkey][$timestamp] = JSON:{}

..and IP data to load,

ipp < get_a_million.sql > a_million_ips.dta
cass:~$ ls -l
126180075 2012-03-13 13:06 a_million_ips.dta

cass:~$ wc -l a_million_ips.dta
1000001 a_million_ips.dta

...next it's designing the schema builder and loader.

REFERENCE: Setting up a Java build env to prepare for Cassandra development

With the environment confirmed, and a test file (test1.java) written, execute and verify function:

cass:~$ ant -DclassToRun=test1 run
Buildfile: ./build.xml

[...]

run:
     [java] This is Java.... drink up!

VERIFIED.

To get moving forward, I created a Utilities class and a DB connector Class. You can look at the source code for those at these two links:

Util Source Code

Cassandra DB Connector Source Code

With the code done, need to perform a couple of house keeping tasks to get it prepared for loading.

Adding the ks33 keyspace

[default@unknown] create keyspace ks33:
c7944700-6e2e-11e1-0000-13393ec611bd
Waiting for schema agreement...
... schemas agree across the cluster

[default@unknown] use ks33;
Authenticated to keyspace: ks33

Adding the cf33 ColumnFamily to ks33 Keyspace:

[default@ks33] create column family cf33 with comparator = UTF8Type; 
2501f8b0-6e2f-11e1-0000-13393ec611bd
Waiting for schema agreement...
... schemas agree across the cluster

Next, to load 100 trial rows. Here is a link to the source code:

Source for useMultiGet (tba)

hpcass@feed0:~/cassIP/java/cBuild$ host=10.1.0.123 port=9160 inserts=100 ks=ks33 cf=cf33 ant -DclassToRun=c01.useMultiGet run
Buildfile: /home/hpcass/cassIP/java/cBuild/build.xml

init:

compile:
    [javac] Compiling 1 source file to /home/hpcass/cassIP/java/cBuild/build/classes

dist:
      [jar] Building jar: /home/hpcass/cassIP/java/cBuild/dist/lib/cassIP.jar

run:
     [java] get time   89062577
     [java] mget time 494039096

BUILD SUCCESSFUL

Here are some results from multi-get tests. It's actually the inverse of my hope, the multi-get seems to rapidly lose it's benefit.

5 Item Slices  (1000 item dataset)
=========================================================
run:                    RUN 1      RUN 2      RUN 3   
     [java] get time  339041199  436440551  358115310
     [java] mget time 172484370  174690508  182833140

10 Item Slices  (1000 item dataset)
=========================================================
run:                    RUN 1      RUN 2      RUN 3   
     [java] get time  346512511  332820479  314136351
     [java] mget time 394049160  251152592  234719383

25 Item Slices  (1000 item dataset)
=========================================================
run:                    RUN 1      RUN 2      RUN 3   
     [java] get time  335286775  293802010  295948562
     [java] mget time 464933443  324505741  312226035

What I didn't expect to see, based on the information in the 'High Performance
Cookbook, was rapid fall-off in performance, and in face in all cases in the
slices of size 25 inverted the performance, showing that it became worse.

2 Item Slices  (1000 item dataset)
=========================================================
run:                    RUN 1      RUN 2      RUN 3   
     [java] get time  285509637  331970814  317512021
     [java] mget time 104567639   96477512  124040195

One thing I didn't think of testing was doing a slice of size 1, and see if maybe part of the perceived performance in the lower slices is really cache hits. AH! Look at this, it looks like the *test* is highly suspect at best. I think this shows some evidence the performance 'benefit' of the multi-get is really a cache hit artifact from extracting the exact same data a second time:

host=10.1.0.123 port=9160 inserts=1000 ks=ks33 cf=cf33 slice=1 ant -DclassToRun=c01.useMultiGet run
Buildfile: /home/hpcass/cassIP/java/cBuild/build.xml

1 Item Slices  (1000 item dataset)
=========================================================
run:                    RUN 1      RUN 2      RUN 3   
     [java] get time  295158535  298466321  283438099
     [java] mget time 109982545  103658894   98260286

This demonstrator failure to perform, is not a failure in and of itself. It's provided useful information regarding some concepts recommended in some documentation, but may not really be a true best practice. I long ago developed a healthy skepticism of expert advice in lieu of verification.

Re-Configuring an Empty Cassandra Cluster

PREV: Setting up a Java build env to prepare for Cassandra development

After doing more research, I decided the Ordered Partitioning was not going to buy me anything but a lop-sided distribution. Looking at this (it’s a case of IP distributions, not hostnames as originally envisioned, that will be a later evaluation).

I’d have 3 very heavy nodes and 3 very light nodes. This is a distribution of real world data.

Node:  Range:                             Dist:    
====== ================================== ======  
node00         0.0.0.0 to 42.170.170.171     6 %  
node01  42.170.170.172 to 85.85.85.87       32 %  
node02     85.85.85.88 to 128.0.0.3         34 %  
node03       128.0.0.4 to 170.170.170.175    2 %  
node04 170.170.170.176 to 213.85.85.91      21 %  
node05    213.85.85.92 to 255.255.255.255    3 %  

Goofing around with pseudo random key naming to get a better balance only does one thing, make the keys I wanted to use (IPs) basically worthless, so the ordering is wrecked regardless. Random partitioning is the default configuration for Cassandra, so, that’s what I plan to use. Problem is, I’d built out this specific node set with this setting first:

ByteOrderedPartitioner orders rows lexically by key bytes. BOP allows scanning rows in key order, but the ordering can generate hot spots for sequential insertion workloads.

I re-set the configurations to use the default instead:

RandomPartitioner distributes rows across the cluster evenly by md5. When in doubt, this is the best option.

After changing the configuration from ByteOrderedPartitioner to RandomPartitioner and restarting the first node.. I am greeted with this happy message:

ERROR 13:03:36,113 Fatal exception in thread Thread[SSTableBatchOpen:3,5,main]
java.lang.RuntimeException: Cannot open /home/hpcass/data/node00/system/Versions-hc-3 because partitioner does not match org.apache.cassandra.dht.RandomPartitioner

In fact I’m greeted with a lot of them. This is then followed by what looks like possibly.. normal startup messaging?

 INFO 13:03:36,166 Creating new commitlog segment /home/hpcass/commitlog/node00/CommitLog-1331586216166.log
 INFO 13:03:36,175 Couldn't detect any schema definitions in local storage.
 INFO 13:03:36,175 Found table data in data directories. Consider using the CLI to define your schema.
 INFO 13:03:36,197 Replaying /home/hpcass/commitlog/node00/CommitLog-1331328557751.log
 INFO 13:03:36,222 Finished reading /home/hpcass/commitlog/node00/CommitLog-1331328557751.log
 INFO 13:03:36,227 Enqueuing flush of Memtable-LocationInfo@1762056890(213/266 serialized/live bytes, 7 ops)
 INFO 13:03:36,228 Writing Memtable-LocationInfo@1762056890(213/266 serialized/live bytes, 7 ops)
 INFO 13:03:36,228 Enqueuing flush of Memtable-Versions@202783062(83/103 serialized/live bytes, 3 ops)
 INFO 13:03:36,277 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-16-Data.db (377 bytes)
 INFO 13:03:36,285 Writing Memtable-Versions@202783062(83/103 serialized/live bytes, 3 ops)
 INFO 13:03:36,357 Completed flushing /home/hpcass/data/node00/system/Versions-hc-4-Data.db (247 bytes)
 INFO 13:03:36,358 Log replay complete, 9 replayed mutations
 INFO 13:03:36,366 Cassandra version: 1.0.8
 INFO 13:03:36,366 Thrift API version: 19.20.0
 INFO 13:03:36,367 Loading persisted ring state
 INFO 13:03:36,384 Starting up server gossip
 INFO 13:03:36,386 Enqueuing flush of Memtable-LocationInfo@846275759(88/110 serialized/live bytes, 2 ops)
 INFO 13:03:36,386 Writing Memtable-LocationInfo@846275759(88/110 serialized/live bytes, 2 ops)
 INFO 13:03:36,440 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-17-Data.db (196 bytes)
 INFO 13:03:36,446 Starting Messaging Service on port 7000
 INFO 13:03:36,452 Using saved token 0
 INFO 13:03:36,453 Enqueuing flush of Memtable-LocationInfo@59584763(38/47 serialized/live bytes, 2 ops)
 INFO 13:03:36,454 Writing Memtable-LocationInfo@59584763(38/47 serialized/live bytes, 2 ops)
 INFO 13:03:36,556 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-18-Data.db (148 bytes)
 INFO 13:03:36,558 Node /10.1.0.23 state jump to normal
 INFO 13:03:36,558 Bootstrap/Replace/Move completed! Now serving reads.
 INFO 13:03:36,559 Will not load MX4J, mx4j-tools.jar is not in the classpath
 INFO 13:03:36,587 Binding thrift service to /10.1.0.23:9160
 INFO 13:03:36,590 Using TFastFramedTransport with a max frame size of 15728640 bytes.
 INFO 13:03:36,593 Using synchronous/threadpool thrift server on /10.1.0.23 : 9160
 INFO 13:03:36,593 Listening for thrift clients...

Despite the fatal errors, it does seem to have restarted the cluster with the new Partition engine:

Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               7169015515630842424558524306038950250903273734
10.1.0.27      datacenter1 rack1       Down   Normal  ?               93.84%  -2742379978670691477635174047251157095949195165
10.1.0.23      datacenter1 rack1       Up     Normal  15.79 KB        86.37%  0                                           
10.1.0.26      datacenter1 rack1       Down   Normal  ?               77.79%  896682280808232140910919391534960240163386913
10.1.0.24      datacenter1 rack1       Up     Normal  15.79 KB        53.08%  1927726543429020693034590137790785169819652674
10.1.0.25      datacenter1 rack1       Up     Normal  15.79 KB        35.85%  6138493926725652010223830601932265434881918085
10.1.0.28      datacenter1 rack1       Down   Normal  ?               53.08%  716901551563084242455852430603895025090327373

Starting up the other three nodes (example:)

 INFO 14:10:06,663 Node /10.1.0.25 has restarted, now UP
 INFO 14:10:06,663 InetAddress /10.1.0.25 is now UP
 INFO 14:10:06,664 Node /10.1.0.25 state jump to normal
 INFO 14:10:06,664 Node /10.1.0.24 has restarted, now UP
 INFO 14:10:06,665 InetAddress /10.1.0.24 is now UP
 INFO 14:10:06,665 Node /10.1.0.24 state jump to normal
 INFO 14:10:06,666 Node /10.1.0.23 has restarted, now UP
 INFO 14:10:06,667 InetAddress /10.1.0.23 is now UP
 INFO 14:10:06,668 Node /10.1.0.23 state jump to normal
 INFO 14:10:06,760 Completed flushing /home/hpcass/data/node01/system/LocationInfo-hc-18-Data.db (166 bytes)
 INFO 14:10:06,762 Node /10.1.0.26 state jump to normal
 INFO 14:10:06,763 Bootstrap/Replace/Move completed! Now serving reads.
 INFO 14:10:06,764 Will not load MX4J, mx4j-tools.jar is not in the classpath
 INFO 14:10:06,862 Binding thrift service to /10.1.0.26:9160

Re-checking the ring displays:

Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               7169015515630842424558524306038950250903273734
10.1.0.27      datacenter1 rack1       Up     Normal  11.37 KB        93.84%  -2742379978670691477635174047251157095949195165
10.1.0.23      datacenter1 rack1       Up     Normal  15.79 KB        86.37%  0                                           
10.1.0.26      datacenter1 rack1       Up     Normal  18.38 KB        77.79%  896682280808232140910919391534960240163386913
10.1.0.24      datacenter1 rack1       Up     Normal  15.79 KB        53.08%  1927726543429020693034590137790785169819652674
10.1.0.25      datacenter1 rack1       Up     Normal  15.79 KB        35.85%  6138493926725652010223830601932265434881918085
10.1.0.28      datacenter1 rack1       Up     Normal  15.79 KB        53.08%  7169015515630842424558524306038950250903273734

Switching partition engine appears to be easy enough. What I suspect however (and I’ve not confirmed this, is that the data would have been compromised or likely destroyed in this process. The documentation I’ve read so far indicated that you could not do this. Once setup with a specific partitioning engine that cluster was bound to it.

My conclusion is that if you have not yet started to saturate your cluster with data, and you wish to change the partitioning engine, it would appear that the right time to do it is now.. before you start to load data.

I plan to test this theory later after the first trial data load to see if in fact it mangles the information. More to follow!

UPDATE!

Despite the information that I thought nodetool was telling me, my cluster was unusable because of the partitioner change. What is the last step required to change partition? NUKE THE DATA. Unfun.. but that is what I need to do.

Having 6 nodes means 6 times the fun. Here is the kicker though, I’ll just move the data aside and re-construct, and that will let me swap it back in if I decided to go back and forth testing the impacts of Random vs. Ordered for my needs. Will I get away with this? I don’t know. That won’t stop me from trying!

The data was stored in ~/data/node00 (node## etc.). This is all I did:

mv data/node00 data/node00-bop       # bop = btye order partition.

Restarted node00:

hpcass:~/nodes$ node00/bin/cassandra -f
 INFO 16:38:46,525 Logging initialized
 INFO 16:38:46,529 JVM vendor/version: OpenJDK 64-Bit Server VM/1.6.0_0
 INFO 16:38:46,529 Heap size: 6291456000/6291456000
 INFO 16:38:46,529 Classpath: node00/bin/../conf:node00/bin/../build/classes/main:node00/bin/../build/classes/thrift:node00/bin/../lib/antlr-3.2.jar:node00/bin/../lib/apache-cassandra-1.0.8.jar:node00/bin/../lib/apache-cassandra-clientutil-1.0.8.jar:node00/bin/../lib/apache-cassandra-thrift-1.0.8.jar:node00/bin/../lib/avro-1.4.0-fixes.jar:node00/bin/../lib/avro-1.4.0-sources-fixes.jar:node00/bin/../lib/commons-cli-1.1.jar:node00/bin/../lib/commons-codec-1.2.jar:node00/bin/../lib/commons-lang-2.4.jar:node00/bin/../lib/compress-lzf-0.8.4.jar:node00/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:node00/bin/../lib/guava-r08.jar:node00/bin/../lib/high-scale-lib-1.1.2.jar:node00/bin/../lib/jackson-core-asl-1.4.0.jar:node00/bin/../lib/jackson-mapper-asl-1.4.0.jar:node00/bin/../lib/jamm-0.2.5.jar:node00/bin/../lib/jline-0.9.94.jar:node00/bin/../lib/json-simple-1.1.jar:node00/bin/../lib/libthrift-0.6.jar:node00/bin/../lib/log4j-1.2.16.jar:node00/bin/../lib/servlet-api-2.5-20081211.jar:node00/bin/../lib/slf4j-api-1.6.1.jar:node00/bin/../lib/slf4j-log4j12-1.6.1.jar:node00/bin/../lib/snakeyaml-1.6.jar:node00/bin/../lib/snappy-java-1.0.4.1.jar
 INFO 16:38:46,531 JNA not found. Native methods will be disabled.
 INFO 16:38:46,538 Loading settings from file:/home/hpcass/nodes/node00/conf/cassandra.yaml
 INFO 16:38:46,635 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
 INFO 16:38:46,645 Global memtable threshold is enabled at 2000MB
 INFO 16:38:46,839 Creating new commitlog segment /home/hpcass/commitlog/node00/CommitLog-1331599126839.log
 INFO 16:38:46,848 Couldn't detect any schema definitions in local storage.
 INFO 16:38:46,849 Found table data in data directories. Consider using the CLI to define your schema.
 INFO 16:38:46,863 Replaying /home/hpcass/commitlog/node00/CommitLog-1331597615041.log
 INFO 16:38:46,887 Finished reading /home/hpcass/commitlog/node00/CommitLog-1331597615041.log
 INFO 16:38:46,892 Enqueuing flush of Memtable-LocationInfo@1834491520(98/122 serialized/live bytes, 4 ops)
 INFO 16:38:46,893 Enqueuing flush of Memtable-Versions@875509103(83/103 serialized/live bytes, 3 ops)
 INFO 16:38:46,894 Writing Memtable-LocationInfo@1834491520(98/122 serialized/live bytes, 4 ops)
 INFO 16:38:47,001 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-1-Data.db (208 bytes)
 INFO 16:38:47,009 Writing Memtable-Versions@875509103(83/103 serialized/live bytes, 3 ops)
 INFO 16:38:47,057 Completed flushing /home/hpcass/data/node00/system/Versions-hc-1-Data.db (247 bytes)
 INFO 16:38:47,057 Log replay complete, 6 replayed mutations
 INFO 16:38:47,066 Cassandra version: 1.0.8
 INFO 16:38:47,066 Thrift API version: 19.20.0
 INFO 16:38:47,067 Loading persisted ring state
 INFO 16:38:47,070 Starting up server gossip
 INFO 16:38:47,091 Enqueuing flush of Memtable-LocationInfo@952443392(88/110 serialized/live bytes, 2 ops)
 INFO 16:38:47,092 Writing Memtable-LocationInfo@952443392(88/110 serialized/live bytes, 2 ops)
 INFO 16:38:47,141 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-2-Data.db (196 bytes)
 INFO 16:38:47,149 Starting Messaging Service on port 7000
 INFO 16:38:47,155 Using saved token 0
 INFO 16:38:47,157 Enqueuing flush of Memtable-LocationInfo@1623810826(38/47 serialized/live bytes, 2 ops)
 INFO 16:38:47,157 Writing Memtable-LocationInfo@1623810826(38/47 serialized/live bytes, 2 ops)
 INFO 16:38:47,237 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-3-Data.db (148 bytes)
 INFO 16:38:47,239 Node /10.1.0.23 state jump to normal
 INFO 16:38:47,240 Bootstrap/Replace/Move completed! Now serving reads.
 INFO 16:38:47,241 Will not load MX4J, mx4j-tools.jar is not in the classpath
 INFO 16:38:47,269 Binding thrift service to /10.1.0.23:9160
 INFO 16:38:47,272 Using TFastFramedTransport with a max frame size of 15728640 bytes.
 INFO 16:38:47,274 Using synchronous/threadpool thrift server on /10.1.0.23 : 9160
 INFO 16:38:47,275 Listening for thrift clients...

^Z
[1]+  Stopped                 node00/bin/cassandra -f
hpcass:~/nodes$ bg
[1]+ node00/bin/cassandra -f &

With the process backgrounded, checked the files in the new data directory for my node:

hpcass:~/data/node00$ ls -1 system
LocationInfo-hc-1-Data.db
LocationInfo-hc-1-Digest.sha1
LocationInfo-hc-1-Filter.db
LocationInfo-hc-1-Index.db
LocationInfo-hc-1-Statistics.db
LocationInfo-hc-2-Data.db
LocationInfo-hc-2-Digest.sha1
LocationInfo-hc-2-Filter.db
LocationInfo-hc-2-Index.db
LocationInfo-hc-2-Statistics.db
LocationInfo-hc-3-Data.db
LocationInfo-hc-3-Digest.sha1
LocationInfo-hc-3-Filter.db
LocationInfo-hc-3-Index.db
LocationInfo-hc-3-Statistics.db
Versions-hc-1-Data.db
Versions-hc-1-Digest.sha1
Versions-hc-1-Filter.db
Versions-hc-1-Index.db
Versions-hc-1-Statistics.db

Following that clearing and rebuild, I see the node tool results look a lot better:

hpcass@feed0:~/nodes$ cass00/bin/nodetool -h localhost ring
Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               6138493926725652010223830601932265434881918085
10.1.0.23      datacenter1 rack1       Up     Normal  15.68 KB        33.29%  0                                           
10.1.0.24      datacenter1 rack1       Up     Normal  18.34 KB        30.87%  1927726543429020693034590137790785169819652674
10.1.0.25      datacenter1 rack1       Up     Normal  18.34 KB        35.85%  6138493926725652010223830601932265434881918085

After resetting the old numerated nodes, I had a complete disaster! Negative node tokens? How did that happen? Restarts did nothing to fix this either.

Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               7169015515630842424558524306038950250903273734
10.1.0.27      datacenter1 rack1       Up     Normal  15.79 KB        93.84%  -2742379978670691477635174047251157095949195165
10.1.0.23      datacenter1 rack1       Up     Normal  15.79 KB        86.37%  0                                           
10.1.0.26      datacenter1 rack1       Up     Normal  15.79 KB        77.79%  896682280808232140910919391534960240163386913
10.1.0.24      datacenter1 rack1       Up     Normal  15.79 KB        53.08%  1927726543429020693034590137790785169819652674
10.1.0.25      datacenter1 rack1       Up     Normal  15.79 KB        35.85%  6138493926725652010223830601932265434881918085
10.1.0.28      datacenter1 rack1       Up     Normal  15.79 KB        53.08%  7169015515630842424558524306038950250903273734

To resolve this, I simply re-ran my token generator to get a new set of tokens:

node00	10.1.0.23  token: 0
node01	10.1.0.26  token: 28356863910078205288614550619314017621
node02	10.1.0.24  token: 56713727820156410577229101238628035242
node03	10.1.0.27  token: 85070591730234615865843651857942052863
node04	10.1.0.25  token: 113427455640312821154458202477256070485
node05	10.1.0.28  token: 141784319550391026443072753096570088106

Followed by manually setting the tokens in the ring:

bin/nodetool -h 10.1.0.24 move 56713727820156410577229101238628035242
bin/nodetool -h 10.1.0.25 move 113427455640312821154458202477256070485

bin/nodetool -h 10.1.0.26 move 28356863910078205288614550619314017621
bin/nodetool -h 10.1.0.27 move 85070591730234615865843651857942052863
bin/nodetool -h 10.1.0.28 move 141784319550391026443072753096570088106

This.. gave me the results I was expecting!

Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               141784319550391026443072753096570088106     
10.1.0.23      datacenter1 rack1       Up     Normal  24.95 KB        16.67%  0                                           
10.1.0.26      datacenter1 rack1       Up     Normal  20.72 KB        16.67%  28356863910078205288614550619314017621      
10.1.0.24      datacenter1 rack1       Up     Normal  25.1 KB         16.67%  56713727820156410577229101238628035242      
10.1.0.27      datacenter1 rack1       Up     Normal  13.38 KB        16.67%  85070591730234615865843651857942052863      
10.1.0.25      datacenter1 rack1       Up     Normal  25.1 KB         16.67%  113427455640312821154458202477256070485     
10.1.0.28      datacenter1 rack1       Up     Normal  25.14 KB        16.67%  141784319550391026443072753096570088106   

Now, the question of actually connecting to the cluster can be answered. Pick one of the nodes and ports to connect too. I picked node00 on .23 (cli defaulted to port 9160 so I didn’t have to specify that):

node00/bin/cassandra-cli -h 10.1.0.23 
Connected to: "test-ip" on 10.1.0.23/9160
Welcome to Cassandra CLI version 1.0.8

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

The big problem I had, was that the cli never did seem to respond. The trick is to end your command with a semi-colon. That might seem obvious to you, and generally obvious to me.. but I’d not seen the docs actually call out that little FACT.

[default@unknown] show cluster name;
test-ip

Created a test column family from the helpful Cassandra Wiki.

create keyspace Twissandra;
Keyspace names must be case-insensitively unique ("Twissandra" conflicts with "Twissandra")
[default@unknown] 
[default@unknown] 
[default@unknown] create column family User with comparator = UTF8Type;
Not authenticated to a working keyspace.
[default@unknown] use Twissandra;
Authenticated to keyspace: Twissandra
[default@Twissandra] create column family User with comparator = UTF8Type;
adf453a0-6cb0-11e1-0000-13393ec611bd
Waiting for schema agreement...
... schemas agree across the cluster
[default@Twissandra] 

AND WE’RE OFF!! Next article will cover actually finishing up this last test and then adding real data. MORE TO COME!!

NEXT: Cassandra – A Use case examined (IP data)

Cassandra and Big Data – building a single-node “cluster”

Cassandra – Getting off the ground.
Continuation of post: Apache Cassandra Project – processing “Big Data”

While researching a project on Big Data services, I knew that I’d need a multi-node cluster to experiment with, but a pile of hardware was not immediately available.

Using the VERY helpful book Cassandra High Performance Cookbook I was able to build a 3 node cluster on a single machine. This is how I did it:


For this cluster test example, I am using Ubunto 10, with following JVM

      JVM vendor/version: OpenJDK 64-Bit Server VM/1.6.0_22

Downloaded Cassandra 1.0.8 package from here:
http://apache.mirrors.tds.net//cassandra/1.0.8/apache-cassandra-1.0.8-bin.tar.gz

Created new user on system: bigdata

Create the required base data directories

  $ mkdir commitlog,log,data,saved_caches

Moved that package there and started the build

$ cp /tmp/apache-cassandra-1.0.8-bin.tar.gz .

Unzipped and extracted the contents

$ gunzip apache-cassandra-1.0.8-bin.tar.gz
$ tar xvf apache-cassandra-1.0.8-bin.tar

Moved the long directory name to first instance cassA-1.0.8

$ mv apache-cassandra-1.0.8 cassA-1.0.8

Extracted again and renamed this to the other two planned instances:

$ tar xfv apache-cassandra-1.0.8-bin.tar
$ mv apache-cassandra-1.0.8 cassB-1.0.8  

$ tar xfv apache-cassandra-1.0.8-bin.tar
$ mv apache-cassandra-1.0.8 cassC-1.0.8  

This gave me three packages to build, and each with a unique IP

  cassA-1.0.8   10.1.1.101
  cassB-1.0.8   10.1.1.102
  cassC-1.0.8   10.1.1.103

Edit configuration files in each instance (casaA-1.0.8 used as example:)

$ vi cassA-1.0.8/conf/cassandra.yaml 

[...]

# directories where Cassandra should store data on disk.
data_file_directories: 
    - /home/bigdata/data/cassA

# commit log
commitlog_directory: /home/bigdata/commitlog/cassA

# saved caches
saved_caches_directory: /home/bigdata/saved_caches/cassA

[...]

# If blank, Cassandra will request a token bisecting the range of
# the heaviest-loaded existing node.  If there is no load information
# available, such as is the case with a new cluster, it will pick
# a random token, which will lead to hot spots.
initial_token: 0

[...]

# Setting this to 0.0.0.0 is always wrong.
listen_address: 10.1.1.101

[...]

rpc_address: 10.1.1.101

[...]

          # seeds is actually a comma-delimited list of addresses.
          # Ex: ",,"
          - seeds: "10.1.100.101,10.1.100.102,10.1.100.103"
[...]

Setting a separate logfile is recommended. Edit config to set separate log

vi cassA-1.0.8/conf/log4j-server.properties

[...]
log4j.appender.R.File=/home/bigdata/log/cassA.log
[...]

Repeat for instances cassB and cassC, setting the token value for B and C to appropriate values (see Extra Credit below if you need to know how to do *that* part):

#cassB
initial_token: 56713727820156410577229101238628035242

#cassC
initial_token: 113427455640312821154458202477256070485

To enable the JMX management console, each instance will require it’s own port. Edit the env file to set that up.

vi cassA-1.0.8/conf/cassandra-env.sh

[...]
# Specifies the default port over which Cassandra will be available for
# JMX connections.
JMX_PORT="8001"
[...]

Repeated for the other two instances, defining 8002 and 8003 respectively.

Now, for the final trick, start up the instances:

  cassA-1.0.8/bin/cassandra
  cassB-1.0.8/bin/cassandra
  cassC-1.0.8/bin/cassandra

Cluster elements started up, and they can be seen active in the process table here:

$ ps -lf
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 S bigdata   4554     1  2  80   0 - 226846 futex_ 12:13 pts/0   00:00:05 java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=
0 S bigdata   4593     1  2  80   0 - 210824 futex_ 12:13 pts/0   00:00:05 java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=
0 S bigdata   4632     1  2  80   0 - 226830 futex_ 12:13 pts/0   00:00:05 java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=
0 R bigdata   5047  3054  0  80   0 -  5483 -      12:16 pts/0    00:00:00 ps -lf

Finally, to check the status, connect to of the JMX node ports and check the ring. You only need to connect to one of the cluster’s nodes to check the complete cluster’s status:

$ bin/nodetool -h 10.1.100.101 -port 8001 ring
Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               113427455640312821154458202477256070485     
10.1.100.101    datacenter1 rack1       Up     Normal  21.86 KB        33.33%  0                                           
10.1.100.102    datacenter1 rack1       Up     Normal  20.28 KB        33.33%  56713727820156410577229101238628035242      
10.1.100.103    datacenter1 rack1       Up     Normal  29.1 KB         33.33%  113427455640312821154458202477256070485      

Now, that’s a functional 3 instance cluster running on a single node. These are not in separate VMs, and if you wanted to experiment with this on a larger cluster, running multiple instances on multiple VM’s on a single hypervisor.. I don’t really see why you cannot!

In the next article, I’m going to start feeding data into the cluster. Stay tuned for that!


Extra Credit:

To create the token value I needed for this three ring cluster, I used the following PERL script. BTW, bignum is required unless you want PERL printing these big numbers in scientific notation:

#!/usr/bin/perl
use bignum;
my $nodes = shift;
print "Calculate tokens for $nodes nodes\n";
print "node 0\ttoken: 0\n" unless $nodes;
exit unless $nodes;
my $factor = 2**127;
print "factor = $factor\n";
for (my $i=0;$i<$nodes;$i++) {
	my $token = $i * ( $factor / $nodes);
	print "node $i\ttoken: $token\n";
}

Running the script for three nodes gave me the following results:

$ ./maketokens.pl  3

Calculate tokens for 3 nodes
factor = 170141183460469231731687303715884105728
node 0	token: 0
node 1	token: 56713727820156410577229101238628035242.67
node 2	token: 113427455640312821154458202477256070485.34

Additional Comments:

If you are setting up a standard mutli-box cluster, make sure you have the following ports opened up on any firewalls. If not, the cluster members wont' find each other:

# TCP port, for commands and data
storage_port: 7000

# SSL port, for encrypted communication.  Unused unless enabled in
# encryption_options
ssl_storage_port: 7001

NEXT: Setting up a Java build env to prepare for Cassandra development