PREV: Setting up a Java build env to prepare for Cassandra development
After doing more research, I decided the Ordered Partitioning was not going to buy me anything but a lop-sided distribution. Looking at this (it’s a case of IP distributions, not hostnames as originally envisioned, that will be a later evaluation).
I’d have 3 very heavy nodes and 3 very light nodes. This is a distribution of real world data.
Node: Range: Dist: ====== ================================== ====== node00 0.0.0.0 to 42.170.170.171 6 % node01 42.170.170.172 to 85.85.85.87 32 % node02 85.85.85.88 to 128.0.0.3 34 % node03 128.0.0.4 to 170.170.170.175 2 % node04 170.170.170.176 to 213.85.85.91 21 % node05 213.85.85.92 to 255.255.255.255 3 %
Goofing around with pseudo random key naming to get a better balance only does one thing, make the keys I wanted to use (IPs) basically worthless, so the ordering is wrecked regardless. Random partitioning is the default configuration for Cassandra, so, that’s what I plan to use. Problem is, I’d built out this specific node set with this setting first:
– ByteOrderedPartitioner orders rows lexically by key bytes. BOP allows scanning rows in key order, but the ordering can generate hot spots for sequential insertion workloads.
I re-set the configurations to use the default instead:
– RandomPartitioner distributes rows across the cluster evenly by md5. When in doubt, this is the best option.
After changing the configuration from ByteOrderedPartitioner to RandomPartitioner and restarting the first node.. I am greeted with this happy message:
ERROR 13:03:36,113 Fatal exception in thread Thread[SSTableBatchOpen:3,5,main]
java.lang.RuntimeException: Cannot open /home/hpcass/data/node00/system/Versions-hc-3 because partitioner does not match org.apache.cassandra.dht.RandomPartitioner
In fact I’m greeted with a lot of them. This is then followed by what looks like possibly.. normal startup messaging?
INFO 13:03:36,166 Creating new commitlog segment /home/hpcass/commitlog/node00/CommitLog-1331586216166.log INFO 13:03:36,175 Couldn't detect any schema definitions in local storage. INFO 13:03:36,175 Found table data in data directories. Consider using the CLI to define your schema. INFO 13:03:36,197 Replaying /home/hpcass/commitlog/node00/CommitLog-1331328557751.log INFO 13:03:36,222 Finished reading /home/hpcass/commitlog/node00/CommitLog-1331328557751.log INFO 13:03:36,227 Enqueuing flush of Memtable-LocationInfo@1762056890(213/266 serialized/live bytes, 7 ops) INFO 13:03:36,228 Writing Memtable-LocationInfo@1762056890(213/266 serialized/live bytes, 7 ops) INFO 13:03:36,228 Enqueuing flush of Memtable-Versions@202783062(83/103 serialized/live bytes, 3 ops) INFO 13:03:36,277 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-16-Data.db (377 bytes) INFO 13:03:36,285 Writing Memtable-Versions@202783062(83/103 serialized/live bytes, 3 ops) INFO 13:03:36,357 Completed flushing /home/hpcass/data/node00/system/Versions-hc-4-Data.db (247 bytes) INFO 13:03:36,358 Log replay complete, 9 replayed mutations INFO 13:03:36,366 Cassandra version: 1.0.8 INFO 13:03:36,366 Thrift API version: 19.20.0 INFO 13:03:36,367 Loading persisted ring state INFO 13:03:36,384 Starting up server gossip INFO 13:03:36,386 Enqueuing flush of Memtable-LocationInfo@846275759(88/110 serialized/live bytes, 2 ops) INFO 13:03:36,386 Writing Memtable-LocationInfo@846275759(88/110 serialized/live bytes, 2 ops) INFO 13:03:36,440 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-17-Data.db (196 bytes) INFO 13:03:36,446 Starting Messaging Service on port 7000 INFO 13:03:36,452 Using saved token 0 INFO 13:03:36,453 Enqueuing flush of Memtable-LocationInfo@59584763(38/47 serialized/live bytes, 2 ops) INFO 13:03:36,454 Writing Memtable-LocationInfo@59584763(38/47 serialized/live bytes, 2 ops) INFO 13:03:36,556 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-18-Data.db (148 bytes) INFO 13:03:36,558 Node /10.1.0.23 state jump to normal INFO 13:03:36,558 Bootstrap/Replace/Move completed! Now serving reads. INFO 13:03:36,559 Will not load MX4J, mx4j-tools.jar is not in the classpath INFO 13:03:36,587 Binding thrift service to /10.1.0.23:9160 INFO 13:03:36,590 Using TFastFramedTransport with a max frame size of 15728640 bytes. INFO 13:03:36,593 Using synchronous/threadpool thrift server on /10.1.0.23 : 9160 INFO 13:03:36,593 Listening for thrift clients...
Despite the fatal errors, it does seem to have restarted the cluster with the new Partition engine:
Address DC Rack Status State Load Owns Token 7169015515630842424558524306038950250903273734 10.1.0.27 datacenter1 rack1 Down Normal ? 93.84% -2742379978670691477635174047251157095949195165 10.1.0.23 datacenter1 rack1 Up Normal 15.79 KB 86.37% 0 10.1.0.26 datacenter1 rack1 Down Normal ? 77.79% 896682280808232140910919391534960240163386913 10.1.0.24 datacenter1 rack1 Up Normal 15.79 KB 53.08% 1927726543429020693034590137790785169819652674 10.1.0.25 datacenter1 rack1 Up Normal 15.79 KB 35.85% 6138493926725652010223830601932265434881918085 10.1.0.28 datacenter1 rack1 Down Normal ? 53.08% 716901551563084242455852430603895025090327373
Starting up the other three nodes (example:)
INFO 14:10:06,663 Node /10.1.0.25 has restarted, now UP INFO 14:10:06,663 InetAddress /10.1.0.25 is now UP INFO 14:10:06,664 Node /10.1.0.25 state jump to normal INFO 14:10:06,664 Node /10.1.0.24 has restarted, now UP INFO 14:10:06,665 InetAddress /10.1.0.24 is now UP INFO 14:10:06,665 Node /10.1.0.24 state jump to normal INFO 14:10:06,666 Node /10.1.0.23 has restarted, now UP INFO 14:10:06,667 InetAddress /10.1.0.23 is now UP INFO 14:10:06,668 Node /10.1.0.23 state jump to normal INFO 14:10:06,760 Completed flushing /home/hpcass/data/node01/system/LocationInfo-hc-18-Data.db (166 bytes) INFO 14:10:06,762 Node /10.1.0.26 state jump to normal INFO 14:10:06,763 Bootstrap/Replace/Move completed! Now serving reads. INFO 14:10:06,764 Will not load MX4J, mx4j-tools.jar is not in the classpath INFO 14:10:06,862 Binding thrift service to /10.1.0.26:9160
Re-checking the ring displays:
Address DC Rack Status State Load Owns Token 7169015515630842424558524306038950250903273734 10.1.0.27 datacenter1 rack1 Up Normal 11.37 KB 93.84% -2742379978670691477635174047251157095949195165 10.1.0.23 datacenter1 rack1 Up Normal 15.79 KB 86.37% 0 10.1.0.26 datacenter1 rack1 Up Normal 18.38 KB 77.79% 896682280808232140910919391534960240163386913 10.1.0.24 datacenter1 rack1 Up Normal 15.79 KB 53.08% 1927726543429020693034590137790785169819652674 10.1.0.25 datacenter1 rack1 Up Normal 15.79 KB 35.85% 6138493926725652010223830601932265434881918085 10.1.0.28 datacenter1 rack1 Up Normal 15.79 KB 53.08% 7169015515630842424558524306038950250903273734
Switching partition engine appears to be easy enough. What I suspect however (and I’ve not confirmed this, is that the data would have been compromised or likely destroyed in this process. The documentation I’ve read so far indicated that you could not do this. Once setup with a specific partitioning engine that cluster was bound to it.
My conclusion is that if you have not yet started to saturate your cluster with data, and you wish to change the partitioning engine, it would appear that the right time to do it is now.. before you start to load data.
I plan to test this theory later after the first trial data load to see if in fact it mangles the information. More to follow!
UPDATE!
Despite the information that I thought nodetool was telling me, my cluster was unusable because of the partitioner change. What is the last step required to change partition? NUKE THE DATA. Unfun.. but that is what I need to do.
Having 6 nodes means 6 times the fun. Here is the kicker though, I’ll just move the data aside and re-construct, and that will let me swap it back in if I decided to go back and forth testing the impacts of Random vs. Ordered for my needs. Will I get away with this? I don’t know. That won’t stop me from trying!
The data was stored in ~/data/node00 (node## etc.). This is all I did:
mv data/node00 data/node00-bop # bop = btye order partition.
Restarted node00:
hpcass:~/nodes$ node00/bin/cassandra -f INFO 16:38:46,525 Logging initialized INFO 16:38:46,529 JVM vendor/version: OpenJDK 64-Bit Server VM/1.6.0_0 INFO 16:38:46,529 Heap size: 6291456000/6291456000 INFO 16:38:46,529 Classpath: node00/bin/../conf:node00/bin/../build/classes/main:node00/bin/../build/classes/thrift:node00/bin/../lib/antlr-3.2.jar:node00/bin/../lib/apache-cassandra-1.0.8.jar:node00/bin/../lib/apache-cassandra-clientutil-1.0.8.jar:node00/bin/../lib/apache-cassandra-thrift-1.0.8.jar:node00/bin/../lib/avro-1.4.0-fixes.jar:node00/bin/../lib/avro-1.4.0-sources-fixes.jar:node00/bin/../lib/commons-cli-1.1.jar:node00/bin/../lib/commons-codec-1.2.jar:node00/bin/../lib/commons-lang-2.4.jar:node00/bin/../lib/compress-lzf-0.8.4.jar:node00/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:node00/bin/../lib/guava-r08.jar:node00/bin/../lib/high-scale-lib-1.1.2.jar:node00/bin/../lib/jackson-core-asl-1.4.0.jar:node00/bin/../lib/jackson-mapper-asl-1.4.0.jar:node00/bin/../lib/jamm-0.2.5.jar:node00/bin/../lib/jline-0.9.94.jar:node00/bin/../lib/json-simple-1.1.jar:node00/bin/../lib/libthrift-0.6.jar:node00/bin/../lib/log4j-1.2.16.jar:node00/bin/../lib/servlet-api-2.5-20081211.jar:node00/bin/../lib/slf4j-api-1.6.1.jar:node00/bin/../lib/slf4j-log4j12-1.6.1.jar:node00/bin/../lib/snakeyaml-1.6.jar:node00/bin/../lib/snappy-java-1.0.4.1.jar INFO 16:38:46,531 JNA not found. Native methods will be disabled. INFO 16:38:46,538 Loading settings from file:/home/hpcass/nodes/node00/conf/cassandra.yaml INFO 16:38:46,635 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 16:38:46,645 Global memtable threshold is enabled at 2000MB INFO 16:38:46,839 Creating new commitlog segment /home/hpcass/commitlog/node00/CommitLog-1331599126839.log INFO 16:38:46,848 Couldn't detect any schema definitions in local storage. INFO 16:38:46,849 Found table data in data directories. Consider using the CLI to define your schema. INFO 16:38:46,863 Replaying /home/hpcass/commitlog/node00/CommitLog-1331597615041.log INFO 16:38:46,887 Finished reading /home/hpcass/commitlog/node00/CommitLog-1331597615041.log INFO 16:38:46,892 Enqueuing flush of Memtable-LocationInfo@1834491520(98/122 serialized/live bytes, 4 ops) INFO 16:38:46,893 Enqueuing flush of Memtable-Versions@875509103(83/103 serialized/live bytes, 3 ops) INFO 16:38:46,894 Writing Memtable-LocationInfo@1834491520(98/122 serialized/live bytes, 4 ops) INFO 16:38:47,001 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-1-Data.db (208 bytes) INFO 16:38:47,009 Writing Memtable-Versions@875509103(83/103 serialized/live bytes, 3 ops) INFO 16:38:47,057 Completed flushing /home/hpcass/data/node00/system/Versions-hc-1-Data.db (247 bytes) INFO 16:38:47,057 Log replay complete, 6 replayed mutations INFO 16:38:47,066 Cassandra version: 1.0.8 INFO 16:38:47,066 Thrift API version: 19.20.0 INFO 16:38:47,067 Loading persisted ring state INFO 16:38:47,070 Starting up server gossip INFO 16:38:47,091 Enqueuing flush of Memtable-LocationInfo@952443392(88/110 serialized/live bytes, 2 ops) INFO 16:38:47,092 Writing Memtable-LocationInfo@952443392(88/110 serialized/live bytes, 2 ops) INFO 16:38:47,141 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-2-Data.db (196 bytes) INFO 16:38:47,149 Starting Messaging Service on port 7000 INFO 16:38:47,155 Using saved token 0 INFO 16:38:47,157 Enqueuing flush of Memtable-LocationInfo@1623810826(38/47 serialized/live bytes, 2 ops) INFO 16:38:47,157 Writing Memtable-LocationInfo@1623810826(38/47 serialized/live bytes, 2 ops) INFO 16:38:47,237 Completed flushing /home/hpcass/data/node00/system/LocationInfo-hc-3-Data.db (148 bytes) INFO 16:38:47,239 Node /10.1.0.23 state jump to normal INFO 16:38:47,240 Bootstrap/Replace/Move completed! Now serving reads. INFO 16:38:47,241 Will not load MX4J, mx4j-tools.jar is not in the classpath INFO 16:38:47,269 Binding thrift service to /10.1.0.23:9160 INFO 16:38:47,272 Using TFastFramedTransport with a max frame size of 15728640 bytes. INFO 16:38:47,274 Using synchronous/threadpool thrift server on /10.1.0.23 : 9160 INFO 16:38:47,275 Listening for thrift clients... ^Z [1]+ Stopped node00/bin/cassandra -f hpcass:~/nodes$ bg [1]+ node00/bin/cassandra -f &
With the process backgrounded, checked the files in the new data directory for my node:
hpcass:~/data/node00$ ls -1 system LocationInfo-hc-1-Data.db LocationInfo-hc-1-Digest.sha1 LocationInfo-hc-1-Filter.db LocationInfo-hc-1-Index.db LocationInfo-hc-1-Statistics.db LocationInfo-hc-2-Data.db LocationInfo-hc-2-Digest.sha1 LocationInfo-hc-2-Filter.db LocationInfo-hc-2-Index.db LocationInfo-hc-2-Statistics.db LocationInfo-hc-3-Data.db LocationInfo-hc-3-Digest.sha1 LocationInfo-hc-3-Filter.db LocationInfo-hc-3-Index.db LocationInfo-hc-3-Statistics.db Versions-hc-1-Data.db Versions-hc-1-Digest.sha1 Versions-hc-1-Filter.db Versions-hc-1-Index.db Versions-hc-1-Statistics.db
Following that clearing and rebuild, I see the node tool results look a lot better:
hpcass@feed0:~/nodes$ cass00/bin/nodetool -h localhost ring Address DC Rack Status State Load Owns Token 6138493926725652010223830601932265434881918085 10.1.0.23 datacenter1 rack1 Up Normal 15.68 KB 33.29% 0 10.1.0.24 datacenter1 rack1 Up Normal 18.34 KB 30.87% 1927726543429020693034590137790785169819652674 10.1.0.25 datacenter1 rack1 Up Normal 18.34 KB 35.85% 6138493926725652010223830601932265434881918085
After resetting the old numerated nodes, I had a complete disaster! Negative node tokens? How did that happen? Restarts did nothing to fix this either.
Address DC Rack Status State Load Owns Token 7169015515630842424558524306038950250903273734 10.1.0.27 datacenter1 rack1 Up Normal 15.79 KB 93.84% -2742379978670691477635174047251157095949195165 10.1.0.23 datacenter1 rack1 Up Normal 15.79 KB 86.37% 0 10.1.0.26 datacenter1 rack1 Up Normal 15.79 KB 77.79% 896682280808232140910919391534960240163386913 10.1.0.24 datacenter1 rack1 Up Normal 15.79 KB 53.08% 1927726543429020693034590137790785169819652674 10.1.0.25 datacenter1 rack1 Up Normal 15.79 KB 35.85% 6138493926725652010223830601932265434881918085 10.1.0.28 datacenter1 rack1 Up Normal 15.79 KB 53.08% 7169015515630842424558524306038950250903273734
To resolve this, I simply re-ran my token generator to get a new set of tokens:
node00 10.1.0.23 token: 0 node01 10.1.0.26 token: 28356863910078205288614550619314017621 node02 10.1.0.24 token: 56713727820156410577229101238628035242 node03 10.1.0.27 token: 85070591730234615865843651857942052863 node04 10.1.0.25 token: 113427455640312821154458202477256070485 node05 10.1.0.28 token: 141784319550391026443072753096570088106
Followed by manually setting the tokens in the ring:
bin/nodetool -h 10.1.0.24 move 56713727820156410577229101238628035242 bin/nodetool -h 10.1.0.25 move 113427455640312821154458202477256070485 bin/nodetool -h 10.1.0.26 move 28356863910078205288614550619314017621 bin/nodetool -h 10.1.0.27 move 85070591730234615865843651857942052863 bin/nodetool -h 10.1.0.28 move 141784319550391026443072753096570088106
This.. gave me the results I was expecting!
Address DC Rack Status State Load Owns Token 141784319550391026443072753096570088106 10.1.0.23 datacenter1 rack1 Up Normal 24.95 KB 16.67% 0 10.1.0.26 datacenter1 rack1 Up Normal 20.72 KB 16.67% 28356863910078205288614550619314017621 10.1.0.24 datacenter1 rack1 Up Normal 25.1 KB 16.67% 56713727820156410577229101238628035242 10.1.0.27 datacenter1 rack1 Up Normal 13.38 KB 16.67% 85070591730234615865843651857942052863 10.1.0.25 datacenter1 rack1 Up Normal 25.1 KB 16.67% 113427455640312821154458202477256070485 10.1.0.28 datacenter1 rack1 Up Normal 25.14 KB 16.67% 141784319550391026443072753096570088106
Now, the question of actually connecting to the cluster can be answered. Pick one of the nodes and ports to connect too. I picked node00 on .23 (cli defaulted to port 9160 so I didn’t have to specify that):
node00/bin/cassandra-cli -h 10.1.0.23 Connected to: "test-ip" on 10.1.0.23/9160 Welcome to Cassandra CLI version 1.0.8 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
The big problem I had, was that the cli never did seem to respond. The trick is to end your command with a semi-colon. That might seem obvious to you, and generally obvious to me.. but I’d not seen the docs actually call out that little FACT.
[default@unknown] show cluster name; test-ip
Created a test column family from the helpful Cassandra Wiki.
create keyspace Twissandra; Keyspace names must be case-insensitively unique ("Twissandra" conflicts with "Twissandra") [default@unknown] [default@unknown] [default@unknown] create column family User with comparator = UTF8Type; Not authenticated to a working keyspace. [default@unknown] use Twissandra; Authenticated to keyspace: Twissandra [default@Twissandra] create column family User with comparator = UTF8Type; adf453a0-6cb0-11e1-0000-13393ec611bd Waiting for schema agreement... ... schemas agree across the cluster [default@Twissandra]
AND WE’RE OFF!! Next article will cover actually finishing up this last test and then adding real data. MORE TO COME!!
NEXT: Cassandra – A Use case examined (IP data)