Tag Archives: cli

Inserting and Reading data from a Cassandra Cluster

Rubber meeting the road. Time to insert some column families, then some data and finally pull it back off the stack.

First off, the keyspace was already defined, so I’m going to simply list it’s structure:

[default@unknown] describe ip_store;

Keyspace: ip_store:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:2]

With a keyspace ready for some column families, those are created next. Here I’m establishing that there will be 4 families in this single keyspace. This is contrary to suggestions in the High Performance Cassandra Handbook, but follows all other documentation I’ve seen. Considering that this is NOT a production implementation, I’m going to go with a more conventional strategy of organizing related data in the same keyspace.

The first action is to assume the desired keyspace, then add the desired column families:

[default@unknown] use ip_store;
Authenticated to keyspace: ip_store

[default@ip_store] create column family warehouse with comparator = UTF8Type;
595945d0-71ce-11e1-0000-13393ec611bf
Waiting for schema agreement...
... schemas agree across the cluster

[default@ip_store] create column family hourly with comparator = UTF8Type;
65ea2170-71ce-11e1-0000-13393ec611bf
Waiting for schema agreement...
... schemas agree across the cluster

[default@ip_store] create column family daily with comparator = UTF8Type; 
6aaeae60-71ce-11e1-0000-13393ec611bf
Waiting for schema agreement...
... schemas agree across the cluster

[default@ip_store] create column family 30day with comparator = UTF8Type;
7b85bf30-71ce-11e1-0000-13393ec611bf
Waiting for schema agreement...
... schemas agree across the cluster

OK, a basic schema has been established. Now.. to load the data. I’ll post the relevant sections of the loader code at a later date. At this point you only need to consider that the loader DOES work and it’s loading data. We’ll look at the extraction of the data following loading a very small set.

time host=10.1.0.23 port=9160 ks=ip_store cf=warehouse ttl=0 datafile=5.ips ant -DclassToRun=loader.bulkIpLoader run
Buildfile: cBuild/build.xml

init:

compile:
    [javac] Compiling 1 source file to cBuild/build/classes

dist:
      [jar] Building jar: cBuild/dist/lib/cass.jar

run:
     [java] ks       ip_store
     [java] cf       warehouse
     [java] ttl      0
     [java] datafile 5.ips

BUILD SUCCESSFUL
Total time: 1 second

Of the set, there are three unique IPs and 2 are duplicates of other data (IMPORTANT NOTE: The IP’s have been changed to protect the innocent and clueless):

2016468288	1011	suspicious	2012-03-13 18:40:01
2016468288	1011	suspicious	2012-03-13 18:40:02
3149138705	1011	suspicious	2012-03-13 18:40:00
3149138705	1011	suspicious	2012-03-13 18:40:01
2179293112	1011	suspicious	2012-03-13 18:39:59

Having loaded these, I re-launch the command line interface, authenticate to the desired keyspace, and then a VERY important command to set an assumption about how we’re going to reference the keys. If you get a strange error like this “cannot parse ‘187.180.11.17’ as hex bytes“, that means you likely forgot to issue the assumes command. Commands I issued are in bold.

cass
Connected to: "ak-ip" on 10.1.0.23/9160
Welcome to Cassandra CLI version 1.0.8

[default@unknown] use ip_store

[default@ip_store] assume warehouse keys as utf8;   
Assumption for column family 'warehouse' added successfully.

[default@ip_store]  get warehouse['3149138705'];

=> (column=2012-03-13 18:40:00, value=7b227265706f72746564223a22323031322d30332d31332031383a34383a3031222c22617474726962757465223a22737573706963696f7573222c2270726f705f6964223a2231303131222c2270726f7065727479223a22426f74204375747761696c222c226465746563746564223a22323031322d30332d31332031383a34303a3030222c226d65746164617461223a22222c226970223a223139302e3137342e3235312e313435227d, timestamp=1332168862658)
=> (column=2012-03-13 18:40:01, value=7b227265706f72746564223a22323031322d30332d31332031383a34383a3031222c22617474726962757465223a22737573706963696f7573222c2270726f705f6964223a2231303131222c2270726f7065727479223a22426f74204375747761696c222c226465746563746564223a22323031322d30332d31332031383a34303a3031222c226d65746164617461223a22222c226970223a223139302e3137342e3235312e313435227d, timestamp=1331689681)
Returned 2 results.
Elapsed time: 39 msec(s).

There we go. A single key row ip_store[‘warehouse’][‘3149138705’] containing to column records, each with a JSON blob within it. Now.. the next step, to set the assumption of utf8 when recalling the records and get output mere mortals such as yourselves can understand.

[default@ip_store] assume warehouse validator as ascii; 
Assumption for column family 'warehouse' added successfully.

[default@ip_store]  t warehouse['3149138705'];  
     
=> (column=2012-03-13 18:40:00, 
  value={
   "reported":"2012-03-13 18:48:01",
   "attribute":"suspicious",
   "prop_id":"1011",
   "detected":"2012-03-13 18:40:00",
   "ip":"187.180.11.17"
  }, timestamp=1331689680)

=> (column=2012-03-13 18:40:01, 
  value={
    "reported":"2012-03-13 18:48:01",
    "attribute":"suspicious",
    "prop_id":"1011",
    "detected":"2012-03-13 18:40:01",
     "ip":"187.180.11.17"
  }, timestamp=1331689681)

Returned 2 results.
Elapsed time: 2 msec(s).

There is it! Data written, data read. Now, it’s up to you to think about how you might use this simple, flexible and powerful storage engine to solve your business needs.

Drop keyspace using Cassandra Cli

Dropping a an entire keyspace using the cassandra-cli is exceptionally simple.

First, access your cluster using the cli. I have an alias in my .bash_profile so I only need to type ‘cass’ to access the clid. In an attempt to be helpful though, I shall show the full command syntax for my environment. Your host and port may vary.

  alias cass='cassandra-cli -h 10.1.0.26'

In this example, I am going to drop the keyspace I was loading with test data in previous posts, ks33.

hpcass: ~$ cass
Connected to: "Test1" on 10.1.0.23/9160
Welcome to Cassandra CLI version 1.0.8

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

DROP keyspace ks33;

07ad5e00-7120-11e1-0000-13393ec611bd
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] 

That’s all there was to it. Keyspace destroyed.

Previous Cassandra related articles