Category Archives: Software Development

Gearman — Starting the Java-Gearman-Service process

gearmanlogo
These notes apply to testing on a MAC OSX portable, and may or may not apply to your implementation. They are provided as an adjunct to my main Getting Gearman Going post elsewhere in this blog.

The project page for Java-Gearman-Service is located [HERE] on Google Code.

The full set of instructions for staring up Gearman’s Java-Gearman-Service were not clearly linked to the main Gearman project page, so I’m including the link [HERE] to save you the few Google dorkings I did to find it.

Starting up the Java service should be as simple as this:


java -jar java-gearman-service-0.6.6.jar

HOWEVER, I received this instead.. an error:


david$ java -jar java-gearman-service-0.6.6.jar
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/gearman/impl/Main : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

Checking my version shows that I am on 1.6 not 1.7 as I had thought;


david$ java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06-451-11M4406)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01-451, mixed mode)

To get the proper version, I navigated to the Oracle page located [HERE], agreed to their terms (do I really have a functional choice… not if I want/need to use Java…) and pushed forward.

If you are running the install for 1.7, you should see a dialog like this:
Screen Shot 2013-06-03 at 2.12.16 PM

That has at least resolved this part of the issue, and will attempt to restart the server.


david$ java -version
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b12)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

Now I’m going to restart, specific a custom port and get it kicked off in the background:


david$ java -jar java-gearman-service-0.6.6.jar -p6315 &

At this point the process is running on my Job Service box and next step will be to craft some code to see how it all works.

Deploying Java Gearman Job Server

gearmanlogo
OK, after more than a year it’s time to get down to really building out a Gearman system.

I’ve recently taken on a project that I believe to be the perfect fit for the pipelined distributed work manager of Gearman. I’m of course, not at liberty to discuss the various details of this project, but I can provide some high-level description for the purposes of justification.

The Project – distributed harvesting

The objective of the project is to provide a distributed method of web page scraping and parsing. This project requires that the scraping and profiling occur for 2,000,000+ websites in under 18 hours. No small feat for certain. The good news is that I’ve built a systems in the past (circa 2006) that did just this using MySQL as the task manager. It worked, but it had it’s issues, and almost every single one of them can be mitigated by using Gearman. The rest will be mitigated with the application of NoSQL solutions for site list management.

What is Gearman

Here is the synopsis from the Gearman.org main page.

Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.

The Job Server — implementing in Amazon’s ES2 environment

For the project I’m working on, I’ve opted for the Java implementation of the Job Server. This implementation’s main page is located [HERE].

Information about the Java Job Server:

Java Gearman Service is an easy-to-use distributed network application framework implementing the gearman protocol used to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages.

04-23-2012 java-gearman-service v0.6 has been released. [DOWNLOAD]

  • The service now uses the slf4j logging facade, allowing the user to have better control over logging
  • Persistent background jobs are now supported though an application hook
  • The API has been updated to be more user friendly, and it makes it easier to create divide-and-conquer/mapreduce applications (breaks the code of previous versions)
  • A .properties file now may be used to set property values and fine-tune the application.
Requirements to deploy the Java job server:
  • Java SE 7
  • slf4j 1.6.4+

For my implementation, I’ve extracted the zip into a vendor directory form where I’ll plan to launch the .jar. Development is occurring on an Apple OSX portable, then deployed to the AWS EC2 cluster for production. It’s expected that some library pathing and configuration will be required to make this all work.

Starting up the Java Gearman Service

It took a little time to locate the instructions for Starting up the Java implementation of the Gearman Service. It is located [HERE].

Instructions on how I started the GearMan server on my OSX development machine are located [HERE].

Once started, you should be able to communicate with it with your Client and Worker code!!

Next steps:

Installing Gearman PHP components

Build a GearMan Client Demonstrator

Build a GearMan Worker Demonstrator

macports.conf points to a non-existing directory.

QUICK TIP:

If you receive the following error while attempting to install MacPorts, there is a quick and easy fix:

Warning: Your developer_dir setting in macports.conf points to a non-existing directory. Since this is known to cause problems, please correct the setting or comment it and let macports auto-discover the correct path.

The following commands will let you locate the current path, then set this into the macport.conf file, making installation of your MacPorts package more robust.

First, determine the actual path on your machine. Your path may vary. The path shown below is for my OSX 10.7 box.


xcode-select --print-path
/Applications/Xcode 4.3.3.app/Contents/Developer

Next, edit the macports.conf file, replacing the current entry with the one you just found above:


sudo vi /opt/local/etc/macports/macports.conf

[...]

# Directory containing Xcode Tools
#developer_dir /Developer
developer_dir /Applications/Xcode 4.3.3.app/Contents/Developer

[...]

Once you have done that, when you run the installation command, you’ll no longer see that annoying warning, and your MacPorts should be installed, ready for you to enjoy!


sudo port -c install coreutils +with_default_names

MS SQL – SSIS setting and using Package Parametes in a task

First off, I’m going to tell you that Microsoft SQL Server SSIS is THE MOST INANE, FRUSTRATING TOOL I’VE ENCOUNTERED IN OVER 2 DECADES OF SOFTWARE ENGINEERING! It’s designed for the point-and-click query builder, and that makes it really difficult to use for real engineers that program code, not click buttons. I really do detest it’s design and mish-mash of concepts, and most of all it’s horrible documentation.

THAT SAID… I’m here to offer a little bit of help.

TODAY’s TASK – Using SSIS Package Variables in a useful way.

What I needed to accomplish today was the dynamic setting of a database’s compatibility level, based on it’s current level. (clear enough?)

The reason for this? MS SQL server 2005 supports the PIVOT operation (why do I feel like the spreadsheet coders designed this crazy thing?), but SQL 2000 does not. No big deal, EXCEPT when it comes to using a database that once lived on an SQL 2000 platform and now lives on a 2005 platform. Buried deep in the bowls of the settings is a flag that says “I know I’m running on a more modern server, but I refuse to honor all of it’s capabilities.. I’m just going to be a whiny pain in the ass until you tell me to stop”.

So, that’s what this task is all about…

Dynamically Detecting an SQL Server run-level of 80, and bumping it up to 90 to support the features of the server it’s running on

Crazy concept, I realize that. HOWEVER, the hitch in this deal is that when I am done using these PIVOT command (I’m just fixing a kludge written by someone else, I’ve never have used PIVOT, but that’s for another day.. next thing you know someone’s going to be to try to sell me on the virtues of File Maker! GAK!), I need to set the DB run level back down to the 2000 setting so OTHER code can run with out blowing up.

Getting to the task, using an SSIS SQL Execute Task

First things first, you need to create an SQL Execute Task in your SQL package. I did this inside a sequence container to clearly define what the application of this was confined too within the broader scope of the package. Here’s a screen shot:

Defining my Task widget

Next is planning and writing the code to check the compatibility level, and store it in a package variable. However the only way I could find to really do this, outside of using a Script task and programming in C# (last thing I really want to dig into right now), was returning a row from the query, and setting that BACK into a pre-defined package variable.

So let’s move to defining that package variable next… this is an important step that took me several hour to find any useful information on, and to use. It’s important that use use the nutty Micro$oft syntax to define the variable (why isn’t that point-and-click too, right?).

You need to go to your SSIS menu item and select ‘Variables’

Now define your SSIS package variable. I used the name

    Compat2kReq

UPDATE: What I have later found is that you MUST set the data type to OBJECT for the value to be properly set. IF you do not do that, a wonderful, inexplicable and generally BS message is presented to you:

Getting back on task… EDIT the SQL Execute Task, click on the SQLStatement until you see the elipsis […] to your right of the cell, you’ll need to then click on that to open the SQL editor….

With the dialog open, I inserted the following code. I’ll go over what it all means next.

Editing the actual query

DECLARE @DbName varchar(128);
DECLARE @CompatLevel int;

SET @DbName = DB_NAME();
SELECT @CompatLevel = compatibility_level from sys.databases where name=@DbName

IF(@CompatLevel) < 90 BEGIN EXEC sp_dbcmptlevel @DbName, '90' END SELECT @CompatLevel AS 'CompatLevel'

First section is the declaration of two local variables.


DECLARE @DbName varchar(128); -- will store the name of database
DECLARE @CompatLevel int; -- will store the starting compatibility level

Set the values next. The first operation get's the current compatibility level. For an SQL2000 server, that value is '80'. To use the PIVOT function, it must be '90' or higher. We'll get the that, after this.


SET @DbName = DB_NAME(); -- dynamically store DB_NAME() into var, seems like a redundant step but some of the MS SQL functions won't actually execute the function, such as ALTER DATABASE, so you have to play fun tricks with MS SQL
SELECT @CompatLevel = compatibility_level from sys.databases where name=@DbName -- now I set the current compatibility level into a local var. I might have been able to do this another way, but mapping the variable, as you'll see later was the fastest way to solve this problem.

Next, test to see if the level is high enough, and if not, bump it up to the minimum required, which is 90. Level 90 indicates SQL2005.


IF(@CompatLevel) < 90 -- see if the level is less than required 90.
BEGIN
EXEC sp_dbcmptlevel @DbName, '90' -- set to required level, 90
END

The final step, and this is crucial to detecting this change, and resetting it later, is to save the ORIGINAL compatibility level in the package variable for later use. Returning it as a single row ResultSet is how I solved this for the Hot-Patch I was tasked with creating.


SELECT @CompatLevel AS 'CompatLevel' -- select variable as a result row.

Now, before heading into the next area, I'm going to direct you to the General pane again and point out a second important attribute to set, The ResultSet value. I'm only returning a single row so that is how I set it. You will need to do this, by default there is no return.

Now that part 1 is done, next is to define that the Package Variable we set up early in this task, is going to be an OUTPUT from the SQL Execute Task we're working on.

Click on the 'Parameter Mapping' item and you'll see an empty table (unless you already added something before).

Use the 'ADD' button to create an entry. There you'll select the Package variable to reference. You'll also want to define it's type (I used SHORT) and if it's INPUT or OUTPUT. I'm using OUTPUT, meaning the SQL Task will OUTPUT a value that will be stored into this variable.

I set a parameter sequence of 0 (meaning it's the first) and an initial value of '0'.

The final step in this phase, is to set your Result Set up. The GUI will help you by displaying your OUTPUT paramter automatically, all you need to do is define the row name returned by your final select (here is the code again, with the row name highlighted, you can use any name you want.. I just happened to use the same name as my variable to confuse you).


SELECT @CompatLevel AS 'CompatLevel' -- result row name

So, there is it.. pretty simple to do, isn't it? I think it's crazy, but I'm not an M$ fanboy so... just keep that in mind.

If you have questions, leave a comment. I can't guarantee I'll be of any help, but if I can, I'll do my best.

I hoe this was useful for you. I spent several hours pouring over docs and web pages and never really found a one-stop explanation, so I wrote this one up, because I sure home I don't forget!

JIRA – tracking projects in an Agile way

With the kick-off of my new Start-Up Company (this is #8 for me, since I started my first company in 1984, Bay Auto Electronics), after taking 10 years to pursue some potentially lucrative (only time will tell if those efforts ever pay off, I’m not holding my breath) employment opportunities in the Internet Security / Anti-Fraud sector.

The short term plan is for that work to continue on a project consulting basis for the remainder of the year (that is the plan.. always subject to change), however in addition to that I’ve taken on two additional clients with very diverse project needs. Those needs need to be carefully manged and time properly allocated to each of these clients and their projects.

In the past, I’ve had adequate success using Work Diary spreadsheets to call out time per project and how it was spent within each of these projects. I continue to do that now. However I want a more useful, powerful and visual tool to track efforts, tasks, sprints, milestones, etc. And in addition to that I want to expose that information to each of my clients so they can get a status update on their projects near-real time, any time, day or night, and also help project their expenses as the projects move forwards.

To make this goal a reality, I have decided to Trail out a tool recently implemented at one of my former employers. It’s name is JIRA. And so far, having only used it there for 30 days or so, I’m impressed. Here is a screen shot of my current JIRA Dashboard (projects, names etc changed to protect the innocent, etc. etc. etc.).

My Sample JIRA Project Dashboard

All that said, and after communicating with one of the helpful JIRA engineers to make sure this tool would do what I want, and provide information for my clients as well, all on one system I host, the decision was made to move forward to the project!

To get further feet-wet, I’m first downloading the distributions for both MAC and LINUX. Initially I will be installing this on a MAC workstation to get the project defines, users entered etc. To test out the waters and learn on a test environment before cutting it loose in the wild. Eventually this will roll out with a public facing (for those with the right credentials) interface for project tracking. One of the first projects that I’ll be defining in my private installation will my forthcoming programming book. After 20+ years as a professional developer, trainer, sales engineer, IT Director and Entrepreneur, there are unique perspectives I can bring to the practice of programming. Keep any eye out for announcements on this by September! 🙂

Getting JIRA – downloading distributions

The current distributions, as of this blog, are located here:
http://www.atlassian.com/software/jira/download

JIRA Download Page

Installation Instructions are found here, a Confluence site (another Atlassian product):

https://confluence.atlassian.com/display/JIRA/Installing+JIRA

Installing JIRA Instructions

NOTE: – regarding OSX
As noted in the pages, installing on OSX is only suitable for evaluation purposes. That’s OK, not a big issue, I’ll have hardware available to host it in the next two weeks. Until then, running a local evaluation will be just fine. Unfortunate that the product can support Windows, but it’s not a surprising point since Apple has shuttered it’s proper Server production lines and is no only shipping MacMini servers and those horrendous beasts know as MAC Pro workstations. There IS A LOT to be said for 19″ rack compatible system, when it comes to REAL CORPORATE operations

Installing on MAC (in this case a laptop of all things)

I selected this package named: JIRA 5.0.6 (TAR.GZ Archive).

Instead of just creating more muck in my Downloads directory, I created a dedicates Atlassian directory under Applications.

I moved the file there and ran the extraction:

First order of business was setting my JIRA Home Directory. The instructions are found here at this link:
https://confluence.atlassian.com/display/JIRA050/Setting+your+JIRA+Home+Directory.

I chose to use the LINUX configuration script located at bin/config.sh to get JIRA setup. This I ran from a console:

You must also setup an environmen var that points to the same directory you configured using the JAVA Config dialog. Since I use the ‘bash’ shell (please, no need to comment on the virtues of ksh, sh, bash.. whatever… I’m not going to listen), I edited my .bash_profile adding these two lines:


## Required Element for JIRA
export JIRA_HOME=/Applications/Atlassian/atlassian-jira-5.0.6-standalone

With that little step completed, I returned to the bin/ directory where I installed JIRA and lit up the night:

FotoCorsa-3:bin david$ ./start-jira.sh

To run JIRA in the foreground, start the server with start-jira.sh -fg
executing as current user
                .....
          .... .NMMMD.  ...
        .8MMM.  $MMN,..~MMMO.
        .?MMM.         .MMM?.

     OMMMMZ.           .,NMMMN~
     .IMMMMMM. .NMMMN. .MMMMMN,
       ,MMMMMM$..3MD..ZMMMMMM.
        =NMMMMMM,. .,MMMMMMD.
         .MMMMMMMM8MMMMMMM,
           .ONMMMMMMMMMMZ.
             ,NMMMMMMM8.
            .:,.$MMMMMMM
          .IMMMM..NMMMMMD.
         .8MMMMM:  :NMMMMN.
         .MMMMMM.   .MMMMM~.
         .MMMMMN    .MMMMM?.

      Atlassian JIRA
      Version : 5.0.6
                  
Detecting JVM PermGen support...
PermGen switch is supported. Setting to 256m

If you encounter issues starting or stopping JIRA, please see the Troubleshooting guide at http://confluence.atlassian.com/display/JIRA/Installation+Troubleshooting+Guide

Using JIRA_HOME:       /Applications/Atlassian/atlassian-jira-5.0.6-standalone

Server startup logs are located in /Applications/Atlassian/atlassian-jira-5.0.6-standalone/logs/catalina.out
Using CATALINA_BASE:   /Applications/Atlassian/atlassian-jira-5.0.6-standalone
Using CATALINA_HOME:   /Applications/Atlassian/atlassian-jira-5.0.6-standalone
Using CATALINA_TMPDIR: /Applications/Atlassian/atlassian-jira-5.0.6-standalone/temp
Using JRE_HOME:        /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home
Using CLASSPATH:       /Applications/Atlassian/atlassian-jira-5.0.6-standalone/bin/bootstrap.jar
Using CATALINA_PID:    /Applications/Atlassian/atlassian-jira-5.0.6-standalone/work/catalina.pid

Opening up JIRA for the first time..

Having started JIRA on my localbox, I connected to port 8080 (the one I used as the default in the installation) and started to complete the setup:

It turns out I’ve made some sort of configuration/installation errors that was not called out in the documentation. Such is the story of software installation. I’ll have to get this one sorted out before continuing on.

JIRA startup error.. this might take a little time to sort out my installation error.

Creating a dedicated JIRA user

Performing a little re-wind, I decided to create separate user account, that can be the JIRA home. This was suggested in the docs but I just didn’t grok it at the time (it’s after midnight.. some slack should be afforded).

Created dedicated JIRA user.

Now.. back to the environment files… first I’m going to log into the new user and create a place for JIRA, copy it’s path, then update the configs.

I logged in to the new user, via the terminal window, then edited (creates actually) the .bash_profile for the user setting the following as the JIRA environment:


jira$ vi .bash_profile

## Required Element for JIRA
export JIRA_HOME=/Users/jira/jira-home

Next, I had to sort out one permissions issue in the Applications directory, and that had to do with the permissions to updates config files in the Altassian directory. To do this, I switched to my root user (su –), moved to the install directory and executed this command to allow group write at all the directory levels for the group user (in this case ‘staff’).


su
Password:
sh-3.2# pwd
/Applications/Atlassian/atlassian-jira-5.0.6-standalone
sh-3.2# chmod -R 775 *

I closed that terminal window, then logged my desktop into my new jira user and re-launched the configuration program (see above if you’ve forgotten how that is started up), and reset the home directory:

Re-Setting the home directory

Tested the connection:

Testing DB connection.

Set the ports I wanted to use for JIRA (defaults shown):

Checking / Setting ports

Then kicked off JIRA again, but this time as the jira user. This time it stuck, took and started:

Next step 2 of the installation is presented, and the requisite settings defined. I’m going to run in PRIVATE mode, as I don’t want to have people attempt to add users to my JIRA without my permissions. That sounds like a licensing seat disaster in the making….

Step 2 of Setup.

NOTE: You will need to sign up and get an evaluation license key to go any further. Since I intend to purchase the product in the new future, unless the evaluation determines another course of action is required, this is a non-issue for me. You may be hesitant to do so, for some reason, one I won’t guess, but if so, be aware of that before digging yourself too deep a hole.

Two more quick steps follow, such as setting up your primary Admin User (sorry, NOT going to show you my settings there), and one last step confirming the setup was successful, before being shuttled over to your new Dashboard!

Dashboard Login

And.. VIOLA!!! Notice the red warning at the far lowest left, the Evaluation DB attached is IN MEMORY only and most likely will be wrecked on a power fail or other shutdown. This could be a big issue on a laptop, wouldn’t you say? Regardless, this IS an evaluation after all…. so… next steps tomorrow will be to see how this all holds up over the next week when I’m back in CA and can install this on my office’s internal servers.

Running, living, breathing JIRA!

MORE TO COME….

Create a Self-Signed wildcard SSL Certificate

Justification

Are you a developer that commonly uses SSL / HTTPS communications on your websites? Do you have multiple development environments hosted on the same domain (such as separate client demo/eval/testing VirtualHosts?), then a wildcarded SSL cert might be for you.

Generating one is very simple process. You will need to have the OpenSSL libraries installed on your computer. All but the worst of Operating systems is likely to have this already installed. If not, you can always go here and get a package: [OpenSSL.org]

Enough reasoning and rationalization, time to get down to business.

Overview

First you must have a private key generated and installed. Second that key is then used to generate a simultaneous signing request and cert signing operation.

Once you have your files created, reference them in the Webserver of your choice (such as nginX or Apach2, if you are using IIS… my heart aches for your plight), using the documentation for that webserver. I’m not going to go into there here, because I’m just taking the time to share this simple process fore generating the CERT.

Step 1 – Generate your private key

If you do not have a private key generated, I’m going to show you have to do it. If you have one that you want to use already, and you know where it is, move onto the next step.

Open a termnal window and execute the following openssl command to generate a private key. For my own installations I never use a key shorter than 2048. Most of the time, I use one that is quite a bit longer. That said, 2048 should provide a sufficiently long key for any practical SSL purposes. Yes, SSL has security issues and a motivated hacker can likely piggy-back it, regardless of your key size… but for the sake of argument and getting through this post, we’ll pretend the Interwebs are a safe place.

Move to the location where you will store your private key (this is a typical location, you can use whatever you want):

cd /etc/ssl/private

Run the command to generate the key:

openssl genrsa 2048 > my.super-awesome.hostname.key

Generating RSA private key, 2048 bit long modulus
......................................+++
.........+++
e is 65537 (0x10001)

So, now we have a key:

ls -l
-rw-r--r-- 1 root wheel 1679 Jan 9 09:41 my.super-awesome.hostname.key

Step 2 – Generate your CERT

This is the fun part, and the 2nd of the super easy steps. To complete this you’ll want to know up front, some important pieces of data, such as the hostname for your site (I’m going to use super-awesome.net for this example). You want to have the address you want to use handy, including the country. Also want to have an e-mail address that will be published in the SSL cert to contact you, and a department and company name if so inclined. Below the actual command and responses will be in bold:


openssl req -new -x509 -nodes -sha1 -days 3650 -key my.super-awesome.hostname.key > my.super-awesome.hostname.cert

You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]: US
State or Province Name (full name) [Some-State]: Kellyfornia
Locality Name (eg, city) []: Sac-of-Tomatoes
Organization Name (eg, company) [Internet Widgits Pty Ltd]: Crazy Assembly House
Organizational Unit Name (eg, section) []: Committee on wasting tax payer money
Common Name (eg, YOUR name) []: *.super-awesome.net
Email Address []: admin@super-awesome.net

Verify that you have the file:

ls -l
-rw-r--r-- 1 root wheel 1927 Jan 9 09:50 my.super-awesome.hostname.cert

That’s all there is to it! You’re done. Now you have a Self-Signed SSL wildcard sert for super-awesome.net. This would allow you to secure (and I always use the word secure with a certain degree of sarcasm) any sub-domain / hostname under super-awesome.net. Examples of what it would handle:

  https://www.super-awesome.net
  https://qa-server.super-awesome.net
  https://some-client.super-awesome.net
  https://another-client.super-awesome.net
  https://ya-client.super-awesome.net

Now, it’s important to note that this DOES NOT secure anything beyond that first level.. here are a couple more examples:

  https://www.super-awesome.net --  OK
  https://qa-server.super-awesome.net -- OK
  https://some.client.super-awesome.net -- FAILS
  https://another-client.super-awesome.net -- OK
  https://test.ya-client.super-awesome.net -- FAILS
Extra Credit – viewing the contents of your CERT

It’s all well and good to generate the cert, but what if you want to verify it’s properly setup? What if you find a cert on your system and you want to know what it covers, when it expires, whom might own it, etc. Well, that’s possible too. Running a simple command we’ll examine the SSL Cert just created. The important info is in the ‘Issuer’ and ‘Subject’ blocks.

  openssl x509 -noout -text -in my.super-awesome.hostname.cert

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            c4:3d:66:b4:e3:cc:61:86
        Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=US, ST=Kellyfornia, L=Sac-of-Tomatoes, O=Crazy Assembly House, OU=Committe on wasting tax payer money, CN=*.super-awesome.net/emailAddress=admin@super-awesome.net
        Validity
            Not Before: Jan  9 17:50:56 2012 GMT
            Not After : Jan  6 17:50:56 2022 GMT
        Subject: C=US, ST=Kellyfornia, L=Sac-of-Tomatoes, O=Crazy Assembly House, OU=Committe on wasting tax payer money, CN=*.super-awesome.net/emailAddress=admin@super-awesome.net
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
            RSA Public Key: (2048 bit)
                Modulus (2048 bit):
[...]  /*  removed the modulus to keep the post short */
               Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Subject Key Identifier: 
                9D:72:0C:A0:E6:EB:77:2C:77:EF:E8:9E:B7:BC:9F:53:81:1A:40:9D
            X509v3 Authority Key Identifier: 
                keyid:9D:72:0C:A0:E6:EB:77:2C:77:EF:E8:9E:B7:BC:9F:53:81:1A:40:9D
                DirName:/C=US/ST=Kellyfornia/L=Sac-of-Tomatoes/O=Crazy Assembly House/OU=Committe on wasting tax payer money/CN=*.super-awesome.net/emailAddress=admin@super-awesome.net
                serial:C4:3D:66:B4:E3:CC:61:86

            X509v3 Basic Constraints: 
                CA:TRUE
    Signature Algorithm: sha1WithRSAEncryption
[...]  /*  removed the signature to keep the post short */


Looking at the Subject breaks downs as follows:

  Subject: C=US, ST=Kellyfornia, L=Sac-of-Tomatoes, O=Crazy Assembly House, OU=Committe on wasting tax payer money, CN=*.super-awesome.net/emailAddress=admin@super-awesome.net

  C=US - Country code  'US'
  ST=Kellyfornia  -  State or Provence. 
  Sac-of-Tomatoes   -  City/Location
  O=Crazy Assembly House  -  Company or Organization name
  OU=Committe on wasting tax payer money Organizational Unit (department, etc.)
  CN=*.super-awesome.net  -  Canonical Name (hostname / domain) that
 the CERT services.  In this case it's a wildcard, signfied by the '*'

That's all there is to it. Now, secure those website communications!

Creating a Logo – birth of a brand

With the impending launch of my new enterprise, it’s time to get down to the business of creating a logo. This article will document the process, good or bad, success for fail, the steps I take will be detailed here for your amusement, edification or horror. You be the judge.

STEP 1 – Get the idea formulated.

Get get started, sketch out a general idea for the logo. Since this I am still a pre-launch state, that drawing will remain a work of the readers imagination. I will tell you though, that after a few iterations I knew what I wanted to go after, and what base information I needed.

First I decided that one of the components I needed for this logo project was an image of the crescent moon. This was of course very easy to locate on the web. There is no lack of such images.

Google search for some sample images

I decided on a sampling of the images and saved them off to a special directory on my computer. Now it was time to step away for a few minutes, clear my mind and and prejudices regarding the images, and then re-open the directory and look at each one. After some time, I selected one of the images that seemed to have the most promise.

Now, it’s important to NOT get locked in or bogged down on one image. Don’t shoot for perfection here, you must be comfortable with the concept that your first, seconds, or maybe even your fifth attempt will be abject failure. It’s a process, and if it will take you 5 attempts to get it right, you’ll never get there unless you get through those first four… so.. let’s get to it.

Step 2 – Open and adjust image to suit

Select your first best guess as a starting point and open the image in some photo editing software. My choice is Photoshop:

The goal I have in mind, requires the crescent to be on the other side of the logo, and the curve must extend over the top. So, this image as is, will not do. Opening up a variety of Photoshop tools (at this point I should point out this is NOT going to be tutorial on how to use photoshop, there are plenty of those done by people better than I at providing such help), I flipped the image and rotated it clockwise 22 degrees.

The next step was to then upon up the ‘Levels’ tool and start cranking in as much contrast, at both the white and black ends of the scale, to remove as much detail as I felt practical. This is needed to get the image close to something you can work with then we open this up again in vector editing suite. This will hopefully make sense, shortly.

This is when things start to get tricky. I know that I may need to go back to photoshop and crank in more contract, or maybe I need to change which end of the contrasting I apply to get just the right amount needed when I move to Illustrator and start my vector editing. Take a few moments to look at your image, if you are attempting something yourself, and apply any tweaks you feel you need now. You’ll notice the file name has changed. I like to keep a good clean copy of the original files aside in case I really destroy the current version. Hitting the “reset button” is more likely to happen than not. Don’t get discouraged if you have to start over. It’s better to have tried and failed than to have never tried at all, simply because you’re afraid to fail. The only way to completely fail is to never try at all.. Be a DOER… not a WISH I DIDer! Here I am making a few more adjustments.

STEP 3 – Open in vector editor

Now, it’s time to find out if I did enough contrast adjustment. This is also the point when I find out if I have a clue what I’m going in Illustrator. caveat emptor, you’re getting what you paid for here.

Now, before doing anything, I’m going to save the workspace. Again, it’s nice to be able to get back to the beginning if something goes wrong.

Next step I want to increase the size of the workspace into roughly a 2×9 ratio (height x width). This will give me room for the next parts of the logo, including text etc., and finally used one of the built in tools called “Live Trace” to convert the image into an Illustrator vector:

Here is what the resulting vector nodes look like once the trace is complete. I adjusted the minimum pixel and path size vars up and down until I had a trace I liked.

After inverting the image, using live trace and use the ‘Auto convert to Live Paint, I had the primary image I wanted. Following the conversion, I added a target and underline, then selected a text I wanted to use. Again, I’m not totally in love with this text and I way decide to change this later, but for now, to test this image, I have to start somewhere. Of course, Sample Company is *not* the name I’m going to be using, this is just that, a sample. After about 3 hours of work, mostly poking around in Illustrator for the options I really wanted to use I have this concept proof.

Step 4 – wrapping things up

Once the new image is saved, make sure cropping is properly set and export the image to the apps you need. For me, I needed it in a transparent .png for use with invoices, letterheads and publishing.

This is just the beginning. Once the new company is fully launched I’ll be posting the final logos. Keep an eye out for more news on Jun 18th!

California Dreamin’ — setting up the new California Office

It has been a long 8 months since the decisions was made to re-locate operations to Santa Cruz, California. On April 16th, the first of the equipment, furniture, and transient stuff arrived in California from the Washington office.

Despite all the best intentions and plans, construction of the office conference room is still underway, so equipment remains temporarily stacked until they can be placed in their designated locations.

Below is a gallery of photos taken yesterday as things got closer and closer to completion.

After 8 days of 10+ hours each moving equipment, negotiating with contractors, delivering two 16′ box trucks, a 20′ trailer and countless pickup truck loads, we have accomplished a lot. Much work remains to be done before the offices are fully functional, but we are already generating revenue of the new location.

When I return to California following the final contractors work, presentation monitors, servers and printers will be installed in their proper locations. I’m looking forward to getting this all wrapped up so we can concentrate on moving forward with day to day operations.

Cassandra – Getting Started – (deployment Part 2 – Installing Ops Center)

<< Previous: Cassandra – Going into Production – Part 2.

With an empty cluster running, the next step I’m going to take is to install and configure OpsCenter from DataStax. This is a fantastic tool for monitoring the health and performance of your cluster.

Installing Ops Center

The first order of business is to create a directory to store the Ops Center code on the server. I opted to do this within the user account used for Cassandra, as the directory datastax

:~$ mkdir datastax
:~$ 

Next, download and extract the OpsCenter package:

:~/datastax$ wget http://downloads.datastax.com/community/opscenter-1.4-free.tar.gz
--2012-03-26 08:25:30--  http://downloads.datastax.com/community/opscenter-1.4-free.tar.gz
Resolving downloads.datastax.com... 173.203.57.192
Connecting to downloads.datastax.com|173.203.57.192|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21539843 (21M) [application/octet-stream]
Saving to: `opscenter-1.4-free.tar.gz'

100%[=======================================================================>] 21,539,843  3.72M/s   in 7.5s    

2012-03-26 08:25:38 (2.74 MB/s) - `opscenter-1.4-free.tar.gz' saved [21539843/21539843]

:~/datastax$ tar -xvzf opscenter-1.4-free.tar.gz
opscenter-1.4/
opscenter-1.4/log/
opscenter-1.4/bin/
opscenter-1.4/bin/create-keystore.bat
opscenter-1.4/bin/create-key-pair.bat
[...]
opscenter-1.4/conf/event-plugins/email.conf
opscenter-1.4/conf/ssl.conf
opscenter-1.4/conf/opscenterd.conf

:~/datastax$

Next is the setup for OpsCenter. Setup is done via a Python script, located in the BIN directory. Have your listening IP ready and know which port you want to use for the Ops Center web portal. I’m going to use the default of port 8888. Make sure you have the port open on your machine. (click here to jump to the my section on ports).

:~/datastax$ ls
opscenter-1.4  opscenter-1.4-free.tar.gz
:~/datastax$ cd opscenter-1.4
:~/datastax/opscenter-1.4$ bin/setup.py
Generating a 1024 bit RSA private key
.........++++++
...++++++
writing new private key to 'ssl/opscenter.key'
-----
MAC verified OK
Certificate was added to keystore

:~/datastax/opscenter-1.4$ 

Configure the Ops Center deamon. Set the listening IP to an IP available on the system. I’m going to node’s interal IP address (10.1.0.23). The values I’ve changed are in bold.

:~/datastax/opscenter-1.4$ vi conf/opscenterd.conf
[...]
[jmx]
# The default jmx port for Cassandra >= 0.8.0 is 7199.  If you are using
# Cassandra 0.7.*, the default is 8080, and you should change this to
# reflect that.
port = 8001
[...] 
[webserver]
port = 8888
interface = 10.1.0.23
staticdir = ./content
log_path = ./log/http.log
[...]
[cassandra]
# a comma-separated list of places to try for a connection to your Cassandra
# cluster:
seed_hosts = 10.1.0.23,10.1.0.26
[...]

Installing the Ops Center Agents

Each node in the cluster must have a running Ops Center agent. The installation package for this was generated by the Ops Center setup process, and saves a compressed file. This file then needs to be copied and extracted on each node you plan to monitor with the Ops Center.

:~/datastax$ mkdir opscenter-agent
:~/datastax$ cp opscenter-1.4/agent.tar.gz opscenter-agent/
:~/datastax$ cd opscenter-agent/
:~/datastax/opscenter-agent$ tar -xvzf agent.tar.gz
agent/opscenter-agent-2.5-standalone.jar
agent/conf/log4j.properties
agent/bin/setup.bat
[...]
agent/bin/ssl/agentKeyStore.p12
agent/bin/ssl/opscenter.key
agent/doc/LICENSE

:~/datastax/opscenter-agent$

Now run the agent’s setup, assigning it’s IP and the Ops Center’s IP. 10.1.0.26 is this node’s IP address. 10.1.0.23 is the location of the Ops Center install (this may or may not be on the same system or even the same IP address):

:~/datastax/opscenter-agent$ agent/bin/setup 10.1.0.26 10.1.0.23

Make sure you copy the agent file to ALL your other nodes and follow the same setup procedure (this is an example of how I copied the file, your system, ports etc. may be different), and repeat the steps above, with the appropriate IPs.

:~/datastax/opscenter-agent$ scp -P41718 agent.tar.gz bigdata@10.1.0.26:.
RSA key fingerprint is 2b:5b:26:03:87:a4:b1:ea:90:b5:4e:42:60:88:cd:d1.
bigdata@10.1.0.26's password: 
agent.tar.gz                                                                   100%   10MB  10.3MB/s   00:01    
:~/datastax/opscenter-agent$ 

Start up Ops Center

On the Ops Center machine, move back to it’s installed directory and start the process.

:~/datastax$ cd opscenter-1.4
~/datastax/opscenter-1.4$ bin/opscenter &

Now connect to the IP address and port and you should see a base Ops Center instance. This is what you would typically see before starting up your agents:

DataStax Ops Center 1.4

Start up the Node Agents

The last step is to start up the Agent deamons so that the OpsCenter knows the status of each node.

:~/datastax/opscenter-1.4$ cd ../opscenter-agent/
:~/datastax/opscenter-agent$ agent/bin/opscenter-agent &
:~/datastax/opscenter-agent$  INFO [main] 2012-03-26 09:12:40,465 Loading conf files: conf/address.yaml
 INFO [main] 2012-03-26 09:12:40,505 Java vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_03
 INFO [main] 2012-03-26 09:12:40,505 Waiting for the config from OpsCenter
 INFO [main] 2012-03-26 09:12:40,637 SSL communication is enabled
 INFO [main] 2012-03-26 09:12:40,637 Creating stomp connection to 10.1.0.23:61620

With the Agents fired up, you will see a nice dashboard, showing the current status of the cluster, and some metrics on performance.

Ops Center up and running.

Conclusion

This basically concludes the fast deployment steps required to download, install, configure and start up Cassandra, along with the DataStax Ops Center.

Total time required to deploy was under 4 hours.

Cassandra – Getting Started – (deployment Part 1 – Installing Cassandra)

It’s been almost a month since I started the Apache Cassandra investigation, and now it’s time to move into a production stance. Some of these steps will differ from the original steps documented here in my blog. Later this week I will go back and amend those posts to point at this post as the more recent information. Those old links are already being referenced by multiple sites, so deleting them would not be a kind thing to do. Thus.. onward we move!

Getting the right JVM/JDK/JRE

Originally, the OpenJDK was being used for this introduction and research into Cassandra. Being a proponent of Open Source, I was going to avoid the use of Oracle’s potentially proprietary JDK/JRE in this environment. I have since seen first had, that the JDK DOES IN FACT MATTER, and the one that supports the latest tools is the one from Oracle.

That is located here:

Downloading the JRE/JDK from Oracle has enabled the reliable use of DataStax’s OpsCenter management tool (more on that later).

These are the recommended minimums for Cassandra and OpsCenter from DataStax, a respected partner of the Apache Cassandra project.

Sun Java Runtime Environment 1.6.0_19 or later
Python 2.5, 2.6, or 2.7
OpenSSL version listed in Configuring SSL unless you disable SSL

I ended up selecting the JDK (linked here) and deposited it in the following location on my system as user root (create the directory path if you don’t already have it):

/opt/java/64/jdk-7u3-linux-x64.tar.gz

Extract the file:

:/opt/java/64# tar -xvzf jdk-7u3-linux-x64.tar.gz
jdk1.7.0_03/
jdk1.7.0_03/include/
jdk1.7.0_03/include/jvmti.h
jdk1.7.0_03/include/jawt.h
[...]
jdk1.7.0_03/jre/plugin/desktop/sun_java.desktop
jdk1.7.0_03/jre/COPYRIGHT
jdk1.7.0_03/LICENSE
jdk1.7.0_03/COPYRIGHT
:/opt/java/64# 

The Cassandra Build I decided to use is this one: apache-cassandra-1.1.0-beta1. I downloaded the file to the user I created for this using wget:

:~$ wget http://apache.deathculture.net/cassandra/1.1.0/apache-cassandra-1.1.0-beta1-bin.tar.gz
--2012-03-25 22:52:27--  http://apache.deathculture.net/cassandra/1.1.0/apache-cassandra-1.1.0-beta1-bin.tar.gz
Resolving apache.deathculture.net... 173.236.158.254
Connecting to apache.deathculture.net|173.236.158.254|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12505037 (12M) [application/x-gzip]
Saving to: `apache-cassandra-1.1.0-beta1-bin.tar.gz'

100%[=======================================================================>] 12,505,037  8.84M/s   in 1.3s    

2012-03-25 22:52:29 (8.84 MB/s) - `apache-cassandra-1.1.0-beta1-bin.tar.gz' saved [12505037/12505037]

Next the file is extracted, moved to a shorter directory name:

:~$ tar -xvzf apache-cassandra-1.1.0-beta1-bin.tar.gz
:~$ mv apache-cassandra-1.1.0-beta1 cass-beta1

Configuring a Node

Now the configuration is edited to define the node ring. The first file to edit is the cassandra.yaml file.

This initially will be only a 2 node cluster, but the tokens must still be calculated. Here are the node tokens I generated using a PERL script I wrote (see: Cassandra and Big Data – building a single-node “cluster” – Extra Credit for the code):

:~/cass-beta1$ ./token.pl 2
Calculate tokens for 2 nodes
factor = 170141183460469231731687303715884105728
node 0	token: 0
node 1	token: 85070591730234615865843651857942052864
:~/cass-beta1$ 

Edit the cluster name. I’m not testing, so I changed the name to one descriptive of the data I was storing. ‘ip’. In the example below, I’m showing configs for the 2nd of the two nodes. Note: The first node would have a different IP address and also a different initial token, in this case ‘0’, as calculated by the tool.

:~$ cd cass-beta1/
:~/cass-beta1$ vi conf/cassandra.yaml

[...]

# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'ip'

[...]

 If blank, Cassandra will request a token bisecting the range of
# the heaviest-loaded existing node.  If there is no load information
# available, such as is the case with a new cluster, it will pick
# a random token, which will lead to hot spots.
initial_token: 85070591730234615865843651857942052864

[...]

# directories where Cassandra should store data on disk.
data_file_directories:
    - /home/bigdata/data/

[...]

# commit log
commitlog_directory: /home/bigdata/commitlog/

[...]

# saved caches
saved_caches_directory: /home/bigdata/saved_caches/

[...]

          # seeds is actually a comma-delimited list of addresses.
          # Ex: ",,"
          - seeds: "10.1.100.101,10.1.100.102"
[...]

# Setting this to 0.0.0.0 is always wrong.
listen_address: 10.1.1.101

[...]

rpc_address: 10.1.1.101

[...]

# Time to wait for a reply from other nodes before failing the command (this was done to increase timeout to 30 seconds, sometimes the search I need to run is pretty nasty)
rpc_timeout_in_ms: 30000

Following that, the shell file needs to be modified to designate the JMX listening port:

:~/cass-beta1$ vi conf/cassandra-env.sh

[...]

# Specifies the default port over which Cassandra will be available for
# JMX connections.
JMX_PORT="8001"

[...]

Make sure your logfile is in the desired location. I decided to keep it within the account itself for now:

vi cassA-1.0.8/conf/log4j-server.properties
[...]

log4j.appender.R.File=/home/bigdata/log/cassA.log

[...]

Next I set the paths in the .bash configuration file for the account, using the following 3 environment variables (ANT_HOME is used by the ANT compiler, if you are not writing code, your JAVA_HOME will point at the JRE, not the JDK, and you won’t need the ANT_HOME path at all):

vi ~/.bash_profile
export JAVA_HOME=/opt/java/64/jdk1.7.0_03
export ANT_HOME=/usr/lib/ant/
export CASS_BIN=$HOME/cass-beta1/bin
export PATH=$PATH:$ANT_HOME/bin:$CASS_BIN

Systems Administration

Make sure there is a location for the cassandra server to write it’s log files. You’ll need your SysAdmin, or root privs, to do this. I set the ownership to root and the user under which I’m currently running cassandra (bigdata):

root:/data/feed/indata# cd /var/log
root:/var/log# mkdir cassandra
root:/var/log# chown root:bigdata cassandra
root:/var/log# chmod 775 cassandra

The following ports need to be opened up, if you are running a firewall on each system (you ARE, right!?!), to allow Cassandra nodes to communicate with each other. This is a snippet from my rules-based firewall control file.


Port Usage:

  • 9160 – Thrift port, where the API is serviced for Reads/Writes to Cassandra
  • 8001 – Individual node listening port. This is used for the command line (cli)
  • 7000 – Commands and Data TCP port, used nodes for communications
  • 7001 – SSL port used for storage communications
  • 8888 – Only used on systems that will host an Ops Center installation
  • 61620 – Required for Ops Center Agent Communications

## Cassandra
ACCEPT          loc             $FW             tcp     9160,8001,7000,7001
## OpsCenter
ACCEPT          loc             $FW             tcp     8888,61620


Starting up the Cluster

This is where the truth is told. The rubber meets the road. The money is placed where your mouth is. Light ’em up!

:~$ cassandra
:~$  INFO 23:52:54,232 Logging initialized
 INFO 23:52:54,236 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_03
 INFO 23:52:54,237 Heap size: 6291456000/6291456000
[...]
INFO 23:52:55,162 Node /10.1.0.23 state jump to normal
 INFO 23:52:55,163 Bootstrap/Replace/Move completed! Now serving reads.

IT LIVES!! Now start your other node(s), and check to verify you have a complete ring, properly configured. You should see something like this in subsequent nodes, I’ve highlighted the references to the other member node:

[...]
INFO 23:54:16,042 Node /10.1.0.23 has restarted, now UP
 INFO 23:54:16,043 InetAddress /10.1.0.23 is now UP
 INFO 23:54:16,043 Node /10.1.0.23 state jump to normal
 INFO 23:54:16,088 Compacted to [/home/bigdata/data/system/LocationInfo/system-LocationInfo-hc-6-Data.db,].  544 to 413 (~75% of original) bytes for 4 keys at 0.003425MB/s.  Time: 115ms.
 INFO 23:54:16,109 Completed flushing /home/bigdata/data/system/LocationInfo/system-LocationInfo-hc-5-Data.db (163 bytes)
 INFO 23:54:16,110 Node /10.1.0.26 state jump to normal
 INFO 23:54:16,111 Bootstrap/Replace/Move completed! Now serving reads.

Run nodetool:

:~$ nodetool -h10.1.0.23 -p 8001 ring
Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               85070591730234615865843651857942052864      
10.1.0.23      datacenter1 rack1       Up     Normal  17.77 KB        50.00%  0                                           
10.1.0.26      datacenter1 rack1       Up     Normal  17.66 KB        50.00%  85070591730234615865843651857942052864      
 

WE HAVE A RING!

NEXT: SETTING UP OPS CENTER