Update for the last month – Jeffrey Haskovec

Sorry for the lack of updates, but I was on the end of project march and then off on paternity leave. I am hoping to resume regular posting soon. The project was very successful, we had a big push and brought another 120 tables online in our new Cassandra Cluster and migrated that data from SQL Server. Along the way it has given us a few fun design challenges.

Initially we were working around some limitations in keys in Cassandra. In SQL server often you will query on a column that may be null. In Cassandra none of the columns in your primary key can be null, which means you can’t query on that column since Cassandra doesn’t allow you to do adhoc queries. One work around we started with the obvious solution is to use a secondary index. However Datastax will tell you in general not to use them. We found in playing around with them in production we have just had issues with them. They seem to either get corrupted or be out of sync in some way with the tables very often so end up having to run a repair on that index to get the correct data. As a result of this we are completely moving away from secondary indexes. In the end it gave us some interesting data design problems, but I think we will end up with a much more resilient system in the end.

Up next I will be changing how we package our application. Currently it is a Spring / Hibernate App that is deployed on JBoss as an ear file. I plan on converting the project to a war file and moving us away from JBoss dependencies so that we could deploy on any web container. The goal will be to switch to Tomcat 8 at the end.