Cassandra Days in Dallas 2015

I may have mentioned this before, but I love going to software conferences. When I got the email mentioning that Cassandra Days was coming to Dallas with a free 1 day conference on all things Cassandra, I signed up immediately. The event was sponsored by Datastax who sells a commercial version of Cassandra called Datastax Enterprise. They had 2 tracks an introductory track for people who are just exploring Cassandra, but haven’t yet taken it to production, and track 2 which was a deeper dive for people with experience with Cassandra.

It was a great event. My team came over from the office as well and they attended a mix of track 1 and track 2. The main thing I wanted people to attend was the Data modeling sessions as that is one of the biggest changes for people who are used to SQL databases when they make the move to Cassandra. The CQL language is great to get SQL people up and running quickly, but then when they try to do things they are used to with a relational database it sort of falls apart for them. When we first signed up with Datastax Enterprise several of us got to attend their Data Modelling class on sight which was great and strongly recommended for anyone taking Cassandra into production, but my team mates had not attended those classes so this was a great event for them to attend.

I had 2 really big takeaways from the event. First was a discussion of Cassandra Light Weight Transactions. In my new Cassandra Data layer that I implemented as part of my current project I hadn’t exposed this concept yet as it is something that we haven’t used to date. It isn’t like a typical SQL transaction in that you aren’t getting the whole ACID concept or getting transactional rollback on a failure so the LWT terminology is a bit misleading. But what it does protect you is from a race condition that clobbers data. Let’s pretend you have a table where you are tracking login names. And they must be unique. If you read the table to see if a name is available and then do an insert in Cassandra there is a race condition where data can get clobbered. Imagine there are 2 users Jeffrey Haskovec and John Haskovec. They are both registering in our system at the same time. At time T1 Jeffrey Haskovec’s thread checks the table for the username of jhaskovec. There is no record so it is cleared to use it. At T2 John Haskovec’s thread then checks the table and sees that the username of jhaskovec isn’t used and so it proceeds. Then at time T3 Jeffrey Haskovec’s thread does an insert into the login table with a username of jhaskovec. The insert returns and Jeffrey thinks he has successfully registered his username. Then at T4 John Haskovec’s process inserts his username of jhaskovec which overwrites Jeffrey. At this point we aren’t aware that we just clobbered data in our datastore but when Jeffrey comes back and can’t login as his user account has been overwritten we will have a difficult to track down bug.

Enter light weight transactions. Now you change that insert statement to an insert if not exists statement in which case it starts up a transaction that guarantees it will only insert the hash if it isn’t already there. We have to check the return result of that insert statement to find out if our insert was successful. So if we go back to our previous example at T4, John Haskovec’s insert would have returned a false that it wasn’t applied and we would have avoided clobbering Jeffrey’s insert from T3. This alone is actually a problem a new table that we are modeling could have had in extremely rare situations so it was very timely that I attended this talk.

I also really liked Patrick McFadden’s advanced data modeling talk as he always gives some great ideas that are worth considering. Lots of just general things to consider there that are always helpful. One of the other talks I was at went into a bunch of cluster level configuration discussion which was also good to hear. I will dig into what we are doing and see how it compares to some of the options that were presented.

The other great thing about conferences is just networking with people. I chatted with a few other people from different companies and it is always interesting to hear how they are using Cassandra and just what things are like in their domain. All in all it was a great event and if one of them is coming to a city near you it is worth spending the day over there for a good free conference. Oh and one final cool thing was that Datastax gave us all USB drives with the latest versions of all of their software on it which is nice to play around with if you are considering rolling it out.