Update for the last month

Sorry for the lack of updates, but I was on the end of project march and then off on paternity leave. I am hoping to resume regular posting soon. The project was very successful, we had a big push and brought another 120 tables online in our new Cassandra Cluster and migrated that data from SQL Server. Along the way it has given us a few fun design challenges.

Initially we were working around some limitations in keys in Cassandra. In SQL server often you will query on a column that may be null. In Cassandra none of the columns in your primary key can be null, which means you can’t query on that column since Cassandra doesn’t allow you to do adhoc queries. One work around we started with the obvious solution is to use a secondary index. However Datastax will tell you in general not to use them. We found in playing around with them in production we have just had issues with them. They seem to either get corrupted or be out of sync in some way with the tables very often so end up having to run a repair on that index to get the correct data. As a result of this we are completely moving away from secondary indexes. In the end it gave us some interesting data design problems, but I think we will end up with a much more resilient system in the end.

Up next I will be changing how we package our application. Currently it is a Spring / Hibernate App that is deployed on JBoss as an ear file. I plan on converting the project to a war file and moving us away from JBoss dependencies so that we could deploy on any web container. The goal will be to switch to Tomcat 8 at the end.

Recap for 2015

At the start of the year I posted my Themes for 2015. I decided now is a good time to look at what I was thinking at the start of the year and see how my year turned out. I think it is sort of pointless to set out some ideas of things you want to accomplish if you never stop and assess what you actually did, so this is sort of an accountability post to myself to see how things played out for the year.

  • On the first idea of working on weekly updates. I didn’t accomplish that. On the other hand if we look at it from a post count standpoint I did average that, so I would call that a pretty successful year for my blog even if there were points that I went a whole month without putting some thoughts down.
  • The next theme was to do more reading. Overall I was pretty successful with this. I didn’t always meet the 10% of my kindle book a day, but I did actually read more books than the previous year. I would call that a success, I hope to keep the momentum going, but with another baby on the way this year, I will probably read fewer books this year. I find that when I am overtired I can’t read as it just puts me to sleep so depending on how soon my little one is sleeping will depend upon how much reading I end up doing.
  • The next thing on my list was spending more time doing stuff in Spring Boot. I did a little bit of work on the side playing with the framework. I am convinced that this is the only way to do development with Spring going forward, but migrating a very large legacy app in that direction is easier said that done. On the positive side though our organization did roll out a new microservice in spring boot this year, so we are beginning to bring it into production. The real key will be migrating the bulk of our code to the framework.
  • The next idea I had was to try to learn Angular and get more serious with that. I can say that did not come to pass. I did take Code School’s shaping up with Angular.js course, and I just recently noticed they have added the next level to the free plan. I may end up messing with that over the coming year, but I am not sure. I am sort of watching to see if I think people start moving over to Angular 2 or if the change in compatibility is going to hurt the popularity of the framework. Also I am hearing more buzz about react.js so that is always an option to look at as well. Either way 2015 was not a year of doing front end stuff for me.
  • For my architectural updates 2015 was a huge success. It started off slow with having issues aspectj-maven-plugin. I was forced to fork the project temporarily on github as the maintainer showed now interest in rolling a patch that someone contributed to solve the problem. Then codehaus shut down. After they moved the project into github one of the other maintainers rolled the fix in when I pointed out it was in the old Jira. So I was happy to be able to abandon my fork of the project and get back on the mainline branch. Then I ended up ripping the whole Hibernate Metamodel Generator out of the project anyway which probably wouldn’t have necessitated needing the aspectj plugin update, but that happened later in the year after we needed that upgrade. After getting the aspectj plugin situation resolved I was able to get us upgraded to Spring 4.1. Even better than that we closed the year running the 4.2 branch. So that was beyond what I was hoping to get done. JBoss finally released EAP 6.4 in April which is what I had been waiting on in order for us to upgrade to Java 8. We were able to get the container upgrade into production this fall. Then a few weeks later we switched our VM to Java 8 with that JBoss and ran successfully with that. Finally near the end of my work year in December I was able to switch our compiles to Java 8. So we are now at a point where we can use all the new lambdas and streams and I look forward to playing around with it in January.

All in all I was very happy with my year. I will spend some time in the near future to figure out my plan for 2016, I have some big ideas at work I want to deal with and I even have a few ideas for home projects to mess around with too. So here is to a great year that just ended and an even better one going forward.

Update

I was thinking I should probably have noted anything big that actually happened that I didn’t predict.

  • I got to learn a bunch more Cassandra since I reworked our whole Cassandra Data layer as part of a big project to hide some of the complexity of what we are doing from the rest of the developers.
  • I took a team lead position so I now lead a group of 3 developers. I didn’t expect that at the start of last year, but it has actually been a great thing.
  • I started mentoring 2 other developers under a mentorship program at work. This started towards the end of the year, but I think it is going to be a great program. I think it will be a valuable part of onboarding new hires in the future when we hire people right out of university.

Cassandra Days in Dallas 2015

I may have mentioned this before, but I love going to software conferences. When I got the email mentioning that Cassandra Days was coming to Dallas with a free 1 day conference on all things Cassandra, I signed up immediately. The event was sponsored by Datastax who sells a commercial version of Cassandra called Datastax Enterprise. They had 2 tracks an introductory track for people who are just exploring Cassandra, but haven’t yet taken it to production, and track 2 which was a deeper dive for people with experience with Cassandra.

It was a great event. My team came over from the office as well and they attended a mix of track 1 and track 2. The main thing I wanted people to attend was the Data modeling sessions as that is one of the biggest changes for people who are used to SQL databases when they make the move to Cassandra. The CQL language is great to get SQL people up and running quickly, but then when they try to do things they are used to with a relational database it sort of falls apart for them. When we first signed up with Datastax Enterprise several of us got to attend their Data Modelling class on sight which was great and strongly recommended for anyone taking Cassandra into production, but my team mates had not attended those classes so this was a great event for them to attend.

I had 2 really big takeaways from the event. First was a discussion of Cassandra Light Weight Transactions. In my new Cassandra Data layer that I implemented as part of my current project I hadn’t exposed this concept yet as it is something that we haven’t used to date. It isn’t like a typical SQL transaction in that you aren’t getting the whole ACID concept or getting transactional rollback on a failure so the LWT terminology is a bit misleading. But what it does protect you is from a race condition that clobbers data. Let’s pretend you have a table where you are tracking login names. And they must be unique. If you read the table to see if a name is available and then do an insert in Cassandra there is a race condition where data can get clobbered. Imagine there are 2 users Jeffrey Haskovec and John Haskovec. They are both registering in our system at the same time. At time T1 Jeffrey Haskovec’s thread checks the table for the username of jhaskovec. There is no record so it is cleared to use it. At T2 John Haskovec’s thread then checks the table and sees that the username of jhaskovec isn’t used and so it proceeds. Then at time T3 Jeffrey Haskovec’s thread does an insert into the login table with a username of jhaskovec. The insert returns and Jeffrey thinks he has successfully registered his username. Then at T4 John Haskovec’s process inserts his username of jhaskovec which overwrites Jeffrey. At this point we aren’t aware that we just clobbered data in our datastore but when Jeffrey comes back and can’t login as his user account has been overwritten we will have a difficult to track down bug.

Enter light weight transactions. Now you change that insert statement to an insert if not exists statement in which case it starts up a transaction that guarantees it will only insert the hash if it isn’t already there. We have to check the return result of that insert statement to find out if our insert was successful. So if we go back to our previous example at T4, John Haskovec’s insert would have returned a false that it wasn’t applied and we would have avoided clobbering Jeffrey’s insert from T3. This alone is actually a problem a new table that we are modeling could have had in extremely rare situations so it was very timely that I attended this talk.

I also really liked Patrick McFadden’s advanced data modeling talk as he always gives some great ideas that are worth considering. Lots of just general things to consider there that are always helpful. One of the other talks I was at went into a bunch of cluster level configuration discussion which was also good to hear. I will dig into what we are doing and see how it compares to some of the options that were presented.

The other great thing about conferences is just networking with people. I chatted with a few other people from different companies and it is always interesting to hear how they are using Cassandra and just what things are like in their domain. All in all it was a great event and if one of them is coming to a city near you it is worth spending the day over there for a good free conference. Oh and one final cool thing was that Datastax gave us all USB drives with the latest versions of all of their software on it which is nice to play around with if you are considering rolling it out.

Spring Boot for prototyping

I am on a new project at work that looks to be very interesting. I am redesigning our Cassandra layer. Currently we have a beautifully done layer that was designed and implemented by our former architect. It ends up making Cassandra look just like a JPA entity and we have Cassandra Repositories that look just like Spring Data JPA Repositories. After this was in place we discovered the Spring Data Cassandra project. We went to the talk on Spring Data Cassandra and it turns out they had implemented pretty much the system that our architect implemented.

Now my goal for this project is to create a higher level Cassandra abstraction for our system. Often in Cassandra we create multiple tables to represent the same data. The reason is depending on how you structure the data in CQL determines what queries you can run. We have the need to query some tables in many different ways so we need multiple tables to be able to answer the questions that we could in the SQL world. In our architecture we don’t want the developer to worry about which table they need to query for a given question, we would like to present this more like a standard JPA entity where the developer doesn’t need to worry about it and we abstract away which particular table is being queried or which tables are being saved to.

One of the initial thoughts of this project was to use Spring Data Cassandra. The advantage of that is we would have a third party library we could use so we wouldn’t have to maintain the low level Cassandra code. At one point I was considering ripping out our custom Cassandra Data layer and just using the Spring Data one (prior to this project) as ours is so close to what they are doing anyway why should we maintain that code when there is a perfectly good library we could use and we are already using Spring Data JPA. Right now I am not sure if I want to do that anymore though. If I go with Spring Data Cassandra each of the underlying tables would need to be modeled as a separate entity like our current framework. In that case then I would need a second layer on top if I am trying to present 1 domain object backing multiple tables so this seems like it isn’t ideal for what we are thinking. Another thing I am not 100% on with Spring Data Cassandra is that it isn’t backed by Pivotal. It is a community plugin. That isn’t necessarily a problem but I do have concerns about how up to date it will be. One way to look at this is maybe I should be involved in the plugin since clearly I could be working on that as well, but I get a little concerned about if it will be around otherwise. I noticed that the Spring Data Cassandra project is built off of the Datastax 2.0 driver. So if we are running off of Cassandra 2.1 (which we probably will be by the time this project goes live) we aren’t necessarily taking advantage of the new features. Again I could submit a pull request to update it, but given that I am not sure this is the right approach anyway at this point I haven’t decided to go this route (though that may change). Another thing I noticed is Spring’s initializr doesn’t list Cassandra as an option out of the box. Though I suspect if you checked JPA and then added the Spring Data Cassandra to your maven pom or Gradle file it would probably work.

So how does this tie into Spring Boot? Well I am not a big design everything up front kind of person. I tend to like to play around in code to come up with my design ideas. So I decided I want to prototype against what we are doing so I can see how this design would play out in code. I downloaded a new Spring project from the initializr with AOP, JAX-RS, Actuator, and Shell support. I then pulled our code for our Cassandra layer into there and added the Cassandra driver to the maven pom. I updated the application.properties file with my Cassandra database properties and low and behold I was up and running with a framework to test all of my changes in. Even though I have played around with boot, seeing myself able to build a sandbox to play around with this design in so quickly was very impressive to me. So now I can evolve this design without messing up our current code base and when I get to something I am happy with I can just bring those changes into back into our branch and we will be ready to go with our new design. So even though I am stuck with a monolith and I couldn’t easily bring boot into our main product the design of our project is such that it was very easy to pull that whole layer into this project and just be up and running with almost no configuration (aside from setting up the Cassandra connection properties).

So as usual I think Spring Boot is the bomb and everyone should be using it. I look forward to seeing where this design takes me. Once I have my high level sorted out I may try playing with dropping Spring Data Cassandra in the low level to see if we want to use that as our base, but my current guess is we won’t end up going that route.

Cassandra Data Modeling

I ended up having to miss the JHipster webinar last week as I was invited by my company to attend the Datastax DS220: Data Modeling with Datastax Enterprise class on Monday and Tuesday. The company came out and taught the class onsite. The instructor was Andrew Lenards and he did a great job.

I have been using Cassandra for a little while, but I hadn’t done anything serious with it. The CQL query language is all at once a great blessing and a curse. On the upside it is immediately familiar so anyone who has done SQL work can get comfortable creating tables and executing queries quickly. On the downside it sort of abstracts a few things about the data store away from you and I think at a certain point for performance you sort of need to understand what is going on under the hood. This class gave us that. It starts out presenting a data model like you might see in relational databases and then you work through the ways you might model that data in Cassandra and the trade offs of different models (which questions you can ask, which fields are required to ask those questions, etc). One of the biggest things I was missing prior to the class was the whole concept of partitions vs rows and what the partition key is vs the collating keys. I had been using the data store like a SQL database so that my partitions always had at most one row. We did a lot of looking at instead what if we model the data so the partitions have many rows and what are the advantages and disadvantages of doing so. On day two we got very deep in the technical aspects of what was going on under the hood, how data was stored on disk and how to do things like estimate partition sizes. We were also able to ask a lot of questions specific to how we have been using Cassandra in our organization and what the limitations are going to be as we expand its usage to even more areas of our product.

I think if anyone is going to roll out Cassandra at the enterprise level you really need to take this class. Based on what we covered it is going to save us a lot of future pain as well as making us much more productive with our current usage. Prior to this class we had one person on our team who had a pretty deep understanding so this was a way to get many of us senior engineers up to speed very quickly.