After years of building and deploying applications at scale, we fully understand why it is common belief that this must be the way deployments are done. Stakes are high and the risk of breaking things during business hours is very real. But we also know the reasons why it is not the right way to do it.
I’ll break down what usually happens prior to and during these kind of ‘big bang’ late night or weekend deployments:
Testing - UAT is the same as Production right? Probably not.
An application is rigorously tested for a UAT period running up to the release window where everybody is doing their level best to ensure the application is tested thoroughly and all the expected functionality works as intended. This would usually include multiple teams across all the involved business units, covering testing, development, product owners and so on.
Testing can often get extremely complex when third party component/service integration is also involved as there can be many versioning issues between the piece being deployed and the pieces already in place, for example
- Was a UAT version of all the third party dependencies available during testing?
- Did the data in those systems represent production?
- Did data change in those systems over time as it will do in production?
- Was the load simulated correctly?
and so on... This complexity increases exponentially with each additional component.
Deployment - The Big Night / Big Weekend
And so it is upon us, the moment of truth. Everybody involved is either at the office or on close standby, most are tired, some are missing watching a movie with their family, some will have had to make excuses to miss a dinner with friends, nobody is happy to be working for sure, regardless of what their face tells you.
The necessary scripts are run and the deployment takes place, fantastic, everyone breaths a sign of relief.
But wait...the product owner just told us he is unable to add items to the shopping cart for some reason.
People’s tempers are fraying, managers and product owners are standing behind testers and developers, all of whom are furiously typing away trying to work out what went wrong. No amount of late night pizza offered up by management is going to help. Coding and workflow rules are being discarded left and right in order to ‘just fix it’ as quickly as possible.
Several hours later the cause is revealed to be an innocuous piece of data on a production system which caused an unexpected flow compared to the equivalent UAT system.
Eventually people trudge out of the offices as the sun rises. The product owners and managers (who themselves are feeling the same!) try to raise everybody’s spirits and tell them what a great job they did. All the team can do is think about a warm bed and all the nice things they missed last night with friends and family.
If this is a familiar story, keep reading and I will tell you why it doesn’t ever have to be this way again.
Deploy early, really early
If you are pushing out a new feature to your customers at the end of the month, then hey, why not put that code into the app at the start of the month and simply hide it away. Even better than this, why not release the feature at the start of the month, but only show it to a small test group of people who can give you great feedback on it with real data.
This gradual, controlled release of features is commonly termed as Feature Flipping. You ‘Flip’ a feature (usually through an administration screen) on or off for everybody, or perhaps just on for a few people at a time. Your shiny new feature is being used for real in your real production app with real data, your customers are none the wiser until you flip it on for them too.
But wait, I still have to actually deploy the feature even if it is flipped off…
Absolutely correct, but given it has zero impact on the customer until you decide, you have significantly reduced your deployment risk and more importantly your choice of deployment schedule. Now you are free to deploy during office hours when all of your team are alert, focussed and available. Even this risk can be further mitigated by doing things like Rolling Deployments (for a later post!).
Deploy often, really often
Don’t let feature development build for months on end. By deploying with feature flags you are free to deploy fragments of features and just flip them on for beta testers. Again, you are mitigating risk by doing small deployments that can easily be rolled back if necessary.
Once you have a Feature Flipping mechanism in your application, you can start to get really smart and schedule auto-flipping on certain features at certain times (remember, this code has been deployed for weeks and used by many people). Imagine your new feature becoming available to your customers exactly on time without anybody lifting a finger!
There are of course exceptions to the rules with all of this, particularly in the case of deploys that have a significant impact on existing features, but in our experience you can nearly always find a way to isolate the new features in a way that ensures they don’t impact customers that don’t see them.
The times, they are a changin’
The good news is that we speak to more and more forward thinking C-level management, product owners and technical staff who understand the issues described as they have often lived through them first hand. They are getting better at articulating the arguments to marketing teams and other stakeholders and changing the Big Bang culture.
Let OOZOU help your team to drive this change too.
Let's do it Friday night management says ( after a long week of last minute fixes). Friday night quickly slips to Saturday. QA finds an unexpected issue and the team is now forced to either fix the bug or perform a rollback. Saturday afternoon rolls around and most team members are now 36 hours without sleep. A fix is finally pushed Saturday night and everything appears to be working.. nope another critical bug pops up and the dev team is again asked to do whatever it takes to fix the issue. Sunday comes and at this point the 'team' is exhausted. Finally mid-day Sunday another patch is pushed and QA resumes. Phew everything looks good by Sunday at 3 pm. Most of the team hasn't slept since Thursday night. Exhausted, stressed, and delirious the team is finally able to crawl out from under their desks and into their beds.
Monday morning the alarm goes off at 7am and management expects the team to start working on new features for the next release accordingly. The developers slowly start to cry into their keyboards and think how did my life get so messed up. Nobody cares.. The team is told this is just normal as our software is so important. Software team members have a heart attack at 40 and are replaced by college graduate.
The scene from pink Floyd 's the wall where the children are marching towards the meat grinder suddenly becomes your reality. Developers quit and decide to go work as hourly consultant never to be abused by this nonsense again.. If management was paying for those hours instead of getting free labor this would never happen.