Interview: Why Applications Fail and What to Do About It

Lee Atchison is a recognized industry thought leader in cloud computing and has significant experience architecting and building high scale, cloud-based, service oriented, SaaS applications. Formerly the Senior Director for Cloud Architecture at New Relic, Lee is now the owner of Atchison Technology LLC, a cloud consulting and advising firm. Lee is also the author of “Architecting for Scale,” a book published by O’Reilly Media. The second edition is now available here.

OpsRamp: You published the first edition of Architecting for Scale in 2016. What has changed the most since then regarding digital applications development and management?

LA: By far the biggest change is the acceptance of serverless in the cloud. Serverless was a novelty then, but now it's more accepted. Another change is the growth of non-AWS cloud services. In 2016, cloud meant AWS. But now we really have viable competitors. Azure is very viable and is more popular in the EU. I am particularly intrigued by IBM, which is poised to potentially do something well for high-end enterprise cloud needs. They have the infrastructure and the business framework; they just need a secret sauce in order to be successful. Also, in 2020, the idea that we are moving 100% to the cloud is becoming more viable. Cloud is the strategy now and even regulated industries are moving this way.

OpsRamp: What kind of lessons do you think IT teams & engineering VPs are learning now, during this crazy time?

LA: I think the whole philosophy that it's acceptable to work from home and you can actually get business done remotely has been a large shift in perception. In tech groups, there has been a hidden message that you need to be at the office to participate in the culture. But working from home has worked better than expected and we will see remote work be more accepted in the long term. This creates a support issue for IT. With so much more at stake we need to standardize cooperative processes more completely. Workflow and team and management processes need to be more formal and less ad hoc. Adding a remote connection changes the way you interact with people and requires a more formal process.

If the virus hit us four years ago, we wouldn’t have been prepared for remote work but now, applications allow for distributed workloads in the cloud."

The work we have done to modernize apps in the past few years for the cloud has made it easier to support remote work. Think about the logistics industry.  We were ready for that too. We didn’t have massive food shortages during this pandemic.  Sure, there were some shortages with a few just-in-time manufactured products like toilet paper.  But, 20 years ago we would not have had the ability to respond fast enough with systems and structural changes and the transportation and the infrastructure for home delivery.

OpsRamp: I was intrigued by your “Five Causes of Poor Availability” on your Modern Digital Applications podcast.  They are resource exhaustion, last-minute fixes, too many developers in the kitchen, third-party integrations and technical debt. Which one (s) is the hardest to prevent, and any advice there?

LA: The underlying driver that you have to deal with is this: the reason most applications fail is because of success. Organizations aren’t prepared to be successful. Having more customers creates higher expectations. So, what to do? Stress testing only helps to a certain degree. Rarely can you truly simulate what customers are doing. You became successful because of something that you weren’t expecting. The answer is to focus on the culture of availability. As you build systems, don't build ones that have specific limitations.

Always planning for availability and scalability in everything you do requires a specific mindset. Of course, doing this adds time and cost to your projects so there is a trade-off and you need to find the right balance."

But if you have a culture of quality, scaling and high availability, a lot of this comes naturally.

OpsRamp: What are some tenets of that kind of culture?

LA: I am a big fan of chaos testing in production. Chaos testing was first popularized by Netflix, with Chaos Monkey. While a production application is running you are also running code to try and bring it down. You do this when customers are using the site, and even during the highest peak times. You do this continuously, trying to break the system. This matures your application. When an issue does occur the system is self healing enough that it can resolve the issues on its own. You have to build that mindset into the application. The more you cause problems for the software and fix those things, the more likely the software can be resilient to problems in the future. Yes, you may have a short outage but it's small compared to having a bigger outage later. It’s about forcing smaller problems to occur that can be easily fixed with the goal of preventing larger problems.

OpsRamp: What do you think about DevOps these days? Has this been an overall positive trend?

LA: The label DevOps can get a bad rap. The philosophies and processes are consistent with all we have been discussing, however, including chaos testing, serverless, cloud maturity and continuous integration. In my book, I talk about STOSA (Single Team Oriented Service Architecture), which is a model for how service ownership works across the organization to encourage better ownership. It’s based on the fundamental principles of DevOps.  And now, we are seeing that DevOps really helps in times of crisis. Companies which have been more agile in the past have been able to be more responsive during Covid-19. What’s important to note is that DevOps can also change the way senior management thinks about apps and can make the entire organization more agile. This in turn allows you to develop a new business model quickly– which many companies have needed to do in recent months. What is impactful to me is that DevOps has been driving agile thinking in the corporate mindset, which is overall more important than the impact of DevOps on application development.

OpsRamp: What is the most exciting thing for you about your work-- that which keeps you motivated through the tough times?

LA: I love it when somebody who has read my book or listened to me talk comes back and tells me that the techniques that I discussed help them solve a problem. These are often very basic concepts, and I sometimes forget that not everybody has deep knowledge about them. It is very gratifying to know that the things I am teaching have a positive effect on an organization. 

Next Steps: