Industry Insights is OpsRamp's blog dedicated to the influencers who are shaping the world of IT Ops. Check back as we continue to talk with the personalities who know infrastructure, monitoring, management, and automation in an increasingly complex world.
This week we talk with Mike Julian, author of Practical Monitoring, the Editor of the Monitoring Weekly newsletter, and founder of Aster Labs, a monitoring consultancy based in San Francisco. In our interview, Mike shares common mistakes that he sees IT teams make when it comes to monitoring along with what's changed with monitoring public cloud.
You’re the author of the book “Practical Monitoring: Effective Strategies For The Real World”. Why did you write the book and is there a core message for IT operations folks?
MJ: Over the years, I continued to get the same sort of questions from people: “My monitoring sucks. What should I do? What tools should I use?” Everyone had similar issues and I came to realize that it’s really not about the tools. You can have great monitoring with really bad tools or excellent tools with awful monitoring. Most problems with monitoring are tool-agnostic, which is really the core premise of the book.
What are some common monitoring issues that occur in most organizations?
MJ: I would say an obsession with tools: people are looking for a magic tool that does everything but it’s not possible, or at least it won’t do anything very well. The other issue is that people will look at monitoring as a job, but it’s more than that: Improving monitoring is the duty of everyone on the team. Otherwise, you’re going to have that one person overwhelmed with maintaining a monitoring system, nobody else knows anything about it, and monitoring just kinda sucks all around.
From Mike's book, here are the five anti-patterns of Practical Monitoring:
Figure 1 - The Five Anti-Patterns of Practical Monitoring.
How have enterprise monitoring best practices evolved with the adoption of the public cloud?
MJ: Enterprise monitoring best practices haven’t changed with cloud but the capabilities have improved. Many enterprises are just taking their on-premises infrastructure, moving it to the cloud, and calling it a day. In effect, they are treating cloud providers as an extension of their data center and leaving so many new capabilities untouched.
With public cloud, you get so much more access and flexibility than you do with on-premise data centers. You can do a lot more. You aren’t locked into the old ways of thinking. And you have access to new tools that you didn’t have before. Basically, the public cloud is a game-changer for the enterprise if you only stop and ask yourself: “What else does this allow us to do?” Also, be open to architectural changes in the process.
Is monitoring in the cloud harder?
MJ: I would just say it’s different. Cloud providers give you a lot more automation capabilities, and there are some things that don’t really have an analog to the on-prem world. For example, I can have servers automatically scale up or down with AWS Auto-Scaling Groups (ASGs) and rely on totally ephemeral instances in those ASGs or with Lambda. All that means is, I now have to pay attention to metrics I didn’t have before because my infrastructure works in totally new ways. It can certainly increase complexity, but it’s an exciting and interesting challenge.
What challenges do IT Ops managers face when it comes to monitoring and management toolsets?
MJ: The biggest challenge I see IT Ops managers being concerned about is having too many tools. Imagine an incident happens and the first question in your mind is: “What tool do I check for this information?” I tell people that you only need enough tools to solve the problems that you have. That might be two or 12. I have had clients with 40 different monitoring-related tools and they needed every single one. Instead, worry about tooling overload only if you have a bunch of tools that all do the exact same thing for the same purpose.
Do you need monitoring tools to integrate?
MJ: In my view, there are few use cases where you need all that monitoring data in one place. If I have a bunch of teams running separate infrastructure, I don’t really need to correlate the data across infrastructures. If I have one or two apps on a shared infrastructure, it’s small-scale enough that I can easily keep the data in the same place. This concern about data aggregation only seems to pop up when you’re talking about disparate apps and infrastructures, and you’re not going to get any real value of aggregating that data anyways. That said, there are certain data sets which are very useful to pull together, like availability information, and you can pull that out and put it into a separate database.
Thanks for all the great insights, Mike! If you’re interested in learning more about monitoring, pick up a copy of Practical Monitoring. Also, listen to Mike’s recent podcast, The Exact Opposite of a Job Creator, for great observations on the current state of monitoring.
Next Steps
- If you’re looking for a practical monitoring and event management solution, be sure to check out OpsRamp’s Unified Service Intelligence.
- Read our related blog post, Unified Service Intelligence: Your Secret Weapon for Digital Transformation.
- Watch our recent webinar with 451 Research on Rethinking Monitoring in the Era of IT Operations as a Service.
- Check out our SlideShare on the subject, too.
- If you would like to schedule a demo to see OpsRamp in action, contact us today.