Top Weekly Reads in IT I&O 

The OpsRamp Monitor is OpsRamp’s top weekly review of interesting developments and emerging trends in IT operations. Subscribe to our blog for the latest and greatest in monitoring, DevOps, AIOps and cloud computing and stay on top of everything Ops. 

In this issue:

  • The large salaries of SREs
  • Why looking for full-stack engineers may not be smart
  • Day in the life of an IT ops pro
  • Solid arguments for multi-cloud

It’s been a week. Four U.S. presidential contenders dropped out of the race. Coronavirus got worse.  Several major tech conferences have been canceled. And the stock market rollercoaster ride continued.  Let’s refresh with some good news: your career.

SREs make coin. The Site Reliable Engineer role requires a diverse technical skillset, as explained by Google’s VP of Engineering, Ben Treynor, who supposedly invented the role. According to the author of this Gremlin blog, who compiled data from several reliable sources, the median average salary of a SRE is $236,000 while the high is $450,000 Starting compensation for a junior SRE averages $75,000 and the typical salary range in places like Phoenix and Atlanta is in the range of $120-160,000, according to Gremlin’s analysis. 

The job of a SRE is no cakewalk—you’ve got to work with developers and Ops teams and you don't neatly fit in with either of these groups. You need to use your bar-none software engineering skills to manage all kinds of performance challenges with finesse. But you’ll have a nice standard of living, once you’ve got some experience. 

To ensure this is the right pathway for you, read this Hackernoon article by a SRE. She’s got a handy checklist to see if you’re a good fit for the role: “Do you enjoy looking at a terminal for large amounts of time? Are you comfortable with the idea of being “on-call” in which you are likely to be in a high-stakes scenario where something needs to be fixed?” She also offers several ideas for how to develop SRE skills.

DevOps teaming tips. We keep hearing that DevOps is just a real pain to implement in the real world. And it's because people are difficult. There, I said it. But maybe we need to stop blaming others, looking for the magical DevOps magician to save the day. Instead, maybe we need to think harder about teams and expectations.

This IT Revolution report, "Full Stack Teams, Not Engineers," authored by some really smart people from Disney and elsewhere, suggests that the idea of hiring a full-stack engineer who is an expert in “front-end developer technologies, back-end middleware, databases, networking and storage configurations, and security concerns,” if there is such a magical person, is flawed. It's not about making one person responsible for everything (a true burn-out job) but creating the right team of people with those requisite skills “A full-stack team has the combined skills across its members to effectively design, build, deploy, and operate software throughout all development cycles of their deliverables. Moving to full-stack teams helps us deliver the full-stack advantage to our organizations without the challenges of recruiting, developing, and sustaining full-stack developers or engineers.” This report has a lot of examples of exactly how to do this without creating silos. I’m no engineer, but it makes sense to me.

Here’s someone who understands you, IT operations person. And now to round out the focus on very important roles in IT today, read this day in the life tale, written by Wael Altaqi, an OpsRamp solutions consultant. Altaqi takes us through the chaotic, 14-hour day of an L4 IT operator struggling to fix a critical application outage. 

He leaves us with three ideas to improve this frustrating and slow process: 

  • Seriously consider machine learning alert and event correlation platforms

  • Restructure relic processes designed for mostly static infrastructure and applications.

  • Reconsider the traditional siloed approach for IT Ops monitoring and alerting.

Multi-cloud is not just about saving money. There are some solid business reasons for having a multi-cloud strategy, and the primary driver, according to a recent survey of more than 900 IT and dev people is to leverage different application services like big data, AI, and IOT. That driver was cited by 31% of participants, followed closely by guaranteed availability (29%) and managing costs (23%). Avoiding vendor lock-in was cited by 54% of respondents as an important benefit of multi-cloud, more so than having truly abstracted infrastructure (46%). Despite all these positives, more than 50% said that they struggle with culture change due to new technologies and the complexity of managing hybrid and multi-cloud environments. The survey also reports plans for containers and serverless computing, both of which are growing.

Having deployment issues? There are just so many things to worry about and monitor in IT today and deploying software in the cloud is certainly one of them.  Now we’ve got Gandalf: not to be confused with the Lord of the Rings character. Gandalf is an intelligent service for making sure that you have a safe deployment in the cloud aka Azure. It appears that Microsoft Research, which creates many cool things that never see the light of day, created this nifty tool for speeding up deployments. Gandalf, as described in the blog of Adrian Colyer, a Venture Partner with Accel in London, is great for: “observing a problem and then connecting it back to a given deployment.” He continues: “Gandalf analyses more than 20TB of data per day : 270K platform events on average (770K peak), 600 million API calls, with data on over 2,000 different fault types. If Gandalf doesn’t like what that data is telling it, it will pause a rollout and send an alert to the development team.” This seems like something worth checking out, if you’re working in Azure.

Next Steps:


Recommended posts