IT metrics to ensure speed and quality for your business
“Power is nothing without control” was the slogan of a well-known company that I had worked for almost 20 years ago. I saw it everywhere, liked it, and I couldn’t get it out of my head. Perhaps for this reason, I gladly accepted the challenge of organizing the MailUp implementation of a set of IT metrics that are well-known among specialists and are based on the balance of speed and quality.
From developing integrations to strategic support, from creating creative concepts to optimizing results.
What are IT metrics, and what are they for?
What are IT metrics? They help control and improve processes by measuring significant variables in the information technology world. Like all complex processes, the skill lies in sorting a limited number of measurable quantities. These not only constitute a good indicator for our product’s processes and performance but also let us improve both aspects through targeted interventions that vary their values.
Complicated? A bit—since it’s not so easy to identify the best metrics for this type of active control. After trying out some internally defined metrics, we’ve decided to take on those that are outlined quite well in “Accelerate – Accelerate: The Science of Lean Software and DevOps”. These have also been among the main topics in the State of DevOps Annual Reports since 2012.
Measure to improve: how IT metrics accelerate business performance
Following years-long extensive research involving thousands of companies, the authors of the aforementioned publications affirm that there’s a direct correlation between excellent IT performance and equally good business results for companies in the digital sector. These studies identified 5 main metrics in which all of the top IT performers (the largest, most avant-garde companies) achieve excellent results. Those companies have adopted these same indicators and have managed to match the values of the “elite performers” on all 5 metrics. Indeed, they have had a growth rate far better than the others, and this confirms how extremely effective and important it is to implement an IT set of KPI’s. Excelling in technology outcomes becomes, in short, an enabling factor to accelerate performance at the business level.
IT metrics to ensure speed and quality for your business
The 5 IT metrics we’ve talked about so far are based on really simple concepts: speed and quality. The message is as follows: deliver value to customers quickly, continuously, and frequently. The faster you go, the more you risk making mistakes. You also need to keep an eye on the frequency of errors and the restore time required after a malfunction.
By translating this concept into numbers, we get the 5 measurable quantities as metrics:
- Lead Time For Changes (LTFC): average time from the start of a process to its availability to the customer (“in production”);
- Deployment frequency: how often news or updates are released in production;
- Change Failure Rate (CFR): number of production updates to recover from a problem caused by a recent update;
- Mean Time To Recover (MTTR): average recover time after a serious production problem has impacted customers; and
- Availability: percentage of time the platform services were fully available (obtained by subtracting the sum of all recovery times).
MailUp’s IT metrics
At MailUp, we’ve been taking on these metrics since September 2020. We’ve done so through automatic measurement tools and by setting up a continuous process where:
- we monitor progress;
- we set goals; and
- we identify actions to get there.
This requires attention since a sudden boost in speed can be linked to lower quality.
It’s common to forget that metrics are indirect indicators to improve a process. Key is that measurement is not an end in itself. Rather, what matters is our impact on the process and on the product when we manage to make that measure change.
Regarding figures, let’s see in more detail how we calculate these metrics in MailUp:
We calculate the Lead Time For Changes as the last three-month average span of time a story (or task), corresponding to a Jira issue, takes to go from the processing start to the publication. Here, the Atlassian Jira suite helps us out. It lets us measure the “time in status” of a story, i.e., how long an issue has been in a certain status for each transition of its workflow. Imported and aggregated data can then be viewed through a Jira Control Chart or ad hoc dashboards, like the one we made with Tibco Spotfire.
In MailUp we also measure the Lead Time To Deploy, i.e., the time from the production’s last change to the source code.
For deployment frequency and CFR, we’ve integrated our automatic production release processes (deployment pipeline) with a flow that records information relevant to database metrics. A technician intervening in the production environment only needs to specify whether it’s an ordinary release, a hotfix (quick fix), or a rollback (restore of the previous version).
The MTTR and the Availability are based on the definition of “incident”. Internally, we’ve defined this as a production disruption with an impact and severity above a certain threshold. For each incident, it’s mandatory to complete an “incident form” on Atlassian Jira. This indicates various elements like the duration, cause, effects, impacts, and type of resolution. Data from the incident cards are then extracted by an automatic process to represent them graphically.
The trends relate to their performance over time for each of these metrics. We’ve found it very useful to use the moving average over the last three or four months as a reference to contextualize and compensate for misleading elements like an isolated peak or the inertia of an evaluation window that’s too large.
Are these metrics enough for us? They’re certainly an excellent starting point to understand the technological state of health, even if our future horizon already has two further steps:
- integrating these metrics with other key measures like access speed to the platform pages; and
- introducing a level of greater specificity on the existing indicators by detailing them through submetrics that “drill-down” (in-depth analysis).
This brief overview pinpoints the meaning and benefits behind using a set of IT metrics to boost overall business performance. To know more about metrics and how they’re calculated, click this link!