Cloud cost optimisation has become one of the most important parts of managing your cloud infrastructure. But managing your cloud costs isn’t easy and often there is the drive to do something quickly which can lead to issues further down the line.
Through delivering many cloud cost optimisation projects over the years, we have seen this first hand. Based on our experience, we found that there are three mistakes that are commonly made. These mistakes mean you are not able to achieve the results you want and may even cause additional problems, such as slowing down your application, stability issues, etc. And they also could be costing your organisation thousands of pounds or dollars a day.
So, what do you do about it? We have explored the mistakes and have suggested solutions based on our experience.
Mistake #1 – Relying on discounts
Discounts are good right? Well yes, but when it comes to cloud you need to plan your discounting carefully because typically discounts deal with the symptoms of rising costs, but they do not address the cause.
One of the biggest mistakes we see is that when organisations decide to start a cost optimisation initiative, they focus on quick results to show they are making progress. They do this by agreeing a discount programme with their cloud provider and then implement other available discounts, such as Savings Plans and Reserved Instances to reduce the monthly spend even more.
This seems like common sense and on the face of it delivers results with your daily spend dropping immediately. Job done, right? Not quite. As mentioned above, this approach is addressing the symptoms and not the cause.
In fact, you could have potentially just cost your organisation 10’s or 100’s of thousands in lost savings…but how you may ask? Let me explain.
First: Let’s understand how your discount plan works; to be given a discount you need to agree to a minimum annual spend which will be based on your current spend and will be expected to grow over the period of the discount plan, usually 3-5 years. The agreed spend for the first year will usually be a little below what your current anticipated spend is. So, for instance, if your spend is $1.2M the agreed spend might be $1M in year 1, $1.4M in year 2 and $2M in year 3 and for that agreed spend the cloud provider will apply a discount of say 10% across a majority of your spend.
The problem arises if you agree to a minimum spend just before you embark upon a cost optimisation exercise. We typically see cost savings of around 30% so if your spend is $1.2M before the exercise it will reduce to $840K once you have completed the optimisations.
As you can see this presents a problem as you have already agreed to spend at least $1M so rather than saving your organisation $120K as you originally thought you have actually committed them to spending an additional $260K that they did not need to spend. And it obviously becomes worse in years 2 and 3 as the minimum spend increases.
Although initially counter intuitive and most likely not recommended by the cloud providers, a much better approach is to only enter into negotiations once you understand the revised expected spend you will have after all your planned optimisations are made.
Second: Let’s look at other discount programmes such as Savings Plans and Reserved Instances. The big mistake here is to cover too much of your environment before you have undertaken reasonable optimisation work. In this case you can reach the point that once you start right sizing over 100% of your available infrastructure are covered by the plan so any further savings are completely lost.
The best approach is the following:
- Get coverage for around 50% of the available infrastructure so that some savings are being made
- Go through the rightsizing process which will hopefully bring your coverage up to 70-80%
- Look to add further plans to reach 90-95% during the peak periods which will be 100% or more at the less busy times.
Mistake #2 – Using only cloud cost optimisation tools
Surely this a no brainer, isn’t it?! Using a tool that will save you money and that is free (CSPs (Cloud Service Providers) take their fee from the money saved), must surely be the way to have your cake and eat it?
Well, it is true that the tools will find some savings, but they only solve part of the problem. In addition, the savings the tools do find are usually the ones that are not too hard to find yourself. So, perhaps with a little bit of scripting and some automation you could identify the main savings, do the rightsizing, and make 100% of the savings rather than say 80% of them. An example of this is from our own experience; one of our clients used AWS Trusted Advisor for optimisation. The tool identified three areas of potential saving. However, when we applied our own methodology, we found 33 areas, which shows you cannot rely on tools alone.
Another area to consider is that cloud providers are shutting down loopholes that cost optimisation companies used to make the substantial savings. A recent example is where AWS (Amazon Web Services) stopped you being able to switch Reserved Instances (RIs) so companies that bought a lot of RIs and then switched multiple companies' instances between them to obtain the highest discounts are no longer able to use that approach.
We are not saying that you should not use these tools. They are point solutions to deal with specific areas of opportunity. But you should not rely on them to provide the substantial savings. They are more likely to be useful in mopping up any savings you have missed. They are also not able to spot certain types of savings e.g. when consolidation is possible, such as combing the workloads of multiple small instances with varying workload profiles on to one larger instance as the peaks in usage do not coincide making the new instance type much more cost effective.
Mistake #3 – Using the metrics in the wrong way
This happens frequently. There are so many metrics collected by so many different tools that it is easy to collect the wrong metric in the wrong way. In fact, it is not just making sure the correct metrics are used but also ensuring they are analysed over the correct time periods.
What we see a lot of people do is extract CPU and Memory stats for their busiest day of the month, then look at the peak of that day which is often measured at 10 or 20 second intervals. This means that instances are being sized based on the peak at 10- or 20-seconds usage in the month. This is often then compounded by using the same instance type for that workload across all applications and all regions where they are run.
When you start looking at the detail, it is obvious that this is overkill and what it leads to is whole systems where average usage is below 20% and peak hour usage is 30 to 50%.
Whilst understanding what the very peak usage is can be useful it is not a metric that should be used for sizing your instances. A far better approach is to look at both the hourly and 5 min peak periods to determine the instance size that will be required. By using this approach for each application, and each region where they are run, you can be sure that you have defined the ideal size to accommodate the workloads they need to support.
Moving forward
We recommend following these steps to get the most out of your cost optimisation efforts and avoid these common mistakes:
- Review all your infrastructure and rightsize using the correct metrics
- Once rightsized, set up your discountusage plans aiming to cover 90% of your peak usage
- Once all your usage plans are in place, negotiate a discount plan with your cloud provider based on your new annual spend and your expected growth over the coming years
- Only then look at using tools to mop up any other savings that you have missed
By doing this you can be sure you are not missing any easy savings and that you are in an advantageous position to implement an ongoing cost optimisation programme to ensure costs do not get out of control again.
To find out more about our approach to cloud cost optimisation, or to take a deeper dive into how to make it work for your organisation, download our latest whitepaper.