<img height="1" width="1" style="display:none;" alt="" src="https://dc.ads.linkedin.com/collect/?pid=1005900&amp;fmt=gif">

Insights

Why Quick Wins in the Cloud May Not Be the Long-Term Solution You’re Looking For

Overspending on cloud services runs into the billions each year. Quick wins, such as applying basic housekeeping practices and discounts bring rapid results and secure smaller bills, at least for next month. But when teams start going about business as usual again, the short-term benefit from attacking the low hanging fruit swiftly evaporates. There are three distinct levels to cloud optimisation, and while the process of taming the cloud requires rigour, the rewards are significant enough to put in the effort to achieve them. 

The perils of the quick win

The journey to cloud optimisation usually begins when costs have already spiralled and shock can set in when a full review reveals current spend. When this happens, it’s important to regain control as soon as possible.

As mentioned, we think of cloud optimization in terms of three levels. The first involves what might be described as the easy elements. This means applying discounts, turning off unused servers, removing unattached disks and unnecessary snapshots, and moving to the last instance for example. This is good housekeeping. Using reserved rather than on-demand instances wherever possible, tagging absolutely everything and keeping tight control of production processes is a great start.

But it takes discipline to maintain these savings. The battle is not won yet, and because the business won’t stand still while the optimisation process is underway, it’s not immediately obvious whether any additional growth in the cloud is adding any real value.

 

Going beyond simple optimization and fixing growth 

Level 2 is the medium degree of optimisation. It implies looking at the use of spot instances, autoscaling, rightsizing, using lower performance disks, and right-oversized test environments among other considerations. This level of optimisation is more complex than the first.

The third level goes even further and involves digging down to the level of the code, optimising the technical architecture and running performance vs cost trade-offs. At this level, the way the actual product consumes cloud resources also requires optimising.

But it’s a valuable process, and our own analysis shows that cost savings are cumulative. Typically, Level 1 brings a 30% cost saving, while Level 2 will take an organisation to 50%, and Level 3 will bring them to a 70% overall saving. Embarking on an optimisation path means getting close to the real picture of your cloud requirements, and consequently paying for what you need, rather than what you have ended up with. To achieve optimisation goals, it’s important to measure what is currently happening in the cloud. Public clouds come with their own toolsets, but commercially available tools exist that either complement or perform better than the native ones. To monitor costs, there are various apps such as AWS Cost Explorer, as well as commercial tools such as Cloudability. Cloud optimisation requires capturing specific metrics that are beyond the scope of inbuilt dashboards, therefore selecting the right tool for the right job is essential.

 

Changing mindset is key to optimising cloud infrastructure 

Armed with the right tools, optimising a cloud infrastructure also means thinking in a joined-up way about cost, performance and capacity. Reaching Level 2 or 3 can only be achieved through developing the accompanying mindset. It’s about thinking of cost in the same way as thinking of performance and reliability. While the idea of saving £1,000 per month on a service here, another £1,500 on a service there doesn’t seem like much in isolation, these savings accrue when applied universally, and compound over time.

Another concept that companies which successfully optimise their cloud understand, is that extra capacity doesn’t translate to better performance and reliability. Rightsizing will inevitably provide the bulk of the benefits with no impact on performance.

When incidents occur, and they often do, it’s fine to expend capacity to resolve them. But, once they are resolved, revisit the scene of the accident and remove the unneeded capacity. If an operation has one team responsible for provisioning storage and compute, while another team is responsible for fixing incidents, then capacity drift is something that is likely to happen over time.

All of these things are part of the mindset required to achieve the highest levels of optimization, and of course, when 70% cost savings are on the table this isn’t something to ignore.

 

Optimisation isn’t only about the cost

Scalability is often something that’s assumed about the cloud, but really applications and services need to be designed to take advantage of the cloud’s inherent elasticity. While peak loads are for the most part predictable, developing solutions that can scale on demand when unexpected peaks hit is another important aspect that has to be decisioned and planned in advance. Taking advantage of this is something that is part of the overall cloud optimisation process, another tangible result from taming the unruly cloud.

Ultimately, cloud optimisation is a data-driven process. Some specific data skills, such as being able to efficiently collect, analyse, model and predict demand and responses are vital. Along with this, it’s important to have a clear picture of what an ideal cloud actually looks like – a blueprint that serves as a beacon for success that comes from years of practice and experience putting things right when they get a little out of hand. 

If you would like to have a chat about optimising your cloud bill, feel free to reach out to me for a no commitment chat. You can contact me via the website at https://www.capacitas.co.uk/book-a-diagnostic-session or reach out to me via email at contact@capacitas.co.uk

Also worth having a look at some of our recent case studies where we have saved our clients Millions of pounds in cloud spend. (Link to Cegid & JAGGAER) 

Cegid and Capacitas case study   New call-to-action

  • There are no suggestions because the search field is empty.
Filter by Tags:
AWS
SRE
TSD