The over-commitment of infrastructure capacity resources is a wasteful and costly problem that remains prevalent in today’s world. Whilst over-allocation provides confidence that unforeseen demands can be accommodated and performance protected it is none-the-less undesirable in normal situations. Think of the potential wasted costs that may be accrued particularly in a situation where over-provisioning is a standard policy. Capacity Planners aspire to retain tight control over provisioning to ensure that the service capacity requirements, and no more, are accommodated. However for various reasons it may not be viable to achieve accurate or right-sized provisioning. Typical reasons include:
- Capacity teams may have little visibility of provisioning configuration applied by the platform support teams.
- Capacity teams are unable to influence how capacity configurations are applied.
- Organisations lack the tools for capacity monitoring and analysis.
- Platform administrators have no faith in the dynamic capacity configurations and tend to verge on the safe size by allocated dedicated or ‘capped’ capacity configurations. This is applicable to shared platforms.
- There is an increasing tendency to rely on the infrastructure to automatically provision additional capacity.
- There is an increased tendency to use platform technology to move services around different parts of the platform fabric to deliver service performance. An example of this is the VMotion capability of VMware.
The origin of the term ‘Capacity Provisioning’ has its roots in the data storage world where “thin provisioning” was used to automatically allocate sufficient disk capacity to ensure that growth of SAN data was automatically accommodated. I believe this is a somewhat narrow application. The extension of this is to promote that Capacity Provisioning ensures both effective and efficient capacity allocation ensuring the infrastructure components of IT Services get sufficient capacity at the right time.
To achieve tight control over capacity provisioning it is necessary to exert the influence of the capacity team beyond the standard capacity planning processes of forecasting, modelling and reporting. This is particularly relevant in larger organisations where infrastructure is regarded as a commodity and is highly shared. Examples of technology platforms used include IBM I-Series & P-Series, Sun E25K, HP Integrity Superdome, Z/OS, SAN Disk, VMware ESX hardware, etc.
To achieve the best outcome the process of capacity allocation at service implementation should be managed by the capacity team in partnership with the platform resources allocated to the project. New requirements and changes are mutually understood to ensure no ambiguity. Platform teams are then instructed to apply the agreed capacity configuration at the correct time. Active configuration data should preferably be automatically populated in the CMIS alongside the other capacity data facets. In this way it is possible to ensure that,
- Changes to configuration are understood and can be tracked and reported regularly by the capacity team.
- Workload resource consumption and service performance can be reported relative to the capacity configuration. This information is shared.
- The actual capacity consumption of the hosting platform is known and reportable. This information is shared.
- The balance of host resources can be tactically managed between guest services.
- The forecast requirements of workload, guest, and hosting environment can be modelled and future requirements determined.
- Reconfiguration of capacity specifications for guest services can be planned in the context of usage and forecast need, and new requirements.
The other element of ensuring that capacity provision is viable is to ensure that all emerging capacity requirements are channelled through a single point of control. How this process operates is largely decided by the organisation concerned as it impinges on change control. The characteristics of this process are as follows,
- The capacity requirements are identified and captured at an early point through the project lifecycle.
- The capacity planner approves the changes as part of process governance. It is not necessary for the capacity planner to implement these changes.
- The capacity planner is able to adjust the capacity requirements. For example, the numbers of dedicated CPU cores are reduced whilst ensuring that the guest can expand to high dynamic limits to accommodate peak requirements thereby improving sharing of hosting infrastructure overall.
- The changes feed into the capacity planning cycle.
- The process is understood by all parties impacted.
- The changes are reviewed regularly to assess if the capacity resources are provisioned correctly. Any excesses in provisioning may be reallocated to other services.
- The single point of control establishes the total aggregate resource demand for a given entity. This is not only important for capacity planning but allows provides an opportunity to obtain better discounts from hardware suppliers.
The lack of configuration visibility, tools, change control, and integration with the platform teams in respect to configuration control and the capacity planning undoubtedly lead to problems and wastage. All the facets need to be in place with capacity provisioning controls being a significant element.
Here are some living examples where the absence of capacity provisioning has been significant problem:
- In one organisation the platform team insisted on allocating IBM P-Series LPARs with capped capacity entitlements resulted in a peak-average guest utilisation of approximately 13% across the P-Series AIX UNIX estate. Introduction of change control and tightly coupled collaboration with the capacity and platform teams resulted in the following benefits:
- Capacity planning process being fully adopted and complete provisioning control on shared infrastructure being adopted by the Capacity team.
- Significantly increased ‘sweat’ of the processor cores way above the baseline 13% peak-average.
- Use of dynamic LPAR’s with extensible boundaries and shared LPARs for small services.
- Greater visibility of AIX capacity requirements in procurement cycle and improved discount per CPU core.
- Increased confidence in the capacity process.
- An organisation with a large shared SAN fabric experienced proliferation of SAN disk space allocation as a consequence of over-sizing, over-allocation, excessive data replication, inappropriate RAID strategies and other factors. The following benefits were realised once capacity provisioning controls were introduced:
- Honed project disk capacity sizings that were more in tune with the business requirements.
- Significantly improved visibility of emerging project requirements offering the opportunity to intervene at an early point before SAN designs were signed-off.
- Improved infrastructure standards for disk provisioning.
- Capacity reporting and forecasting.
- Significant disk reclamation from the SAN estate by identifying and removing redundant data.
- Cost savings as a consequence of procurement deferral.