Designing and building an effective capacity management function has never been more challenging: the competition for talented staff is high, as is the current workload of projects and 'business-as-usual'. At the same time salary and training budgets are often constrained.
Follow our ten-step plan to designing and building an effective capacity management function; wherever your starting point and whatever your challenges you will find useful techniques in this article.
Overview
The Ten Steps:
- Capacity Management Team
- Capacity Planning and Modelling
- Align Resources with Service Importance
- Build a Single Capacity Management Information System
- Use the Web for Capacity Reporting
- Publish Reports on Key Systems
- Automate Processes Where Possible
- Define Scope of the Capacity Management Function
- Establish Authority of the Capacity Management Function
- Audit the Staffs Skills and Define Training Plan
1. Capacity Management Team
There should be one centralised capacity management team that covers the proactive capacity planning and reactive capacity management of all IT platforms within the enterprise. This team should be divided into two functions, with staff focused only on their function: firstly, project-facing staff (the 'capacity consultants') who deal with all incoming capacity requirements for new and amended services; secondly, the platform experts (the 'capacity service-owners'). Each platform expert should be assigned responsibility for one, or several, end-to-end services and should be cross-trained on the other platforms across which their services traverse.
Additional areas of interest include the need to include performance management (e.g. modelling and tuning) within the same team. If this is not possible organisationally then a strong inter-team relationship should be established.
2. Capacity Planning and Modelling
One of the difficulties with offering capacity management to all IT services within any medium to large organisation is getting the time to undertake a thorough analysis of a service or platform to build an accurate capacity plan. The building of a capacity/performance model is extremely useful in this situation, as it allows quick analysis of the impact of changes. The time spent on modelling of any high or medium priority service saves considerably more time in repetitive analysis when various scenarios are considered.
3. Align Resources with Service Importance
IT has developed to become more mature and aligned with 'the business' in recent years. With this as a driver and the move away from centralised computing platforms, capacity management functions have had to face a difficult change, moving from resource-based capacity management to service-based capacity management. However, no capacity planning function in any sizable organisation can plan all the services that the business uses. Therefore some level of prioritisation-based rationing is needed. The level of resources that a capacity management function should allocate to each service should be proportional to the importance of the service. There should be a minimal amount of Capacity Management done at the operational level (i.e. alerting against component consumption) across all services and infrastructure with value being added for more important services (i.e forecasting, service mapping, resource and performance models).
4. Build a Single Capacity Management Information System (CMIS)
A factor which always prevents effective capacity and performance management is the inability to analyse data relating to a single cross-platform service. This is usually not because the data is not collected, but because it resides in disparate locations becomes difficult to collate and analyse. The objective is to be able to easily retrieve a consistent sub-set of data relating to a particular service or transaction. This objective does not require a single database as such, it only requires a single meta-database containing views which are linked to the underlying data, the Capacity Management Information System (CMIS). This immediately negates some common problems that building a single database introduces:
- Building a large central database and the storage of duplicate data - this central database will be enormous
- Transferring the data to a single location - even if undertaken out of core business hours, may impact batch transfers, etc.
- Data integrity issues - if the central database contains incorrect data it is unlikely to be updated
By using a meta-DB it is possible to run live queries across several databases without transferring the entire data set from each platform, reducing the amount of unnecessary data; whilst ensuring the data is up to date and maintaining data integrity.
5. Use the Web for Capacity Reporting
One area that consistently causes capacity staff resourcing issues is that capacity and performance teams are expected to supply ad-hoc analysis on systems and services, often at short notice, to resolve incidents and problems. This can weigh heavy on the capacity and performance function as it tends to supersede any other work. One method of off-loading this frequently onerous duty is to provide many pre-configured reports on the intranet. This reduces the time spent on unnecessary reporting. Ideally all reports should be automated and on-line to remove this burden from the capacity and performance staff.
6. Publish Reports on Key Systems
One method of raising the functions profile positively is to provide a regular report on a key cross-platform service. This should contain all the resource data that can be retrieved from as many platforms as possible (e.g. WAN, LAN, server, workstation, etc.) in one consolidated but readable report. Most importantly, in addition to demonstrating the collection of resource metrics the report should contain any service metrics available, such as business transactions, by type, and their response times. This type of report, circulated to the appropriate senior management, will raise the profile of the capacity and performance function positively as it demonstrates a comprehensive, proactive and professional approach to relating to the business.
7. Automate Processes Where Possible
Many capacity and performance processes and activities consist of repetitive work that can easily be automated, such as generation of graphs from data, running capacity models and pulling disparate data together into a single report. It is wasteful for expensive resources, such as capacity and performance staff, to undertake work that can be automated easily.
8. Define Scope of the Capacity Management Function
The capacity management function should be well defined and understood across the entire IT organisation. Examples of scope issues include:
- Are all technologies included, e.g. TCP/IP routers as well as Mainframes?
- Are all resources included, e.g. data centre power and space?
- Are virtual resources included, e.g. VMWare guest partitions?
- Is application performance included?
9. Establish Authority of the Capacity Management Function
An area where the capacity management function can add considerable value is as part of assessing new software and systems prior to implementation. While application sizing is invaluable as a proactive service the capacity management function is not often given any power of sign-off on whether applications or service is 'fit-for-purpose'. Review and approval of new services and systems should be considered as a primary role of the capacity management function, and so it should be given the power of sign-off. Any recommendation not to proceed may still be over-ridden by the business when they want, but by providing accurate assessments of the impact of roll-out the capacity management function will gain respect and then authority.
10. Audit the Staffs Skills and Define Training Plan
Capacity and performance staff require a very specific technical skill set and an appreciation of how business relates to the IT services it uses. Examples include: statistical analysis, queuing theory and trending. These skills are the mandatory minimum and precede any knowledge of individual platforms or technology. Staff should be audited to ensure they have the relevant skills for their roles, and level of seniority within the function, and training plans put in place where gaps exist. This should be coordinated with any defined career path or succession plan within the function.