1) Use more than one measure for SLA response time
There may be a scenario where the service provider can meet either an average or a percentile response time target, but not both. In this case, the customer may experience unacceptable response times.
The classic real-world example that most people can relate to is train punctuality. E.g. A train operator may report that 90% of train journeys are on time; however, 10% of train journeys, which tend to occur during the busy hour, are late. The customers perception of the service is based on the 10% late journeys.
2) Use a meaningful sample for measuring
In certain circumstances, the number of measurements made will not be enough to make a meaningful sample for either average or percentile response times. Define the minimum number of samples needed to get a meaningful sample.
This must be balanced against Heisenberg’s principle, which says, the very fact of observing something changes its nature. Excessive sampling may result in a degradation of service response time.
3) Determine the arrival rate distribution
If this isn’t defined then its possible that the customer can batch requests and send them all at once to the service. The average rate over which the response time is measured will be the same, however the intensity of the arrival rate has a significant impact on the services ability to meet SLA response times.
4) Use an appropriate distribution for calculation response time percentiles
If the average response time target is known, a percentile response time target may be derived using a probability distribution. Two different probability distributions are typically used, exponential and normal.
Typically for a given percentile, the exponential distribution will predict a higher percentile response time than the normal distribution where the normal distribution doesn’t have a large standard deviation. Thus a SLA derived from an exponential distribution will favour the service provider over the client.
5) Determine your average and maximum throughput rates
Service performance cannot solely be measured using response time. Throughput is a key performance measure. When defining a SLA, both average and maximum throughput should be stated. An SLA based on average throughput will impair the service provider from ensuring appropriate capacity is in place to meet the SLA response times.