Over the years we’ve built numerous batch capacity models. Some clients have problems with batch suites taking too long and want to know which jobs need optimisation. Some need to accurately forecast the impact that increasing business volumes will have to their batch SLAs. In general the technique is roughly the same: model the relationships and dependencies between jobs, identify the key jobs and produce forecasts for them as necessary. Then the whole thing comes together and you can use the model to simulate the batch run for a given date.
What we’ve found though is that the relationship modelling part isn’t as simple as it sounds. There are a huge variety of different relationships and constraints amongst jobs in a batch suite, and the more there are the more complex it is to model.
Also often people aren’t aware of all the constraints that are in play until you start digging into it. So we thought we would share some of the constraints we’ve come across and modelled in the past – do post a comment if you’ve met any others:
- Scheduled start times – some jobs will be set to only run on (or after) a particular time in the day
- Job success dependencies – these are the basic building block of the batch, some jobs can only start after others have finished
- Job not running dependencies – some jobs can only run if others are not running, but don’t need to wait for them if they haven’t even started
- Job groups or applications – whatever they are called you probably have groups of jobs within your batch, or jobs which consist of other jobs, and these groups will probably also have dependencies between themselves
- Concurrency limitations (of jobs or job groups) – these may be a configuration setting or a limitation on the number of available threads, but can have a massive effect in extending the batch run time
- File locking – two or more jobs all want to write to the same file, the other jobs all have to wait while the first accesses it. This can be tricky to anticipate, as the result all depends on which job gets there first
- Feeds from other systems – often these are the culprits of batch delays and SLA breaches, which is always a bit frustrating as often there’s not much you can do about it directly. In these cases there is a real need to tighten up upstream SLAs or OLAs
- Date and day of week conditions – some jobs or sets of jobs only run on certain days, so it’s important to take this into account when forecasting
Other things to consider might include the scheduling priority mechanism of the batch, also of course any capacity bottlenecks in the underlying hardware.
Once all the constraints are modelled all you need to do is forecast the run times of the individual jobs… maybe that merits a post of its own someday!