Centralised monitoring tools are very useful for collecting data from a large server estate.
However for the purposes of capacity management and performance analysis, you need easy access to the raw data. Often these tools have an interface which is both slow and convoluted - it takes a long time to work out how to get a raw data export, as long again to run it and when you finally get it out it's not in the format you wanted it. And if you want to automate analysis a csv export isn't what you want anyway - you want access to the raw data in a database.
A lot of tools (particularly opensource ones) seem to use an rrdtool database which isn't designed for anything except graphing. You can get the data out but SCOM (or Microsoft System Centre) is one tool that is better in this sense in that it has a fairly nice SQL server database underneath it.
We looked into the database structure for connecting our OCM analysis pack to it (see Intelligent Automated Alerting in Practice). I thought I'd share a few tips on how the performance data is stored and how to retrieve it.
The database which contains all the SCOM data is called OperationsManagerDW.
First of all it has a whole bunch of tables containing all sorts of information about alerts, 'ManagedEntity's (servers, etc) and then finishing with a collection of tables starting with Perf. This is where the performance data is stored, you can see that they are called PerfDaily, PerfHourly, PerfRaw depending on the granularity of the data. The data is split between tables so that the table sizes don't get too big - thankfully you don't need to work with the raw tables as the database comes with a predefined set of views for SCOM to retrieve the data from - and these are what we use as well.
The most important views are:
vManagedEntity - contains the collection of all the servers, instances, disks; basically anything for which you are measuring performance and capacity metrics. The important colums are
- The ManagedEntityRowId contains the id for the entity - this is important as this will provide the key into the actual performance data
- TopLevelHostManagedEntityRowId - if this particular entity isn't a host (eg. if it is a disk drive) this refers back to the entity which is the actual server
- FullName gives us a full reference name for what it is we are monitoring
vPerformanceRule - this view gives us all the different 'rules' or metrics which might be being monitored on any of the entities - for instance there is one for Process/Percent Processor Time. Key columns are
- RuleRowId: used to identify the performance metric
- ObjectName and CounterName - defining the metric to be collected, as with standard windows metrics you would find in perfmon etc
vPerformanceRuleInstance - this provides the Instance part of the metrics (e.g. for Logical Disk (C:)/Free Megabytes C: is the instance) and links the vPerformanceRule view to the tables containing the actual data. Columns:
- PerformanceRuleInstanceRowId - the Id for this particular instance related metric which will appear in the data tables
- RuleRowId - referring back to vPerformanceRule
- InstanceName - the name of the instance (e.g. C:)
Perf.vPerfDaily / vPerfHourly / vPerfRaw - these tables contain the actual time series data which you extract, graph, and analyse. The columns are fairly self explanatory and link back to the tables already mentioned:
- DateTime: The time the metric was sampled
- PerformanceRuleInstanceRowId - linking back to the vPerformanceRuleInstance, so identifying the metric and instance being measured
- ManagedEntityRowId - linking back to ManagedEntity, identifying the server or whatever it is
- AverageValue/MinValue/MaxValue - the summarised values
You can now construct a query to get back, for instance, CPU data for every server between two dates (knocked together quickly to show how it all links up, you could probably write one more elegantly):
SELECT FullName, InstanceName, DateTime, AverageValue
FROM OperationsManagerDW.dbo.vManagedEntity,
OperationsManagerDW.dbo.vPerformanceRule,
OperationsManagerDW.dbo.vPerformanceRuleInstance,
OperationsManagerDW.Perf.vPerfHourly
WHERE vPerfHourly.ManagedEntityRowId = vManagedEntity.ManagedEntityRowId
AND vPerfHourly.PerformanceRuleInstanceRowId = vPerformanceRuleInstance.PerformanceRuleInstanceRowId
AND vPerformanceRuleInstance.RuleRowId = vPerformanceRule.RuleRowId
AND vPerformanceRule.ObjectName = 'Processor'
AND vPerformanceRule.CounterName = '% Processor Time'
AND DateTime > '2012-11-01'
AND DateTime < '2012-11-08'
ORDER BY FullName, InstanceName, DateTime
Hope you found that useful.