Our methodology for managing Unix systems takes a top down approach by providing two reports. If more detailed information is required, the RGS product can be used to browse information such as process, processor, network, device and file system.
The methodology for managing Unix systems starts with the All System Summary Report (please see page 6 for a full example). We have selected the critical metrics for ongoing management.
 Click on the picture for the full image.
This report focuses on the percentage of capacity used across all systems, IO and disk information. By quickly looking down the list you can identify any systems that might be running close to capacity or suffering at the IO level. To assist with understanding the capacity calculation, additional information regarding the percentage of cpu used, percentage of headroom used and percentage IO bound is provided.
If the list of systems is extensive, the list could be filtered or sorted to remove less significant systems from the report. Part of the beauty of the RGS product is that it is “soft” and things can easily be changed.
If further action is required, perhaps there is a system which is reporting a high % capacity used figure, a second report called Single System History Report (please see page 7 for a full example) is provided.
 Click on the picture for the full image.
The format of the report is identical to the System Summary Report but instead of looking at all systems for a given time period it looks at one specific system over time. This not only proves to be very useful for establishing if the system is truly out of capacity, but also for looking at trends or one-off occurrences.
If the Single System History Report highlights any issues that need to the analysed further then it is time to go to the RGS product.
RGS On-Screen Analysis
The Single System History Report described above shows that there are a few days when the system in question was over capacity. To investigate this in more detail it is necessary to browse detailed data using the RGS product on the PC.
Having identified that the system was running over capacity, our recommended approach is to get a feel for the activity on the system over the course of 24 hours. By viewing the data “on-screen” you can see the capacity used each hour on Monday 18th October as follows:
 Click on the picture for the full image.
This clearly shows that the percentage of used capacity is sporadic through most of the day, but at 16:00, something starts and from then on, the system is out of capacity.
The next question is “what caused this?”. By opening the “Process” data, this will show all the processes that were running. The following screen shows the Process data:
 Click on the picture for the full image.
In the process data at 16:00, there are two copies of the process called “data server” using most of the processor.
In summary, by using this methodology it is possible to quickly get a measure of capacity used across a large number of systems and if necessary it is possible to drill down through the information for further more detailed analysis. |