back to top

Back to Products and Services

RG Solutions® for VMWare ESX : Providing a comprehensive view of the VMWare ESX Server platform.

RG Solutions® incorporates its own Capacity Management database which is structured to provide top-down analysis and trending that can cover many months or years of information from multiple physical servers. RG Solutions® for VMWare provides accurate management and technical answers and the information required to answer questions like these:

  • Are we making full use of our VMWare servers ?  
  • Can we add another VM, assuming that it is similar to an existing VM and if not, why not?
  • We are experiencing performance issues in one of our VMs; is this a guest operating system problem or is it due to contention at the VMWare level?
  • On four occasions this month, we experienced response-time problems in several VMs; is this a VMWare issue?
  • Can we optimize usage and minimise costs, by “juggling” VMs?

In addition to its powerful ad-hoc data analysis features, RG Solutions® has strong support for automatic analysis and report production. For example, the arrival of raw data can be used to trigger data import and post-import report production and distribution.

The objective now is to show how RG Solutions® for VMWare can be used to manage a VMWare installation, here are some of the things we will illustrate:

  • How to get an overall view of usage of each physical platform and how this varies across the day.
  • How to compare performance on different days for the same system or different systems on the same day. This is most easily done using Summary data either directly in the RG Solutions® browser or in reports.
  • How to determine usage trends.
  • Identification of the peak period and quantification of usage at that time.
  • The ability to “drill down” into the detailed data to investigate issues or to examine data from the perspective of specific Virtual Machines (VMs).
  • The ability to profile whole systems or individual VMs in terms of their resource usage and daily profile.

See the example Analysis and Reports for more detail.

Example Analysis and Reports

The following charts present an example analyses for the main detailed data categories and provide an overview of how the data can be used to manage your VMWare ESX installation.

Physical CPU Chart

The following chart illustrates the primary information that is provided in this category – processor %Busy.

This is a quick and easy way to get an overall view of processor usage for the physical platform.  In this case, there are eight physical processors and we are using the equivalent of about two and a half of them at peak times on this day (so the system as a whole is about 32% busy). 

We note that CPU-0 is busiest at about 32% across the day, followed by CPUs 4, 1 and 5 at about 18% busy, then the remaining processors at about 15% busy.  This information might be useful when fine-tuning processor allocations and exploring anomalies.

 

Group CPU Data

The next chart provides a summary of processor usage across the day for a specified server (in this case it was VHost2 on 6 March).  Usage levels are given as percentages of one single physical CPU. 

Usage:

We can see at a glance the cumulative usage by all virtual machines:

  • The report quantifies the number of CPUs that are required to support the current workload.
  • The peak period (which dictates the required size of the system) has been identified.
  • The report highlights and quantifies “lost time” (Pct Ready) i.e. the percentage of time when a VM cannot run due to contention.
  • We can see how usage varies over the day.  Since we must size the system to cope with peak usage, we can potentially save costs if we can: 
    • Reduce the peak by swapping one or more VMs to another, more suitable server.
    • Add other VMs that can take advantage of the troughs.
  • The report can be used to establish the “normal” profile so that you can easily spot anomalies (which can be investigated by drilling down into the data) and track growth.

There are other items that could be included in the above chart (or companion charts), but %Used and %Ready are generally the most significant metrics.

The following chart is similar to the chart immediately above but this time, we are focussed upon the peak hour and have quantified the usage of individual VMs.  Once again, we have included Ready time (as well as normal processor usage) to emphasise its importance in the context of virtual systems.

Usage:

  • It is fairly obvious that there are two VMs that stand out, because their usage is significantly higher than the rest.  This does not necessarily imply a problem, but it does suggest that these VMs might need to be treated differently (in terms of allocated resources and priorities) and perhaps they should be moved to a different host?

  • Consider Ready time; at first glance, you might think that it is reasonable, but look a little closer: it seems fine for the two dominant VMs, but what about the smaller ones?  The amount of ready time accumulated by the smaller VMs is small in absolute terms, but in some cases it appears to represent a significant proportion of their total used time.  At this point, you might want to take a closer look. There are several ways of doing this, for example:

    • Using one or more list or chart reports.

    • By delving into the more detailed (VCPU) data.

But perhaps the quickest and more usual approach is to view the data directly in the browser.

 

In relative terms, the amount of Ready time is still very small for the two heaviest users (ACCOUNTS and PERFORMANCE), but for the other VMs, it represents about 30% of the amount of useful processing (Pct Used).  In practice, this is probably not a problem, since demand is very low in these VMs.  Nevertheless, we have illustrated how very easy it is to track and investigate these kind of issues with RG Solutions.

 

 

 

In the next chart, we examine usage over a complete day.  The profile is similar®, but not identical to that of Figure 5.  The same two VMs are dominant, but this time, the difference in usage levels between them and the others is much less. 

 

 

Virtual CPU

If we need to know more about the resource consumption of one or more VMs, we can turn to the VCPU data class.

In order to illustrate the kind of information that is available, let us continue to investigate the high-usage VMs that we referred to in section earlier.  Although we can use previously-prepared reports to examine this data, in practice, most of our work in this area is likely to be ad-hoc when there are more than just a few VMs.  RG Solutions® can help us to examine large volumes of data, from multiple perspectives relatively easily.  Let’s start by looking at the two “big” VMs in the browser.

 

 

The following illustration shows the initial view of the VCPU data.  Information is summarised at hourly intervals, and we can use this to see how usage levels overall vary across the entire day. Assuming that we didn’t already know that the peak hour was at 14:00, we could just click twice on the “Pct Used” heading to sort the information into descending order. This simple “trick” can be a real time-saver when dealing with voluminous data.

 

 

 

We can now open up the 14:00 node and the two nodes associated with the VMs ACCOUNTS and PERFORMANCE as shown above. There are four items in each of these VMs: vmware-vmx, mks, vcpu-0 and vmm0 (vmware-vms itself has two sub-items).  The names are the same as those that you will see at the VMWARE console.  The major component in both VMs is vmm0 which is the single virtual CPU assigned to each of these VMs.

 

 

 

If we wanted to determine how the usage of the virtual CPU that is associated with the PEFORMANCE VM varied across the day, we could use the RG Solutions® search facility to display records associated with Group-Id 2225 (see the Group Id column).  Note that we can switch back and forth between the original data and the search results by clicking on the tab at the top of the figure.

 

 

 

 

 

Alternatively, if we wanted to view all of the processes associated with the PERFORMANCE VM, we could search for all items in that Group.  

 

 

Memory Data

The following illustrations presents a subset of the information that describes memory usage.  In our sample, most of the remaining data values were zero, because the systems that we examined were not placing any strain on the memory management function so we have omitted a number of metrics.  All measurements are in megabytes.

The left hand chart shows Physical memory: a relatively modest amount of memory is allocated to the Console and Kernel; the rest is either free or allocated to VMs.

The middle chart shows memory as either Reserved (the amount of memory that is committed to Resource Pools at the time of the sample) or Unreserved (available for guaranteed allocations to new VMs as they are powered on).  It also shows the minimum amount of memory that the Kernel will try to keep free (MinFree).

The right hand chart shows memory usage from the perspective of memory page sharing: how much physical memory is being shared (Shared), how much is common to all VMWare “Worlds” (Common) and how much memory has been saved by just maintaining one copy of “common” memory.  Shared = Common + Savings.

 

The following illustration provides an alternative view of the information contained in the leftmost two panels of the chart above.

 

 

Group Memory Data

The Group Memory data describes memory usage from the perspective of VMWare “groups”, which in most cases refer to Virtual Machines and other Worlds associated with them.  A number of metrics are provided, but perhaps the most useful are Memory Size (The amount of memory configured for a given VM) and Target Size (The amount of memory to be allocated, based upon recent usage – including the memory overheads of VMWare itself.

The Target Size of all VMs is less than their specified Memory Size, but this is not always the case on VMWare platforms.  It is quite common to find that the Target Size exceeds the Memory Size, because of the (generally small) additional memory overhead imposed by VMWare.

 

 

Network Port Data

 

 

VMWare Network statistics are organised in a tree structure.  Devices (typically virtual switches) sit at the top of the tree, and below that we have Ports and Port-users.  Ports may be linked to either a physical or a virtual network interface card (nic). Port-users include VMs.  The RG Solutions® Network Port Data Class reflects this hierarchy.

 

 

 

 

 

We can use a report or the product’s search capabilities to select information relating to a specific VM , but we have also provided an optional ordering (Data Set) for this data that sorts the information by User instead of Device. 

 

 


 

Physical Disk Data

 

 

Like Network Port information, Physical Disk information is organized according to the hierarchy that is used by VMWare: Adapter, Channel ID (CID), Target ID (TID), Logical Unit ID (LID), World ID (WID). 

 

 

 

 

 

In the left hand screen shot, we observed that the highest read rate occurred in LID 15 (134.67 reads per second), so we opened up that node to find that all of the activity is associated with WID 2258.  We can easily find out which World this is, by searching in the Virtual CPU Data for a Group with a matching Group ID.

As we can see, the activity is associated with the ACCOUNTS VM.

 

Summary

Use our new RG Solutions® VMWare ESX product to examine the feasibility of migrating servers to VMWare ESX and subsequently managing your VMWare ESX environment.  Allowing you to deliver fully optimised virtual environments, and monitor ongoing performance of the hardware and virtual layers.

Back to Products and Services