VMware CPU-related Basics

When organizations move from a physical server infrastructure to a virtualization based infrastructure they find that one of the hardest things to do is to troubleshoot performance issues.  In the glory days of intel servers, where every application ran on a single physical server, troubleshooting was so easy.  However in today’s virtual environment with complex hardware, ever changing applications, constantly updating operating systems and firmware the task is daunting and not for the faint of heart.  In this first post in a series we begin tackling basic performance troubleshooting.

With 25+ years of experience in the x86 arena and with 12 years experience tuning VMware the one thing that I have learned is that nothing is as it seems.  Working to systematically isolate the issue is what has to occur in every situation.  Each engagement has to start out with a broad view of your total environment with tenacious work to methodically narrow the extent of the research as possible sources of performance problems are eliminated.

I have worked with engineers who are experts in troubleshooting specific components such as storage or networks.  Whenever they experience a performance problem, they like to begin by taking a narrow look at the components where their level of expertise resides.  This expert knowledge of a narrow area oft times leads to them getting bogged down doing detailed analysis of one component of the whole environment, while the root cause of the issue is actually somewhere else in the infrastructure.  This is not to say that a detailed analysis of key components is not warranted, but in the vast number of cases, I find that a faster resolution to the issue is obtained by starting broad and narrowing the scope vs starting narrow and broadening the scope.

In this first post we will look at basic CPU troubleshooting.

 

Tools:

The tool that I normally use to troubleshoot is the vSphere client, which is my primary tool.  In some situations I will use the esxtop and resxtop utilities if I need detailed performance from a single ESX host.  Operating system specific tools inside a VM are generally a bad idea to use in troubleshooting performance especially in cases of over-commitment.

 

Resource Pool CPU Saturation:

  • If you have resource pool’s configured in your virtual datacenter, check for CPU saturation esp if you have CPU limits set – are they close to the limit value?  If they are not close to the limit, you do not have a Resource Pool CPU saturation issue.  If yes check for high Ready Time.
  • If the performance problem is specific to one VM in the resource pool, use that VM to check for high ready times.  If you find high CPU ready times you have a Resource Pool CPU saturation issue which needs to be corrected.

 

Host CPU Saturation:

  • Check for high host CPU usage.  If average usage or peaks are below the norm you do not have host CPU saturation.
  • If usage and peaks are above norm and you have a HIgh CPU ready times are high then you have host CPU saturation which needs to be corrected.

The normal root-cause of cpu saturation is simple.  Your host does not have enough CPU cycles to service the work loads of the virtual machines it is servicing.  There are a few scenarios where this typically happens with each having it’s own solution.

  • Host with large number of VMs, all with low to moderate cpu requirements
  • Host with few VMs, all with high cpu requirements
  • Host with a mix of VMs with a similar mix of high and low cpu requirements

 

Guest CPU Saturation:

  • Check for high guest cpu saturation.  If average usage and peaks are below the norm you do not have guest cpu saturation.
  • If usage and peaks are above norm you have a guest cpu saturation issue which needs to be corrected.

Guest cpu saturation happens when operating system and application on a VM utilize all of the cpu cycles that the host machine provides it.  The identification of a guest cpu saturation does not necessarily mean there is a performance problem.  There are many cpu intensive applications that will use 100% of the cpu cycles it has access to, likewise many application experience peaks when under high workloads.  However, if you find that when your VM is processing a peak load it is experience cpu saturation then a performance issue is occurring which needs to be corrected.

 


Next post in the series:  Basic Memory Performance Troubleshooting

Advertisement
This entry was posted in VMware and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s