Windows Server Monitoring Paper

"The system's really slow today!"

How often have you heard that? Finding the solution isn’t so easy. The obvious questions to ask are why is it running slowly and what can you do about it? An even better question is how can you tell that a server is beginning to reach its limits in time to do something about it before system performance and business productivity start to suffer? Enter from stage left, Server Performance Monitoring.

As with most things in life, computing resources are finite and there is a limit to how much can be done in any given period of time by any system. The key to understanding performance issues on servers is to know:

  • What are the key resources that might be the limit on the overall performance of the system?
  • What should be measured in order to assess their utilization?
  • What should be done if there are signs of overload? Some things are easier to rectify than others and knowing when and how to upgrade or replace systems or components is important to ensure that any investments that are made will actually solve the problem that is being experienced.

Unfortunately, there is no single solution that will address all performance issues. It depends on several factors including the application mix, numbers of users, the hardware itself and external factors such as network topology. Of the many parts of a server, there are four key elements that in practice tend to influence the performance of the system as a whole:

  • Processor throughput
  • Memory capacity
  • Disk I/O throughput
  • Network I/O throughput

Each of these will have their limits and the overall performance of the server will be determined by which of them is exhausted first. Table 1 below shows typical situations in which each resource type is likely to be in high demand and factors that might cause issues that are unlikely to be the result simply of under-specified hardware.

Resource Type

High demand usage pattern

Potential design and implementation issues

Processor

Mathematical computation, modelling, simulation

Poor application coding, inefficient algorithms

Memory

Heavy application load, high numbers of users

Too many applications sharing a server

Disk I/O

Large server databases, frequent copies between physical volumes

Poorly indexed databases, overloaded I/O channels, backups over-running

Network I/O

Streaming media, heavy file sharing load, file-based databases (e.g. Access),

Network topology that causes bottlenecks, poor network segmentation, can indicate malware infection

Table 1 - Server Resource Demands

So, enough of the generalities. Where should you look and for what should you be looking in each case?