Hard drives are one of the slowest components in a server as they rely on the physical movement of heads over platters to access the positions at which data is to be read or written. Like most other components of a multi-tasking system, different applications can access disks simultaneously and it is up to the operating system to manage the requests for access to disk resources. As with other shared resources, the operating system maintains a queue for these requests and handles them in sequence, routing the data between the disk controller and the applications requesting disk access.
The raw throughput of a disk drive or array and the associated disk controller(s) clearly have a significant effect on performance but these do not change as the system is used, so it is more useful to look at the amount of time that the disks spend servicing requests, reflected in the percentage disk time. If this value is consistently high it will usually indicate that the disk system is working flat out to process all the data transfers that are being requested.
Another useful indicator of disk performance limiting overall system performance is to look at the read and write queue lengths. As with processor queues, if the counters start to show values significantly in excess of the number of devices (in this case drives) in the system, this indicates that the system is not able to process requests as fast as the system is making them and therefore applications are likely to be held up waiting for data to be delivered from the disks.
The solution to disk bottlenecks will depend on the underlying problem.
- If the system is running multiple different tasks, each of which is making heavy use of the disk subsystem, then there may be little that can be done other than installing faster disks, disk controllers or array configurations. The right answer to the question of which one to change will vary from one case to another. The classic answer is that the three most important things about disk performance are spindles, spindles and spindles. By spreading the activity across multiple physical devices it is possible to increase the overall throughput substantially and the right use of RAID configurations (preferably in hardware rather than software) can be very effective.
- Another, low-cost, approach is to spread applications across more than one server, if there are other machines available, making effective use of the aggregate performance across all the available hardware.
- The use of network disk devices (SAN or NAS) is a further, albeit significantly more expensive, solution, removing the disk I/O from the server to a device optimized for high throughput applications.
- However, hardware may not always be the answer a disk bottleneck may be the result of poor design so increasing the throughput of the disk subsystem may not be that effective. The most common case of this nature is databases that are poorly indexed. Performing searches on database tables that are not well indexed will place a very heavy load on the disks and the impact of correcting such a problem by adding the right indices is dramatic disk I/O can be reduced dramatically and system performance improved significantly.