1. Introduction

Accessing data on traditional hard disks might seem straightforward at first glance. We store files, and when we need them, we request the data from the corresponding disk. However, behind the scenes, a complex process determines the order in which these requests are serviced. This process is known as disk scheduling.

Disk scheduling is a fundamental concept in operating systems and storage management. It deals with arranging the order of read and write requests to a disk to improve overall performance. Since mechanical hard disks rely on a spinning platter and a moving read/write head, the time it takes to read or write data depends heavily on the positions of that data on the disk. Different scheduling algorithms have various goals, such as minimizing delays, reducing wear and tear, and enhancing throughput.

In this tutorial, we’ll explore disk scheduling, why it matters, the underlying principles behind it, the common algorithms used, and how these approaches relate to modern storage technologies.

2. Why Do We Need Disk Scheduling?

Hard disk drives (HDDs) are mechanical devices. They contain spinning platters coated with magnetic material and an actuator arm holding read/write heads that move over the surface of these platters.

When we request data from an HDD, the disk must perform three main actions:

  1. Seek: Move the read/write head to the correct track with the data
  2. Rotational latency: Wait until the spinning platter rotates the requested sector under the head
  3. Data transfer: Once the head is in the required position, read or write the data

Among these steps, seek time and rotational latency are the most time-consuming, often dominating the overall disk access time. Since physical movements are slow compared to electronic components, optimizing the order of requests can significantly improve performance.

If the disk receives multiple requests spread out across different areas, serving them in the order they arrive might cause the head to jump back and forth. This random movement leads to long delays for some requests and reduces the system’s throughput.

Disk scheduling tries to minimize these unnecessary movements. By reordering requests, we can reduce the average seek time, provide fairer service, and ensure that no single request waits indefinitely. Without careful scheduling, a queue of disk requests can become chaotic, leading to poor response times.

3. Key Metrics in Disk Scheduling

Before we dive into common scheduling algorithms, let’s consider what we aim to optimize. The typical goals of disk scheduling include:

  • Minimizing the average seek time: the time it takes for the read/write head to move from its current position to the position of the requested track
  • Minimizing the average waiting time: how long each request waits in the queue before it is serviced
  • Maximizing throughput: serving as many requests as possible in a given period
  • Fairness: ensuring that we process all requests eventually and leave none to wait indefinitely

Some algorithms focus on pure performance, while others try to maintain fairness, but we usually want to balance these goals. The best approach depends on the environment and context. Is the machine a single-user desktop or a multi-user server? Does it run specialized applications, such as database management systems, or not? We must factor in all those details when deciding which algorithm to choose.

4. Traditional Disk Scheduling Algorithms

Over the years, several disk scheduling algorithms have been developed. Although modern systems and solid-state drives (SSDs) have changed how we think about storage performance, it’s still valuable to understand these classic algorithms.

4.1. First-Come, First-Served (FCFS)

The simplest method is first-come, first-served. As its name suggests, we service disk requests in the order they arrive, without any reordering. While this is fair (no request gets priority over another), it can lead to poor performance:

FCFS

For example, say we have requests at track 10, then track 190, then track 20. If the head is currently on track 0, servicing the requests in the arrival order means we go from 0 to 10, then jump all the way to 190, only to return to 20. The overall seek time is large, and we might waste time crisscrossing the disk surface.

The advantage of FCFS is its simplicity. However, in practice, FCFS often leads to high average seek times and is generally inefficient.

4.2. Shortest Seek Time First (SSTF)

The shortest seek time first (SSTF) tries to reduce overall movement. At any given time, it chooses the requests closest to the current head position. By always processing the nearest request, this algorithm reduces the seek time:

SSTF

Continuing our earlier example, if the head is at track 0 and we have requests at 10, 190, and 20, SSTF first goes to track 10 (since 10 is closest to 0). Then, of the remaining two (20 and 190), the closest is 20. After processing the request at 20, only 190 remains. This saves significant seek time compared to FCFS.

However, SSTF can cause starvation. If there’s always another request closer to the head than a specific distant request, the distant request might wait indefinitely. So, while SSTF improves average performance, it can be unfair in certain workloads.

4.3. SCAN (Elevator Algorithm)

SCAN is often described as the elevator algorithm because it works similarly to how an elevator moves in a building. The head moves continuously across the disk in one direction, servicing all requests in its path. Upon reaching the end, it reverses its direction and services the requests on the way back. This pattern repeats, just like an elevator going up and down, picking up passengers along the way:

SCAN

For example, if our disk is laid out from track 0 to track 200, and the head starts at track 50, moving towards higher-numbered tracks, SCAN will service all the requests from the current position up to the maximum track number, then reverse and service requests on the way down.

SCAN reduces the worst-case waiting time and provides a better sense of fairness than SSTF. Requests are serviced in an orderly manner as the head passes by. However, requests near the edges might still wait longer since the head only reaches them once per full pass.

4.4. C-SCAN (Circular SCAN)

C-SCAN, or circular SCAN, modifies SCAN to provide more uniform wait times. Instead of reversing direction after reaching the end, the head jumps back to the start (like a circular buffer) and continues in the same direction.

This way, all requests wait, at most, one full pass of the disk. C-SCAN provides a more consistent performance profile and is often considered fairer than SCAN.

4.5. LOOK and C-LOOK

LOOK and C-LOOK are variants of SCAN and C-SCAN that don’t travel to the edges of the disk if there are no requests out there. The head goes only as far as the furthest outstanding request in the current direction, then reverses or jumps back. This optimization can save time if no requests are waiting at extreme ends.

So, LOOK behaves like SCAN but stops where the last request in a direction is located, while C-LOOK behaves like C-SCAN but only travels between the extremes of the actual requests. This makes them more efficient in many real-world scenarios.

5. Comparison

No single algorithm is suitable for all situations. Each approach has strengths and drawbacks, and the choice often depends on the workload, priorities, and specific environment.

Here, we summarize the key differences and trade-offs:

Algorithm

Key Advantages

Key Disadvantages

Best Use Cases

FCFS

Simple and fair in terms of the arrival order

Can lead to long wait times and inefficient head movement

Environments where simplicity and strict arrival order fairness are a priority

SSTF

Minimizes average seek time by serving closest request first

Starvation risk for distant requests

Systems where reducing seek time is crucial, and workloads are well-distributed

SCAN

Predictable pattern, reduced worst-case waiting time, more fairness than SSTF

Requests at extremes may still wait longer

General-purpose systems needing balanced performance and fairness

C-SCAN

More uniform wait times than SCAN, better fairness for all requests

May not always minimize seek times as efficiently as SSTF

Environments where fairness and uniform waiting are more important than raw performance

LOOK/C-LOOK

Avoids unnecessary travel to disk ends, and can be more efficient in many real situations

Complexity is slightly higher than SCAN/C-SCAN

Workloads where requests cluster in certain areas, improving performance by not scanning empty regions

6. Disk Scheduling in Modern Systems

Modern computers frequently use SSDs rather than HDDs, or at least a combination.

6.1. SSDs and Scheduling

SSDs don’t have moving parts. Instead, they store data in flash memory cells. This changes the nature of disk scheduling because seek time and rotational latency are essentially zero on SSDs. The position of data doesn’t matter as much as it does on a spinning disk.

However, disk scheduling can still play a role. Certain operations on SSDs, such as writes, can be more expensive and wear down the cells. Therefore, we need scheduling strategies to distribute writes evenly or batch them to reduce overhead.

6.2. Scheduling in Multi-User Environments

Additionally, in multi-user or server environments, even with SSDs, some form of scheduling may help balance load, ensure quality of service, or handle priorities.

Traditional scheduling algorithms were designed to solve problems of mechanical latency. While we may not need SCAN or SSTF in the same way for SSDs, the concept of managing I/O requests intelligently remains valuable.

7. Real-World Considerations

In real operating systems, disk scheduling isn’t just about the algorithm. Other factors come into play:

  • Caching and buffering: The operating system and hardware use caches to reduce the number of actual disk accesses. If data is already in memory, we might not need to wait for a disk operation at all
  • Request merging: If two requests are close or overlapping on disk, the OS might merge them into a single larger request, improving efficiency
  • Priority and deadlines: Some requests might be time-sensitive; the OS can assign priorities or deadlines, ensuring that certain critical operations are not delayed too long
  • Workload type: Random small requests from many sources might benefit from one algorithm, while large sequential reads (like streaming a video) might require another approach

Modern OS schedulers incorporate these factors to optimize disk performance dynamically. So, the disk scheduling algorithm is only part of a bigger puzzle that includes memory management, file system structure, and driver optimizations.

8. The Future of Disk Scheduling

As storage technology evolves, the role of traditional disk scheduling might diminish in importance for SSD-based systems. With no mechanical parts, most of the complexity that motivated these algorithms disappears. However, scheduling might still matter for other reasons:

  • Managing write operations to extend SSD lifespan
  • Ensuring fairness in multi-tenant cloud environments
  • Optimizing for energy usage, prioritizing requests to minimize power consumption
  • Handling large-scale storage clusters, where network latency and distributed file systems introduce new scheduling challenges

In technologies like NVMe-based SSDs, where multiple I/O queues are supported, scheduling is about managing queues and priorities rather than handling mechanical latencies. Some scheduling algorithms have been designed specifically for these devices. In this subfield, the focus is on quality of service, latency guarantees, and device lifetime instead of minimizing seek time.

9. The Impact of Disk Scheduling on System Performance

We might wonder how much difference disk scheduling can make. On a system with a fast SSD, the possible improvement might seem negligible, but on traditional HDDs or in large-scale systems where disks handle thousands of requests per second, a good disk scheduling strategy can significantly improve throughput and response times.

Disk scheduling can influence how quickly queries return results in database servers, for instance. Busy file servers can affect how long users wait to open files. Even on personal computers, a more efficient scheduler can mean smoother multitasking and less waiting when multiple applications access the disk simultaneously.

10. Conclusion

In this article, we discussed the concept of disk scheduling.

In short, disk scheduling is all about making our storage subsystems more efficient. By understanding what it means and how it’s done, we gain insight into the intricate interplay between hardware capabilities and software intelligence. Ultimately, disk scheduling helps us get data off the disk and into memory as fast and fairly as possible, improving the overall computing experience.


原始标题:What Does Disk Scheduling Mean?