CPU architecture에서 in-order와 Out-of-order 의 차이에 대해서

The CPU's Dance: In-Order, Out-of-Order, and Why Barriers Still Matter

Ever wondered how your computer's brain, the CPU, handles a flood of instructions? You might have heard terms like "in-order" and "out-of-order" execution. It's intuitive that "out-of-order" means instructions can change their dance steps for speed. But what happens when you find "barrier" commands – instructions designed to enforce strict order – in a processor that's supposed to be "in-order," like ARM's Cortex-A55? This can feel like a contradiction! Let's clear up this common point of confusion.

1. The Two Main Styles of Instruction Execution

Think of instructions as steps in a recipe. How the CPU follows these steps defines its execution style.

In-Order Execution: The Methodical Chef
- How it works: The CPU fetches, decodes, executes, and finalizes each instruction strictly in the sequence they appear in your program. It's like a chef following a recipe step-by-step, one after the other.
- Analogy: A single-lane road where cars must follow each other precisely.
- Pros: Simpler design, generally more power-efficient.
- Cons: Can lead to performance bottlenecks. If one step takes a long time (like waiting for data from memory), the entire sequence grinds to a halt, even if other steps could have been worked on.
- Used in: Many embedded systems, low-power devices, and older or simpler CPU designs.
Out-of-Order Execution (OoOE): The Multitasking Mastermind
- How it works: This is where things get exciting for performance. The CPU still reads instructions in order, but it then uses complex logic (like "reservation stations" and a "reorder buffer") to identify instructions that are ready to be executed, even if they are further down the line. It executes these ready instructions out of their original program sequence. However, critically, it ensures the results are committed back to the program's state in the original order, maintaining program correctness.
- Analogy: A bustling kitchen where multiple chefs can work on different parts of a complex meal simultaneously, as long as the final plating is done in the correct order. Or, a multi-lane highway where cars can overtake slower ones, but they still arrive at their destination in a logical sequence.
- Pros: Significantly boosts performance by hiding latency (e.g., waiting for memory).
- Cons: Much more complex hardware, higher power consumption, and heat generation.
- Used in: Most modern high-performance CPUs (e.g., Intel Core, AMD Ryzen, ARM Cortex-A72 and higher).

2. Cortex-A55: The Efficient In-Order Processor

The ARM Cortex-A55 is a prime example of an in-order superscalar processor. This means:
* It sticks to executing instructions in the order they appear in your code.
* However, it's "superscalar," meaning it can fetch and issue multiple instructions (up to four in the A55's case) in the same clock cycle, provided they are independent and ready. These instructions still proceed through the pipeline in their program order. It doesn't allow instructions to execute out of order like a true OoOE processor. This design balances performance with efficiency, making it excellent for mobile and embedded applications.

3. The Purpose of "Barriers": Orchestrating Order When It Counts

This is where your confusion likely stems from. If a CPU is already "in-order," why would it need special instructions to enforce order? The key is that "in-order" execution refers to the CPU's internal processing pipeline, but modern systems are far more complex. They involve multiple CPU cores, caches, memory controllers, and external I/O devices, all interacting with each other.

ARM processors use memory and instruction barriers to manage these complex interactions and guarantee correctness. The most common ones are:

DMB (Data Memory Barrier):
- What it does: Ensures that all memory accesses (reads and writes) that occurred before the DMB are completed and visible to other processors or I/O devices before any memory accesses after the DMB can proceed.
- Analogy: Imagine a team collaborating on a document. A DMB is like saying, "Before I start writing my next section, make sure everyone has finished reading and understood the changes I've already made to the previous sections."
- Why it's needed: Crucial for cache coherency in multi-core systems. It ensures that when one core writes data, other cores eventually see that updated data in a predictable manner.
DSB (Data Synchronization Barrier):
- What it does: This is a stronger guarantee. It ensures that all memory accesses and other operations that were initiated before the DSB have fully completed (their side effects are finalized) before any instruction after the DSB can execute.
- Analogy: Building on the document analogy, a DSB is like saying, "Before anyone can proceed to the next document, all the work on the current document, including final saving and distribution, must be completely finished."
- Why it's needed: Essential when the CPU needs to ensure that a system-level change has propagated completely. For example, after updating memory management unit (MMU) page tables or configuring a hardware peripheral, a DSB ensures these changes are settled before the CPU proceeds with instructions that might rely on them.
ISB (Instruction Synchronization Barrier):
- What it does: This instruction effectively flushes the CPU's instruction pipeline. It guarantees that all instructions before the ISB have completed execution and their effects are visible, before any new instructions after the ISB are fetched and executed.
- Analogy: Imagine you've just made significant edits to your recipe. An ISB is like stopping the cooking process, throwing out any partially prepared dishes, and starting fresh from the revised recipe to ensure no old steps interfere.
- Why it's needed: Primarily used after changes that affect the program flow or instruction decoding itself, such as modifying the MMU tables that dictate memory access permissions or when dealing with self-modifying code (though this is rare and discouraged). It ensures the CPU's instruction fetch is synchronized with the updated system state.

4. Reconciling Barriers with In-Order Execution

The "conflict" arises from thinking barriers prevent reordering. Instead, they enforce specific, necessary ordering relationships that are critical for the system's overall correctness.

Multicore Coordination: Even if Cortex-A55 cores execute instructions sequentially internally, they need to communicate and synchronize with each other. DMB ensures that writes from one core become visible to others in a controlled fashion, preventing race conditions.
Hardware Interaction: Processors interact with the outside world via peripherals. DSB ensures that commands sent to hardware complete before the CPU makes decisions based on their completion.
System Stability: Operating systems manage memory, processes, and hardware. Operations like updating page tables (DSB, ISB) must be precisely ordered to prevent the system from crashing or entering an inconsistent state.

In essence, barriers are not about forcing an in-order CPU to become out-of-order; they are about establishing defined synchronization points in a complex, often multi-threaded and hardware-interacting environment, ensuring that the sequence of observable effects (especially memory visibility and hardware state changes) is correct, regardless of the core's internal execution style. For an in-order processor, they help manage the interactions with the outside world and other cores, preventing subtle bugs that arise from assumptions about timing and visibility.

Conclusion:
The Cortex-A55's in-order execution offers efficiency, but the presence of barriers like DMB, DSB, and ISB highlights that CPU execution is just one piece of a larger system puzzle. These barriers are not contradictions; they are essential tools that provide critical ordering guarantees for memory visibility and synchronization, ensuring that even simple in-order processors can reliably operate within complex, concurrent computing environments.

이 블로그 검색

SoC Design