AES (Advanced Encryption Standard): A Complete Technical Reference

From GF(2⁸) arithmetic and four-step round functions to SoC RTL architecture decisions — everything an IC engineer needs to design a production-grade AES accelerator

AES has served as the de facto single standard for digital security for roughly 25 years — from the late 1990s to the present. TLS, VPN, full-disk encryption, mobile SE/TEE, and blockchain infrastructure all rely on AES as their core confidentiality primitive. For SoC designers, building a cryptographic IP block and building an AES accelerator are practically synonymous.

The Fall of DES and the Rise of Rijndael

To understand why AES exists, you need to understand what broke before it. DES (Data Encryption Standard) — adopted as FIPS 46 in 1977 — carried a 56-bit key, a constraint imposed partly by export-control concerns. By the late 1990s, that margin had evaporated: brute-force hardware demonstrated a full DES key search in under 24 hours, forcing NIST to launch a public competition for a successor. This matters because the process itself set a precedent — rather than designating an algorithm behind closed doors, NIST opened a five-year global evaluation where any team could submit, attack, or analyze candidates.

1977

DES Standardized

1997

NIST Competition Opens

2000

Rijndael Selected

2001

FIPS 197 Published

2026

25 Years as Standard

Belgian cryptographers Joan Daemen and Vincent Rijmen submitted the Rijndael algorithm, which survived the full five-year open analysis process and was selected in 2000. NIST published it as FIPS PUB 197 in November 2001. The fact that the complete algorithm has been publicly known for 25 years — and that no practical cryptanalytic attack against the full cipher exists — makes AES the clearest illustration of the modern cryptographic principle: "open peer review produces stronger security than secrecy."

Core Concepts: Symmetric Key, Block Cipher, and Rounds

AES rests on five foundational concepts. The most critical property to internalize first: the block size is always fixed at 128 bits regardless of key length. Only the round count varies with key size. This clean separation simplifies hardware datapath design — the core computation is always 128-bit wide.

Concept	Definition	In AES
Symmetric key	Same key for encryption and decryption	Sender and receiver share one key
Block cipher	Processes fixed-size data chunks	Always 128-bit blocks
Key length	Number of bits in the secret key	128 / 192 / 256 bits
Round	One iteration of the cipher's operations	10 / 12 / 14 rounds
State	In-flight data being transformed	4×4 byte matrix (16 B)

AES Variants: Key Length vs. Round Count Trade-off

AES comes in three variants keyed by key length. Longer keys increase security, but the trade-off is strictly linear: more rounds = proportionally more area, latency, and power. In a tight-budget IoT design, choosing AES-256 over AES-128 increases gate count and power by roughly 40%, so the choice is never a default.

AES-128

10 rounds

AES-192

12 rounds

AES-256

14 rounds

Variant	Key Length	Key Schedule	Typical Use
AES-128	128 bit	44 words	Mobile, general-purpose commercial
AES-192	192 bit	52 words	Enterprise and government classified
AES-256	256 bit	60 words	Military, national security, quantum-resistant

Algorithm Internals: The Four Round Operations

Every AES operation is defined over the finite field GF(2⁸) — a Galois Field of 256 elements with reduction polynomial x⁸ + x⁴ + x³ + x + 1. The 16-byte plaintext is arranged as a 4×4 state matrix, and the following four operations iterate 10–14 times. These operations were chosen jointly to satisfy the wide trail strategy defined by Daemen and Rijmen: any differential characteristic covering the full cipher has exponentially low probability, making differential and linear cryptanalysis infeasible.

SubBytes — A 256-entry S-Box applies a byte-for-byte substitution. Each input byte is first inverted in GF(2⁸) (the multiplicative inverse), then passed through a fixed affine transform. This step provides all the nonlinearity in AES — without it, every transformation would be linear over GF(2) and the cipher would be trivially broken by linear or differential cryptanalysis. In hardware, the S-Box can be implemented as a ROM lookup (fast, but costs 256 bytes per instance) or as combinational composite-field logic over GF(2⁴)² (compact, preferred for area-critical designs).

ShiftRows — The four rows of the 4×4 state matrix are left-rotated by 0, 1, 2, and 3 bytes respectively. This repositions bytes across columns so that after MixColumns, each output column depends on bytes from all four original columns — inter-column diffusion would not occur without this transposition step.

MixColumns — Each 4-byte column is multiplied by a fixed 4×4 MDS (maximum distance separable) matrix over GF(2⁸). The matrix guarantees that any 1-byte input change affects all 4 output bytes, providing the bulk of the diffusion. MixColumns is omitted in the final round to preserve the structural symmetry that allows the inverse cipher to reuse the same key schedule.

AddRoundKey — A bitwise XOR of the current state with a 128-bit round subkey derived from the key schedule. This is the only step that involves the secret key. It is executed N+1 times total: once as an initial whitening step before round 1, then once at the end of each of the N full rounds.

Operating Modes: A Security Decision as Critical as the Cipher

The AES core processes exactly one 128-bit block. To handle arbitrary-length data securely, an operating mode defines how consecutive blocks chain together. Mode selection often carries a larger security impact than the underlying cipher — ECB and CBC have both produced serious real-world vulnerabilities despite using perfectly correct AES implementations.

Mode	Key Property	Parallel	Current Status
ECB	Each block encrypted independently	Yes	Deprecated
CBC	XOR previous ciphertext, then encrypt	No	Declining
CFB / OFB	Operates as a stream cipher	No	Legacy
CTR	Counter + AES core → keystream	Yes	Widely used
GCM	CTR + GHASH authentication (AEAD)	Yes	De facto standard (TLS 1.3)
XTS	Dedicated to disk/storage encryption	Yes	Storage standard

"High-speed GCM requires a large 128-bit GF(2¹²⁸) multiplier" — NIST SP 800-38D. A GCM accelerator needs more than just an AES core: a dedicated GHASH unit (GF(2¹²⁸) multiplier) is required alongside it. Without hardware GHASH, the authentication tag computation becomes the throughput bottleneck — not the AES core itself. This co-design requirement is a key driver of SoC area budgets for cryptographic blocks.

SoC RTL Design: Area, Throughput, and Power Trade-offs

When implementing an AES accelerator in RTL, the first architectural decision is the architecture class. For the same algorithm, varying the datapath width and pipeline depth can change gate count by more than 100×. There is no universally correct answer — the right choice depends entirely on the target's throughput requirement, area budget, and power envelope.

Architecture	Gate Count (GE)	Throughput	Target Domain
Fully Pipelined	100k–500k	>100 Gbps	Datacenter, NVMe SSD
Iterative	5k–20k	100–500 Mbps	General-purpose SoC, security IP
Serialized (8-bit)	<3k	Tens of Mbps	IoT, smartcard

Key RTL Design Decisions

① S-Box implementation — LUT (ROM-based) is fast but costs 256 bytes of storage per instance; four parallel instances for a 128-bit datapath add up quickly. Composite field logic — decomposing GF(2⁸) inversion into operations over GF(2⁴)² — is the standard area-reduction technique and is recommended in NIST's own implementation guidance. The trade-off: combinational depth increases, which may constrain achievable clock frequency.

② Key expansion (key schedule) — On-the-fly generation computes each round key from the previous one during operation, saving the SRAM needed for all 11/13/15 round keys, but introduces a latency cost on the very first block. Pre-computed storage eliminates first-block latency at the cost of SRAM area equal to (Nr+1) × 128 bits.

③ Datapath width — 128-bit full-width (one round per clock), 32-bit (four cycles per round), or 8-bit serial (16 cycles per round, ultra-low area). Each halving of the datapath width roughly halves area and power while quadrupling per-block latency.

④ Mode-dependent parallelism constraints — CTR, GCM, and ECB have no inter-block data dependency, so a pipelined architecture sustains one block per clock cycle. CBC and CFB require the previous ciphertext before the next block can begin; a pipelined datapath is wasteful here — an iterative round-reuse architecture is more efficient.

Side-Channel Attack (SCA) Countermeasures

Commercial SoCs targeting automotive, payment, or government certification face a critical threat class: even a perfectly correct AES implementation can leak the key through power analysis, EM emanation, or timing side channels. The algorithm cannot be broken, but the physical instantiation can be. OpenTitan is a well-documented open-source example of SCA hardening applied at the RTL level.

Masking — Splits plaintext and key into random shares so that the power consumption of any single share is statistically independent of the secret value, defeating DPA (differential power analysis). First-, second-, and higher-order masking schemes offer progressively stronger protection at progressively higher area cost. DOM (domain-oriented masking) is a common hardware-friendly variant; a masked S-Box effectively doubles or triples the S-Box gate count.

Constant-time execution — All datapath operations run in a fixed, input-independent cycle count, eliminating timing side channels. This is a design constraint enforced through RTL coding guidelines and confirmed via formal analysis — not something an EDA tool guarantees automatically.

Key path isolation — Dedicated routing and register banks ensure raw key material never propagates to external buses, debug interfaces, or scan chains. Many secure-element designs use hardware key slots where the key value can be loaded and used but not read back.

Random delay insertion — Dummy rounds or stall cycles are injected at pseudorandom intervals to desynchronize power traces across repeated measurements, frustrating correlation-based attack alignment.

Open-Source AES RTL Cores: Baseline Comparison

When integrating an AES IP block into an SoC, starting from a well-characterized open-source baseline is standard practice. Select a baseline matched to your target domain, then layer in differentiating RTL — additional mode support, SCA countermeasures, or bus interface adapters — on top.

Core	Maintainer & License	Strengths	AES-128 Latency
OpenTitan AES	lowRISC / Google — Apache 2.0	SCA hardened (masking, DOM), formally verified security properties	Tens to hundreds of cycles
SecWorks AES	J. Strömbergson — BSD-style	High throughput, parameterized design, clean RTL coding style	11–44 cycles
TinyAES	OpenCores — LGPL	Minimal area footprint, iterative architecture	~160 cycles
NIST Reference	Public domain	Algorithmic correctness validation, golden-model reference	Variable

Practical recommendation — High-assurance ASICs (automotive, payment, government): start from OpenTitan. High-throughput networking or storage: start from SecWorks. Area-constrained IoT: start from TinyAES. In all three cases, drive the DUT with NIST Reference test vectors from day one of RTL simulation and diff the output before committing to any microarchitectural change.

What to Avoid: Deprecated Primitives

Cryptographic standards accumulate technical debt as weaknesses emerge. In new SoC designs, the following items should be disabled by default or retained only in a legacy-compatibility mode, never present on any security-critical path.

DES / 3DES — NIST recommended against any new use of 3DES in 2017 and classified it as fully disallowed from 2023 onward. Any SoC shipping with DES/3DES enabled on a security path is a compliance liability from the moment of tape-out.

AES-ECB mode — ECB's deterministic mapping (identical plaintext blocks produce identical ciphertext blocks) is structurally broken for any message longer than one block. The "ECB penguin" — the visually recognizable Linux Tux image encrypted in ECB — is the canonical demonstration. Effectively eliminated from all standard security protocols.

Paired use of MD5 / SHA-1 as MAC — Hash functions that historically accompanied AES for message authentication are now deprecated. The shift to AEAD modes (GCM, CCM) eliminates the need for a separate MAC computation entirely, removing an entire attack surface.

CBC on new protocol designs — Padding-oracle attacks (POODLE, Lucky Thirteen) and the need for a separate MAC layer have pushed new protocol designs toward AES-GCM and ChaCha20-Poly1305. CBC is not broken in the way ECB is, but AEAD is strictly better — use GCM unless there is a hard compatibility requirement.

Where the Field Is Heading: Five Converging Trends

The core AES algorithm has been stable for 25 years. What continues to evolve rapidly is everything around it: operating mode selection, SCA hardening requirements, key-length policy, and system-level isolation architecture. Next-generation SoC security IP is converging on the following five directions.

AES-GCM Consolidation

95%

AES-256 as Default

78%

PQC Hybrid Integration

55%

Masked AES Standardization

70%

Secure Enclave / TEE

88%

* Estimated adoption rates in new designs across major SoC IP vendors

① AES-GCM consolidation — GCM is now the primary or sole cipher suite in TLS 1.3, IPsec, and NVMe SED specifications. A bundled GHASH unit is effectively mandatory in any SoC AES accelerator; shipping an AES-only core forces software to compute GHASH on the general CPU, causing a severe throughput regression at the protocol boundary.

② AES-256 as default — Grover's algorithm provides a quadratic speedup for brute-force search on a quantum computer, halving the effective security level of any symmetric key. AES-128 degrades to roughly 64-bit equivalent post-quantum security — a margin that most threat models no longer accept for data with a multi-decade confidentiality requirement. AES-256 retains ~128-bit post-quantum security.

③ PQC hybrid integration — The emerging architecture separates responsibilities: key establishment uses a quantum-resistant KEM (e.g., ML-KEM/Kyber, now NIST FIPS 203), while bulk data encryption uses AES-256-GCM. This hybrid KEM + AEAD structure provides quantum resistance without sacrificing the performance of a battle-tested symmetric cipher.

④ Secure Enclave / TEE isolation — Architectures where raw key material exists only inside a hardware security boundary — ARM TrustZone, RISC-V Keystone, Apple SEP — are becoming the baseline design expectation. The key is never visible to normal-world software; it is loaded into a hardware key slot and accessed only through a controlled API.

⑤ Masked AES at silicon level — Automotive (ISO/SAE 21434), payment (EMVCo), and government (FIPS 140-3) certifications increasingly require SCA-hardened RTL — masked S-Box, constant-time execution, and often formal side-channel analysis — as a non-negotiable certification gate, driving masked AES from a niche capability to a default design requirement.

SoC Designer Decision Checklist

An RTL engineer's task is not merely implementing FIPS 197 — it is to optimize the three-dimensional curve of area, throughput, and security assurance against the target SoC's threat model. Lock down the following items before committing to an IP architecture.

✓ Confirm target domain — IoT (area-first) / mobile (balanced) / datacenter (throughput-first) / secure SoC (SCA-hardened)

✓ Define mode coverage — Minimum: ECB + CBC + CTR; recommended: CTR + GCM; add XTS for storage targets

✓ Key-length support — AES-128 and AES-256 minimum; include AES-192 when the specification permits it

✓ S-Box implementation — Composite field (GF(2⁴)²) for area-constrained designs; LUT for speed-critical paths where the area budget allows

✓ Key schedule policy — On-the-fly generation for area-constrained targets; pre-computed storage for throughput-critical designs

✓ SCA countermeasures — Masking and constant-time execution are required for any commercial security certification; confirm requirements with your certification body early

✓ Verification flow — NIST FIPS 197 test vectors + CAVP compliance + UVM-based per-mode sequence validation from the first RTL milestone

✓ Open-source baseline selection — SecWorks (throughput), OpenTitan (security assurance), or TinyAES (minimal area) as baseline; layer differentiating design on top

References

▶ NIST FIPS 197 (AES Standard)

▶ NIST SP 800-38D (GCM Mode)

▶ OpenTitan AES IP Spec

▶ SecWorks AES GitHub

Disclaimer — This article is provided for general informational purposes based on publicly available standards documents and open-source RTL analysis. It does not constitute a recommendation to adopt any specific product. Actual SoC IP integration, certification, and security validation must undergo review by qualified security engineers and accredited certification bodies.

SoC Design

Semiconductor & SoC Design Notes

Engineering notes on semiconductor and SoC design, curated from a verification perspective and reviewed before publication.

Blog

Based on publicly available data and primary sources. Last updated: June 8, 2026.

이 블로그 검색