SHA-2 vs. SHA-3: A Hardware Engineer's Guide to SoC Crypto Engine Design

SHA-2 vs. SHA-3: Hash Algorithm Deep-Dive for SoC Crypto Engine Design

SHA-2 · SHA-3 · SHAKE · HMAC — A PPA Optimization Guide for Hardware Engineers

Key Takeaways — In modern SoC (System-on-Chip) design, a dedicated Crypto Engine is not an optional IP block — it is a hard requirement. SHA-2 relies on an adder-heavy Merkle-Damgård construction and delivers broad legacy compatibility. SHA-3 eliminates adders entirely with a Sponge construction, yielding superior area and power efficiency. HMAC augments either algorithm with a secret key to provide message authentication. This report unpacks the structural differences that RTL designers must understand when sharing and optimizing datapaths across both algorithm families.

Why Every SoC Needs a Hash Engine

Digital signatures, Secure Boot, OTA firmware verification, DRM, and blockchain acceleration — nearly every security function in modern silicon ultimately rests on a cryptographic hash function. A hash function maps an arbitrary-length input to a fixed-length output (a "digital fingerprint") in a one-way, deterministic fashion: identical inputs always produce identical outputs, while distinct inputs produce distinct outputs with overwhelming probability.

Software-only hashing is feasible for low-throughput use cases, but satisfying a milliwatt power budget on an IoT sensor node or a 100 Gbps throughput requirement in a 5G base station demands dedicated hardware acceleration. This report targets SoC design engineers and provides the structural insights needed to make well-grounded PPA (Power, Performance, Area) trade-offs when selecting and implementing hash algorithm IP.

Three Core Security Properties
Pre-image resistance — Given a digest, it is computationally infeasible to recover the original input.
Second pre-image resistance — Given an input, it is computationally infeasible to find a different input that maps to the same digest.
Collision resistance — It is computationally infeasible to find any two distinct inputs that produce the same digest.

SHA-2 — The Industry Workhorse

Background and Standardization

The SHA-2 family was designed by the NSA (National Security Agency) and standardized by NIST as FIPS PUB 180-2 in 2001. Its predecessor, SHA-1, was retired after a theoretical collision attack was published in 2005 and Google's SHAttered project demonstrated a practical collision in 2017. SHA-2, by contrast, remains unbroken by any practical attack more than two decades after its publication — making it the most widely deployed hash standard in industry today. TLS 1.3, IPsec, Bitcoin, and Secure Boot chains all operate on SHA-256 as their cryptographic foundation.

Merkle-Damgård Construction

SHA-2 uses the Merkle-Damgård (MD) construction. The input message is split into fixed-size blocks; each block is fed into a compression function together with the previous block's output (the chaining variable), which becomes the initial value for the next iteration. The full computation breaks down into two stages:

① Message Scheduling — The 512-bit (SHA-256) or 1024-bit (SHA-512) block is split into 32/64-bit words, which are expanded via bitwise rotations and XOR into 64 or 80 schedule words (W0 … W79).

② Compression Function — Eight working registers (a, b, c, d, e, f, g, h) are updated over 64 or 80 rounds using logical operations, bit rotations, and 32/64-bit modular additions. It is these additions that create the primary hardware bottleneck.

SHA-2 Variant Specifications

Variant Word Size Block Size Rounds Digest Length
SHA-224 32-bit 512-bit 64 224-bit
SHA-256 32-bit 512-bit 64 256-bit
SHA-384 64-bit 1024-bit 80 384-bit
SHA-512 64-bit 1024-bit 80 512-bit

Hardware Design Challenges

The dominant bottleneck in SHA-2 RTL is carry propagation delay in the adders. Multiple 32-bit or 64-bit additions appear within a single round, and the critical path through these adders can constrain the entire SoC's timing closure. This matters because a tighter critical path directly trades off with achievable clock frequency and therefore throughput. Standard mitigation techniques are:

Carry-Save Adder (CSA) tree — Compresses multiple partial sums in parallel, deferring carry propagation to a single final CPA (Carry-Propagate Adder), which significantly shortens the critical path.
Sub-pipelining — Splits each round into 2–4 pipeline stages, allowing a higher clock frequency at the cost of increased round latency.
Loop unrolling — Processes multiple rounds per cycle to reduce latency; trades area for throughput.
Message schedule precomputation — Generates Wt values ahead of the compression function so both units run in parallel, hiding schedule latency.

SHA-3 — A Paradigm Shift with the Sponge Construction

Keccak: Winner of the NIST Hash Competition

Recognizing the need for a backup standard with a structurally independent design, NIST announced an open competition in 2007. After five years of public analysis covering 64 candidate algorithms, NIST selected Keccak — designed by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche (STMicroelectronics/NXP) — as the winner in 2012. It was published as FIPS PUB 202 in 2015. Daemen is also a co-designer of AES, lending considerable cryptanalytic credibility to the submission.

How the Sponge Construction Works

The Sponge construction — named for its absorb-then-squeeze behavior — maintains a 1600-bit state (a 5×5 matrix of 64-bit lanes) and operates in two phases:

🔵 Absorbing Phase — The message is split into r-bit (Rate) blocks. Each block is XORed into the top r bits of the state, followed by a 24-round application of the Keccak-f[1600] permutation.

🔵 Squeezing Phaser bits of output are extracted from the state per iteration. If more output is needed, Keccak-f is invoked again — enabling arbitrarily long output (the basis for SHAKE).

The 1600-bit state is partitioned into Rate (r) and Capacity (c), where r + c = 1600. The capacity determines security: the security strength equals c/2 bits. For SHA3-256, c = 512 and r = 1088, yielding a 256-bit security level. Increasing c raises the security margin but reduces throughput — a direct rate-capacity trade-off the designer controls.

The Five Steps of Keccak-f[1600]

Step Function Key Operations
θ (Theta) Column-wise diffusion XOR + bit rotation
ρ (Rho) Lane-wise bit rotation Fixed-offset rotate per lane position
π (Pi) Bit-position permutation Pure wiring (no logic gates)
χ (Chi) Nonlinearity injection (only nonlinear step) AND, NOT, XOR
ι (Iota) Round symmetry breaking XOR with round constant RC

💎 The Hardware Insight — Across all five steps of Keccak-f[1600], addition never appears once. The entire permutation consists of XOR, AND, NOT, fixed-offset bit rotations, and wire permutations. This is the fundamental reason SHA-3 achieves better area and power efficiency than SHA-2 in FPGA and ASIC implementations — carry-chain logic is simply absent.

SHAKE: Extendable Output Functions (XOFs)

SHAKE128 and SHAKE256 are XOF (eXtendable Output Function) variants co-standardized with SHA-3. Unlike SHA-2 and the fixed-length SHA-3 variants, SHAKE allows the caller to specify an arbitrary output length — a property that opens up a broad class of cryptographic applications beyond conventional hashing.

Use Cases
Mask Generation Function (MGF) — Used in RSA-OAEP and RSA-PSS padding schemes.
Deterministic RBG (DRBG) — Produces arbitrarily long pseudorandom output from a seed.
Post-Quantum Cryptography (PQC) — CRYSTALS-Dilithium and CRYSTALS-Kyber (NIST PQC standards) rely on SHAKE128/256 as their core hash primitive.
Key Derivation Function (KDF) — Derives session keys of arbitrary length from a short secret.

HMAC — Keyed Message Authentication

Why a Plain Hash Is Not Enough for Authentication

A bare hash function answers "was this message tampered with?" but not "who sent it?" An attacker who modifies a message can simply recompute the hash and pass verification. HMAC (Keyed-Hash Message Authentication Code) binds a secret key K into the hash computation so that any party without the key cannot forge a valid tag. This provides both integrity and authentication in a single primitive. HMAC-SHA-256 underlies TLS record integrity, JWT signing, and API request authentication across the industry.

HMAC Definition (RFC 2104)

HMAC(K, m) = H( (K ⊕ opad) ‖ H( (K ⊕ ipad) ‖ m ) )

ipad = 0x36 repeated to fill one block — inner padding constant
opad = 0x5C repeated to fill one block — outer padding constant
• If K is shorter than the block size, it is zero-padded; if longer, it is replaced by H(K)
• The underlying hash function is invoked twice — once for the inner hash, once for the outer hash

SoC Implementation Considerations

Hash core reuse — HMAC requires no separate datapath; it is implemented as an FSM wrapper that invokes the existing SHA-2/SHA-3 engine twice.
Inner/outer state caching — When authenticating multiple messages under the same key, caching the mid-states H(K ⊕ ipad) and H(K ⊕ opad) roughly doubles throughput by avoiding redundant padding-block processing.
Key storage isolation — Store HMAC keys in an ARM TrustZone Secure World or a dedicated Key Vault (e.g., Samsung Knox Vault, Apple Secure Enclave) to prevent key leakage from the normal world.
Side-channel countermeasures — Include masking and shuffling logic to defend against DPA (Differential Power Analysis) attacks targeting key bits.
SHA-3 special case — Because of the Sponge construction's capacity isolation, SHA-3 is structurally resistant to length-extension attacks; a keyed MAC can be computed safely as H(K ‖ m) without the HMAC double-wrap. This is formalized in the KMAC standard (NIST SP 800-185).

SHA-2 vs. SHA-3: Head-to-Head Comparison

Structural and Characteristic Comparison

Attribute SHA-2 SHA-3
Construction Merkle-Damgård Sponge Construction
Core Operations Addition (+), XOR, AND, OR, NOT, Rotate XOR, AND, NOT, Rotate, Permutation
Internal State 256–512 bits 1600 bits
Length-Extension Attack Vulnerable (requires HMAC to mitigate) Structurally immune
H/W Area Efficiency Moderate (adder overhead) Excellent
S/W Performance (CPU) Fast (SHA-NI and similar ISA extensions) Slower in software (improving with AVX-512)
Variable Output Length Not supported Supported via SHAKE (XOF)
Industry Adoption Ubiquitous (TLS, Bitcoin, Secure Boot) Growing (PQC, Ethereum)

Hardware Resource Efficiency (~1 Gbps Target, Same Process Node)

SHA-256 Area
~22k GE
SHA3-256 Area
~14k GE
SHA-256 Power
1.0×
SHA3-256 Power
~0.67×

† Figures are representative values drawn from published ASIC implementation papers (IEEE/IACR). Actual results vary with process node, microarchitecture, and frequency target. GE = Gate Equivalent.

Technology Trends Shaping Hash Engine Design

Hash Algorithms in the Post-Quantum Era

Grover's algorithm reduces the cost of a pre-image search from O(2n) to O(2n/2) on a quantum computer, effectively halving the security strength of any hash function. SHA-256, for instance, drops to a 128-bit effective security level in a quantum-capable threat model. For post-quantum resilience, NIST recommends a minimum of 256-bit security strength, pointing designers toward SHA-384, SHA-512, SHA3-384, SHA3-512, or SHAKE256. Notably, NIST's PQC standardized algorithms CRYSTALS-Dilithium and CRYSTALS-Kyber use SHAKE128/256 as their primary hash primitive throughout.

High-Throughput Parallel Hash: KangarooTwelve and ParallelHash

Data centers, storage systems, and blockchain accelerators that demand tens of Gbps of hash throughput have spurred interest in Keccak-derived parallel constructions. KangarooTwelve (K12) reduces Keccak rounds from 24 to 12 and processes messages using a tree structure, delivering 4–6× higher throughput than SHA-3. NIST SP 800-185 formally standardizes ParallelHash128/256 for similar use cases.

Agile Crypto Engine Architecture

Modern high-end SoCs — including Apple A-series, Qualcomm Snapdragon 8 Gen 4, Samsung Exynos 2500, and NXP i.MX 9 — integrate an Agile Crypto Engine that supports both SHA-2 and SHA-3. Because their datapaths are fundamentally different, full datapath sharing is not possible. However, the following infrastructure elements can be shared across both algorithms:

✓ Common AXI/AHB bus interface and DMA engine
✓ Message padding logic (only the pad constants differ between algorithms)
✓ FIFO and message schedule buffers
✓ Interrupt controller, register bank, and control FSM
✓ HMAC wrapper logic (SHA-2 path only; SHA-3 uses KMAC)
✓ Side-channel countermeasures (clock randomization, power gating)

Side-Channel and Fault-Injection Defense

A cryptographic IP block that is mathematically sound can still leak key material through physical side channels. The major threat classes and corresponding countermeasures are:

DPA (Differential Power Analysis) — Mitigated with Boolean/arithmetic masking and random delay insertion to decorrelate power traces from key-dependent operations.
Electromagnetic (EM) Attack — Countered with power-supply noise injection and metal shielding layers in the physical layout.
Fault Injection — Detected by running a duplicate computation and comparing results; round counters must also be protected against glitching.
Timing Attack — Eliminated through constant-time implementation — all code paths must execute in the same number of cycles regardless of secret-dependent branching.

Design Decision Guide by Use Case

Scenario-Based Recommendations

IoT Sensor Node (area and power constrained) — SHA3-256 only. Eliminating the adder tree reduces area by 30–40% compared to SHA-256.

Mobile AP (TLS and Secure Boot compatibility required) — SHA-256/384 + SHA3-256 agile engine. Maintains compatibility with existing software stacks while enabling the SHA-3 path for new workloads.

Automotive / Avionics (ISO 21434, DO-326A) — SHA-384/512 + HMAC; side-channel and fault-injection countermeasures are mandatory; redundant (lockstep) implementation required.

Data Center AI Accelerator — SHA-512 + KangarooTwelve; targets tens of Gbps throughput via multiple parallel instances.

New SoC Targeting PQC Readiness (2026+) — SHAKE128/256 mandatory; integrated support for Dilithium and Kyber; retain SHA-2 for backward compatibility.

Engineering Principles and the Road Ahead

Hash algorithm selection for a SoC Crypto Engine is not a single choice — it is a portfolio decision. SHA-2 remains mandatory for broad industry compatibility and legacy interoperability. SHA-3/SHAKE is strongly recommended wherever power efficiency, structural security (length-extension immunity), or variable-length output is required. The overall direction is clear:

🧠 Five Engineering Principles

Hybrid design is the new normal — Integrate both SHA-2 and SHA-3; maximize shared infrastructure to minimize area overhead.

256-bit security is the floor — SHA-224 and SHA3-224 should be excluded from new designs that must account for quantum adversaries.

The HMAC scheduler determines system performance — Caching inner/outer mid-states and pipelining the two hash invocations are the primary levers for HMAC throughput.

Side-channel defense is non-negotiable — Automotive, financial, and medical SoCs routinely require CC EAL4+/EAL5+ certification, which mandates physical attack countermeasures.

PQC preparation is already present tense — SHAKE integration and co-design with Dilithium/Kyber accelerators should be considered now, ahead of anticipated 2030s commercial quantum systems.

For HMAC in particular, the latency overhead of two sequential hash invocations directly bounds achievable authentication throughput. A well-architected HMAC scheduler — featuring mid-state caching, parallel inner/outer padding processing, and tuned pipeline depth — is what separates a competitive crypto engine from a mediocre one. Looking further ahead, the SoC of the future is not simply a faster single-algorithm engine but a full cryptographic platform: a flexible substrate that can compose hash, signature, and encryption primitives on demand. Engineers must now treat crypto agility — the ability to swap or upgrade cryptographic primitives without re-spinning the hardware — as a first-class design dimension alongside the traditional PPA metrics.

References

NIST FIPS 180-4 — Secure Hash Standard (SHA-2 Family)
NIST FIPS 202 — SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions
NIST SP 800-185 — SHA-3 Derived Functions: cSHAKE, KMAC, TupleHash, ParallelHash
RFC 2104 — HMAC: Keyed-Hashing for Message Authentication
Keccak Team — keccak.team (official site of Bertoni, Daemen, Peeters, Van Assche)
NIST PQC Standardization — Final standard documents for CRYSTALS-Dilithium and CRYSTALS-Kyber

This report is a technical reference for semiconductor and security engineers. Actual implementations require additional verification against the target process node, EDA toolchain, and applicable security certification requirements. Verify relevant patents and export control regulations before commercial deployment.

S
SoC Design
Semiconductor & SoC Design Notes

Collecting and organizing materials from a semiconductor and SoC design/verification perspective, with a review before every post.

Written based on publicly available data and referenced sources. Last updated: June 8, 2026

댓글