SoC Interconnect Evolution: Why Network-on-Chip Replaced the Crossbar

SoC Interconnect Evolution: Why Network-on-Chip Replaced the Crossbar

The paradigm shift in on-chip communication — from point-to-point crossbars to packet-switched NoC | Data flow, routing algorithms, and practical design considerations

💡 TL;DR: The moment a SoC exceeds roughly 10 IP blocks, a traditional crossbar hits a hard physical wall. NoC (Network-on-Chip) solves this by routing data as packets through a shared mesh, and as of 2026 it is the de facto interconnect standard in mobile APs, AI accelerators, and automotive SoCs.

Crossbar vs. NoC — Why the Paradigm Had to Change

A crossbar (multi-layer interconnect) connects every master and every slave with dedicated physical wires — think of it as laying a private road between every pair of endpoints. That approach is simple and low-latency when the IP count is small, but it runs into two fundamental problems as the design scales.

⚠️ Wiring Congestion

With N IP blocks, routing complexity grows as O(N²). Packing tens of thousands of copper traces on die inflates area dramatically and drives up manufacturing cost at every process node.

⚡ Timing & Power

Longer wires mean larger RC delay. Maintaining target clock frequency requires inserting repeaters throughout the interconnect — each one burning dynamic power and consuming yet more area.

NoC replaces the private-road model with a highway-and-hub system. Data is segmented into packets and forwarded through shared links; intermediate routers determine the best path at each hop. Physical wiring footprint is kept to O(N) while logical connectivity scales freely.

🔄 Crossbar vs. NoC: Structural Comparison

Crossbar (Point-to-Point)

CPU GPU
⟷ ✕ ⟷
DSP DRAM

Full mesh of direct connections → O(N²) wiring

NoC (Packet-Switched)

CPU R R DRAM
GPU R R DSP

Packets forwarded through routers (R) → O(N) wiring

SoC NoC vs. the Internet — Same Packet Switching, Different Constraints

Both SoC NoC and TCP/IP networks are built on packet switching, but their design philosophies diverge sharply. The on-chip environment demands guarantees that the Internet was never designed to provide.

Property 🖥️ SoC NoC 🌍 Internet (TCP/IP)
Transfer unit Flit (Flow Control Unit) Packet (up to 1,500 B MTU)
Latency Single-digit ns Tens to hundreds of ms
Packet loss Never tolerated (lossless) Recovered by retransmission (best-effort)
Buffer depth Tens of flits (gate-count minimized) Megabytes to gigabytes
Flow control Credit-based / On-Off TCP window / congestion control

The defining constraint is lossless delivery. On the Internet, TCP retransmission recovers lost packets transparently. Inside a chip, a dropped transaction means system corruption or a hard crash — there is no recovery layer. SoC NoC therefore enforces losslessness in hardware through credit-based flow control: a sender can only inject a flit when it holds a credit token for the downstream buffer, making overflow structurally impossible.

Flit Structure in Detail

A NoC packet is subdivided into a sequence of flits (Flow Control Units) — the smallest unit tracked by flow control. Each flit type carries a distinct responsibility.

🏷️ Head Flit

Destination address, packet ID, QoS class, transaction type

📦 Body Flit(s)

Actual data payload (read or write data)

🔚 Tail Flit

End-of-packet marker; triggers router resource release

The router allocates a virtual channel when it sees the head flit and holds it until the tail flit arrives, then releases the resource. This tail-flit mechanism is what makes wormhole switching — forwarding a packet before it is fully buffered — safe and efficient in hardware.

A Packet's Journey — Five Stages from Master to Slave

Consider a CPU issuing a read request to DRAM. Here is what happens at each stage as the transaction traverses the NoC.

1

NIU Packetization

The CPU issues a read request on the AXI or CHI bus. The source-side NIU (Network Interface Unit) translates this protocol transaction into a NoC-native flit sequence — this is the boundary between the coherency protocol world and the packet-switched fabric.

2

Routing Decision

When the head flit arrives at the first router, the router reads the destination address field and selects an output port. Under XY routing, the decision reduces to a simple comparison: route along X first, then Y — no lookup table required.

3

Arbitration & Switching

Multiple flits competing for the same output port are serialized by the arbiter (typically round-robin or priority-weighted). Virtual channels (VCs) partition buffer space logically, keeping different traffic classes isolated and breaking potential deadlock cycles without requiring complex deadlock-recovery hardware.

4

Multi-Hop Traversal

The packet advances one hop at a time, repeating the route–arbitrate–switch cycle at each intermediate router. Total latency is roughly proportional to hop count, so topology selection directly determines worst-case end-to-end latency.

5

NIU Depacketization

The destination-side NIU reassembles the incoming flit stream back into an AXI or CHI transaction and presents it to the target IP (e.g., DRAM controller). From the slave's perspective, the NoC is transparent — it sees a standard protocol transaction.

Routing Algorithms — How the Path Is Chosen

The most widely deployed routing strategy in production SoCs is deterministic routing — and within that category, XY routing dominates.

XY Routing

On a 2D mesh topology, XY routing moves a packet first along the X axis until it is aligned with the destination column, then switches to the Y axis. The rule is applied at every router independently with a single coordinate comparison.

📐 XY Routing Example — 4×4 Mesh

S → → → — — —

● — — — — — —

● — — — D — — —

S (Source) → traverse X axis → traverse Y axis → D (Destination)

Why Keep It Simple?

Deadlock-free by construction: XY routing never forms a circular channel dependency, so no deadlock-recovery logic is needed — a meaningful area and verification saving.

In-order delivery: All packets between a given source–destination pair follow the identical path, so arrival order matches issue order. This property significantly simplifies the coherency protocol above the NoC.

Lightweight hardware: The routing function reduces to two subtractions and two comparisons per hop — no routing tables, no path-computation state machine.

💡 What about adaptive routing? Adaptive routing — rerouting packets around congested links — is an active research area, but production SoCs use it sparingly. Adaptive paths can re-order packets and introduce subtle deadlock risks that are hard to formally verify at tape-out. The preferred production strategy is to allocate sufficient bandwidth upfront and use virtual channels to isolate traffic classes, rather than dynamically rerouting under load.

NoC Topology Trends in 2026

The physical arrangement of routers and links — the topology — is chosen to match each SoC's traffic profile and die-size constraints. Here are the topologies appearing in major production chips as of 2026.

Topology Characteristics Example Designs
2D Mesh Excellent scalability, uniform bandwidth distribution AI accelerators, many-core processors
Ring Simple to implement, suited for smaller node counts Intel Core series (Ring Bus)
Hierarchical Local cluster NoC + global backbone NoC Mobile APs (Arm DynamIQ)
Tree / Fat Tree Low hop count; root can become a bottleneck Datacenter chips, network processors
Chiplet Mesh Die-to-die bridging via UCIe or BoW interconnect AMD EPYC, Intel Ponte Vecchio

The standout trend in 2026 is chiplet architecture. As UCIe (Universal Chiplet Interconnect Express) gains industry-wide adoption, connecting multiple dies in a single package with a coherent NoC fabric has become a first-class design requirement. Die-to-die NoC bridge design — keeping latency, power, and protocol transparency acceptable across die boundaries — is now a key competitive differentiator for chip IP companies.

Why NoC Is Non-Negotiable in Modern SoC Design

🧱 IP Reuse

When the floorplan changes, only routers and links need to be repositioned. The NoC acts like a composable backplane — IP blocks plug in without requiring point-to-point rewiring, which meaningfully shortens tapeout schedules.

🔄 GALS Support

GALS (Globally Asynchronous, Locally Synchronous) lets each IP run at its optimal clock frequency. The NIU handles asynchronous domain crossing transparently, enabling per-IP DVFS without forcing a common clock tree.

📈 Scalability

A crossbar tops out around 10–16 master IPs before area and timing become unmanageable. In an era of 100+ core designs and chiplet disaggregation, NoC is the only viable interconnect architecture.

🎯 Crossbar or NoC — Which One Fits?

Crossbar is appropriate when: fewer than ~10 IP blocks, MCU-class design, area and cost minimization are the primary constraints.

NoC is required when: 16 or more IP blocks, heterogeneous core mix, chiplet packaging, or high-bandwidth requirements that a crossbar's O(N²) wiring cannot sustain.

Common NoC Design Pitfalls and How to Avoid Them

NoC design is a discipline where theory and silicon reality diverge quickly. The following are recurring issues seen in production tapeouts — along with the fixes that actually work.

Undersized bandwidth allocation: Sizing the fabric to average traffic means the design passes simulation but stalls under real workloads. Always model burst traffic scenarios — especially the worst-case combination of simultaneous high-bandwidth initiators — before locking down link widths.

Missing QoS assignment: When a CPU and a display controller share the same physical path with no QoS differentiation, the display controller can be starved long enough to cause screen tearing. Assign a high QoS level to every latency-critical IP before integration, not after.

Profile traffic before choosing topology: Analyze the real application's access patterns — read/write ratio, burst length, spatial locality — before committing to a topology. A mesh that works for a neural-network accelerator may be the wrong fit for a baseband modem with asymmetric traffic.

Align topology with power domains: Unused NoC regions can be power-gated to cut idle power dramatically, but only if the router topology was laid out to match power domain boundaries. Plan this at architecture phase — retrofitting power domains onto a finalized NoC layout is painful.

Major NoC IP Vendors (as of 2026)

Vendor Product Key Strengths
Arm CMN (Coherent Mesh Network) Native CHI protocol support; dominant in server and mobile AP platforms
Arteris FlexNoC, Ncore Automated NoC generation tool; strong footprint in automotive and IoT SoCs
Synopsys DesignWare NoC Tight EDA tool integration; optimized for high-performance computing workloads
Cadence Interconnect IP Deep Tensilica DSP integration; specialized for AI/ML inference workloads

The Road Ahead

What started as a simple shared bus thirty years ago has evolved into something as layered and deliberate as a city's traffic infrastructure. Once a SoC crosses the complexity threshold, NoC stops being an option and becomes a survival requirement. With chiplets and UCIe opening the door to NoC fabrics that span die boundaries, the interconnect problem is only getting harder — and more interesting. This is one of the most consequential open problems in semiconductor architecture today.

📎 References

→ Arm AMBA Specification — developer.arm.com/architectures/system-architectures/amba

→ Arteris NoC Technology — arteris.com/noc-technology

→ Synopsys DesignWare NoC — synopsys.com/designware-ip/interconnect-ip.html

This content is written for informational purposes based on publicly available technical sources and practitioner experience. It does not constitute investment advice.

S
SoC Design
Semiconductor & SoC Design Notes

Collecting and organizing resources from a semiconductor and SoC design and verification perspective — reviewed once more before publishing.

Written based on publicly available data and cited sources. Last updated: June 8, 2026

댓글