SoC Interconnect Evolution: Why Network-on-Chip Replaced the Crossbar
SoC Interconnect Evolution: Why Network-on-Chip Replaced the Crossbar
The paradigm shift in on-chip communication — from point-to-point crossbars to packet-switched NoC | Data flow, routing algorithms, and practical design considerations
💡 TL;DR: The moment a SoC exceeds roughly 10 IP blocks, a traditional crossbar hits a hard physical wall. NoC (Network-on-Chip) solves this by routing data as packets through a shared mesh, and as of 2026 it is the de facto interconnect standard in mobile APs, AI accelerators, and automotive SoCs.
Crossbar vs. NoC — Why the Paradigm Had to Change
A crossbar (multi-layer interconnect) connects every master and every slave with dedicated physical wires — think of it as laying a private road between every pair of endpoints. That approach is simple and low-latency when the IP count is small, but it runs into two fundamental problems as the design scales.
⚠️ Wiring Congestion
With N IP blocks, routing complexity grows as O(N²). Packing tens of thousands of copper traces on die inflates area dramatically and drives up manufacturing cost at every process node.
⚡ Timing & Power
Longer wires mean larger RC delay. Maintaining target clock frequency requires inserting repeaters throughout the interconnect — each one burning dynamic power and consuming yet more area.
NoC replaces the private-road model with a highway-and-hub system. Data is segmented into packets and forwarded through shared links; intermediate routers determine the best path at each hop. Physical wiring footprint is kept to O(N) while logical connectivity scales freely.
🔄 Crossbar vs. NoC: Structural Comparison
Crossbar (Point-to-Point)
Full mesh of direct connections → O(N²) wiring
NoC (Packet-Switched)
Packets forwarded through routers (R) → O(N) wiring
SoC NoC vs. the Internet — Same Packet Switching, Different Constraints
Both SoC NoC and TCP/IP networks are built on packet switching, but their design philosophies diverge sharply. The on-chip environment demands guarantees that the Internet was never designed to provide.
| Property | 🖥️ SoC NoC | 🌍 Internet (TCP/IP) |
|---|---|---|
| Transfer unit | Flit (Flow Control Unit) | Packet (up to 1,500 B MTU) |
| Latency | Single-digit ns | Tens to hundreds of ms |
| Packet loss | Never tolerated (lossless) | Recovered by retransmission (best-effort) |
| Buffer depth | Tens of flits (gate-count minimized) | Megabytes to gigabytes |
| Flow control | Credit-based / On-Off | TCP window / congestion control |
The defining constraint is lossless delivery. On the Internet, TCP retransmission recovers lost packets transparently. Inside a chip, a dropped transaction means system corruption or a hard crash — there is no recovery layer. SoC NoC therefore enforces losslessness in hardware through credit-based flow control: a sender can only inject a flit when it holds a credit token for the downstream buffer, making overflow structurally impossible.
Flit Structure in Detail
A NoC packet is subdivided into a sequence of flits (Flow Control Units) — the smallest unit tracked by flow control. Each flit type carries a distinct responsibility.
🏷️ Head Flit
Destination address, packet ID, QoS class, transaction type
📦 Body Flit(s)
Actual data payload (read or write data)
🔚 Tail Flit
End-of-packet marker; triggers router resource release
The router allocates a virtual channel when it sees the head flit and holds it until the tail flit arrives, then releases the resource. This tail-flit mechanism is what makes wormhole switching — forwarding a packet before it is fully buffered — safe and efficient in hardware.
A Packet's Journey — Five Stages from Master to Slave
Consider a CPU issuing a read request to DRAM. Here is what happens at each stage as the transaction traverses the NoC.
NIU Packetization
The CPU issues a read request on the AXI or CHI bus. The source-side NIU (Network Interface Unit) translates this protocol transaction into a NoC-native flit sequence — this is the boundary between the coherency protocol world and the packet-switched fabric.
Routing Decision
When the head flit arrives at the first router, the router reads the destination address field and selects an output port. Under XY routing, the decision reduces to a simple comparison: route along X first, then Y — no lookup table required.
Arbitration & Switching
Multiple flits competing for the same output port are serialized by the arbiter (typically round-robin or priority-weighted). Virtual channels (VCs) partition buffer space logically, keeping different traffic classes isolated and breaking potential deadlock cycles without requiring complex deadlock-recovery hardware.
Multi-Hop Traversal
The packet advances one hop at a time, repeating the route–arbitrate–switch cycle at each intermediate router. Total latency is roughly proportional to hop count, so topology selection directly determines worst-case end-to-end latency.
NIU Depacketization
The destination-side NIU reassembles the incoming flit stream back into an AXI or CHI transaction and presents it to the target IP (e.g., DRAM controller). From the slave's perspective, the NoC is transparent — it sees a standard protocol transaction.
Routing Algorithms — How the Path Is Chosen
The most widely deployed routing strategy in production SoCs is deterministic routing — and within that category, XY routing dominates.
XY Routing
On a 2D mesh topology, XY routing moves a packet first along the X axis until it is aligned with the destination column, then switches to the Y axis. The rule is applied at every router independently with a single coordinate comparison.
📐 XY Routing Example — 4×4 Mesh
S → → → ● — — — ●
● — — — ↓ — — — ●
● — — — D — — — ●
S (Source) → traverse X axis → traverse Y axis → D (Destination)
Why Keep It Simple?
▶ Deadlock-free by construction: XY routing never forms a circular channel dependency, so no deadlock-recovery logic is needed — a meaningful area and verification saving.
▶ In-order delivery: All packets between a given source–destination pair follow the identical path, so arrival order matches issue order. This property significantly simplifies the coherency protocol above the NoC.
▶ Lightweight hardware: The routing function reduces to two subtractions and two comparisons per hop — no routing tables, no path-computation state machine.
💡 What about adaptive routing? Adaptive routing — rerouting packets around congested links — is an active research area, but production SoCs use it sparingly. Adaptive paths can re-order packets and introduce subtle deadlock risks that are hard to formally verify at tape-out. The preferred production strategy is to allocate sufficient bandwidth upfront and use virtual channels to isolate traffic classes, rather than dynamically rerouting under load.
NoC Topology Trends in 2026
The physical arrangement of routers and links — the topology — is chosen to match each SoC's traffic profile and die-size constraints. Here are the topologies appearing in major production chips as of 2026.
| Topology | Characteristics | Example Designs |
|---|---|---|
| 2D Mesh | Excellent scalability, uniform bandwidth distribution | AI accelerators, many-core processors |
| Ring | Simple to implement, suited for smaller node counts | Intel Core series (Ring Bus) |
| Hierarchical | Local cluster NoC + global backbone NoC | Mobile APs (Arm DynamIQ) |
| Tree / Fat Tree | Low hop count; root can become a bottleneck | Datacenter chips, network processors |
| Chiplet Mesh | Die-to-die bridging via UCIe or BoW interconnect | AMD EPYC, Intel Ponte Vecchio |
The standout trend in 2026 is chiplet architecture. As UCIe (Universal Chiplet Interconnect Express) gains industry-wide adoption, connecting multiple dies in a single package with a coherent NoC fabric has become a first-class design requirement. Die-to-die NoC bridge design — keeping latency, power, and protocol transparency acceptable across die boundaries — is now a key competitive differentiator for chip IP companies.
Why NoC Is Non-Negotiable in Modern SoC Design
🧱 IP Reuse
When the floorplan changes, only routers and links need to be repositioned. The NoC acts like a composable backplane — IP blocks plug in without requiring point-to-point rewiring, which meaningfully shortens tapeout schedules.
🔄 GALS Support
GALS (Globally Asynchronous, Locally Synchronous) lets each IP run at its optimal clock frequency. The NIU handles asynchronous domain crossing transparently, enabling per-IP DVFS without forcing a common clock tree.
📈 Scalability
A crossbar tops out around 10–16 master IPs before area and timing become unmanageable. In an era of 100+ core designs and chiplet disaggregation, NoC is the only viable interconnect architecture.
🎯 Crossbar or NoC — Which One Fits?
▶ Crossbar is appropriate when: fewer than ~10 IP blocks, MCU-class design, area and cost minimization are the primary constraints.
▶ NoC is required when: 16 or more IP blocks, heterogeneous core mix, chiplet packaging, or high-bandwidth requirements that a crossbar's O(N²) wiring cannot sustain.
Common NoC Design Pitfalls and How to Avoid Them
NoC design is a discipline where theory and silicon reality diverge quickly. The following are recurring issues seen in production tapeouts — along with the fixes that actually work.
❌ Undersized bandwidth allocation: Sizing the fabric to average traffic means the design passes simulation but stalls under real workloads. Always model burst traffic scenarios — especially the worst-case combination of simultaneous high-bandwidth initiators — before locking down link widths.
❌ Missing QoS assignment: When a CPU and a display controller share the same physical path with no QoS differentiation, the display controller can be starved long enough to cause screen tearing. Assign a high QoS level to every latency-critical IP before integration, not after.
✅ Profile traffic before choosing topology: Analyze the real application's access patterns — read/write ratio, burst length, spatial locality — before committing to a topology. A mesh that works for a neural-network accelerator may be the wrong fit for a baseband modem with asymmetric traffic.
✅ Align topology with power domains: Unused NoC regions can be power-gated to cut idle power dramatically, but only if the router topology was laid out to match power domain boundaries. Plan this at architecture phase — retrofitting power domains onto a finalized NoC layout is painful.
Major NoC IP Vendors (as of 2026)
| Vendor | Product | Key Strengths |
|---|---|---|
| Arm | CMN (Coherent Mesh Network) | Native CHI protocol support; dominant in server and mobile AP platforms |
| Arteris | FlexNoC, Ncore | Automated NoC generation tool; strong footprint in automotive and IoT SoCs |
| Synopsys | DesignWare NoC | Tight EDA tool integration; optimized for high-performance computing workloads |
| Cadence | Interconnect IP | Deep Tensilica DSP integration; specialized for AI/ML inference workloads |
The Road Ahead
What started as a simple shared bus thirty years ago has evolved into something as layered and deliberate as a city's traffic infrastructure. Once a SoC crosses the complexity threshold, NoC stops being an option and becomes a survival requirement. With chiplets and UCIe opening the door to NoC fabrics that span die boundaries, the interconnect problem is only getting harder — and more interesting. This is one of the most consequential open problems in semiconductor architecture today.
📎 References
→ Arm AMBA Specification — developer.arm.com/architectures/system-architectures/amba
→ Arteris NoC Technology — arteris.com/noc-technology
→ Synopsys DesignWare NoC — synopsys.com/designware-ip/interconnect-ip.html
This content is written for informational purposes based on publicly available technical sources and practitioner experience. It does not constitute investment advice.
Collecting and organizing resources from a semiconductor and SoC design and verification perspective — reviewed once more before publishing.
Written based on publicly available data and cited sources. Last updated: June 8, 2026
댓글
댓글 쓰기