๐Ÿ–ฅ๏ธ

4-Way Set Associative
VIPT Instruction Cache

for RISC-V Processor Architecture

A comprehensive guide from first principles to full implementation โ€” covering cache theory, address translation, hardware design, and RISC-V integration.

RISC-V ISA Cache Architecture VIPT Design Set Associative Computer Architecture
May 2026  |  From Basics to Advanced

Table of Contents

  1. Part I โ€” Foundations
  2. Chapter 1: Why Caches Exist โ€” The Memory Hierarchy Problem
  3. 1.1 The CPUโ€“Memory Speed Gap
  4. 1.2 Locality of Reference
  5. 1.3 The Memory Hierarchy Pyramid
  6. Chapter 2: Cache Fundamentals โ€” Terminology & Anatomy
  7. 2.1 Cache Lines, Sets, and Ways
  8. 2.2 Address Decomposition: Tag, Index, Offset
  9. 2.3 Valid Bits, Dirty Bits, Metadata
  10. Chapter 3: Cache Organization Types
  11. 3.1 Direct-Mapped Cache
  12. 3.2 Fully Associative Cache
  13. 3.3 Set Associative Cache (N-Way)
  14. 3.4 4-Way Set Associative โ€” Deep Dive
  15. Part II โ€” Virtual vs Physical Addressing
  16. Chapter 4: Virtual Memory & Address Translation
  17. 4.1 Virtual Addresses vs Physical Addresses
  18. 4.2 Page Tables and the TLB
  19. Chapter 5: Cache Indexing Strategies
  20. 5.1 VIVT โ€” Virtually Indexed, Virtually Tagged
  21. 5.2 PIPT โ€” Physically Indexed, Physically Tagged
  22. 5.3 VIPT โ€” Virtually Indexed, Physically Tagged
  23. 5.4 The VIPT Aliasing Condition and the Page Offset Rule
  24. Part III โ€” VIPT Cache Design
  25. Chapter 6: VIPT Cache Architecture โ€” Full Design
  26. 6.1 Address Map for 32-bit RISC-V
  27. 6.2 Cache Lookup Pipeline
  28. 6.3 Tag Comparison and Hit Detection
  29. 6.4 Replacement Policy: LRU and PLRU
  30. Chapter 7: Instruction Cache Specifics
  31. 7.1 Why I-Cache is Read-Only
  32. 7.2 Self-Modifying Code and Fence.I
  33. 7.3 Cache Invalidation
  34. Part IV โ€” RISC-V Integration
  35. Chapter 8: RISC-V Architecture Overview
  36. 8.1 RISC-V Pipeline Stages
  37. 8.2 Where I-Cache Sits in the Pipeline
  38. 8.3 PC Alignment and 32-bit Instructions
  39. Chapter 9: Bigger Picture โ€” Cache in a Real SoC
  40. 9.1 L1 / L2 / L3 Hierarchy
  41. 9.2 Cache Coherence Basics
  42. 9.3 Critical Path and Timing
  43. Part V โ€” Reference
  44. Chapter 10: Worked Examples & Parameter Tables
  45. Chapter 11: Glossary
Chapter 1
Why Caches Exist โ€” The Memory Hierarchy Problem

1.1 The CPUโ€“Memory Speed Gap

Modern processors execute instructions at gigahertz speeds โ€” one clock cycle can be as short as 0.3 nanoseconds. Yet main memory (DRAM) requires 50โ€“100 nanoseconds to return data after a request. This means the CPU is 100โ€“300ร— faster than memory. Without a solution, the CPU would sit idle waiting for every instruction fetch and data access.

Figure 1.1 โ€” CPU vs Memory Latency Gap
CPU ~0.3 ns L1 Cache 1โ€“4 ns L2 Cache 5โ€“15 ns L3 Cache 20โ€“40 ns Main Memory (DRAM) 50โ€“100 ns SSD / Disk 100ยตs+ Latency โ†’

1.2 Locality of Reference

Caches work because programs exhibit locality. There are two types:

Temporal Locality If you accessed memory address X recently, you are likely to access it again soon. Example: a loop counter is read and incremented thousands of times.
Spatial Locality If you accessed address X, you are likely to access nearby addresses (X+4, X+8โ€ฆ) soon. Example: reading sequential instructions or array elements.

These two properties are the reason caches are effective. A cache exploits temporal locality by keeping recently-used data close to the CPU, and exploits spatial locality by loading an entire cache line (typically 64 bytes) instead of just the single byte or word requested.

1.3 The Memory Hierarchy Pyramid

Figure 1.2 โ€” Memory Hierarchy Pyramid
Registers L1 Cache (32โ€“64 KB) L2 Cache (256 KB โ€“ 1 MB) L3 Cache (8โ€“64 MB, shared) Main Memory โ€” DRAM (GBs) Storage โ€” SSD / HDD (TBs) ~0.3 ns 1โ€“4 ns 5โ€“15 ns 20โ€“40 ns 50โ€“100 ns ~100 ยตs

The L1 instruction cache (I-Cache) is the very first level after the CPU's fetch unit. It stores recently-fetched instructions so the processor doesn't have to reach all the way to DRAM every cycle. This is exactly what we will design in this document.

Chapter 2
Cache Fundamentals โ€” Terminology & Anatomy

2.1 Cache Lines, Sets, and Ways

A cache is organized into cache lines (also called blocks). Each line stores a contiguous chunk of memory โ€” typically 64 bytes. We don't cache individual bytes because of spatial locality: if you need byte X, you'll probably need X+1 soon.

TermDefinitionTypical Value (L1 I-Cache)
Cache Line / BlockThe smallest unit of transfer between cache and memory64 bytes (16 ร— 32-bit words)
SetA group of N lines (ways) that a given address can map to32 or 64 sets
WayOne slot within a set; N-way = N choices per set4 ways (4-way)
Total CapacitySets ร— Ways ร— Line Size64ร—4ร—64 = 16 KB
TagUpper bits of address stored to verify identityRemaining bits after index+offset
IndexBits used to select which set to look inlogโ‚‚(#sets) bits
OffsetBits to select a byte within the cache linelogโ‚‚(line size) bits
Valid Bit1-bit flag: is this line's data meaningful?1 bit per way

2.2 Address Decomposition: Tag, Index, Offset

Every memory access uses a physical or virtual address that is split into three fields. For a 32-bit address with a 16 KB, 4-way, 64-byte-line cache:

Cache Size = 16 KB = 16,384 bytes Ways = 4 Line Size = 64 bytes โ†’ Offset bits = logโ‚‚(64) = 6 bits #Sets = 16384 / (4 ร— 64) = 64 โ†’ Index bits = logโ‚‚(64) = 6 bits Tag bits = 32 โˆ’ 6 โˆ’ 6 = 20 bits
Figure 2.1 โ€” 32-bit Address Decomposition
Bits [31:12]
TAG (20 bits)
Bits [11:6]
INDEX (6 bits)
Bits [5:2]
WORD OFF (4 bits)
Bits [1:0]
BYTE (2b)

Note: for instruction cache, bits[1:0] are always 00 (word-aligned in RISC-V)

2.3 Valid Bits and Metadata Per Way

Each way in each set stores:

Per-Way Storage Valid(1) | Tag(20) | Data(64ร—8 = 512 bits) = 533 bits per way. For all 4 ways and 64 sets: 64 ร— 4 ร— 533 โ‰ˆ 136 KB of raw SRAM. Plus ~14 bits of LRU state per set.
Chapter 3
Cache Organization Types

3.1 Direct-Mapped Cache

Each memory block maps to exactly one location in cache. Simple, fast โ€” but suffers from conflict misses when two heavily-used addresses share the same slot.

Figure 3.1 โ€” Direct-Mapped (1-Way) Cache
Memory Block 0 Block 1 Block 2 Block 3 Block 4 Block 5 โ€ฆ Cache (4 sets, 1 way each) Set 0 Set 1 Set 2 Set 3 conflict!

3.2 Fully Associative Cache

A block can be placed in any cache line โ€” no index field, just tag. No conflict misses, but requires comparing all tags simultaneously, which is expensive in hardware (many comparators).

3.3 Set Associative Cache

The practical middle ground. The cache is divided into S sets, each holding N ways. A block maps to one set (chosen by the index bits) but can go into any of the N ways within that set. This eliminates most conflict misses while keeping hardware complexity manageable.

Key Trade-off More ways โ†’ fewer conflict misses โ†’ better hit rate, but more tag comparators, more power, and potentially longer hit latency. 4-way is the most common sweet spot for L1 caches.

3.4 4-Way Set Associative โ€” The Full Picture

Figure 3.2 โ€” 4-Way Set Associative Cache Structure (64 sets)
Set Way 0 Way 1 Way 2 Way 3 LRU Set 0 V|Tag[19:0]|Data[511:0] V|Tag[19:0]|Data[511:0] V|Tag[19:0]|Data[511:0] V|Tag[19:0]|Data[511:0] 2b Set 1 V|Tag[19:0]|Data[511:0] V|Tag[19:0]|Data[511:0] V|Tag[19:0]|Data[511:0] V|Tag[19:0]|Data[511:0] 2b โ‹ฎ Set 62 V|Tag[19:0]|Data[511:0] V|Tag[19:0]|Data[511:0] V|Tag[19:0]|Data[511:0] V|Tag[19:0]|Data[511:0] 2b Set 63 V|Tag[19:0]|Data[511:0] V|Tag[19:0]|Data[511:0] V|Tag[19:0]|Data[511:0] V|Tag[19:0]|Data[511:0] 2b Legend: โ€ข V = Valid bit (1 bit)    โ€ข Tag = Physical tag bits (20 bits)    โ€ข Data = 64 bytes = 512 bits โ€ข LRU = 2-bit PLRU state per set (4 ways needs logโ‚‚(4)=2 bits minimum) Total: 64 sets ร— 4 ways ร— (1+20+512) bits + 64ร—3 LRU bits โ‰ˆ 136 Kb SRAM
Chapter 4
Virtual Memory & Address Translation

4.1 Virtual Addresses vs Physical Addresses

Every process running on a CPU uses virtual addresses โ€” a private address space that gives the illusion of having all of memory. The operating system and hardware MMU (Memory Management Unit) translate these into physical addresses โ€” real locations in DRAM.

Virtual Address (VA) What the CPU / program sees. Can be up to 48-bit on modern 64-bit systems. Two different processes can have the same VA pointing to different physical data.
Physical Address (PA) The actual DRAM row/column. Unique globally. The cache tag stores physical bits to prevent false hits across processes.

4.2 Page Tables and the TLB

Translation happens at page granularity. A page is typically 4 KB. The page table maps the upper bits of VA (the Virtual Page Number, VPN) to the upper bits of PA (the Physical Page Number, PPN). The lower bits (the page offset, 12 bits for 4 KB pages) are identical in both VA and PA โ€” this is the crucial VIPT insight.

Figure 4.1 โ€” VA to PA Translation via TLB
Virtual Address VPN [31:12] | Offset [11:0] TLB Translation Lookaside Buffer Physical Address PPN [31:12] | Offset [11:0] Page Offset [11:0] is IDENTICAL in VA and PA This is the key property that makes VIPT possible TLB Miss? Walk page table in mem
The Golden Rule of VIPT Because the page offset bits are identical in both VA and PA, you can use them as the cache index without waiting for TLB translation. The TLB lookup happens in parallel with the SRAM lookup, not sequentially after it. This is why VIPT is fast.
Chapter 5
Cache Indexing Strategies

5.1 VIVT โ€” Virtually Indexed, Virtually Tagged

Both the index and tag come from the virtual address. Fast (no TLB needed) but suffers from aliasing (two different virtual addresses in different processes map to the same PA โ€” the cache will contain duplicate data) and homonyms (same VA, different PA in different processes โ€” the cache may return wrong data). Requires full cache flush on context switch. Rare in modern CPUs.

5.2 PIPT โ€” Physically Indexed, Physically Tagged

Both index and tag come from the physical address. Correct by design โ€” no aliasing or homonym problems. But the TLB must complete before the cache SRAM can even be addressed. This adds the full TLB latency to every cache hit โ€” critical path penalty for L1.

5.3 VIPT โ€” Virtually Indexed, Physically Tagged

Index comes from the virtual address; tag comes from the physical address. The key insight: use the TLB and SRAM in parallel. While the cache SRAM is being read (indexed by VA bits), the TLB translates the upper virtual bits to physical bits. Both complete at roughly the same time, then tag comparison uses the physical tag.

Figure 5.1 โ€” VIPT Parallel Lookup Pipeline
Virtual Address (PC) VA[11:0] VA[31:12] Cache SRAM Read 4 ways at set[index] TLB Lookup VPN โ†’ PPN (Physical tag) Tag Comparators (ร—4 ways) Stored tag == Physical tag? โ†’ HIT or MISS PARALLEL

5.4 The VIPT Aliasing Condition and the Page Offset Rule

VIPT introduces a subtle problem called virtual aliasing: two virtual pages can map to the same physical page. If they both index into different cache sets, the same data will be stored twice โ€” inconsistently.

VIPT Aliasing Condition Aliasing is eliminated if and only if the cache index bits fall entirely within the page offset. For 4 KB pages, the page offset is 12 bits (bits [11:0]). The index bits must be a subset of [11:0].

Our design: Index = bits[11:6], Offset = bits[5:0]. Index high bit = bit 11. Page offset top = bit 11. โœ“ Safe โ€” no aliasing.
VIPT is alias-free if: (index_bits + offset_bits) โ‰ค page_offset_bits Our example: index(6) + offset(6) = 12 โ‰ค 12 (for 4KB pages) โœ“ Maximum alias-free cache size = Ways ร— Page_Size = 4 ร— 4KB = 16 KB (This is why L1 caches are typically โ‰ค 32 KB with 8-way or โ‰ค 16 KB with 4-way)
StrategyIndex SourceTag SourceTLB on Critical Path?Aliasing?Used In
VIVTVirtualVirtualNoYes (severe)Old ARM, rare
PIPTPhysicalPhysicalYesNoD-Cache, L2/L3
VIPTVirtualPhysicalNo (parallel)ConditionalL1 I-Cache (most modern CPUs)
Chapter 6
VIPT Cache Architecture โ€” Full Design

6.1 Address Map for 32-bit RISC-V

Our design targets a 32-bit RISC-V core with a 16 KB, 4-way set associative, 64-byte line instruction cache. Here are all the parameters derived step by step:

ParameterValueDerivation
Address width32 bitsRISC-V RV32I
Cache capacity16 KBDesign choice
Associativity4 waysDesign choice
Cache line size64 bytesDesign choice (matches burst length)
Number of sets6416384 / (4 ร— 64) = 64
Block offset bits6logโ‚‚(64) = 6, bits[5:0]
Index bits6logโ‚‚(64) = 6, bits[11:6]
Tag bits (physical)2032 โˆ’ 6 โˆ’ 6 = 20, bits[31:12]
Page offset (4 KB page)12 bitslogโ‚‚(4096) = 12
Alias-free check6+6=12 โ‰ค 12 โœ“VIPT safe
SRAM per way(1+20+512) = 533 bits ร— 64V + Tag + Data per line

6.2 Cache Lookup Pipeline โ€” Cycle by Cycle

Figure 6.1 โ€” Full VIPT I-Cache Lookup Datapath
PC [31:0] Physical Tag [31:12] โ€” 20 bits Index [11:6] โ€” 6 bits Word off [5:2] [1:0] VA index bits VA upper bits โ†’ TLB 4-Way Cache SRAM Read Way0, Way1, Way2, Way3 from Set[ index[11:6] ] Returns: 4 ร— (Valid, Tag[19:0], Data[511:0]) TLB VPN[31:12] โ†’ PPN[31:12] Outputs Physical Tag[31:12] ~1 cycle (same as SRAM read) โ†” PARALLEL โ†” Tag Comparators ร— 4 Way0: stored_tag == phys_tag && valid? โ†’ hit0 Way1 / Way2 / Way3: same logic in parallel HIT โœ“ MUX: select matching way's data MISS โœ— Fetch from L2 / memory Instruction [31:0] โ†’ Decode Update LRU state

6.3 Tag Comparison and Hit Detection Logic

All four tag comparisons happen simultaneously in hardware โ€” this is what associativity means in implementation. The logic for each way is:

hit_way_N = valid_N AND (stored_tag_N == physical_tag_from_TLB) overall_hit = hit_way_0 OR hit_way_1 OR hit_way_2 OR hit_way_3 // Select the data from whichever way hit: instr_out = MUX4(data_way_0, data_way_1, data_way_2, data_way_3, {hit_way_3, hit_way_2, hit_way_1, hit_way_0})

6.4 Replacement Policy: LRU and PLRU

On a miss, we must evict one of the 4 ways. The most accurate policy is True LRU (Least Recently Used), but it requires logโ‚‚(4!) = 5 bits per set. A practical approximation is Pseudo-LRU (PLRU) using a 3-bit binary tree per set.

Figure 6.2 โ€” 4-Way PLRU Binary Tree (3 bits per set)
b0 b1 b2 0 = left 1 = right Way 0 Way 1 Way 2 Way 3 b1=0 b1=1 b2=0 b2=1 Evict victim: b0=0โ†’check b1, b0=1โ†’check b2. Update bits on every access.
Chapter 7
Instruction Cache Specifics

7.1 Why I-Cache is Read-Only

The instruction cache is read-only from the CPU's perspective. The fetch unit never writes instructions โ€” it only reads them. This has major design implications:

7.2 Self-Modifying Code and FENCE.I

A problem arises with JIT compilers or loaders that write instructions to memory via the D-cache, then try to execute them. The I-cache and D-cache are separate; writing through the D-cache doesn't automatically update the I-cache. The program may fetch stale instructions.

RISC-V FENCE.I Instruction RISC-V provides the FENCE.I instruction (Zifencei extension) to synchronize the instruction stream. When executed, it guarantees that any prior stores are visible to subsequent instruction fetches. Hardware implementations typically respond by flushing and invalidating the entire I-cache.

7.3 Cache Invalidation on Context Switch

With VIPT using physical tags, the cache is naturally process-safe โ€” a tag miss from a different process's PA simply won't match. However, if the OS reuses a physical page for a different virtual mapping, stale I-cache data with the old physical tag could match. This is solved by:

ASID (Address Space ID) Include the process ASID in the tag match. Lines from different processes never match even if PA is recycled. Avoids full flush on context switch.
Full Flush Clear all valid bits on context switch. Simple but wastes the warm cache. Acceptable in embedded systems, costly in OS-heavy workloads.
Chapter 8
RISC-V Architecture & I-Cache Integration

8.1 RISC-V Pipeline Overview

A standard 5-stage RISC-V pipeline (as in the reference Rocket or CVA6 cores) operates as:

Figure 8.1 โ€” 5-Stage RISC-V Pipeline with I-Cache
IF Instruction Fetch ID Instruction Decode EX Execute ALU / Branch MEM Memory Access WB Write Back I-Cache lives HERE (IF stage) D-Cache (MEM) Branch/Jump PC redirect from EX stage

8.2 Where I-Cache Sits in the Pipeline

The IF (Instruction Fetch) stage drives the I-cache every cycle with the current PC. The cache must return the instruction within the same cycle (or signal a stall). For a typical 1 GHz embedded RISC-V core, the I-cache must deliver a result in ~1 ns โ€” this means the SRAM access time and tag comparison must fit within one clock cycle.

8.3 PC Alignment in RISC-V

RISC-V RV32I instructions are always 4 bytes (32 bits) wide and word-aligned. This means:

Practical Impact Each cache line holds 64 bytes รท 4 bytes/instr = 16 instructions. On a cache miss, 16 instructions are fetched in one burst from L2/memory, amortizing the miss penalty. This is why spatial locality matters so much for instruction caches.
Chapter 9
Bigger Picture โ€” Cache in a Real SoC

9.1 L1 / L2 / L3 Hierarchy in Context

Figure 9.1 โ€” Full SoC Cache Hierarchy with RISC-V Core
RISC-V Core IF ID EX MEM WB L1 I-Cache 16 KB VIPT 4-way 64B lines, 64 sets L1 D-Cache 16โ€“32 KB VIPT 4-way, write-back Unified L2 Cache 256 KB โ€“ 1 MB | PIPT | 8-way | shared by I+D | Write-back L3 Cache / Memory Bus (AXI4 / AHB) DDR DRAM

9.2 Cache Coherence Basics

In a single-core RISC-V system, I-cache coherence is managed manually via FENCE.I. In multi-core systems, a hardware coherence protocol (MESI, MOESI) maintains consistency. The I-cache typically operates in a read-only invalid/valid state machine โ€” it never produces modified data, so it participates in coherence as a snooper only.

9.3 Critical Path and Timing

The I-cache is on the critical path of the processor โ€” it determines the minimum clock period. The critical path flows:

t_cycle โ‰ฅ t_SRAM_read + t_tag_compare + t_mux + t_setup Typical L1 SRAM read: ~200โ€“400 ps (TSMC 28nm) Tag comparator (XOR + NOR): ~80 ps 4:1 MUX: ~60 ps Flip-flop setup: ~50 ps โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Total: ~400โ€“600 ps โ†’ cycle time โ‰ฅ 600 ps โ†’ max freq ~1.6 GHz

This is why L1 caches are kept small (16โ€“32 KB) โ€” larger SRAMs have longer access times, violating timing. L2 caches run slower and can be larger.

Chapter 10
Worked Examples & Parameter Tables

Example 1: Cache Hit Trace

PC = 0x80001234. Is there a hit?

Address: 0x80001234 = 1000_0000_0000_0000_0001_0010_0011_0100 (binary) Offset [5:0] = 11_0100 = 0x34 (but word aligned: [5:2]=1101=13, byte in word [1:0]=00) Index [11:6] = 00_0100 = Set 4 Tag [31:12]= 0x80001 โ†’ Physical tag after TLB = 0x80001 1. Read Set 4 from SRAM: get Way0..Way3 tags and valid bits 2. TLB maps 0x80001 (VPN) โ†’ 0x80001 (PPN) [identity mapped in this example] 3. Compare physical tag 0x80001 with stored tags in Set 4: - Way0: valid=1, tag=0x80001 โ†’ MATCH โ†’ HIT in Way 0! 4. Select Way 0 data, extract word at offset 13 โ†’ instr[31:0] 5. Update PLRU: Way 0 most recently used

Example 2: Cache Miss Trace

PC = 0xC0008000 Tag = 0xC0008, Index = Set 0, Offset = 0 1. Read Set 0: Way0(tag=0x00001,V=1), Way1(tag=0x00002,V=1), Way2(tag=0x80001,V=1), Way3(tag=0xA0001,V=1) 2. TLB returns physical tag = 0xC0008 3. No tag matches โ†’ MISS 4. Stall pipeline (insert bubbles) 5. PLRU selects victim way (say Way 1) 6. Issue L2 read: fetch 64 bytes at PA 0xC0008000 7. Fill Way 1 with new data, set valid=1, tag=0xC0008 8. Resume pipeline

Complete Parameter Table

ParameterOur DesignAlternative (32KB, 8-way)
Total Size16 KB32 KB
Ways48
Sets6464
Line size64 bytes64 bytes
Offset bits6 (bits[5:0])6 (bits[5:0])
Index bits6 (bits[11:6])6 (bits[11:6])
Tag bits20 (bits[31:12])20 (bits[31:12])
VIPT alias-free?โœ“ 6+6=12โ‰ค12โœ“ 6+6=12โ‰ค12
LRU bits/set3 (PLRU tree)7 (PLRU tree)
Tag comparators48
SRAM area~0.02 mmยฒ (28nm)~0.04 mmยฒ (28nm)
Typical hit rate~97โ€“98%~98โ€“99%
Access time~1 cycle~1โ€“2 cycles
Chapter 11
Glossary
TermDefinition
Associativity (N-way)Number of cache lines within a set where a block can be placed
ASIDAddress Space Identifier โ€” process tag added to TLB/cache to avoid flushes on context switch
Cache Line / BlockThe atomic unit of data moved between cache levels (typically 64 bytes)
Cold MissMiss because data has never been loaded (compulsory miss)
Conflict MissMiss because two lines compete for the same set (eliminated by more ways)
Capacity MissMiss because cache is too small to hold the working set
Critical PathLongest combinational logic path that limits maximum clock frequency
D-CacheData cache โ€” serves load/store instructions in the MEM pipeline stage
DRAMDynamic RAM โ€” main memory, slow (50โ€“100 ns) but large (GBs)
FENCE.IRISC-V instruction to flush I-cache and synchronize instruction stream
Hit / MissHit: requested data found in cache. Miss: not found, must fetch from lower level
I-CacheInstruction cache โ€” supplies instructions to the IF stage of the pipeline
IndexAddress bits used to select which set to look in
LRULeast Recently Used โ€” eviction policy that removes the line accessed longest ago
MMUMemory Management Unit โ€” hardware that performs VAโ†’PA translation
OffsetAddress bits selecting a byte within a cache line
PA (Physical Address)Real address in DRAM after MMU translation
PageFixed-size block of memory (typically 4 KB) โ€” the unit of VAโ†’PA mapping
PIPTPhysically Indexed, Physically Tagged โ€” correct but TLB is on critical path
PLRUPseudo-LRU โ€” approximation of LRU using a binary tree, fewer bits
PPNPhysical Page Number โ€” upper bits of a physical address
RISC-VOpen-source reduced instruction set computer architecture
SetA group of N ways in a cache; an address maps to exactly one set
Spatial LocalityTendency to access addresses near recently-used addresses
SRAMStatic RAM โ€” fast (sub-nanosecond), used for cache storage
TagUpper address bits stored alongside data to verify a cache line's identity
Temporal LocalityTendency to re-access recently-used memory locations
TLBTranslation Lookaside Buffer โ€” fast cache of recent VAโ†’PA translations
VA (Virtual Address)Address as seen by the program; private per-process view
Valid Bit1-bit flag per cache way indicating whether stored data is meaningful
VIPTVirtually Indexed, Physically Tagged โ€” indexes via VA, tags via PA; fast and safe if alias condition met
VIVTVirtually Indexed, Virtually Tagged โ€” fast but prone to aliasing and homonym problems
VPNVirtual Page Number โ€” upper bits of a virtual address
WayOne slot within a set โ€” N-way cache has N ways per set
Write-BackD-cache policy: write to cache only, flush to memory on eviction
Write-ThroughD-cache policy: every write goes to both cache and memory immediately

4-Way Set Associative VIPT Instruction Cache for RISC-V

From first principles to full SoC integration

To save as PDF: Open in browser โ†’ Print โ†’ Save as PDF โ†’ Set paper to A4/Letter, enable Background Graphics