Figure 1 — Visual identity of the nand2cpu educational project (Part 1).
TL;DR: Educational, reproducible journey turning one universal logic primitive (NAND) into a parameterized gate library, an 8-bit ALU, then a 16-bit ALU plus a minimal Python assembler and Verilog testbenches (canonical demo: 7 + 8 = 15). First episode of the “From Bits to Chip” series.
People often hear “any digital system can be built from NAND”. This project makes that statement concrete: a reproducible path from a single universal gate to a working 16-bit Arithmetic Logic Unit with minimal tooling (Verilog + Python) and systematic verification.
2. Pedagogical Architecture (Bottom-Up)
Parametric universal primitive (nand_gate.v)
Derived gates by structural composition (AND, OR, etc.)
Bit to word level generalization (8-bit operands)
Introduce arithmetic & logical ops (8-bit ALU)
Scale to 16 bits (carry chaining)
Add a textual assembler → 16-bit machine code
Automated testbenches (Icarus Verilog)
3. Repository Structure (Snapshot)
src/rtl/ — Verilog modules (gates + ALUs)
src/testbenches/ — Focused testbenches
src/fpga/ — Future top-level for FPGA mapping
tools/assembler/ — Parser & encoder (Python)
tools/scripts/ — Build / simulation helpers
examples/ — Assembly examples
docs/ — Scripts and pedagogical slides
4. Core Logic Modules
4.1 Universality of NAND
NAND is functionally complete: by combining inversion and conjunction you recover all other boolean operators. The educational value lies in explicit reconstruction rather than hand-waving: every subsequent gate in the library is expressed structurally in terms of nand_gate, keeping the dependency graph transparent and auditable.
NOT A = NAND(A, A)
AND(A,B) = NOT(NAND(A,B))
OR(A,B) = NAND(NOT A, NOT B)
Each primitive is parameterized via WIDTH, enabling reuse from 1 bit to N bits without duplicating logic. Parameterization keeps the pedagogical jump (bit → bus) incremental: learners reuse mental models instead of confronting an all-new abstraction.
Figure 2 — Derivation path: each gate is expressed structurally from the single NAND primitive (no hidden abstractions).
4.2 8-bit ALU
Supported operations (3-bit opcode): 000 ADD, 001 SUB, 010 AND, 011 OR, 100 XOR, 101 SHL, 110 SHR, 111 NOT. Internally the ALU partitions functionality into three parallel functional units whose results enter a final result multiplexer selected by the opcode. This separation clarifies where to extend (e.g., future rotate or arithmetic shift).
Semantic decoupling between output value and carry flag.
Intentionally simple: no pipelining, no exposed Overflow/Zero (yet).
Figure 3 — 8-bit ALU organization: parallel functional units feeding a single opcode-controlled multiplexer.
4.3 16-bit Extension
Strategy: chain two 8-bit ALUs with explicit carry propagation. A deliberate choice was not to prematurely optimize with a carry-lookahead so the latency penalty of ripple chaining remains visible and measurable—an on-ramp to performance discussions.
Figure 4 — 16-bit ALU: low byte produces carry forwarded into the high byte for arithmetic ops.
5. Mini Python Assembler
5.1 Purpose
Bridge a readable symbolic format (mnemonics + registers) to a compact 16-bit encoding aligned with the ALU design.
Each abstraction level has a focused testbench for fast isolation of regressions. The design philosophy: small deterministic stimuli first (unit tests for gates), then **representative functional cases** (ALU ops), finally **integrated toolchain proofs** (assembler + simulation), before contemplating randomized fuzzing.
6.2 Emblematic Example (7 + 8 = 15)
// add7_plus_8.v
A = 7;
B = 8;
Op = 3'b000; // ADD
#10;
assert(Y == 15);
6.3 Test Categories
Primitive logic validation (NAND)
16-bit ALU functional validation (chaining integrity)
Pedagogical scenario (7 + 8)
Assembler binary generation + hex inspection
6.4 Potential Extensions
Randomized (fuzz) ALU test vectors
Coverage metrics (Verilator + gcov)
CI non-regression (GitHub Actions)
Figure 7 — Layered validation strategy: isolate primitive correctness before system integration.
# macOS
brew install icarus-verilog python3 make
# Demo (7 + 8 = 15)
make sim-add7_plus_8
# 16-bit ALU test
make sim-tb_alu16
# Assemble example
make assembler
hexdump -C tools/assembler/test.bin
15. Pedagogical Extensions
Rewrite ALU structurally (half/full adders explicit)
Add Zero / Carry / Overflow flags
Enable labels & immediates in assembler
Fuzz script (random A,B,Op + Python oracle)
16. Conclusion
A clear foundation: minimal logical core elevated to a working 16-bit ALU with light tooling, paving the way for memory, pipelining, register file, control path, and execution of multi-instruction programs. The explicit layering (primitive → composition → word ops → ALU → assembler → tests) makes future architectural steps (control unit, instruction fetch, branch handling) feel evolutionary rather than abrupt. In short: a transparent, inspectable learning artifact that turns a textbook aphorism into verifiable silicon logic.
17. Resources (curated references & assets)
REPO
Source Code
Verilog RTL, assembler, testbenches. Reproduce every example locally.