nand2cpu — From a Single NAND Gate to a 16-bit ALU

Verilog

Assembler

Open Source

Conceptual illustration of the nand2cpu project — Figure 1 — Visual identity of the nand2cpu educational project (Part 1).

TL;DR: Educational, reproducible journey turning one universal logic primitive (NAND) into a parameterized gate library, an 8-bit ALU, then a 16-bit ALU plus a minimal Python assembler and Verilog testbenches (canonical demo: 7 + 8 = 15). First episode of the “From Bits to Chip” series.

Contents 1. Context & Goal 2. Pedagogical Architecture 3. Repository Structure 4. Core Logic Modules 5. Mini Python Assembler 6. Tests & Validation 7. Findings 8. Constraints & Choices 9. Improvement Roadmap 10. Portfolio Relevance 11. Illustrative Snippets 12. Personal Lessons 13. Current Limitations 14. Quick Reproduction 15. Pedagogical Ideas 16. Conclusion 17. Resources

1. Context & Objective

People often hear “any digital system can be built from NAND”. This project makes that statement concrete: a reproducible path from a single universal gate to a working 16-bit Arithmetic Logic Unit with minimal tooling (Verilog + Python) and systematic verification.

2. Pedagogical Architecture (Bottom-Up)

Parametric universal primitive (nand_gate.v)
Derived gates by structural composition (AND, OR, etc.)
Bit to word level generalization (8-bit operands)
Introduce arithmetic & logical ops (8-bit ALU)
Scale to 16 bits (carry chaining)
Add a textual assembler → 16-bit machine code
Automated testbenches (Icarus Verilog)

3. Repository Structure (Snapshot)

src/rtl/ — Verilog modules (gates + ALUs)
src/testbenches/ — Focused testbenches
src/fpga/ — Future top-level for FPGA mapping
tools/assembler/ — Parser & encoder (Python)
tools/scripts/ — Build / simulation helpers
examples/ — Assembly examples
docs/ — Scripts and pedagogical slides

4. Core Logic Modules

4.1 Universality of NAND

NAND is functionally complete: by combining inversion and conjunction you recover all other boolean operators. The educational value lies in explicit reconstruction rather than hand-waving: every subsequent gate in the library is expressed structurally in terms of nand_gate, keeping the dependency graph transparent and auditable.

NOT A = NAND(A, A)
AND(A,B) = NOT(NAND(A,B))
OR(A,B) = NAND(NOT A, NOT B)

Each primitive is parameterized via WIDTH, enabling reuse from 1 bit to N bits without duplicating logic. Parameterization keeps the pedagogical jump (bit → bus) incremental: learners reuse mental models instead of confronting an all-new abstraction.

Figure 2 — Derivation path: each gate is expressed structurally from the single NAND primitive (no hidden abstractions).

4.2 8-bit ALU

Supported operations (3-bit opcode): 000 ADD, 001 SUB, 010 AND, 011 OR, 100 XOR, 101 SHL, 110 SHR, 111 NOT. Internally the ALU partitions functionality into three parallel functional units whose results enter a final result multiplexer selected by the opcode. This separation clarifies where to extend (e.g., future rotate or arithmetic shift).

Extended internal register (9 bits) captures carry.
Semantic decoupling between output value and carry flag.
Intentionally simple: no pipelining, no exposed Overflow/Zero (yet).

Figure 3 — 8-bit ALU organization: parallel functional units feeding a single opcode-controlled multiplexer.

4.3 16-bit Extension

Strategy: chain two 8-bit ALUs with explicit carry propagation. A deliberate choice was not to prematurely optimize with a carry-lookahead so the latency penalty of ripple chaining remains visible and measurable—an on-ramp to performance discussions.

Figure 4 — 16-bit ALU: low byte produces carry forwarded into the high byte for arithmetic ops.

5. Mini Python Assembler

5.1 Purpose

Bridge a readable symbolic format (mnemonics + registers) to a compact 16-bit encoding aligned with the ALU design.

5.2 Instruction Format (Current)

[15:13] Opcode (3 bits)
[12:10] Rd
[9:7]   Rs1
[6:4]   Rs2 (or single source depending on type)
[3:0]   Reserved (future immediates / flags)

Figure 5 — Fixed 16-bit instruction format with 4 LSBs reserved to avoid future encoding breakage.

5.3 Software Pipeline

parser.py: line cleanup, mnemonic & operand extraction
encoder.py: mnemonic → opcode mapping, bitfield packing
main.py: orchestration, emits .bin + human-readable .hex

Figure 6 — Assembler flow: parsing produces a lightweight IR enabling validation before packing bits.

5.4 Current Limits (Opportunities)

No label resolution for branching (labels collected but unused)
No immediates or memory operations
No pseudo-instructions
Minimal error reporting (range / register validation basic)

6. Tests & Validation

6.1 Philosophy

Each abstraction level has a focused testbench for fast isolation of regressions. The design philosophy: small deterministic stimuli first (unit tests for gates), then **representative functional cases** (ALU ops), finally **integrated toolchain proofs** (assembler + simulation), before contemplating randomized fuzzing.

6.2 Emblematic Example (7 + 8 = 15)

// add7_plus_8.v
A  = 7;
B  = 8;
Op = 3'b000; // ADD
#10;
assert(Y == 15);

6.3 Test Categories

Primitive logic validation (NAND)
16-bit ALU functional validation (chaining integrity)
Pedagogical scenario (7 + 8)
Assembler binary generation + hex inspection

6.4 Potential Extensions

Randomized (fuzz) ALU test vectors
Coverage metrics (Verilator + gcov)
CI non-regression (GitHub Actions)

Figure 7 — Layered validation strategy: isolate primitive correctness before system integration.

7. Findings & Insights

Theme	Observation	Implication
Universality	NAND reconstructs everything	Deepens structural understanding
Parametrization	`WIDTH` cuts duplication	Scales to wider buses
Logic vs ISA	Early assembler clarifies contract	Preps pipeline integration
ALU Simplicity	Pure combinational core	Easy for timing exploration
8→16 Chaining	Highlights carry cost	Motivates faster adders
Compact Encoding	Low 4 bits reserved	Forward compatible evolution

8. Constraints & Deliberate Choices

No pipeline (clarity first)
No exposed status flags (episode scope)
Minimal cross-lane shift sophistication
Assembler intentionally minimal (no macros)

9. Technical Roadmap

1 / 8

Roadmap progress indicator.

Flags

Add Zero / Negative / Overflow signals.

Immediates

Literal fields (sign/zero extend) in ISA.

Memory & Branch

LOAD / STORE + conditional branches.

Register File

Multi-port module decoupling ALU.

Pipeline

2–3 stage throughput increase.

CI & Coverage

Verilator + gcov + Actions.

Fast Adder

CLA / prefix structure swap.

Assembler UX

Diagnostics + pseudo-ops.

10. Portfolio Relevance

Vertical mastery: boolean logic → architecture
Complementary tooling (assembler)
Testing discipline (Make + scripts)
Readable, extensible code
Coherent educational documentation

11. Illustrative Snippets

// Universal NAND gate (conceptual)
assign y = ~(a & b);

// 16-bit ALU chaining (conceptual)
// ALU_low (A[7:0],  B[7:0]) → carry → ALU_high (A[15:8], B[15:8])

// Encoding (ADD R0,R1,R2)
// Opcode=000 Rd=000 Rs1=001 Rs2=010 xxxx

12. Personal Lessons (Neutral)

Early convention formalization removes ambiguity.
A minimal assembler forces ISA stabilization.
Fine-grained testbenches shorten diagnosis cycles.
Reserved bits prevent format rewrites later.

13. Current Limitations

No scripted timing measurements
No large-scale fuzzing
No multi-file assembly linking
No memory / PC / branching yet

14. Quick Reproduction

# macOS
brew install icarus-verilog python3 make

# Demo (7 + 8 = 15)
make sim-add7_plus_8

# 16-bit ALU test
make sim-tb_alu16

# Assemble example
make assembler
hexdump -C tools/assembler/test.bin

15. Pedagogical Extensions

Rewrite ALU structurally (half/full adders explicit)
Add Zero / Carry / Overflow flags
Enable labels & immediates in assembler
Fuzz script (random A,B,Op + Python oracle)

16. Conclusion

A clear foundation: minimal logical core elevated to a working 16-bit ALU with light tooling, paving the way for memory, pipelining, register file, control path, and execution of multi-instruction programs. The explicit layering (primitive → composition → word ops → ALU → assembler → tests) makes future architectural steps (control unit, instruction fetch, branch handling) feel evolutionary rather than abrupt. In short: a transparent, inspectable learning artifact that turns a textbook aphorism into verifiable silicon logic.

17. Resources (curated references & assets)

REPO

Source Code

Verilog RTL, assembler, testbenches. Reproduce every example locally.
github.com/promaaa/nand2cpu
DOCS

Slides & Scripts

Teaching slides and helper automation stored under docs/.
Browse folder
VIDEO

Episode 1 Walkthrough

Commented build: gate derivation → ALU demonstration → assembler.
Watch
REF

Inspirations

Nand2Tetris & MIT 6.004 shaped the bottom-up sequencing philosophy.
Nand2Tetris site
LICENSE

MIT License

Permissive; encourage forks for learning & teaching extensions.
View text