← Back to Portfolio

nand2cpu — From a Single NAND Gate to a 16-bit ALU

Verilog
Assembler
Open Source
Conceptual illustration of the nand2cpu project
Figure 1 — Visual identity of the nand2cpu educational project (Part 1).
TL;DR: Educational, reproducible journey turning one universal logic primitive (NAND) into a parameterized gate library, an 8-bit ALU, then a 16-bit ALU plus a minimal Python assembler and Verilog testbenches (canonical demo: 7 + 8 = 15). First episode of the “From Bits to Chip” series.
Contents 1. Context & Goal 2. Pedagogical Architecture 3. Repository Structure 4. Core Logic Modules 5. Mini Python Assembler 6. Tests & Validation 7. Findings 8. Constraints & Choices 9. Improvement Roadmap 10. Portfolio Relevance 11. Illustrative Snippets 12. Personal Lessons 13. Current Limitations 14. Quick Reproduction 15. Pedagogical Ideas 16. Conclusion 17. Resources

1. Context & Objective

People often hear “any digital system can be built from NAND”. This project makes that statement concrete: a reproducible path from a single universal gate to a working 16-bit Arithmetic Logic Unit with minimal tooling (Verilog + Python) and systematic verification.

2. Pedagogical Architecture (Bottom-Up)

  1. Parametric universal primitive (nand_gate.v)
  2. Derived gates by structural composition (AND, OR, etc.)
  3. Bit to word level generalization (8-bit operands)
  4. Introduce arithmetic & logical ops (8-bit ALU)
  5. Scale to 16 bits (carry chaining)
  6. Add a textual assembler → 16-bit machine code
  7. Automated testbenches (Icarus Verilog)

3. Repository Structure (Snapshot)

4. Core Logic Modules

4.1 Universality of NAND

NAND is functionally complete: by combining inversion and conjunction you recover all other boolean operators. The educational value lies in explicit reconstruction rather than hand-waving: every subsequent gate in the library is expressed structurally in terms of nand_gate, keeping the dependency graph transparent and auditable.

Each primitive is parameterized via WIDTH, enabling reuse from 1 bit to N bits without duplicating logic. Parameterization keeps the pedagogical jump (bit → bus) incremental: learners reuse mental models instead of confronting an all-new abstraction.

Deriving Logic from NAND Primitive Unary / Basic Derived Composite NAND y = ~(a & b) NOT NAND(a,a) AND ¬NAND(a,b) OR NAND(¬a,¬b) XOR (a ▽ b) =(a NAND b) XOR Impl. (a NAND b) NAND(¬a NAND ¬b) ▽ = XOR symbolic
Figure 2 — Derivation path: each gate is expressed structurally from the single NAND primitive (no hidden abstractions).

4.2 8-bit ALU

Supported operations (3-bit opcode): 000 ADD, 001 SUB, 010 AND, 011 OR, 100 XOR, 101 SHL, 110 SHR, 111 NOT. Internally the ALU partitions functionality into three parallel functional units whose results enter a final result multiplexer selected by the opcode. This separation clarifies where to extend (e.g., future rotate or arithmetic shift).

8-bit ALU Diagram A[7:0] B[7:0] Adder/Sub Ripple (9b w/ carry) Logic Unit AND OR XOR NOT Shifter SHL / SHR Result MUX Opcode Select Y[7:0] Opcode[2:0]
Figure 3 — 8-bit ALU organization: parallel functional units feeding a single opcode-controlled multiplexer.

4.3 16-bit Extension

Strategy: chain two 8-bit ALUs with explicit carry propagation. A deliberate choice was not to prematurely optimize with a carry-lookahead so the latency penalty of ripple chaining remains visible and measurable—an on-ramp to performance discussions.

16-bit ALU Chaining ALU_LOW A[7:0], B[7:0] → Y[7:0], C8 ALU_HIGH A[15:8], B[15:8] C8 → Y[15:8] Y[15:0] Carry A[15:0] B[15:0] Opcode[2:0]
Figure 4 — 16-bit ALU: low byte produces carry forwarded into the high byte for arithmetic ops.

5. Mini Python Assembler

5.1 Purpose

Bridge a readable symbolic format (mnemonics + registers) to a compact 16-bit encoding aligned with the ALU design.

5.2 Instruction Format (Current)

[15:13] Opcode (3 bits)
[12:10] Rd
[9:7]   Rs1
[6:4]   Rs2 (or single source depending on type)
[3:0]   Reserved (future immediates / flags)
Instruction Bitfield Layout Opcode [15:13] Rd [12:10] Rs1 [9:7] Rs2 [6:4] Reserved [3:0]
Figure 5 — Fixed 16-bit instruction format with 4 LSBs reserved to avoid future encoding breakage.

5.3 Software Pipeline

  1. parser.py: line cleanup, mnemonic & operand extraction
  2. encoder.py: mnemonic → opcode mapping, bitfield packing
  3. main.py: orchestration, emits .bin + human-readable .hex
Assembler Pipeline Source .asm(mnemonics) Parsertokens Intermediate IR(opcode + regs) Encoderbitpack binary (.bin) hex dump (.hex)
Figure 6 — Assembler flow: parsing produces a lightweight IR enabling validation before packing bits.

5.4 Current Limits (Opportunities)

6. Tests & Validation

6.1 Philosophy

Each abstraction level has a focused testbench for fast isolation of regressions. The design philosophy: small deterministic stimuli first (unit tests for gates), then **representative functional cases** (ALU ops), finally **integrated toolchain proofs** (assembler + simulation), before contemplating randomized fuzzing.

6.2 Emblematic Example (7 + 8 = 15)

// add7_plus_8.v
A  = 7;
B  = 8;
Op = 3'b000; // ADD
#10;
assert(Y == 15);

6.3 Test Categories

6.4 Potential Extensions

Validation Flow Logic Layer Gate TBs(NAND) ALU8 TBsOps correctness ALU16 TBsCarry chain Toolchain Layer Assembler TestsEncoding Integration Demo7+8 scenario Random / Fuzz (future)Coverage
Figure 7 — Layered validation strategy: isolate primitive correctness before system integration.

7. Findings & Insights

ThemeObservationImplication
UniversalityNAND reconstructs everythingDeepens structural understanding
ParametrizationWIDTH cuts duplicationScales to wider buses
Logic vs ISAEarly assembler clarifies contractPreps pipeline integration
ALU SimplicityPure combinational coreEasy for timing exploration
8→16 ChainingHighlights carry costMotivates faster adders
Compact EncodingLow 4 bits reservedForward compatible evolution

8. Constraints & Deliberate Choices

9. Technical Roadmap

1 / 8

Roadmap progress indicator.

1

Flags

Add Zero / Negative / Overflow signals.

2

Immediates

Literal fields (sign/zero extend) in ISA.

3

Memory & Branch

LOAD / STORE + conditional branches.

4

Register File

Multi-port module decoupling ALU.

5

Pipeline

2–3 stage throughput increase.

6

CI & Coverage

Verilator + gcov + Actions.

7

Fast Adder

CLA / prefix structure swap.

8

Assembler UX

Diagnostics + pseudo-ops.

10. Portfolio Relevance

11. Illustrative Snippets

// Universal NAND gate (conceptual)
assign y = ~(a & b);

// 16-bit ALU chaining (conceptual)
// ALU_low (A[7:0],  B[7:0]) → carry → ALU_high (A[15:8], B[15:8])

// Encoding (ADD R0,R1,R2)
// Opcode=000 Rd=000 Rs1=001 Rs2=010 xxxx

12. Personal Lessons (Neutral)

13. Current Limitations

14. Quick Reproduction

# macOS
brew install icarus-verilog python3 make

# Demo (7 + 8 = 15)
make sim-add7_plus_8

# 16-bit ALU test
make sim-tb_alu16

# Assemble example
make assembler
hexdump -C tools/assembler/test.bin

15. Pedagogical Extensions

16. Conclusion

A clear foundation: minimal logical core elevated to a working 16-bit ALU with light tooling, paving the way for memory, pipelining, register file, control path, and execution of multi-instruction programs. The explicit layering (primitive → composition → word ops → ALU → assembler → tests) makes future architectural steps (control unit, instruction fetch, branch handling) feel evolutionary rather than abrupt. In short: a transparent, inspectable learning artifact that turns a textbook aphorism into verifiable silicon logic.

17. Resources (curated references & assets)