Signal-to-system playbooks

Custom Model Training

Practical guidance for teams deciding when a frontier signal deserves a trained model, a dataset, an eval gate, or a safer rollout path.

Playbooks SLM series Operating principles

Core Playbooks

Start with decision quality, then move into data quality, then enforce ruthless evaluation before launch.

Start here Strategy

When Custom Training Is Actually Worth It

How to decide between prompting, RAG, fine-tuning, or full custom training without burning six weeks for a 4% gain.

Read playbook →

Data foundation Dataset

Dataset Design and Curation

Build a dataset that reflects real user behavior, edge cases, and failure modes instead of happy-path vanity examples.

Read playbook →

Quality control Evals

Evaluation and Release Gates

A practical eval stack: offline evals, scenario tests, red-team checks, and hard release thresholds that block bad launches.

Read playbook →

Architecture choice Decision matrix

Fine-Tune vs RAG vs Prompting

A plain-language decision matrix for selecting the lightest approach that achieves your target behavior and reliability.

Read playbook →

Nanochat / SLM Series

Field notes and a four-part series connecting Eric's local-inference experiments, pico-LLM work, nanochat, small-language-model research, and practical ability training.

Field note Writing loop

Write As You Go. Schedule Ahead.

A project-writing field note on capturing lessons while the system is still warm, drafting ahead, and turning publication into a reusable learning loop.

Read field note →

Field note NPU

Tinkering with a NPU

A first look at the Snapdragon Elite X1E80100 Qualcomm Hexagon NPU as a local-inference path for low-cost agentic tokens.

Read field note →

Field note NPU + Gemma

NPU and Gemma4

A local-inference field note on Gemma, ONNX, battery-powered AI, and the harness needed to turn a small local model into a useful agent.

Read field note →

Field note Gemma harness

Experimenting with a Custom Gemma Harness

A short field note on Snapdragon NPU experiments, local Gemma sessions, and the slash commands that make a tiny-model harness usable.

Read field note →

Part 1 Pico models

Why Tiny Specialists Matter

The preserved 64 MB RPG-state experiment reframed around constrained domains, token contracts, failure-first evals, and capability-per-megabyte.

Read part 1 →

Part 2 nanochat

The Depth Dial and Miniseries

How nanochat's depth dial turns training into a comparable family of compute-optimal models instead of one-off checkpoint luck.

Read part 2 →

Part 3 Economics

GPT-2 Economics Under $100

What changes when GPT-2-level capability becomes cheap enough to repeat, and why data and evals become the real constraint.

Read part 3 →

Part 4 Abilities

Training Small Model Abilities

How synthetic data, token-visible task design, and identity tuning turn small models into useful narrow specialists.

Read part 4 →

Operating Principles

Use the smallest sufficient intervention: if prompt design solves it, do not train.
Ground decisions in production pain: train against real failures, not vibes.
Version everything: data, prompts, eval sets, and model artifacts need traceability.
Gate every release: no pass on evals means no launch, even when deadlines scream.
Measure drift continuously: a model can degrade quietly while dashboards still look pretty.