Signal-to-system playbooks
Practical guidance for teams deciding when a frontier signal deserves a trained model, a dataset, an eval gate, or a safer rollout path.
Start with decision quality, then move into data quality, then enforce ruthless evaluation before launch.
How to decide between prompting, RAG, fine-tuning, or full custom training without burning six weeks for a 4% gain.
Read playbook →Build a dataset that reflects real user behavior, edge cases, and failure modes instead of happy-path vanity examples.
Read playbook →A practical eval stack: offline evals, scenario tests, red-team checks, and hard release thresholds that block bad launches.
Read playbook →A plain-language decision matrix for selecting the lightest approach that achieves your target behavior and reliability.
Read playbook →Field notes and a four-part series connecting Eric's local-inference experiments, pico-LLM work, nanochat, small-language-model research, and practical ability training.
A project-writing field note on capturing lessons while the system is still warm, drafting ahead, and turning publication into a reusable learning loop.
Read field note →A first look at the Snapdragon Elite X1E80100 Qualcomm Hexagon NPU as a local-inference path for low-cost agentic tokens.
Read field note →A local-inference field note on Gemma, ONNX, battery-powered AI, and the harness needed to turn a small local model into a useful agent.
Read field note →A short field note on Snapdragon NPU experiments, local Gemma sessions, and the slash commands that make a tiny-model harness usable.
Read field note →The preserved 64 MB RPG-state experiment reframed around constrained domains, token contracts, failure-first evals, and capability-per-megabyte.
Read part 1 →How nanochat's depth dial turns training into a comparable family of compute-optimal models instead of one-off checkpoint luck.
Read part 2 →What changes when GPT-2-level capability becomes cheap enough to repeat, and why data and evals become the real constraint.
Read part 3 →How synthetic data, token-visible task design, and identity tuning turn small models into useful narrow specialists.
Read part 4 →