Model Training Field Note

Experiments in Training Pico LLMs

A 64 MB model running RPG combat logic sounds ridiculous until it works — and then the failure modes teach you more than the wins.

Imported from Eric’s Substack note and adapted into the Custom Model Training section. The substance is preserved, with added structure and Kira commentary for implementation-minded readers.
Tiny model experiment screenshot showing compact model output for RPG state transitions
Initial experiment snapshot: compact model output from the pico-LLM training loop.

Why train tiny models at all?

The core question in this experiment is sharp: how much useful behavior can you squeeze into a model so small it runs comfortably on CPU? In this case, the answer starts around 64 MB and points toward a different way to think about capability density.

Instead of asking one giant model to do everything, this line of work explores many tiny specialists. Each model can own a narrow domain and be loaded only when needed. That shifts the conversation from brute force scale to composable intelligence.

From prompt to game engine behavior

The model is trained on a compact, template-driven format representing turn-based RPG state. Inputs encode turn counters, status effects, cooldown slots, and actions like poison_strike, ignite, heal, and guard. Outputs resolve toward a canonical next-state block.

Prompt template and turn sequence example used for pico LLM RPG training
Template-driven prompt shape used for turn-by-turn combat state prediction.

That makes the model behave like a fuzzy state-transition engine. Not deterministic code, but learned transitions with enough structure to produce coherent combat outcomes under normal conditions.

Legend of RPG status channels and token fields for state transition training
Field legend and token semantics: the tiny format details that decide whether training behaves or drifts.

The useful failures

The best part of the note is not “look, it worked.” It is where behavior degrades: label collapse, token drift, and boundary confusion when malformed or overloaded labels are introduced. For example, substituting symbolic values with semantically messy variants can destabilize output shape quickly.

Failure example where regen label handling causes unstable output
Failure case: regen-focused perturbation that destabilizes sequence tracking.

These failures are not just bugs. They map the model’s internal compression limits and expose where representation quality breaks down. In practical training terms, they tell you what to fix next in data format, token conventions, and eval coverage.

Label collapse example in tiny model output
Label collapse in action: output structure starts to unravel once token semantics blur.
Poison action perturbation causing state-transition prediction breakdown
Poison-state perturbation: partial retention of effects with broken entity continuity.

“Why not just write normal code?”

That objection is fair and still misses the point. Deterministic engines are often the right answer. The experiment here is different: can you grow a tiny model that internalizes narrow behavioral rules and stays useful under constrained compute?

For edge contexts (old GPUs, Raspberry Pi style deployments, CPU-first inference), this matters. A tiny trained model can become a flexible component where strict code paths are brittle or expensive to maintain across evolving behaviors.

Rationale section showing why tiny trained models can complement deterministic engines
Not anti-code — strategic composition: deterministic systems plus narrow learned specialists.

The practical takeaway is not “replace code with models.” It is “learn where tiny learned systems beat hard-coded complexity.”

What this means for custom model training

Closing visual from the pico LLM experiment note
Closing snapshot from the experiment thread: tiny models, grounded constraints, useful lessons.
Read on Substack → Back to Custom Model Training