The order of operations

Rule: if you cannot prove the failure in a reproducible eval set, you are not ready to train. You are still debugging requirements.

Decision matrix

Situation Best first move Training needed?
Wrong answer because missing source facts RAG with source quality controls Usually no
Output format inconsistency across repeated tasks Prompt contract + schema validation Sometimes fine-tune
Domain-specific reasoning patterns missing Task decomposition + eval harness Possibly yes
Edge-case safety failures under pressure Red-team evals + policy scaffolding Maybe, after controls

What qualifies as “worth it”