The order of operations
- Prompting first: fastest iteration loop, cheapest experimentation, easiest rollback.
- RAG second: closes knowledge gaps without changing model weights.
- Fine-tune third: good for style consistency, formatting reliability, and repetitive behavior corrections.
- Custom training last: reserved for domain behavior that cannot be reached by orchestration layers.
Rule: if you cannot prove the failure in a reproducible eval set, you are not ready to train. You are still debugging requirements.
Decision matrix
| Situation | Best first move | Training needed? |
|---|---|---|
| Wrong answer because missing source facts | RAG with source quality controls | Usually no |
| Output format inconsistency across repeated tasks | Prompt contract + schema validation | Sometimes fine-tune |
| Domain-specific reasoning patterns missing | Task decomposition + eval harness | Possibly yes |
| Edge-case safety failures under pressure | Red-team evals + policy scaffolding | Maybe, after controls |
What qualifies as “worth it”
- Clear, repeated production failure with measurable cost.
- Baseline interventions have failed (prompting, retrieval, guardrails).
- You can define pass/fail criteria before touching data.
- You have operations capacity for monitoring drift after launch.