Comparison table
| Approach | Best for | Cost and complexity | Typical trap |
|---|---|---|---|
| Prompting | Fast behavior iteration, structured response contracts | Low | Prompt bloat and brittle orchestration logic |
| RAG | Knowledge freshness and factual grounding | Medium | Weak retrieval quality quietly wrecks answer quality |
| Fine-tuning | Consistent formatting, style, and repetitive patterns | Medium to high | Training low-quality data and amplifying the noise |
| Custom training | Domain-specific capability shifts that lighter methods cannot achieve | High | Underestimating evaluation and post-launch operations |
Practical selection flow
- Define the failure in measurable terms.
- Try prompt and orchestration fixes first.
- If failure is factual, improve retrieval quality.
- If failure is behavioral consistency, test a fine-tune.
- If capability is still missing, scope custom training with strict eval gates.
Default bias: stay out of training land until evidence says you cannot hit quality targets with lighter interventions.