Audiobook Agent
A multi-agent pipeline that evaluates audiobook narrators across voice quality, pacing, dialogue attribution, and emotional range. Distilled into production CRF models from 83 books in the Gutenberg corpus.
Pipeline
- Compass (0.737 F1) — Navigational scoring: how well does the narrator guide the listener through the text?
- Beacon (0.806 F1) — Emotional signaling: does the narrator's performance illuminate the emotional landscape?
- Cluster (0.936 F1) — Voice consistency: are character voices distinct and maintained across the performance?
- Echo (0.799 F1) — Dialogue attribution: can you tell who's speaking from the performance alone?
Scalpel Distillation
The pipeline uses a technique I call "scalpel distillation" — using a large local LLM (qwen3-next-80b) to generate high-quality training labels, then distilling those labels into lightweight production models (CRF, small classifiers) that run without a GPU. Two rounds of distillation across 83 public domain books.
Scale
- 83-book Gutenberg corpus for training data generation
- 117,988 labeled entries for dialogue attribution
- 4 production models shipped, 2nd generation CRF in training
- Runs on Ollama with qwen3-next-80b for distillation