Audiobook Agent

A multi-agent pipeline that evaluates audiobook narrators across voice quality, pacing, dialogue attribution, and emotional range. Distilled into production CRF models from 83 books in the Gutenberg corpus.

Python Local LLMs sklearn-crfsuite Ollama

Pipeline

  • Compass (0.737 F1) — Navigational scoring: how well does the narrator guide the listener through the text?
  • Beacon (0.806 F1) — Emotional signaling: does the narrator's performance illuminate the emotional landscape?
  • Cluster (0.936 F1) — Voice consistency: are character voices distinct and maintained across the performance?
  • Echo (0.799 F1) — Dialogue attribution: can you tell who's speaking from the performance alone?

Scalpel Distillation

The pipeline uses a technique I call "scalpel distillation" — using a large local LLM (qwen3-next-80b) to generate high-quality training labels, then distilling those labels into lightweight production models (CRF, small classifiers) that run without a GPU. Two rounds of distillation across 83 public domain books.

Scale

  • 83-book Gutenberg corpus for training data generation
  • 117,988 labeled entries for dialogue attribution
  • 4 production models shipped, 2nd generation CRF in training
  • Runs on Ollama with qwen3-next-80b for distillation