SpeechBrain is an open-source toolkit built to make conversational AI research easier. 1.0 is the stable milestone — a coherent API across dozens of tasks, production-ready recipes, and comprehensive documentation.

My contributions

  • ATC ASR recipes (Wav2Vec2, XLS-R fine-tuning on air traffic control data)
  • HyperConformer encoder (see separate project)
  • Accent classification benchmarks (CommonAccent)
  • Bug fixes, documentation, and review across the speech modules

Why it matters

Before SpeechBrain, building a speech system with PyTorch meant gluing together bits from ESPnet, Kaldi, Fairseq, and custom code. SpeechBrain unifies that into a single toolkit that covers ASR, TTS, speaker recognition, speech enhancement, and dialogue — all with reproducible recipes.

Used by research groups at Idiap, EPFL, Mila, Meta, and dozens of startups.

Citation

If you use SpeechBrain in your work, please cite the JMLR 2024 paper.