wav2vec2-atc

Wav2Vec2 and XLS-R models fine-tuned on public ATC datasets (ATCOSIM, LDC-ATCC, UWB-ATCC), released through HuggingFace for anyone to benchmark or build on.

Headline results

20–40% relative WER reduction vs. supervised Conformer baselines on in-domain ATC test sets
Cross-accent generalization via XLS-R — a single model trained on mixed European ATC data
~6% WER on ATCOSIM with the Wav2Vec2-Large fine-tune

What’s released

Model	Training data	Link
Wav2Vec2-Large ATC	ATCOSIM	HuggingFace ↗
Wav2Vec2-Base ATC	LDC-ATCC	HuggingFace ↗
XLS-R ATC	All public ATC	HuggingFace ↗

Try it

The Colab notebook loads any of the models and transcribes an audio sample in under a minute — no GPU required for inference.

Context

This work is part of my PhD at Idiap/EPFL and was presented at IEEE SLT 2022. The accompanying paper systematically studies how self-supervised representations transfer under heavy domain shift — something surprisingly under-studied before we published.

Headline results#

What’s released#

Try it#

Context#

Headline results

What’s released

Try it

Context