BERTraffic

The idea

Traditional speaker diarization (“who spoke when”) relies on acoustic features — but for Air Traffic Control, the audio signal is poor: VHF noise, short turns (~2s average), and a single mono channel for both parties.

BERTraffic sidesteps that: take the ASR transcript, finetune BERT on a two-head classification task, and output both (a) turn boundaries and (b) the role of each turn (pilot/controller). Surprisingly, text-only beats the strongest audio-only baselines available.

Results

27% relative DER reduction vs. audio baseline on ATCO2 test set
Works with noisy ASR transcripts, not just gold text — the model is robust to ~15% WER input
Joint training of the two heads is better than pipelining them

Released

Full training / evaluation code on GitHub
Fine-tuned BERT models on HuggingFace (pilot/controller classifier, turn-change classifier)

The idea#

Results#

Released#

The idea

Results

Released