# BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications

Published in ArXiv, 2022

Recommended citation: Juan Zuluaga-Gomez, Seyyed Saeed Sarfjoo, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlicek, Karel Ondrej, Oliver Ohneiser, Hartmut Helmke, 2022. BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications. arXiv preprint arXiv:2110.05781. https://arxiv.org/abs/2110.05781

Abstract: Automatic speech recognition (ASR) allows transcribing the communications between air traffic controllers (ATCOs) and aircraft pilots. The transcriptions are used later to extract ATC named entities e.g., aircraft callsigns, command types, or values. One common challenge is Speech Activity Detection (SAD) and diarization system. If one of them fails then two or more single speaker segments remain in the same recording, jeopardizing the overall system’s performance. We propose a system that combines the segmentation of a SAD module with a BERT model that performs speaker change detection (SCD) and speaker role detection (SRD) by chunking ASR transcripts i.e., diarization with a defined number of speakers together with SRD. The proposed model is evaluated on real-life ATC test sets. It reaches up to 0.90/0.95 F1-score on ATCO/pilot SRD, which means a 27% relative improvement on diarization error rate (DER) compared to standard acoustic-based diarization. Results are measured on ASR transcripts of challenging ATC test sets with ∼13\% word error rate, and the robustness of the system is even validated on noisy ASR transcripts.

@article{zuluaga2021bertraffic,