About me

I’m a Senior Research Engineer working at the intersection of natural language processing (NLP) and automatic speech recognition (ASR), with a strong focus on speech-and-audio LLMs. I’ve contributed to projects like HyperConformer for efficient ASR and STAC-ST for speech translation. My work spans spoken language understanding, LLMs, TTS/ASR model training, pseudo-labeling, and data generation/selection pipelines for model optimization.

Currently at Agigo AG, a Swiss AI company building next-generation autonomous AI-agents, I focus on 1) Scaling and deploying large acoustic and LLMs in production settings. 2) Generating high-quality synthetic conversational data for model training. 3) Maximizing GPU efficiency for multi-client inference workloads.

I’m passionate about bridging speech and language technologies for real-world, multimodal, and multilingual applications. I have also explored AI applications in biomedical imaging and intelligent human-machine communication.

I obtained a Ph.D. at IDIAP and École polytechnique fédérale de Lausanne (EPFL). I was also intern at AWS (Amazon) in Seattle and Apple in Boston in 2023!

My PhD initially targeted ATCO2 project. Mainly dedicated to the development of Robust Automatic Air Traffic Speech Recognition and Understanding system. We aimed at developing a unique platform allowing to collect, organize and pre-process air-traffic control (voice communication) data from air space (and yes! air traffic communications are very tough and the pilots talk super fast!).

Originally from Baranoa, Colombia. A tiny village in the North of Colombia (around 30 minutes from the Caribbean coast). I received a B.S. in Mechatronics Engineering from Universidad Autonoma del Caribe and an M.Sc. also in Mechatronics Engineering, but this time from University of Oviedo in Spain.

My latest research: click here

Resume / CV: click here

Publications

2025

Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering. ICASSP 2025. IEEE Xplore / PDF
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models. ICASSP 2025. IEEE Xplore / arXiv PDF
Fine-Tuning Pretrained Models with NVIB for Improved Generalisation. Workshop on Spurious Correlation and Shortcut Learning: Foundations and Applications, 2025. PDF / OpenReview

2024

TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR. arXiv e-prints, 2024. ADS Abstract
Open-source conversational AI with SpeechBrain 1.0. Journal of Machine Learning Research, 2024. JMLR / PDF
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper. arXiv preprint arXiv:2409.13499, 2024. arXiv / PDF
LM-assisted keyword biasing with Aho-Corasick algorithm for Transducer-based ASR. arXiv preprint arXiv:2409.13514, 2024. arXiv / PDF
Low-Resource Speech Recognition and Understanding for Challenging Applications. 2024. PDF

2023

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation. EMNLP main, 2023. Abstract / Paper / ArXiV
HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition. Proc. Interspeech 2023. Abstract / Paper
CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice. Proc. Interspeech 2023. Abstract / Paper. Nominated: best Student Paper Award
Implementing contextual biasing in GPU decoder for online ASR. Proc. Interspeech 2023. Abstract / Paper
An Automatic Speaker Clustering Pipeline for Air Traffic Communication Domain. Aerospace 2023, 10(10), 876. Paper
Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding. Aerospace 2023, 10(10), 898. Paper
A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers. Aerospace 2023, 10(5), 490. Paper
Validating Automatic Speech Recognition and Understanding for Pre-Filling Radar Labels—Increasing Safety While Reducing Air Traffic Controllers’ Workload. Aerospace 2023, 10(6), 538. Paper
Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks. ICASSP 2023. Paper / Paper
Automatic Speech Recognition and Understanding for Radar Label Maintenance Support Increases Safety and Reduces Air Traffic Controllers’ Workload. Air Traffic Management Research and Development Seminar. Paper
ATCO2 Corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications. Under review. Abstract / PDF / Code / Colab
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications. IEEE Spoken Language Technology Workshop, SLT-2022. Abstract / PDF / Code
How Does Pre-trained Wav2Vec2.0 Perform on Domain-Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications. IEEE Spoken Language Technology Workshop, SLT-2022. Abstract / PDF / Code / Colab / HuggingFace

2022

Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator. SESAR Innovation Days 2022. Accepted.
Readback Error Detection by Automatic Speech Recognition and Understanding – Results of HAAWAII Project for Isavia’s Enroute Airspace. SESAR Innovation Days 2022. Accepted
Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR. SESAR Innovation Days 2022. ArXiv preprint: Abstract / PDF
IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach. CASE Workshop@EMNLP 2022. Abstract / PDF / Code
IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model. CASE Workshop@EMNLP 2022. Abstract / PDF / Code
Legal and Ethical Challenges in Recording Air Traffic Control Speech. 13th Language Resources and Evaluation Conference. Abstract / PDF
A two-step approach to leverage contextual data: speech recognition in air-traffic communications. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022. Abstract / PDF

2021

Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition. Electronics Journal, MDPI. Abstract / PDF
Automatic Processing Pipeline for Collecting and Annotating Air-Traffic Voice Communication Data. Engineering Proceedings, MDPI. Abstract / PDF
Contextual Semi-Supervised Learning: An Approach to Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems. Proc. Interspeech 2021. Abstract / PDF
Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition. Proc. Interspeech 2021. Abstract / PDF
Improving callsign recognition with air-surveillance data in air-traffic communication. ArXiv preprint. abstract / PDF

2020

Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models. ArXiv preprint. Abstract / PDF / Code
Automatic Call Sign Detection: Matching Air Surveillance Data with Air Traffic Spoken Communications. Proceedings, MDPI. Abstract / PDF
Automatic speech recognition benchmark for air-traffic communications. Proc. Interspeech 2020. Abstract / PDF
January: Started my PhD studies at The École polytechnique fédérale de Lausanne & Idiap Research Institute in Switzerland!

2019

January: Started my Master Thesis at the École nationale supérieure de mécanique et des microtechniques in Besançon, France! I will work on computer vision for breast cancer diagnosis!

2017

September: Arrived in Gijon, Spain to start a Master’s degree in Mechatronics Engineering! Gijon is such a nice coastal town!
January: I got the exciting news that I was accepted at the The Joint Master Degree in Mechatronic Engineering, EU4M. This program allows students to select from 2 to 3 (out of 5) universities where to go for studies. I will start at the University of Oviedo in Spain, then I will go to ISPU (Ivanovo, Russia) and finally to ENSMM in France!

2010-2015

Started and finished a Bachelor’s degree in Mechatronics Engineering! I learned tons of things about electronics, mechanical engineering, robotics, and programming languages (C, Python, and Matlab)!

Juan Pablo Zuluaga