About me
I’m a Senior Research Engineer working at the intersection of natural language processing (NLP) and automatic speech recognition (ASR), with a strong focus on speech-and-audio LLMs. I’ve contributed to projects like HyperConformer for efficient ASR and STAC-ST for speech translation. My work spans spoken language understanding, LLMs, TTS/ASR model training, pseudo-labeling, and data generation/selection pipelines for model optimization.
Currently at Agigo AG, a Swiss AI company building next-generation autonomous AI-agents, I focus on 1) Scaling and deploying large acoustic and LLMs in production settings. 2) Generating high-quality synthetic conversational data for model training. 3) Maximizing GPU efficiency for multi-client inference workloads.
I’m passionate about bridging speech and language technologies for real-world, multimodal, and multilingual applications. I have also explored AI applications in biomedical imaging and intelligent human-machine communication.
I obtained a Ph.D. at IDIAP and École polytechnique fédérale de Lausanne (EPFL). I was also intern at AWS (Amazon) in Seattle and Apple in Boston in 2023!
My PhD initially targeted ATCO2 project. Mainly dedicated to the development of Robust Automatic Air Traffic Speech Recognition and Understanding system. We aimed at developing a unique platform allowing to collect, organize and pre-process air-traffic control (voice communication) data from air space (and yes! air traffic communications are very tough and the pilots talk super fast!).
Originally from Baranoa, Colombia. A tiny village in the North of Colombia (around 30 minutes from the Caribbean coast). I received a B.S. in Mechatronics Engineering from Universidad Autonoma del Caribe and an M.Sc. also in Mechatronics Engineering, but this time from University of Oviedo in Spain.
My latest research: click here
Resume / CV: click here
Publications
2025
Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering. ICASSP 2025. IEEE Xplore / PDF
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models. ICASSP 2025. IEEE Xplore / arXiv PDF
Fine-Tuning Pretrained Models with NVIB for Improved Generalisation. Workshop on Spurious Correlation and Shortcut Learning: Foundations and Applications, 2025. PDF / OpenReview
2024
- TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR. arXiv e-prints, 2024. ADS Abstract
- Open-source conversational AI with SpeechBrain 1.0. Journal of Machine Learning Research, 2024. JMLR / PDF
- Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper. arXiv preprint arXiv:2409.13499, 2024. arXiv / PDF
- LM-assisted keyword biasing with Aho-Corasick algorithm for Transducer-based ASR. arXiv preprint arXiv:2409.13514, 2024. arXiv / PDF
- Low-Resource Speech Recognition and Understanding for Challenging Applications. 2024. PDF
2023
- End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation. EMNLP main, 2023. Abstract / Paper / ArXiV
- HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition. Proc. Interspeech 2023. Abstract / Paper
- CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice. Proc. Interspeech 2023. Abstract / Paper. Nominated: best Student Paper Award
- Implementing contextual biasing in GPU decoder for online ASR. Proc. Interspeech 2023. Abstract / Paper
- An Automatic Speaker Clustering Pipeline for Air Traffic Communication Domain. Aerospace 2023, 10(10), 876. Paper
- Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding. Aerospace 2023, 10(10), 898. Paper
- A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers. Aerospace 2023, 10(5), 490. Paper
- Validating Automatic Speech Recognition and Understanding for Pre-Filling Radar Labels—Increasing Safety While Reducing Air Traffic Controllers’ Workload. Aerospace 2023, 10(6), 538. Paper
- Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks. ICASSP 2023. Paper / Paper
- Automatic Speech Recognition and Understanding for Radar Label Maintenance Support Increases Safety and Reduces Air Traffic Controllers’ Workload. Air Traffic Management Research and Development Seminar. Paper
- ATCO2 Corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications. Under review. Abstract / PDF / Code / Colab
- BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications. IEEE Spoken Language Technology Workshop, SLT-2022. Abstract / PDF / Code
- How Does Pre-trained Wav2Vec2.0 Perform on Domain-Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications. IEEE Spoken Language Technology Workshop, SLT-2022. Abstract / PDF / Code / Colab / HuggingFace
2022
- Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator. SESAR Innovation Days 2022. Accepted.
- Readback Error Detection by Automatic Speech Recognition and Understanding – Results of HAAWAII Project for Isavia’s Enroute Airspace. SESAR Innovation Days 2022. Accepted
- Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR. SESAR Innovation Days 2022. ArXiv preprint: Abstract / PDF
- IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach. CASE Workshop@EMNLP 2022. Abstract / PDF / Code
- IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model. CASE Workshop@EMNLP 2022. Abstract / PDF / Code
- Legal and Ethical Challenges in Recording Air Traffic Control Speech. 13th Language Resources and Evaluation Conference. Abstract / PDF
- A two-step approach to leverage contextual data: speech recognition in air-traffic communications. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022. Abstract / PDF
2021
- Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition. Electronics Journal, MDPI. Abstract / PDF
- Automatic Processing Pipeline for Collecting and Annotating Air-Traffic Voice Communication Data. Engineering Proceedings, MDPI. Abstract / PDF
- Contextual Semi-Supervised Learning: An Approach to Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems. Proc. Interspeech 2021. Abstract / PDF
- Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition. Proc. Interspeech 2021. Abstract / PDF
- Improving callsign recognition with air-surveillance data in air-traffic communication. ArXiv preprint. abstract / PDF
2020
- Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models. ArXiv preprint. Abstract / PDF / Code
- Automatic Call Sign Detection: Matching Air Surveillance Data with Air Traffic Spoken Communications. Proceedings, MDPI. Abstract / PDF
Automatic speech recognition benchmark for air-traffic communications. Proc. Interspeech 2020. Abstract / PDF
- January: Started my PhD studies at The École polytechnique fédérale de Lausanne & Idiap Research Institute in Switzerland!
2019
- January: Started my Master Thesis at the École nationale supérieure de mécanique et des microtechniques in Besançon, France! I will work on computer vision for breast cancer diagnosis!
2017
September: Arrived in Gijon, Spain to start a Master’s degree in Mechatronics Engineering! Gijon is such a nice coastal town!
January: I got the exciting news that I was accepted at the The Joint Master Degree in Mechatronic Engineering, EU4M. This program allows students to select from 2 to 3 (out of 5) universities where to go for studies. I will start at the University of Oviedo in Spain, then I will go to ISPU (Ivanovo, Russia) and finally to ENSMM in France!
2010-2015
- Started and finished a Bachelor’s degree in Mechatronics Engineering! I learned tons of things about electronics, mechanical engineering, robotics, and programming languages (C, Python, and Matlab)!