Unifying Global and Near-Context Biasing in a Single Trie Pass
Single-pass trie unifies global vocabulary biasing with utterance-level context biasing for transducer ASR.

I'm a Senior Research Engineer at Agigo AG, a Swiss AI company building autonomous AI agents. My work sits at the intersection of natural language processing and automatic speech recognition, with a strong focus on speech-and-audio LLMs.
I completed my PhD at EPFL and IDIAP in 2024. My thesis tackled automatic speech recognition for air traffic control — one of the hardest real-world ASR domains. Along the way I built the ATCO2 corpus, fine-tuned self-supervised models for this domain, and published work on speaker diarization, speaker role detection, and contextual ASR.
Before Agigo, I interned at Apple (ML for ASR on tail named entities) and at AWS (speech translation and transcription). I hold master's and bachelor's degrees in Mechatronics Engineering from Universidad de Oviedo and Universidad Autónoma del Caribe.
I live in Zürich. Originally from Baranoa, Colombia.
Single-pass trie unifies global vocabulary biasing with utterance-level context biasing for transducer ASR.
A domain classifier plus pseudo-label filtering cuts ASR fine-tuning compute by ~40% at matched WER.
Streaming ASR atop a frozen self-supervised backbone, without sacrificing non-streaming accuracy.
Co-authored the 1.0 release of SpeechBrain — a PyTorch toolkit for conversational AI.
First end-to-end speech translation system that handles speaker turns and overlapped speech on a single channel.
Replaces Conformer attention with HyperMixer, matching accuracy at a fraction of the compute.
Accent classification benchmark on Common Voice using large self-supervised models — **Best Student Paper nominee**.
Systematic study of self-supervised pretraining under domain shift — 20–40% relative WER cut on Air Traffic Control.
5,000 hours of Air Traffic Control communications — the largest open ATC speech dataset.
A multilingual, semi-automatically labeled corpus built to advance ASR and natural language understanding on one of the hardest real-world speech domains. Includes audio, transcripts, speaker role annotations, and a preprocessing pipeline. Used as a benchmark by follow-up work across Europe.
Self-supervised ASR models fine-tuned for Air Traffic Control, available on HuggingFace.
A family of Wav2Vec2 models that achieve 20–40% relative WER reduction on ATC data compared to supervised baselines. Released with training recipes, evaluation scripts, and a Colab notebook for immediate inference. The benchmark paper at SLT 2022 studies self-supervised pretraining behavior under heavy domain shift.
Joint speaker-role and speaker-change detection from ATC transcripts — no audio required.
Most ATC diarization systems rely on audio signals, which are low-quality and short. BERTraffic reframes the problem as text classification: given a transcript, predict speaker turns and whether each turn is a pilot or controller. Beats audio-only baselines by 27% DER.
A Conformer variant where attention is replaced with HyperMixer — matched accuracy, less compute.
Attention is the expensive part of Conformer-based ASR models. HyperConformer swaps it for a multi-head HyperMixer, which scales linearly in sequence length rather than quadratically. Same WER as Conformer at a meaningful compute cut.
Co-authored the 1.0 release of the open-source conversational AI toolkit.
SpeechBrain is a PyTorch-based toolkit for speech and language tasks, used by dozens of research groups and startups. The 1.0 release (JMLR 2024) consolidates years of contributions into a stable API with comprehensive recipes for ASR, TTS, speaker recognition, and dialogue understanding.