Projects

Open-source projects, datasets, and research artifacts I’ve built or co-authored. Most are available on GitHub or HuggingFace.

vLLM-Omni

Active contributor to vLLM-Omni — the inference engine for omni-modality models (text, speech, audio, vision).

Ongoing contributions to vLLM-Omni’s Qwen3-TTS and OmniVoice paths: streaming output, Code2Wav batched decoding, CUDA Graph + torch.compile, voice cloning, and throughput/latency optimization for high-concurrency TTS serving.

InferenceTTSOpen-sourcevLLM

github my commits

ATCO2 Corpus

5,000 hours of Air Traffic Control communications — the largest open ATC speech dataset.

A multilingual, semi-automatically labeled corpus built to advance ASR and natural language understanding on one of the hardest real-world speech domains. Includes audio, transcripts, speaker role annotations, and a preprocessing pipeline. Used as a benchmark by follow-up work across Europe.

SpeechASRDatasetNLU

github paper

wav2vec2-atc

Self-supervised ASR models fine-tuned for Air Traffic Control, available on HuggingFace.

A family of Wav2Vec2 models that achieve 20–40% relative WER reduction on ATC data compared to supervised baselines. Released with training recipes, evaluation scripts, and a Colab notebook for immediate inference. The benchmark paper at SLT 2022 studies self-supervised pretraining behavior under heavy domain shift.

ASRSelf-supervisedWav2Vec2HuggingFace

huggingface github paper colab

BERTraffic

Joint speaker-role and speaker-change detection from ATC transcripts — no audio required.

Most ATC diarization systems rely on audio signals, which are low-quality and short. BERTraffic reframes the problem as text classification: given a transcript, predict speaker turns and whether each turn is a pilot or controller. Beats audio-only baselines by 27% DER.

NLPBERTDiarizationATC

github paper

HyperConformer

A Conformer variant where attention is replaced with HyperMixer — matched accuracy, less compute.

Attention is the expensive part of Conformer-based ASR models. HyperConformer swaps it for a multi-head HyperMixer, which scales linearly in sequence length rather than quadratically. Same WER as Conformer at a meaningful compute cut.

ASRArchitectureEfficient ML

paper github

SpeechBrain 1.0

Co-authored the 1.0 release of the open-source conversational AI toolkit.

SpeechBrain is a PyTorch-based toolkit for speech and language tasks, used by dozens of research groups and startups. The 1.0 release (JMLR 2024) consolidates years of contributions into a stable API with comprehensive recipes for ASR, TTS, speaker recognition, and dialogue understanding.

Open SourcePyTorchSpeechNLP

github jmlr