Why
Conformer has become the default ASR encoder, but its self-attention scales quadratically with sequence length — a real bottleneck for streaming and long-form recognition. HyperMixer is a linear-complexity alternative to attention that had been shown to work well on NLP tasks; we asked whether it generalizes to speech.
What we did
Replaced Conformer’s self-attention with a multi-head HyperMixer block, keeping the convolutional module and macaron feedforward layers unchanged. Trained on Librispeech and CommonVoice with SpeechBrain.
Results
- On par with Conformer on Librispeech (same WER, no regression)
- Better than Conformer on limited-data settings (CommonVoice)
- Linear time complexity — meaningful wins for longer utterances
Status
Merged into SpeechBrain recipes. Interspeech 2023.