HyperConformer

Why

Conformer has become the default ASR encoder, but its self-attention scales quadratically with sequence length — a real bottleneck for streaming and long-form recognition. HyperMixer is a linear-complexity alternative to attention that had been shown to work well on NLP tasks; we asked whether it generalizes to speech.

What we did

Replaced Conformer’s self-attention with a multi-head HyperMixer block, keeping the convolutional module and macaron feedforward layers unchanged. Trained on Librispeech and CommonVoice with SpeechBrain.

Results

On par with Conformer on Librispeech (same WER, no regression)
Better than Conformer on limited-data settings (CommonVoice)
Linear time complexity — meaningful wins for longer utterances

Status

Merged into SpeechBrain recipes. Interspeech 2023.

Why#

What we did#

Results#

Status#

Why

What we did

Results

Status