SUB-QUADRATIC TOKEN MIXING VIA SPECTRAL FILTERING AND POLYNOMIAL FUNCTIONAL CALCULUS
Keywords:
Polynomial Functional Calculus, Sub-Quadratic, State Space ModelsAbstract
The self-attention mechanism, the building block of the Transformer architecture, has a computational and memory complexity that grows quadratically with sequence lengths. This quadratic complexity makes it impractical for application on truly long-context problems. To overcome this challenge, we present the Spectral Filter Polynomial Calculus (SFPC) framework, a family of sub-quadratic mixing operators on tokens. SFPC takes a learned polynomial of an operator that is a function of a fixed base operator (e.g., discrete Laplacian or circulant operator with a rightward shift) and a polynomial of a given degree whose coefficients are learned. SFPC captures the token mixing problem by considering an application of an operator polynomial. We also formalized the token sequence space into a Hilbert space. We utilized the continuous functional calculus on self-adjoint operators. The key theoretical contribution is establishing a precise operator approximation error in terms of the familiar polynomial approximation error in the classical case. This straightforwardly connects expressiveness in deep models













