Member-only story

|TRANSFORMER|KOLMOGOROV-ARNOLD NETWORK|MLP|AI|

Kolmogorov-Arnold Transformer (KAT): Is the MLP Headed for Retirement?

Exploring how the Kolmogorov-Arnold Transformer (KAT) challenges the MLP dominance in modern deep-learning

Salvatore Raieli

Published in

Level Up Coding

7 min readSep 23, 2024

Kolmogorov-Arnold Transformer explained — image created by the author using AI

Wisdom and penetration are the fruit of experience, not the lessons of retirement and leisure. Great necessities call out great virtues. — Abigail Adams

Transformers are the most successful architecture in deep learning, dominate text generation (and all other NLP tasks), and are now often used in other fields such as computer vision. Despite this dominance, the transformer is not without its flaws and the search for new architectures. One of the biggest flaws is the quadratic cost of the attention mechanism. Today, efforts are being made to find a lighter alternative.

A Requiem for the Transformer?

Will be the transformer the model leading us to artificial general intelligence? Or will be replaced?

towardsdatascience.com

Welcome Back 80s: Transformers Could Be Blown Away by Convolution

The Hyena model shows how convolution could be faster than self-attention

levelup.gitconnected.com

The transformer consists not only of the attention mechanism but also of multi-layer perceptrons (MLPs). MLPs have been popularized by the fact that they can theoretically approximate any function (provided there are enough neurons). In practice, even MLPs have flaws:

An MLP struggles to approximate a periodic function.
Training MLPs requires long convergence in practice, especially for functions with high frequency.

Despite these limitations, there have been few attempts to supplant MLPs in general (and in the transformer).