Beyond attention [& transformers]

09.03.2023 22:15

Paper Blog

Hyena is a convolutional layer for LLMs that can shrink the gap with attention, while scaling *subquadratically* in seq len (eg train a lot faster @ 64k + train 100k+ tokens!) 2/

blogs: https://t.co/DIeS1kfyte, https://t.co/FE8BgZYzTX
code: https://t.co/ss9n5bxtDP pic.twitter.com/4yCzbJWlLJ
— Michael Poli (@MichaelPoli6) March 7, 2023

Abstract of the article linked above:

Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting ...

Партнёры Smi24.net

Все новости за 24 часа

Музыкальные новости

Агрегатор новостей 24СМИ