Добавить новость
smi24.net
News in English
Март
2023

Beyond attention [& transformers]

0

Hyena is a convolutional layer for LLMs that can shrink the gap with attention, while scaling *subquadratically* in seq len (eg train a lot faster @ 64k + train 100k+ tokens!) 2/

blogs: https://t.co/DIeS1kfyte, https://t.co/FE8BgZYzTX
code: https://t.co/ss9n5bxtDP pic.twitter.com/4yCzbJWlLJ

— Michael Poli (@MichaelPoli6) March 7, 2023

Abstract of the article linked above:

Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting ...














Музыкальные новости






















СМИ24.net — правдивые новости, непрерывно 24/7 на русском языке с ежеминутным обновлением *