Attention, Transformers, and LLMs: a hands-on introduction in Pytorch

In this 3-hour workshop, we will build a large language model from the ground up using Pytorch. This workshop focuses on the fundamentals of attention and the transformer architecture. We will assume a foundation in basic calculus, linear algebra, optimization, and python programming. Experience using Pytorch will be helpful, but is not strictly required.

Tentative agenda:

Overview of the language modeling task
Attention: Query, Key, and Value
Self Attention
Positional Encodings
The Transformer Architecture
Autoregressive Models
Fitting our LLM
Huggingface: a high-level API for LLMs
Conclusions

Live Workshop

No live sessions are currently planned for this workshop.

Resources

For a self-guided version, you can read the Workshop notebooks on our workshop site.

Live Workshop​

Resources​

Live Workshop

Resources