Mental models, paper breakdowns, and engineering notes — written for people who want to go deep.
A deep dive into how Transformers encode position — from the original sinusoidal scheme in “Attention Is All You Need” to modern alternatives like RoPE and ALiBi.