Veda Samhith

Breaking down text into tokens and understanding tokenization in language models.

Understanding token representations in large language models.

Alternative mechanisms to traditional attention.

Understanding whether Transformers can recover token order when positional signals are removed.

A from-scratch walkthrough for training GPT-2 (124M) on FineWeb-Edu with practical insights.