21 Parts · 34 Projects · End-to-End System

From Tokensto Agents

Rebuilding the Modern LLM Stack Through 34 Engineering Projects

A systematic journey through tokenization, embeddings, attention, transformers, training, inference, long-context systems, MoE architectures, post-training, serving, agents, multimodal AI, interpretability, and complete LLM systems.

21
Curriculum Parts
34
Engineering Projects
100M+
Parameters Trained
First Principles

Interactive Systems

Live visualizations of core LLM components. These are not screenshots—they are interactive demonstrations of the concepts explored in each project.

Tokenization Pipeline

Input string as continuous characters

Transformers are powerful sequence models

Attention Heatmap

Causal self-attention pattern visualization

Transformer Architecture

Interactive decoder block diagram

Multi-Head Attention

Computes attention scores between all positions. Multiple heads attend to different aspects (syntax, semantics, position) simultaneously.

Attention(Q,K,V) = softmax(QK^T / sqrt(d_k)) V

The LLM Engineering Roadmap

Featured Engineering Case Studies

Deep dives into the most impactful projects—each representing a critical component of the modern LLM stack.

Research Interests

Areas I am actively exploring and contributing to through implementation, experimentation, and open-source work.

LLM Architecture

Designing more efficient, capable, and interpretable language model architectures. Exploring alternatives to the standard transformer stack including state-space models, linear attention, and mixture-of-experts systems.

Key Papers

Attention Is All You Need
Mamba: Linear-Time Sequence Modeling
Mixtral of Experts

Efficient Inference

Optimizing LLM inference through algorithmic improvements (FlashAttention, speculative decoding), system optimizations (continuous batching, KV cache management), and hardware-aware design.

Key Papers

FlashAttention-2
Speculative Decoding
vLLM: Easy, Fast, and Cheap LLM Serving

Model Compression

Reducing model size and inference cost through quantization, pruning, knowledge distillation, and architecture search. Focus on maintaining quality while achieving dramatic speedups.

Key Papers

GPTQ: Accurate Post-Training Quantization
AWQ: Activation-aware Weight Quantization
The Case for 4-bit Precision

Long Context Systems

Extending transformer context windows to millions of tokens through architectural innovations, memory systems, and position encoding advances. Applications in document analysis, code understanding, and multi-turn conversation.

Key Papers

Longformer
Ring Attention
YaRN: Efficient Context Window Extension

Agentic AI

Building autonomous systems that can plan, reason, use tools, and interact with environments. Focus on reliability, safety, and capability in multi-step reasoning tasks.

Key Papers

ReAct: Synergizing Reasoning and Acting
Reflexion: Self-Reflective Agents
Toolformer

Multimodal Models

Extending language models to understand and generate across vision, audio, and other modalities. Focus on efficient alignment and unified representation learning.

Key Papers

CLIP
LLaVA
Flamingo

Interpretability

Understanding the internal mechanisms of language models through mechanistic interpretability, feature visualization, and circuit tracing. Goal: making AI systems understandable and auditable.

Key Papers

A Mathematical Framework for Transformer Circuits
Sparse Autoencoders Find Highly Interpretable Features
Mechanistic Interpretability

Open Source AI

Contributing to open, reproducible, and accessible AI research. Building tools, datasets, and models that democratize access to state-of-the-art AI capabilities.

Key Papers

The Pile
OpenLM
OLMo