
So You Want to Learn LLMs? Here's the Roadmap
A Real-World, No-Bloat Guide to Building, Training, and Shipping LLMs

Table of Contents
So You Want to Learn LLMs? Here’s the Roadmap
This blogpost was published on my X/Twitter account on June 23rd, 2025 .
Welcome to the “how do I actually learn how LLMs work” guide. If you’ve got a CS background and you’re tired of the endless machine learning prerequisites, this is for you. I built this with past me in mind, I wish I had it all drawn out like this. This roadmap should have you comfortable with building, training, and exploring and researching.
The links at the end let you go as deep as you want. If you’re stuck, rewatch or reread. If you already know something, skip ahead. The phases are your guardrails, not handcuffs. By the end, you’ll have actually built the skills. Every resource, every project, every link is there for a reason. Use it, adapt it, and make it your own. I hope you don’t just use this as a collection of bookmarks.
Remember, you can always use DeepResearch when you’re stuck, need something broken down to first principles, want material tailored to your level, need to identify gaps, or just want to explore deeper.
This is blogpost #4 in my 101 Days of Blogging . If it sparks anything; ideas, questions, or critique, my DMs are open. Hope it gives you something useful to walk away with.
TL;DR – What Are We Doing?
The short version:
- 5 phases.
- No detours into generic ML unless it’s absolutely necessary.
- Focused on the fundamentals that would get you to comfortably be able to build, fine-tune, and ship LLMs.
You will:
- Build an autograd engine by hand
- Build a mini-GPT from scratch
- Fine-tune a model using PEFT like LoRA/QLoRA
How This Works
The approach here is simple.
Learn by Layering: Build Intuition ➡️ Strengthen Theory ➡️ More Hands-on ➡️ Paper Deep Dives ➡️ Build Something Real.
You’re going to use four kinds of resources:
- Visual Intuition (3Blue1Brown, Karpathy) – get the why and how well.
- Formal Theory (Stanford/MIT lectures, open courseware) – unfortunately, sometimes you do need the math.
- Papers (“Attention Is All You Need”, BERT, LoRA, etc) – get used to reading papers.
- Coding Projects.
Concepts first, then the breakdown, then the tools to go do it.
The Roadmap Overview section is there to give you the conceptual big picture, it tells you what you’ll need to understand, at a high level. After that, the How To Actually Learn section breaks those concepts down into actual learning phases: what to study, how to build intuition, which projects to complete, and in what order. Finally, the Where To Learn Them section links out to the exact videos, lectures, papers, and codebases that’ll help you execute this roadmap. So: concepts first, then the breakdown, then the tools to go do it.
Roadmap Overview & Topics
Foundations Refresher
- Linear Algebra and Probability that actually matter for DL
- Python/PyTorch for the dirty work
- Project: Build Micrograd. Afterwards you’ll build an MLP and train it
Transformers
- Tokenization, embeddings, self-attention, all the block diagram stuff
- Pre-training paradigms: BERT/MLM vs GPT/CLM, and the why, how, and when
- Project: Build a working mini-GPT from scratch
Scaling and Training
- How “scaling laws” actually predict performance (math)
- Distributed training: Data, Tensor, Pipeline parallelism
- Project: Spin up multi-GPU training with HuggingFace Accelerate. Make it run, see why things break, fix it
Alignment + Fine-Tuning
- RLHF/Constitutional AI
- LoRA/QLoRA: parameter-efficient fine-tuning
- Project: Implement LoRA from scratch. Plug it into a HuggingFace model and actually fine-tune on a use case
Inference Optimizations
- Inference optimization: FlashAttention, quantization, getting sub-second responses
How To Actually Learn (The Real Plan)
Phases 0: Foundations Refresher
You do not need a PhD in math to understand LLMs. But if you can’t follow a simple PyTorch training loop, or you have zero intuition for matrix multiplication, things will seem very confusing (they really aren’t once you get your head around them).
- Linear Algebra/Probability: 3Blue1Brown’s videos. I’d say, it’s helpful to be able to “see” a matrix transform, rewatch if needed.
- Formal theory: MIT 18.06 Linear Algebra (Strang, of course).
- Coding: Karpathy’s Micrograd series. The only “autograd engine from scratch” tutorial that isn’t boring from my experience.
- PyTorch: Do the official basics, but spend most of your time translating math into code.
- Mini-project: Build Micrograd. Build and train a basic MLP on MNIST. No shortcuts.
Phases 1: Transformers
I have this meme about how the words are the scariest part in LLMs. Transformers is the very first word you need your brain to think “easy” when you hear. They’re just stacks of matrix multiplications and attention blocks, with some really clever engineering.
- Intuition: 3Blue1Brown on Transformers/Attention. Jay Alammar’s Illustrated Transformer. Watch, take notes, and re-watch if you need to.
- Formal: Stanford CS224N Natural Language Processing with Deep Learning (the lectures, not just the slides).
- Paper: “Attention Is All You Need”. Don’t read it yet if you haven’t built the mental model above. Otherwise, you’ll drown. READ ONLY ONCE COMFORTABLE WITH ALL THE ABOVE.
- Hands-on: Karpathy’s “Let’s Build GPT” (eureka moment, you’ll realize how simple all of it is).
- Project: Reimplement a decoder-only GPT from scratch. Bonus points: swap in your own tokenizer, try BPE/SentencePiece.
Phases 2: Scaling Laws & Training for Scale
LLMs got good through figuring out what to scale, how to scale it, proving it could scale, and showing that it actually works.
- Papers: “Scaling Laws for Neural Language Models” (Kaplan et al), then “Chinchilla” (Hoffmann et al). Learn the difference.
- Distributed Training: Learn what Data, Tensor, and Pipeline Parallelism actually do. Then set up multi-GPU training with HuggingFace Accelerate. Yes, you’ll hate CUDA at some point. Such is life.
- Project: Pick a model, run a small distributed job. Play with batch sizes, gradient accumulation. Notice how easy it is to run out of VRAM? Good. Welcome to my world .
Phases 3: Alignment & PEFT
Fine-tuning is not just a cheap trick. RLHF and PEFT are the reason you can actually use LLMs for real-world use cases.
- RLHF: OpenAI’s “Aligning language models to follow instructions” blog post, then Ouyang et al’s paper. Grasp the SFT ➡️ Reward Model ➡️ RL pipeline. Don’t get lost in PPO math too much.
- CAI/RLAIF: Read Anthropic’s “Constitutional AI”.
- LoRA/QLoRA: Read both papers, then actually implement LoRA in PyTorch. If you can’t replace a Linear layer with a LoRA-adapted version, try again.
- Project: Fine-tune an open model (e.g. gpt2, distilbert) with your own LoRA adapters. Do it for a real dataset, not toy text.
Phases 4: Production
You made it to the only part that most people ever see: the actual app.
- Inference Optimization: Read the FlashAttention paper. Understand why it works, then try it with a quantized model.
Where To Learn Them
Below is what to read/watch for the this learning plan.
Math/CS Pre-Reqs
- 3Blue1Brown: Essence of Linear Algebra (YouTube)
- MIT 18.06: Linear Algebra (Strang, OCW)
- Deep Learning Book (Goodfellow)
PyTorch Fundamentals
Transformers & LLMs
- Attention Is All You Need (Vaswani et al)
- 3Blue1Brown: What is a GPT? (YouTube)
- Jay Alammar: The Illustrated Transformer
- Karpathy: Let’s Build GPT
- Stanford CS224N (YouTube Lectures)
Scaling & Distributed Training
Alignment & PEFT
- OpenAI: Aligning LMs to Follow Instructions
- Anthropic: Constitutional AI
- LoRA: Low-Rank Adaptation
- QLoRA
- LightningAI: LoRA from Scratch
Inference
The Endgame
If you actually do the roadmap above, build the projects, and push past the YouTube tutorial hell, you’ll understand LLMs extremely well. You’ll see through the hype, spot nonsense at a glance, and build your own models and tooling.
If you make it through this plan and actually ship something, DM me, I wanna see it.
Happy hacking.