Blogs

So You Want to Learn LLMs? Here's the Roadmap

A Real-World, No-Bloat Guide to Building, Training, and Shipping LLMs

Posted on Jun 23, 2025
7 Minutes

#roadmap #deep-learning #transformers #llms

Table of Contents

So You Want to Learn LLMs? Here’s the Roadmap

This blogpost was published on my X/Twitter account on June 23rd, 2025 .

Welcome to the “how do I actually learn how LLMs work” guide. If you’ve got a CS background and you’re tired of the endless machine learning prerequisites, this is for you. I built this with past me in mind, I wish I had it all drawn out like this. This roadmap should have you comfortable with building, training, and exploring and researching.

The links at the end let you go as deep as you want. If you’re stuck, rewatch or reread. If you already know something, skip ahead. The phases are your guardrails, not handcuffs. By the end, you’ll have actually built the skills. Every resource, every project, every link is there for a reason. Use it, adapt it, and make it your own. I hope you don’t just use this as a collection of bookmarks.

Remember, you can always use DeepResearch when you’re stuck, need something broken down to first principles, want material tailored to your level, need to identify gaps, or just want to explore deeper.

This is blogpost #4 in my 101 Days of Blogging . If it sparks anything; ideas, questions, or critique, my DMs are open. Hope it gives you something useful to walk away with.

TL;DR – What Are We Doing?

The short version:

5 phases.
No detours into generic ML unless it’s absolutely necessary.
Focused on the fundamentals that would get you to comfortably be able to build, fine-tune, and ship LLMs.

You will:

Build an autograd engine by hand
Build a mini-GPT from scratch
Fine-tune a model using PEFT like LoRA/QLoRA

How This Works

The approach here is simple.

Learn by Layering: Build Intuition ➡️ Strengthen Theory ➡️ More Hands-on ➡️ Paper Deep Dives ➡️ Build Something Real.

You’re going to use four kinds of resources:

Visual Intuition (3Blue1Brown, Karpathy) – get the why and how well.
Formal Theory (Stanford/MIT lectures, open courseware) – unfortunately, sometimes you do need the math.
Papers (“Attention Is All You Need”, BERT, LoRA, etc) – get used to reading papers.
Coding Projects.

Concepts first, then the breakdown, then the tools to go do it.

The Roadmap Overview section is there to give you the conceptual big picture, it tells you what you’ll need to understand, at a high level. After that, the How To Actually Learn section breaks those concepts down into actual learning phases: what to study, how to build intuition, which projects to complete, and in what order. Finally, the Where To Learn Them section links out to the exact videos, lectures, papers, and codebases that’ll help you execute this roadmap. So: concepts first, then the breakdown, then the tools to go do it.

Roadmap Overview & Topics

Foundations Refresher

Linear Algebra and Probability that actually matter for DL
Python/PyTorch for the dirty work
Project: Build Micrograd. Afterwards you’ll build an MLP and train it

Transformers

Tokenization, embeddings, self-attention, all the block diagram stuff
Pre-training paradigms: BERT/MLM vs GPT/CLM, and the why, how, and when
Project: Build a working mini-GPT from scratch

Scaling and Training

How “scaling laws” actually predict performance (math)
Distributed training: Data, Tensor, Pipeline parallelism
Project: Spin up multi-GPU training with HuggingFace Accelerate. Make it run, see why things break, fix it

Alignment + Fine-Tuning

RLHF/Constitutional AI
LoRA/QLoRA: parameter-efficient fine-tuning
Project: Implement LoRA from scratch. Plug it into a HuggingFace model and actually fine-tune on a use case

Inference Optimizations

Inference optimization: FlashAttention, quantization, getting sub-second responses

How To Actually Learn (The Real Plan)

Phases 0: Foundations Refresher

You do not need a PhD in math to understand LLMs. But if you can’t follow a simple PyTorch training loop, or you have zero intuition for matrix multiplication, things will seem very confusing (they really aren’t once you get your head around them).

Linear Algebra/Probability: 3Blue1Brown’s videos. I’d say, it’s helpful to be able to “see” a matrix transform, rewatch if needed.
Formal theory: MIT 18.06 Linear Algebra (Strang, of course).
Coding: Karpathy’s Micrograd series. The only “autograd engine from scratch” tutorial that isn’t boring from my experience.
PyTorch: Do the official basics, but spend most of your time translating math into code.
Mini-project: Build Micrograd. Build and train a basic MLP on MNIST. No shortcuts.

Phases 1: Transformers

The Scariest Thing In LLMs and AI Isn’t The Models Or The Math It’s The Names — The Scariest Thing In LLMs and AI Isn’t The Models Or The Math: It’s The Names

I have this meme about how the words are the scariest part in LLMs. Transformers is the very first word you need your brain to think “easy” when you hear. They’re just stacks of matrix multiplications and attention blocks, with some really clever engineering.

Intuition: 3Blue1Brown on Transformers/Attention. Jay Alammar’s Illustrated Transformer. Watch, take notes, and re-watch if you need to.
Formal: Stanford CS224N Natural Language Processing with Deep Learning (the lectures, not just the slides).
Paper: “Attention Is All You Need”. Don’t read it yet if you haven’t built the mental model above. Otherwise, you’ll drown. READ ONLY ONCE COMFORTABLE WITH ALL THE ABOVE.
Hands-on: Karpathy’s “Let’s Build GPT” (eureka moment, you’ll realize how simple all of it is).
Project: Reimplement a decoder-only GPT from scratch. Bonus points: swap in your own tokenizer, try BPE/SentencePiece.

Phases 2: Scaling Laws & Training for Scale

LLMs got good through figuring out what to scale, how to scale it, proving it could scale, and showing that it actually works.

Papers: “Scaling Laws for Neural Language Models” (Kaplan et al), then “Chinchilla” (Hoffmann et al). Learn the difference.
Distributed Training: Learn what Data, Tensor, and Pipeline Parallelism actually do. Then set up multi-GPU training with HuggingFace Accelerate. Yes, you’ll hate CUDA at some point. Such is life.
Project: Pick a model, run a small distributed job. Play with batch sizes, gradient accumulation. Notice how easy it is to run out of VRAM? Good. Welcome to my world .

Phases 3: Alignment & PEFT

Fine-tuning is not just a cheap trick. RLHF and PEFT are the reason you can actually use LLMs for real-world use cases.

RLHF: OpenAI’s “Aligning language models to follow instructions” blog post, then Ouyang et al’s paper. Grasp the SFT ➡️ Reward Model ➡️ RL pipeline. Don’t get lost in PPO math too much.
CAI/RLAIF: Read Anthropic’s “Constitutional AI”.
LoRA/QLoRA: Read both papers, then actually implement LoRA in PyTorch. If you can’t replace a Linear layer with a LoRA-adapted version, try again.
Project: Fine-tune an open model (e.g. gpt2, distilbert) with your own LoRA adapters. Do it for a real dataset, not toy text.

Phases 4: Production

You made it to the only part that most people ever see: the actual app.

Inference Optimization: Read the FlashAttention paper. Understand why it works, then try it with a quantized model.

Where To Learn Them

Below is what to read/watch for the this learning plan.

Math/CS Pre-Reqs

PyTorch Fundamentals

Transformers & LLMs

Scaling & Distributed Training

Alignment & PEFT

Inference

FlashAttention Paper

The Endgame

If you actually do the roadmap above, build the projects, and push past the YouTube tutorial hell, you’ll understand LLMs extremely well. You’ll see through the hype, spot nonsense at a glance, and build your own models and tooling.

If you make it through this plan and actually ship something, DM me, I wanna see it.

Happy hacking.