My Lessons with Andrej
A PM's experiments with machine learning - from the ground up
I'm working through Andrej Karpathy's Neural Networks:
Zero to Hero series.
This is my learning journal with all my mistakes, frustrations and learnings.
Code and notes at my GitHub
repo.
Progress so far
- ✅ micrograd
- ✅ makemore
- ⬜ makemore Part 2: MLP
- ⬜ makemore Part 3: BatchNorm
- ⬜ makemore Part 4: Backprop
- ⬜ makemore Part 5: WaveNet
- ⬜ GPT Implementation
- ⬜ Tokenizer Implementation
Video 1: Micrograd
Multiple false starts, forgotten and picked back up again.
I had built out a neural net from scratch using raw Python before, so this was fairly straightforward. I knew quite well how a neuron functioned.
But this was also my first use of PyTorch. It took me some time to get used to zips. Also weirdly, and kinda sheepish → it took me a while to grok the neuron → layer → Multi layer perceptron code. I wanted to make sure I got it. So I had to run through it a couple of times.
LLMs weren't too useful for me here, at least out of the box. They were good at answering questions and explaining, but they tended to give me an answer immediately. I found this prompt from Dwarkesh that talked about the Socratic method, and that was slightly better, though it wasn't perfect. It still seemed to focus the Socratic method of questioning on things I was already grokking, not the stuff I needed help with. But that might have been a skill issue.
Generally speaking though, it is absolutely amazing how a person like Karpathy can share something like this. Simple, straightforward and a detailed introduction to people looking to learn. A massive, pure hearted win for the internet as a whole. Information does want to be free!
While trying to minimize the loss function, I ended up with a stupid error in the code that took me a bunch of time to understand. The loss function wasn't backpropagating correctly.
Notes to self:
- Try out DSPy
- Target: finetune my own LLM and write a blog post on that
Always fun to dutifully spend the time thinking about every piece I'm writing out, create a tiny network and see it become a dutiful little overachiever that isn't good at much else outside the tiny specialisation it spent all its life optimising for. David Foster Wallace would be proud.
Video 2: Makemore
Makemore is about having a list of things, and us trying to make more like them. The goal is, given a list of around 35k names (I mixed some Indian ones in with the Western ones), can we build something that generates a realistic sounding fake name? After this session, I have a newfound respect for all those fantasy name generator websites I frequented when I was 10. It is hard to create good, fake names.
This video took me the better part of a day to work through. Bigrams are conceptually simple, but I decided to do them by myself first, before starting the video. This led to a bunch of wrestling with dictionaries and list manipulation in Python. Given the "handwritten code, read docs and understand" limit I'd set for myself, it was more frictional than I've been used to for the last year and a half. Felt good though.
After a lot of work, my bigram model generated the following masterpieces. Indian name websites would bow their heads in shame:
- Araida
- Jrlaitviana
- Meniraliaba
- Diesqud
- Dan
Sounds obvious in retrospect, but it was crushing to see that significant effort could be spent to get something with no concept of word boundaries, or even an understanding of words. It was a perfect Markov chain producing perfect garbage.
But all in all, it was a good day. Built a decent bigram model myself, used the tutorial to learn how to do it with basic PyTorch, and then tied it all together by training a toy neural net that approximated the direct calculation. Felt a bit stretched by the end, and capped it off with a nice kombucha and a late night football match.
Notes to self:
- Print intermediate values, a lot.
- Test smaller, train small, then big.
- When it feels like it should be easier in a spreadsheet, use pandas.
Next: Multi-layer perceptrons. Time to add actual depth.
Last updated: November 2025