There are two bad reasons to implement a machine learning model from scratch.
The first is pride. The second is the vague feeling that using a framework is somehow cheating.
Use the framework for real work. PyTorch is not embarrassed about knowing more numerical tricks than you do. It should know more. That is why it exists.
The good reason to build a toy model is different: you want the bookkeeping to become annoying.
A tiny language model is not impressive. A basic GAN is not going to produce anything useful. An evolutionary training loop will probably spend most of its time teaching humility. But these small programs force the dull questions into the open. What shape is this tensor? Am I sampling from the distribution I think I am sampling from? Did I separate training and evaluation, or did I build a machine for lying to myself?
They also make failure less mystical. A model that does not learn is not "AI being weird". Maybe the objective is weak. Maybe the data is too thin. Maybe the initialization is bad. Maybe the experiment is too small for the claim being made. You can still be wrong about all of this, but you are wrong closer to the mechanism.
Then, having learned the lesson, go back to the library. The goal was never to become a worse framework author in your spare time.
Back to writing