Note
CS Degree Day 81
What I did today?
- Lecture 11: Convexity - convex sets, convex functions
- Lecture 12: Gradient descent - derivation, step size, convergence
- Lecture 13: Stochastic gradient descent
This is the connection I was waiting for. Gradient descent is how neural networks learn. The loss function is a surface in a high-dimensional space. The gradient tells you the direction of steepest ascent. You walk downhill. The learning rate determines the step size. Too large and you overshoot. Too small and you never arrive. I have used gradient descent in PyTorch without understanding what it was doing. Now I do.