🐐
Andrej Karpathy
865K subscribersWe dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, ...
I keep coming back to these videos again and again. Andrej i...
77 Comments
TheAIGRID
99.5k views • 1 day ago
Wes Roth
91.1k views • 1 day ago
Wes Roth
11.6k views • 1 day ago
AI For Humans
2.1k views • 2 days ago
Wes Roth
79.9k views • 2 days ago
TheAIGRID
21.2k views • 2 days ago
Wes Roth
10.0k views • 2 days ago
AI For Humans
25.0k views • 3 days ago
TheAIGRID
13.6k views • 3 days ago
Wes Roth
80.3k views • 3 days ago
Shelf will be hidden for 30 daysUndo
Wes Roth
13.8k views • 3 days ago
Wes Roth
24.6k views • 4 days ago
TheAIGRID
8.1k views • 4 days ago
AI For Humans
8.9k views • 5 days ago
Wes Roth
29.3k views • 5 days ago
TheAIGRID
17.6k views • 5 days ago
TheAIGRID
49.3k views • 1 week ago
AI Jason
31.5k views • 1 week ago
Wes Roth
19.4k views • 1 week ago
Wes Roth
15.0k views • 1 week ago
Wes Roth
201.6k views • 1 week ago
TheAIGRID
32.6k views • 1 week ago
Wes Roth
148.3k views • 1 week ago
TheAIGRID
3.3k views • 1 week ago
TheAIGRID
23.5k views • 1 week ago
Wes Roth
45.1k views • 1 week ago
AI Explained
90.5k views • 1 week ago
Wes Roth
17.4k views • 1 week ago
AI For Humans
18.8k views • 1 week ago
Wes Roth
4.9k views • 1 week ago
AI Jason
16.1k views • 1 week ago
Wes Roth
51.5k views • 1 week ago
TheAIGRID
439.3k views • 1 week ago
AI Explained
95.2k views • 1 week ago
Wes Roth
103.7k views • 1 week ago
AI For Humans
25.6k views • 1 week ago
Wes Roth
53.8k views • 1 week ago
AI Jason
7.6k views • 1 week ago
TheAIGRID
31.1k views • 1 week ago
AI For Humans
1.8k views • 1 week ago
AI Explained
74.6k views • 1 week ago
Wes Roth
62.3k views • 1 week ago
TheAIGRID
37.3k views • 2 weeks ago
Wes Roth
9.5k views • 2 weeks ago
Wes Roth
3.0k views • 2 weeks ago
Wes Roth
5.9k views • 2 weeks ago
Wes Roth
13.9k views • 2 weeks ago
TheAIGRID
39.8k views • 2 weeks ago
AI For Humans
13.8k views • 2 weeks ago
Wes Roth
99.4k views • 2 weeks ago
Wes Roth
58.1k views • 2 weeks ago
AI For Humans
1.5k views • 2 weeks ago
TheAIGRID
9.6k views • 2 weeks ago
TheAIGRID
35.8k views • 3 weeks ago
AI For Humans
21.2k views • 3 weeks ago
AI For Humans
1.3k views • 3 weeks ago
TheAIGRID
29.3k views • 3 weeks ago
AI Jason
24.4k views • 3 weeks ago
AI For Humans
1.5k views • 3 weeks ago
AI For Humans
1.7k views • 3 weeks ago
TheAIGRID
11.6k views • 4 weeks ago
TheAIGRID
24.4k views • 1 month ago
AI For Humans
12.7k views • 1 month ago
TheAIGRID
13.6k views • 1 month ago
TheAIGRID
22.0k views • 1 month ago
TheAIGRID
7.6k views • 1 month ago
TheAIGRID
39.1k views • 1 month ago
AI Explained
98.3k views • 1 month ago
TheAIGRID
27.7k views • 1 month ago
AI Explained
60.0k views • 1 month ago
AI For Humans
17.2k views • 1 month ago
TheAIGRID
5.9k views • 1 month ago
TheAIGRID
50.8k views • 1 month ago
AI Jason
45.3k views • 1 month ago
AI For Humans
16.5k views • 1 month ago
AI Explained
92.6k views • 1 month ago
AI Explained
56.7k views • 1 month ago
AI For Humans
10.3k views • 1 month ago
AI Jason
221.8k views • 1 month ago
AI Explained
72.1k views • 1 month ago
AI For Humans
11.3k views • 1 month ago
AI For Humans
3.4k views • 2 months ago
AI Explained
108.0k views • 2 months ago
AI For Humans
14.8k views • 2 months ago
AI Jason
97.6k views • 2 months ago
AI Explained
135.6k views • 2 months ago
AI Explained
93.2k views • 2 months ago
AI For Humans
7.6k views • 2 months ago
AI Jason
211.8k views • 2 months ago
AI For Humans
949 views • 2 months ago
AI Jason
16.2k views • 2 months ago
AI For Humans
987 views • 2 months ago
AI For Humans
1.1k views • 2 months ago
AI Explained
117.1k views • 2 months ago
AI For Humans
11.2k views • 2 months ago
AI Jason
82.4k views • 2 months ago
AI For Humans
1.0k views • 2 months ago
AI Explained
109.7k views • 3 months ago
Andrej Karpathy
1.4M views • 3 months ago
AI Explained
135.2k views • 3 months ago
AI Jason
189.7k views • 3 months ago
AI Explained
111.2k views • 3 months ago
Andrej Karpathy
2.6M views • 3 months ago
AI Jason
14.7k views • 3 months ago
AI Explained
122.8k views • 3 months ago
AI Jason
18.1k views • 3 months ago
AI Explained
107.6k views • 4 months ago
AI Explained
182.7k views • 4 months ago
AI Jason
51.4k views • 4 months ago
AI Explained
106.0k views • 4 months ago
AI Jason
46.6k views • 4 months ago
AI Jason
43.4k views • 4 months ago
AI Explained
108.3k views • 4 months ago
AI Jason
68.1k views • 4 months ago
AI Explained
287.4k views • 5 months ago
Andrej Karpathy
34.7k views • 5 months ago
AI Jason
96.7k views • 5 months ago
AI Explained
87.3k views • 5 months ago
AI Explained
74.9k views • 5 months ago
AI Explained
153.6k views • 5 months ago
AI Explained
116.9k views • 5 months ago
AI Jason
62.6k views • 6 months ago
AI Jason
325.6k views • 7 months ago
AI Jason
151.4k views • 8 months ago
AI Jason
201.3k views • 8 months ago
AI Jason
31.3k views • 8 months ago
AI Jason
19.7k views • 9 months ago
AI Jason
124.6k views • 10 months ago
Andrej Karpathy
811.3k views • 11 months ago
Morningside AI
13.8k views • 1 year ago
Andrej Karpathy
813.6k views • 1 year ago
Morningside AI
4.2k views • 1 year ago
Morningside AI
10.1k views • 1 year ago
Andrej Karpathy
2.8M views • 1 year ago
Morningside AI
26.7k views • 1 year ago
Andrej Karpathy
5.7M views • 2 years ago
Andrej Karpathy
223.2k views • 2 years ago
Andrej Karpathy
268.3k views • 2 years ago
Andrej Karpathy
384.0k views • 2 years ago
Andrej Karpathy
423.4k views • 2 years ago
77 Comments
I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?
Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏
Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.
This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.
The amount of useful information in this video is impressive. Thanks for such good content.
Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!
🐐     See Less
I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?     See Less
Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏     See Less
🎯Course outline for quick navigation:
[00:00-03:21]1.     See More ng and refactoring neural networks for language modeling
-[00:00-00:30]Continuing makemore implementation with multilayer perceptron for character-level language modeling, planning to move to larger neural networks.
-[00:31-01:03]Understanding neural net activations and gradients in training is crucial for optimizing architectures.
-[02:06-02:46]Refactored code to optimize neural net with 11,000 parameters over 200,000 steps, achieving train and val loss of 2.16.
-[03:03-03:28]Using torch.nograd decorator to prevent gradients computation.
[03:22-14:22]2. Efficiency of torch.no_grad and neural net initialization issues
-[03:22-04:00]Using torch's no_grad makes computation more efficient by eliminating gradient tracking.
-[04:22-04:50]Network initialization causes high loss of 27, rapidly decreases to 1 or 2.
-[05:00-05:32]At initialization, the model aims for a uniform distribution among 27 characters, with roughly 1/27 probability for each.
-[05:49-06:19]Neural net creates skewed probability distributions leading to high loss.
-[12:08-12:36]Loss at initialization as expected, improved to 2.12-2.16
[14:24-36:39]3. Neural network initialization
-[16:03-16:31]The chain rule with local gradient is affected when outputs of tanh are close to -1 or 1, leading to a halt in back propagation.
-[18:09-18:38]Concern over destructive gradients in flat regions of h outputs, tackled by analyzing absolute values.
-[26:03-26:31]Optimization led to improved validation loss from 2.17 to 2.10 by fixing softmax and 10-inch layer issues.
-[29:28-30:02]Standard deviation expanded to three, aiming for unit gaussian distribution in neural nets.
-[30:17-30:47]Scaling down by 0.2 shrinks gaussian with standard deviation 0.6.
-[31:03-31:46]Initializing neural network weights for well-behaved activations, kaiming he et al.
-[36:24-36:55]Modern innovations have improved network stability and behavior, including residual connections, normalization layers, and better optimizers.
[36:39-51:52]4. Neural net initialization and batch normalization
-[36:39-37:05]Modern innovations like normalization layers and better optimizers reduce the need for precise neural net initialization.
-[40:32-43:04]Batch normalization enables reliable training of deep neural nets, ensuring roughly gaussian hidden states for improved performance.
-[40:51-41:13]Batch normalization from 2015 enabled reliable training of deep neural nets.
-[41:39-42:09]Standardizing hidden states to be unit gaussian is a perfectly differentiable operation, a key insight in the paper.
-[43:20-43:50]Calculating standard deviation of activations, mean is average value of neuron's activation.
-[45:45-46:16]Back propagation guides distribution movement, adding scale and shift for final output
[51:52-01:01:35]5. Jittering and batch normalization in neural network training
-[52:10-52:37]Padding input examples adds entropy, augments data, and regularizes neural nets.
-[53:44-54:09]Batch normalization effectively controls activations and their distributions.
-[56:05-56:33]Batch normalization paper introduces running mean and standard deviation estimation during training.
-[01:00:46-01:01:10]Eliminated explicit calibration stage, almost done with batch normalization, epsilon prevents division by zero.
[01:01:36-01:09:21]6. Batch normalization and resnet in pytorch
-[01:02:00-01:02:30]Biases are subtracted out in batch normalization, reducing their impact to zero.
-[01:03:13-01:03:53]Using batch normalization to control activations in neural net, with gain, bias, mean, and standard deviation parameters.
-[01:07:25-01:07:53]Creating deep neural networks with weight layers, normalization, and non-linearity, as exemplified in the provided code.
[01:09:21-01:23:37]7. Pytorch weight initialization and batch normalization
-[01:10:05-01:10:32]Pytorch initializes weights using 1/fan-in square root from a uniform distribution.
-[01:11:11-01:11:40]Scaling weights by 1 over sqrt of fan in, using batch normalization layer in pytorch with 200 features.
-[01:14:02-01:14:35]Importance of understanding activations and gradients in neural networks, especially as they get bigger and deeper.
-[01:16:00-01:16:30]Batch normalization centers data for gaussian activations in deep neural networks.
-[01:17:32-01:18:02]Batch normalization, influential in 2015, enabled reliable training of much deeper neural nets.
[01:23:39-01:55:56]8. Custom pytorch layer and network analysis
-[01:24:01-01:24:32]Updating buffers using exponential moving average with torch.nograd context manager.
-[01:25:47-01:27:11]The model has 46,000 parameters and uses pytorch for forward and backward passes, with visualizations of forward pass activations.
-[01:28:04-01:28:30]Saturation stabilizes at 20% initially, then stabilizes at 5% with a standard deviation of 0.65 due to gain set at 5 over 3.
-[01:33:19-01:33:50]Setting gain correctly at 1 prevents shrinking and diffusion in batch normalization.
-[01:38:41-01:39:11]The last layer has gradients 100 times greater, causing faster training, but it self-corrects with longer training.
-[01:43:18-01:43:42]Monitoring update ratio for parameters to ensure efficient training, aiming for -3 on log plot.
-[01:51:36-01:52:04]Introduce batch normalization and pytorch modules for neural networks.
-[01:52:39-01:53:06]Introduction to diagnostic tools for neural network analysis.
-[01:54:45-01:55:50]Introduction to diagnostic tools in neural networks, active research in initialization and backpropagation, ongoing progress
offered by Coursnap    See Less
Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.     See Less
This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.     See Less
The amount of useful information in this video is impressive. Thanks for such good content.     See Less
I keep coming back to these videos again and again. Andrej is legend!     See Less
Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!     See Less
what is the purpose of bnmean_running and bnstd_running?     See Less