🐐
Andrej Karpathy
1.3M subscribersWe dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, ...
I keep coming back to these videos again and again. Andrej i...
111 Comments
15:00
TheAIGRID
11.9k views • 2 months ago
20:52
Wes Roth
55.6k views • 2 months ago
15:57
TheAIGRID
15.3k views • 2 months ago
27:28
Wes Roth
109.4k views • 2 months ago
22:12
TheAIGRID
10.8k views • 2 months ago
19:53
TheAIGRID
12.3k views • 2 months ago
21:52
AI Explained
78.4k views • 2 months ago
58:29
AI For Humans
9.7k views • 2 months ago
23:50
TheAIGRID
16.8k views • 2 months ago
13:15
Wes Roth
52.6k views • 2 months ago
Shelf will be hidden for 30 daysUndo
15:17
AI Jason
27.9k views • 2 months ago
16:40
Wes Roth
46.8k views • 2 months ago
11:33
TheAIGRID
7.7k views • 2 months ago
39:39
Wes Roth
39.2k views • 2 months ago
01:03
AI For Humans
2.2k views • 2 months ago
14:10
TheAIGRID
21.9k views • 2 months ago
19:31
TheAIGRID
10.1k views • 2 months ago
23:53
Wes Roth
35.1k views • 2 months ago
16:39
TheAIGRID
23.0k views • 2 months ago
29:32
TheAIGRID
11.0k views • 2 months ago
12:34
TheAIGRID
10.4k views • 2 months ago
13:00
TheAIGRID
7.6k views • 2 months ago
13:40
AI Explained
40.1k views • 2 months ago
58:08
AI For Humans
9.0k views • 2 months ago
16:26
TheAIGRID
10.7k views • 2 months ago
32:43
TheAIGRID
24.0k views • 2 months ago
04:53
AI Jason
20.6k views • 2 months ago
32:27
Wes Roth
56.0k views • 2 months ago
07:00
TheAIGRID
3.0k views • 2 months ago
24:44
Wes Roth
78.9k views • 2 months ago
96:00
Wes Roth
44.3k views • 2 months ago
19:59
TheAIGRID
13.9k views • 2 months ago
14:10
AI Jason
183.5k views • 2 months ago
13:09
TheAIGRID
4.7k views • 2 months ago
21:55
TheAIGRID
15.3k views • 2 months ago
17:29
Wes Roth
16.9k views • 2 months ago
18:50
AI Explained
105.5k views • 2 months ago
52:21
AI For Humans
14.0k views • 2 months ago
12:43
TheAIGRID
30.9k views • 2 months ago
10:58
Wes Roth
48.8k views • 2 months ago
21:15
TheAIGRID
20.0k views • 2 months ago
15:28
AI Jason
38.4k views • 2 months ago
10:15
TheAIGRID
12.2k views • 2 months ago
18:06
TheAIGRID
19.7k views • 2 months ago
02:30
AI For Humans
5.3k views • 2 months ago
10:32
TheAIGRID
9.2k views • 2 months ago
11:24
AI Jason
52.5k views • 2 months ago
12:54
Wes Roth
33.2k views • 2 months ago
01:50
AI For Humans
5.0k views • 2 months ago
26:19
Wes Roth
102.1k views • 3 months ago
22:33
TheAIGRID
57.9k views • 3 months ago
62:43
AI For Humans
28.1k views • 3 months ago
23:07
TheAIGRID
12.5k views • 3 months ago
10:31
Wes Roth
39.2k views • 3 months ago
50:36
Wes Roth
75.8k views • 3 months ago
43:03
TheAIGRID
59.8k views • 3 months ago
08:06
TheAIGRID
43.1k views • 3 months ago
21:07
TheAIGRID
28.9k views • 3 months ago
22:15
TheAIGRID
53.8k views • 3 months ago
18:32
TheAIGRID
25.1k views • 3 months ago
15:29
Wes Roth
68.2k views • 3 months ago
12:35
TheAIGRID
14.8k views • 3 months ago
09:57
TheAIGRID
12.8k views • 3 months ago
09:39
AI Jason
25.2k views • 3 months ago
10:07
TheAIGRID
13.8k views • 3 months ago
19:50
AI Explained
79.3k views • 3 months ago
10:57
TheAIGRID
7.5k views • 3 months ago
57:15
AI For Humans
11.2k views • 3 months ago
10:30
Wes Roth
20.1k views • 3 months ago
15:44
Wes Roth
46.1k views • 3 months ago
05:25
TheAIGRID
623 views • 3 months ago
10:03
Wes Roth
57.4k views • 3 months ago
01:35
AI For Humans
2.5k views • 3 months ago
10:55
TheAIGRID
9.5k views • 3 months ago
12:49
TheAIGRID
11.1k views • 3 months ago
10:47
Wes Roth
92.3k views • 3 months ago
10:12
TheAIGRID
12.5k views • 3 months ago
01:47
AI For Humans
1.2k views • 3 months ago
05:19
TheAIGRID
4.0k views • 3 months ago
31:51
Wes Roth
80.7k views • 3 months ago
08:50
TheAIGRID
41.5k views • 3 months ago
08:05
TheAIGRID
3.4k views • 3 months ago
32:40
Wes Roth
78.3k views • 3 months ago
13:22
TheAIGRID
23.6k views • 3 months ago
13:22
TheAIGRID
2.3k views • 3 months ago
01:44
AI For Humans
1.2k views • 3 months ago
11:31
TheAIGRID
10.9k views • 3 months ago
10:00
TheAIGRID
44.1k views • 3 months ago
25:00
Wes Roth
121.8k views • 3 months ago
11:46
TheAIGRID
23.0k views • 3 months ago
01:22
AI For Humans
21.6k views • 3 months ago
11:25
TheAIGRID
7.2k views • 3 months ago
53:42
AI For Humans
12.0k views • 3 months ago
23:17
Wes Roth
35.0k views • 3 months ago
08:02
TheAIGRID
9.4k views • 3 months ago
17:30
Wes Roth
34.3k views • 3 months ago
22:13
AI Explained
69.4k views • 3 months ago
23:08
Wes Roth
36.2k views • 3 months ago
23:08
Wes Roth
41 views • 3 months ago
09:19
TheAIGRID
7.6k views • 3 months ago
11:31
TheAIGRID
13.4k views • 3 months ago
27:09
Wes Roth
77.7k views • 3 months ago
10:51
AI Jason
18.6k views • 3 months ago
09:46
TheAIGRID
21.1k views • 3 months ago
08:19
TheAIGRID
4.3k views • 3 months ago
59:16
AI For Humans
11.7k views • 3 months ago
09:27
TheAIGRID
19.4k views • 3 months ago
11:05
TheAIGRID
4.4k views • 3 months ago
09:31
TheAIGRID
13.7k views • 3 months ago
08:10
TheAIGRID
8.9k views • 3 months ago
01:20
AI For Humans
2.8k views • 3 months ago
12:56
TheAIGRID
6.8k views • 3 months ago
55:25
AI For Humans
12.3k views • 3 months ago
19:03
AI Explained
104.1k views • 4 months ago
18:16
TheAIGRID
19.3k views • 4 months ago
13:18
TheAIGRID
21.4k views • 4 months ago
01:56
AI For Humans
8.6k views • 4 months ago
41:56
AI For Humans
11.1k views • 4 months ago
02:31
AI For Humans
2.0k views • 4 months ago
15:52
TheAIGRID
49.7k views • 4 months ago
10:25
TheAIGRID
448.8k views • 4 months ago
18:54
AI Jason
73.4k views • 4 months ago
33:27
AI Explained
123.6k views • 4 months ago
09:34
TheAIGRID
18.8k views • 4 months ago
20:00
AI Explained
89.8k views • 4 months ago
45:20
AI For Humans
9.6k views • 4 months ago
25:37
TheAIGRID
42.4k views • 4 months ago
35:11
TheAIGRID
23.1k views • 4 months ago
10:39
TheAIGRID
56.1k views • 4 months ago
13:03
TheAIGRID
15.1k views • 5 months ago
17:42
AI Explained
89.4k views • 5 months ago
55:07
AI For Humans
11.5k views • 5 months ago
11:29
AI Jason
96.4k views • 5 months ago
41:56
TheAIGRID
26.6k views • 5 months ago
33:44
TheAIGRID
9.4k views • 5 months ago
10:12
TheAIGRID
20.4k views • 5 months ago
20:16
AI Explained
74.9k views • 5 months ago
49:17
AI For Humans
10.7k views • 5 months ago
11:29
TheAIGRID
6.3k views • 5 months ago
13:46
TheAIGRID
25.8k views • 5 months ago
10:43
TheAIGRID
7.1k views • 5 months ago
00:26
AI For Humans
3.6k views • 5 months ago
02:17
TheAIGRID
6.1k views • 5 months ago
12:33
AI Jason
31.3k views • 5 months ago
00:42
AI For Humans
5.4k views • 5 months ago
44:40
AI For Humans
15.8k views • 5 months ago
14:56
AI Explained
60.4k views • 5 months ago
01:41
AI For Humans
6.0k views • 5 months ago
21:43
AI Explained
118.6k views • 5 months ago
23:40
TheAIGRID
71.5k views • 5 months ago
13:33
TheAIGRID
8.8k views • 5 months ago
14:40
TheAIGRID
27.0k views • 5 months ago
15:37
TheAIGRID
113.8k views • 5 months ago
15:07
TheAIGRID
17.6k views • 5 months ago
18:27
AI Explained
61.9k views • 6 months ago
45:12
AI For Humans
12.1k views • 6 months ago
12:54
AI Explained
60.7k views • 6 months ago
55:23
AI For Humans
13.1k views • 6 months ago
08:33
AI Jason
41.4k views • 6 months ago
14:14
AI Explained
58.4k views • 6 months ago
05:14
AI Jason
33.6k views • 6 months ago
11:47
AI Jason
61.4k views • 7 months ago
15:44
AI Explained
58.8k views • 7 months ago
14:07
AI Explained
67.5k views • 7 months ago
11:32
AI Explained
48.7k views • 7 months ago
11:32
AI Explained
20.2k views • 7 months ago
06:41
AI Jason
18.0k views • 8 months ago
18:55
AI Explained
57.9k views • 8 months ago
16:02
AI Jason
116.1k views • 9 months ago
15:02
AI Explained
163.6k views • 9 months ago
11:55
AI Explained
199.3k views • 9 months ago
18:44
AI Jason
139.2k views • 9 months ago
17:20
AI Explained
84.7k views • 9 months ago
07:02
AI Jason
81.7k views • 10 months ago
11:44
AI Explained
179.1k views • 10 months ago
09:29
AI Jason
53.2k views • 10 months ago
16:39
AI Jason
188.3k views • 10 months ago
26:20
AI Explained
110.5k views • 10 months ago
05:56
AI Jason
23.0k views • 10 months ago
14:01
AI Explained
101.8k views • 11 months ago
22:02
AI Jason
18.2k views • 11 months ago
16:50
AI Explained
96.4k views • 11 months ago
03:35
AI Jason
52.7k views • 11 months ago
19:05
AI Explained
99.1k views • 11 months ago
04:25
AI Jason
24.5k views • 11 months ago
14:02
AI Jason
10.2k views • 11 months ago
11:44
AI Jason
35.5k views • 1 year ago
131:12
Andrej Karpathy
2.3M views • 1 year ago
211:24
Andrej Karpathy
5.8M views • 1 year ago
81:55
Andrej Karpathy
34.7k views • 1 year ago
241:26
Andrej Karpathy
1.0M views • 1 year ago
30:38
Morningside AI
13.8k views • 2 years ago
133:35
Andrej Karpathy
1.0M views • 2 years ago
26:56
Morningside AI
4.2k views • 2 years ago
45:54
Morningside AI
10.1k views • 2 years ago
59:48
Andrej Karpathy
3.5M views • 2 years ago
39:00
Morningside AI
26.7k views • 2 years ago
116:20
Andrej Karpathy
6.9M views • 3 years ago
56:22
Andrej Karpathy
264.4k views • 3 years ago
115:24
Andrej Karpathy
328.8k views • 3 years ago
115:58
Andrej Karpathy
475.4k views • 3 years ago
75:40
Andrej Karpathy
510.0k views • 3 years ago
111 Comments
I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?
Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏
Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.
This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.
The amount of useful information in this video is impressive. Thanks for such good content.
Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!
🐐     See Less
I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?     See Less
Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏     See Less
🎯Course outline for quick navigation:
[00:00-03:21]1.     See More ng and refactoring neural networks for language modeling
-[00:00-00:30]Continuing makemore implementation with multilayer perceptron for character-level language modeling, planning to move to larger neural networks.
-[00:31-01:03]Understanding neural net activations and gradients in training is crucial for optimizing architectures.
-[02:06-02:46]Refactored code to optimize neural net with 11,000 parameters over 200,000 steps, achieving train and val loss of 2.16.
-[03:03-03:28]Using torch.nograd decorator to prevent gradients computation.
[03:22-14:22]2. Efficiency of torch.no_grad and neural net initialization issues
-[03:22-04:00]Using torch's no_grad makes computation more efficient by eliminating gradient tracking.
-[04:22-04:50]Network initialization causes high loss of 27, rapidly decreases to 1 or 2.
-[05:00-05:32]At initialization, the model aims for a uniform distribution among 27 characters, with roughly 1/27 probability for each.
-[05:49-06:19]Neural net creates skewed probability distributions leading to high loss.
-[12:08-12:36]Loss at initialization as expected, improved to 2.12-2.16
[14:24-36:39]3. Neural network initialization
-[16:03-16:31]The chain rule with local gradient is affected when outputs of tanh are close to -1 or 1, leading to a halt in back propagation.
-[18:09-18:38]Concern over destructive gradients in flat regions of h outputs, tackled by analyzing absolute values.
-[26:03-26:31]Optimization led to improved validation loss from 2.17 to 2.10 by fixing softmax and 10-inch layer issues.
-[29:28-30:02]Standard deviation expanded to three, aiming for unit gaussian distribution in neural nets.
-[30:17-30:47]Scaling down by 0.2 shrinks gaussian with standard deviation 0.6.
-[31:03-31:46]Initializing neural network weights for well-behaved activations, kaiming he et al.
-[36:24-36:55]Modern innovations have improved network stability and behavior, including residual connections, normalization layers, and better optimizers.
[36:39-51:52]4. Neural net initialization and batch normalization
-[36:39-37:05]Modern innovations like normalization layers and better optimizers reduce the need for precise neural net initialization.
-[40:32-43:04]Batch normalization enables reliable training of deep neural nets, ensuring roughly gaussian hidden states for improved performance.
-[40:51-41:13]Batch normalization from 2015 enabled reliable training of deep neural nets.
-[41:39-42:09]Standardizing hidden states to be unit gaussian is a perfectly differentiable operation, a key insight in the paper.
-[43:20-43:50]Calculating standard deviation of activations, mean is average value of neuron's activation.
-[45:45-46:16]Back propagation guides distribution movement, adding scale and shift for final output
[51:52-01:01:35]5. Jittering and batch normalization in neural network training
-[52:10-52:37]Padding input examples adds entropy, augments data, and regularizes neural nets.
-[53:44-54:09]Batch normalization effectively controls activations and their distributions.
-[56:05-56:33]Batch normalization paper introduces running mean and standard deviation estimation during training.
-[01:00:46-01:01:10]Eliminated explicit calibration stage, almost done with batch normalization, epsilon prevents division by zero.
[01:01:36-01:09:21]6. Batch normalization and resnet in pytorch
-[01:02:00-01:02:30]Biases are subtracted out in batch normalization, reducing their impact to zero.
-[01:03:13-01:03:53]Using batch normalization to control activations in neural net, with gain, bias, mean, and standard deviation parameters.
-[01:07:25-01:07:53]Creating deep neural networks with weight layers, normalization, and non-linearity, as exemplified in the provided code.
[01:09:21-01:23:37]7. Pytorch weight initialization and batch normalization
-[01:10:05-01:10:32]Pytorch initializes weights using 1/fan-in square root from a uniform distribution.
-[01:11:11-01:11:40]Scaling weights by 1 over sqrt of fan in, using batch normalization layer in pytorch with 200 features.
-[01:14:02-01:14:35]Importance of understanding activations and gradients in neural networks, especially as they get bigger and deeper.
-[01:16:00-01:16:30]Batch normalization centers data for gaussian activations in deep neural networks.
-[01:17:32-01:18:02]Batch normalization, influential in 2015, enabled reliable training of much deeper neural nets.
[01:23:39-01:55:56]8. Custom pytorch layer and network analysis
-[01:24:01-01:24:32]Updating buffers using exponential moving average with torch.nograd context manager.
-[01:25:47-01:27:11]The model has 46,000 parameters and uses pytorch for forward and backward passes, with visualizations of forward pass activations.
-[01:28:04-01:28:30]Saturation stabilizes at 20% initially, then stabilizes at 5% with a standard deviation of 0.65 due to gain set at 5 over 3.
-[01:33:19-01:33:50]Setting gain correctly at 1 prevents shrinking and diffusion in batch normalization.
-[01:38:41-01:39:11]The last layer has gradients 100 times greater, causing faster training, but it self-corrects with longer training.
-[01:43:18-01:43:42]Monitoring update ratio for parameters to ensure efficient training, aiming for -3 on log plot.
-[01:51:36-01:52:04]Introduce batch normalization and pytorch modules for neural networks.
-[01:52:39-01:53:06]Introduction to diagnostic tools for neural network analysis.
-[01:54:45-01:55:50]Introduction to diagnostic tools in neural networks, active research in initialization and backpropagation, ongoing progress
offered by Coursnap    See Less
Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.     See Less
This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.     See Less
The amount of useful information in this video is impressive. Thanks for such good content.     See Less
I keep coming back to these videos again and again. Andrej is legend!     See Less
Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!     See Less
what is the purpose of bnmean_running and bnstd_running?     See Less