🐐
Andrej Karpathy
1.3M subscribersWe dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, ...
I keep coming back to these videos again and again. Andrej i...
111 Comments
29:32
TheAIGRID
6.7k views • 17 hours ago
12:34
TheAIGRID
8.4k views • 1 day ago
18:31
Wes Roth
36.0k views • 1 day ago
13:00
TheAIGRID
7.2k views • 1 day ago
13:40
AI Explained
32.2k views • 1 day ago
16:26
TheAIGRID
10.2k views • 2 days ago
20:55
Wes Roth
75.6k views • 3 days ago
32:43
TheAIGRID
20.0k views • 3 days ago
04:53
AI Jason
17.0k views • 4 days ago
32:27
Wes Roth
54.4k views • 4 days ago
Shelf will be hidden for 30 daysUndo
07:00
TheAIGRID
3.0k views • 4 days ago
24:44
Wes Roth
75.0k views • 5 days ago
96:00
Wes Roth
39.8k views • 6 days ago
19:59
TheAIGRID
13.9k views • 6 days ago
14:10
AI Jason
162.9k views • 1 week ago
13:09
TheAIGRID
4.7k views • 1 week ago
21:55
TheAIGRID
15.3k views • 1 week ago
17:29
Wes Roth
16.3k views • 1 week ago
18:50
AI Explained
102.2k views • 1 week ago
52:21
AI For Humans
14.0k views • 1 week ago
12:43
TheAIGRID
30.9k views • 1 week ago
10:58
Wes Roth
48.5k views • 1 week ago
21:15
TheAIGRID
20.0k views • 1 week ago
15:28
AI Jason
34.2k views • 1 week ago
10:15
TheAIGRID
12.2k views • 1 week ago
18:06
TheAIGRID
19.7k views • 1 week ago
02:30
AI For Humans
5.0k views • 1 week ago
10:32
TheAIGRID
9.2k views • 1 week ago
11:24
AI Jason
48.5k views • 2 weeks ago
12:54
Wes Roth
33.0k views • 2 weeks ago
01:50
AI For Humans
4.5k views • 2 weeks ago
26:19
Wes Roth
101.8k views • 2 weeks ago
22:33
TheAIGRID
57.9k views • 2 weeks ago
62:43
AI For Humans
26.9k views • 2 weeks ago
23:07
TheAIGRID
12.5k views • 2 weeks ago
10:31
Wes Roth
38.9k views • 2 weeks ago
50:36
Wes Roth
71.6k views • 2 weeks ago
43:03
TheAIGRID
59.8k views • 2 weeks ago
08:06
TheAIGRID
43.1k views • 2 weeks ago
21:07
TheAIGRID
28.9k views • 2 weeks ago
22:15
TheAIGRID
53.8k views • 2 weeks ago
18:32
TheAIGRID
25.1k views • 2 weeks ago
15:29
Wes Roth
67.9k views • 2 weeks ago
12:35
TheAIGRID
14.8k views • 3 weeks ago
09:57
TheAIGRID
12.8k views • 3 weeks ago
09:39
AI Jason
23.2k views • 3 weeks ago
10:07
TheAIGRID
13.8k views • 3 weeks ago
19:50
AI Explained
78.7k views • 3 weeks ago
10:57
TheAIGRID
7.5k views • 3 weeks ago
57:15
AI For Humans
10.8k views • 3 weeks ago
10:30
Wes Roth
20.0k views • 3 weeks ago
15:44
Wes Roth
45.7k views • 3 weeks ago
05:25
TheAIGRID
623 views • 3 weeks ago
10:03
Wes Roth
57.4k views • 3 weeks ago
01:35
AI For Humans
2.4k views • 3 weeks ago
10:55
TheAIGRID
9.5k views • 3 weeks ago
12:49
TheAIGRID
11.1k views • 3 weeks ago
10:47
Wes Roth
92.2k views • 3 weeks ago
10:12
TheAIGRID
12.5k views • 3 weeks ago
01:47
AI For Humans
1.2k views • 3 weeks ago
05:19
TheAIGRID
4.0k views • 3 weeks ago
31:51
Wes Roth
80.1k views • 3 weeks ago
08:50
TheAIGRID
41.5k views • 3 weeks ago
08:05
TheAIGRID
3.4k views • 3 weeks ago
32:40
Wes Roth
78.2k views • 3 weeks ago
13:22
TheAIGRID
23.6k views • 3 weeks ago
13:22
TheAIGRID
2.3k views • 3 weeks ago
01:44
AI For Humans
1.2k views • 3 weeks ago
11:31
TheAIGRID
10.9k views • 3 weeks ago
10:00
TheAIGRID
44.1k views • 3 weeks ago
25:00
Wes Roth
121.6k views • 1 month ago
11:46
TheAIGRID
23.0k views • 1 month ago
01:22
AI For Humans
20.7k views • 4 weeks ago
11:25
TheAIGRID
7.2k views • 4 weeks ago
53:42
AI For Humans
11.8k views • 4 weeks ago
23:17
Wes Roth
34.9k views • 4 weeks ago
08:02
TheAIGRID
9.4k views • 4 weeks ago
17:30
Wes Roth
34.2k views • 1 month ago
22:13
AI Explained
68.9k views • 1 month ago
23:08
Wes Roth
41 views • 1 month ago
23:08
Wes Roth
36.1k views • 1 month ago
09:19
TheAIGRID
7.6k views • 1 month ago
11:31
TheAIGRID
13.4k views • 1 month ago
27:09
Wes Roth
77.6k views • 1 month ago
12:43
Wes Roth
28.4k views • 1 month ago
10:51
AI Jason
18.0k views • 1 month ago
09:46
TheAIGRID
21.1k views • 1 month ago
35:11
Wes Roth
36.6k views • 1 month ago
08:19
TheAIGRID
4.3k views • 1 month ago
59:16
AI For Humans
11.5k views • 1 month ago
09:27
TheAIGRID
19.4k views • 1 month ago
11:05
TheAIGRID
4.4k views • 1 month ago
09:31
TheAIGRID
13.7k views • 1 month ago
08:10
TheAIGRID
8.9k views • 1 month ago
01:20
AI For Humans
2.8k views • 1 month ago
12:56
TheAIGRID
6.8k views • 1 month ago
55:25
AI For Humans
12.2k views • 1 month ago
19:03
AI Explained
103.4k views • 1 month ago
18:16
TheAIGRID
19.3k views • 1 month ago
13:18
TheAIGRID
21.4k views • 1 month ago
01:56
AI For Humans
8.6k views • 1 month ago
41:56
AI For Humans
11.0k views • 1 month ago
02:31
AI For Humans
2.0k views • 1 month ago
15:52
TheAIGRID
49.7k views • 1 month ago
10:25
TheAIGRID
448.8k views • 2 months ago
18:54
AI Jason
73.0k views • 2 months ago
33:27
AI Explained
123.3k views • 2 months ago
09:34
TheAIGRID
18.8k views • 2 months ago
20:00
AI Explained
89.7k views • 2 months ago
45:20
AI For Humans
9.4k views • 2 months ago
25:37
TheAIGRID
42.4k views • 2 months ago
35:11
TheAIGRID
23.1k views • 2 months ago
10:39
TheAIGRID
56.1k views • 2 months ago
13:03
TheAIGRID
15.1k views • 2 months ago
17:42
AI Explained
89.4k views • 2 months ago
55:07
AI For Humans
11.4k views • 2 months ago
11:29
AI Jason
95.0k views • 2 months ago
41:56
TheAIGRID
26.6k views • 2 months ago
33:44
TheAIGRID
9.4k views • 2 months ago
10:12
TheAIGRID
20.4k views • 2 months ago
20:16
AI Explained
74.8k views • 2 months ago
49:17
AI For Humans
10.6k views • 2 months ago
11:29
TheAIGRID
6.3k views • 2 months ago
13:46
TheAIGRID
25.8k views • 3 months ago
10:43
TheAIGRID
7.1k views • 2 months ago
00:26
AI For Humans
3.6k views • 3 months ago
02:17
TheAIGRID
6.1k views • 3 months ago
12:33
AI Jason
31.2k views • 3 months ago
00:42
AI For Humans
5.4k views • 3 months ago
44:40
AI For Humans
15.8k views • 3 months ago
14:56
AI Explained
60.4k views • 3 months ago
01:41
AI For Humans
6.0k views • 3 months ago
21:43
AI Explained
118.5k views • 3 months ago
23:40
TheAIGRID
71.5k views • 3 months ago
13:33
TheAIGRID
8.8k views • 3 months ago
14:40
TheAIGRID
27.0k views • 3 months ago
15:37
TheAIGRID
113.8k views • 3 months ago
15:07
TheAIGRID
17.6k views • 3 months ago
18:27
AI Explained
61.9k views • 3 months ago
45:12
AI For Humans
12.1k views • 3 months ago
12:54
AI Explained
60.7k views • 3 months ago
55:23
AI For Humans
13.1k views • 3 months ago
08:33
AI Jason
41.2k views • 3 months ago
14:14
AI Explained
58.4k views • 4 months ago
05:14
AI Jason
33.5k views • 4 months ago
11:47
AI Jason
61.0k views • 4 months ago
15:44
AI Explained
58.8k views • 4 months ago
14:07
AI Explained
67.5k views • 5 months ago
11:32
AI Explained
48.7k views • 5 months ago
11:32
AI Explained
20.2k views • 5 months ago
06:41
AI Jason
17.9k views • 5 months ago
18:55
AI Explained
57.9k views • 6 months ago
16:02
AI Jason
115.1k views • 6 months ago
15:02
AI Explained
163.6k views • 6 months ago
11:55
AI Explained
199.3k views • 6 months ago
18:44
AI Jason
138.9k views • 7 months ago
17:20
AI Explained
84.7k views • 7 months ago
07:02
AI Jason
81.7k views • 7 months ago
11:44
AI Explained
179.1k views • 7 months ago
09:29
AI Jason
53.2k views • 7 months ago
16:39
AI Jason
188.0k views • 8 months ago
26:20
AI Explained
110.5k views • 8 months ago
05:56
AI Jason
23.0k views • 8 months ago
14:01
AI Explained
101.8k views • 8 months ago
22:02
AI Jason
18.2k views • 8 months ago
16:50
AI Explained
96.4k views • 8 months ago
03:35
AI Jason
52.6k views • 9 months ago
19:05
AI Explained
99.1k views • 9 months ago
04:25
AI Jason
24.5k views • 9 months ago
14:02
AI Jason
10.2k views • 9 months ago
11:44
AI Jason
35.5k views • 9 months ago
19:04
AI Jason
55.2k views • 10 months ago
131:12
Andrej Karpathy
2.3M views • 1 year ago
211:24
Andrej Karpathy
5.7M views • 1 year ago
81:55
Andrej Karpathy
34.7k views • 1 year ago
241:26
Andrej Karpathy
1.0M views • 1 year ago
30:38
Morningside AI
13.8k views • 1 year ago
133:35
Andrej Karpathy
1.0M views • 2 years ago
26:56
Morningside AI
4.2k views • 2 years ago
45:54
Morningside AI
10.1k views • 2 years ago
59:48
Andrej Karpathy
3.4M views • 2 years ago
39:00
Morningside AI
26.7k views • 2 years ago
116:20
Andrej Karpathy
6.9M views • 3 years ago
56:22
Andrej Karpathy
263.0k views • 3 years ago
115:24
Andrej Karpathy
326.7k views • 3 years ago
115:58
Andrej Karpathy
475.4k views • 3 years ago
75:40
Andrej Karpathy
510.0k views • 3 years ago
111 Comments
I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?
Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏
Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.
This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.
The amount of useful information in this video is impressive. Thanks for such good content.
Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!
🐐     See Less
I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?     See Less
Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏     See Less
🎯Course outline for quick navigation:
[00:00-03:21]1.     See More ng and refactoring neural networks for language modeling
-[00:00-00:30]Continuing makemore implementation with multilayer perceptron for character-level language modeling, planning to move to larger neural networks.
-[00:31-01:03]Understanding neural net activations and gradients in training is crucial for optimizing architectures.
-[02:06-02:46]Refactored code to optimize neural net with 11,000 parameters over 200,000 steps, achieving train and val loss of 2.16.
-[03:03-03:28]Using torch.nograd decorator to prevent gradients computation.
[03:22-14:22]2. Efficiency of torch.no_grad and neural net initialization issues
-[03:22-04:00]Using torch's no_grad makes computation more efficient by eliminating gradient tracking.
-[04:22-04:50]Network initialization causes high loss of 27, rapidly decreases to 1 or 2.
-[05:00-05:32]At initialization, the model aims for a uniform distribution among 27 characters, with roughly 1/27 probability for each.
-[05:49-06:19]Neural net creates skewed probability distributions leading to high loss.
-[12:08-12:36]Loss at initialization as expected, improved to 2.12-2.16
[14:24-36:39]3. Neural network initialization
-[16:03-16:31]The chain rule with local gradient is affected when outputs of tanh are close to -1 or 1, leading to a halt in back propagation.
-[18:09-18:38]Concern over destructive gradients in flat regions of h outputs, tackled by analyzing absolute values.
-[26:03-26:31]Optimization led to improved validation loss from 2.17 to 2.10 by fixing softmax and 10-inch layer issues.
-[29:28-30:02]Standard deviation expanded to three, aiming for unit gaussian distribution in neural nets.
-[30:17-30:47]Scaling down by 0.2 shrinks gaussian with standard deviation 0.6.
-[31:03-31:46]Initializing neural network weights for well-behaved activations, kaiming he et al.
-[36:24-36:55]Modern innovations have improved network stability and behavior, including residual connections, normalization layers, and better optimizers.
[36:39-51:52]4. Neural net initialization and batch normalization
-[36:39-37:05]Modern innovations like normalization layers and better optimizers reduce the need for precise neural net initialization.
-[40:32-43:04]Batch normalization enables reliable training of deep neural nets, ensuring roughly gaussian hidden states for improved performance.
-[40:51-41:13]Batch normalization from 2015 enabled reliable training of deep neural nets.
-[41:39-42:09]Standardizing hidden states to be unit gaussian is a perfectly differentiable operation, a key insight in the paper.
-[43:20-43:50]Calculating standard deviation of activations, mean is average value of neuron's activation.
-[45:45-46:16]Back propagation guides distribution movement, adding scale and shift for final output
[51:52-01:01:35]5. Jittering and batch normalization in neural network training
-[52:10-52:37]Padding input examples adds entropy, augments data, and regularizes neural nets.
-[53:44-54:09]Batch normalization effectively controls activations and their distributions.
-[56:05-56:33]Batch normalization paper introduces running mean and standard deviation estimation during training.
-[01:00:46-01:01:10]Eliminated explicit calibration stage, almost done with batch normalization, epsilon prevents division by zero.
[01:01:36-01:09:21]6. Batch normalization and resnet in pytorch
-[01:02:00-01:02:30]Biases are subtracted out in batch normalization, reducing their impact to zero.
-[01:03:13-01:03:53]Using batch normalization to control activations in neural net, with gain, bias, mean, and standard deviation parameters.
-[01:07:25-01:07:53]Creating deep neural networks with weight layers, normalization, and non-linearity, as exemplified in the provided code.
[01:09:21-01:23:37]7. Pytorch weight initialization and batch normalization
-[01:10:05-01:10:32]Pytorch initializes weights using 1/fan-in square root from a uniform distribution.
-[01:11:11-01:11:40]Scaling weights by 1 over sqrt of fan in, using batch normalization layer in pytorch with 200 features.
-[01:14:02-01:14:35]Importance of understanding activations and gradients in neural networks, especially as they get bigger and deeper.
-[01:16:00-01:16:30]Batch normalization centers data for gaussian activations in deep neural networks.
-[01:17:32-01:18:02]Batch normalization, influential in 2015, enabled reliable training of much deeper neural nets.
[01:23:39-01:55:56]8. Custom pytorch layer and network analysis
-[01:24:01-01:24:32]Updating buffers using exponential moving average with torch.nograd context manager.
-[01:25:47-01:27:11]The model has 46,000 parameters and uses pytorch for forward and backward passes, with visualizations of forward pass activations.
-[01:28:04-01:28:30]Saturation stabilizes at 20% initially, then stabilizes at 5% with a standard deviation of 0.65 due to gain set at 5 over 3.
-[01:33:19-01:33:50]Setting gain correctly at 1 prevents shrinking and diffusion in batch normalization.
-[01:38:41-01:39:11]The last layer has gradients 100 times greater, causing faster training, but it self-corrects with longer training.
-[01:43:18-01:43:42]Monitoring update ratio for parameters to ensure efficient training, aiming for -3 on log plot.
-[01:51:36-01:52:04]Introduce batch normalization and pytorch modules for neural networks.
-[01:52:39-01:53:06]Introduction to diagnostic tools for neural network analysis.
-[01:54:45-01:55:50]Introduction to diagnostic tools in neural networks, active research in initialization and backpropagation, ongoing progress
offered by Coursnap    See Less
Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.     See Less
This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.     See Less
The amount of useful information in this video is impressive. Thanks for such good content.     See Less
I keep coming back to these videos again and again. Andrej is legend!     See Less
Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!     See Less
what is the purpose of bnmean_running and bnstd_running?     See Less