🐐
Andrej Karpathy
1.2M subscribersWe dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, ...
I keep coming back to these videos again and again. Andrej i...
103 Comments
27:23
Wes Roth
23.3k views • 23 hours ago
08:52
TheAIGRID
3.1k views • 1 day ago
19:30
Wes Roth
27.7k views • 1 day ago
30:11
TheAIGRID
21.1k views • 2 days ago
111:10
Wes Roth
11.7k views • 2 days ago
13:07
TheAIGRID
37.5k views • 4 days ago
89:57
Wes Roth
18.0k views • 4 days ago
15:52
TheAIGRID
36.1k views • 6 days ago
21:05
Wes Roth
46.3k views • 6 days ago
14:24
TheAIGRID
3.9k views • 1 week ago
Shelf will be hidden for 30 daysUndo
11:31
TheAIGRID
19.9k views • 1 week ago
10:45
Wes Roth
31.6k views • 1 week ago
27:21
TheAIGRID
38.5k views • 1 week ago
18:55
Wes Roth
54.2k views • 1 week ago
10:25
TheAIGRID
416.1k views • 1 week ago
88:45
Wes Roth
13.4k views • 1 week ago
18:54
AI Jason
63.4k views • 2 weeks ago
33:27
AI Explained
112.1k views • 2 weeks ago
09:34
TheAIGRID
18.2k views • 2 weeks ago
62:07
Wes Roth
11.7k views • 2 weeks ago
20:00
AI Explained
86.7k views • 3 weeks ago
45:20
AI For Humans
8.5k views • 3 weeks ago
38:57
Wes Roth
31.1k views • 3 weeks ago
25:37
TheAIGRID
41.8k views • 3 weeks ago
23:58
Wes Roth
64.4k views • 3 weeks ago
35:11
TheAIGRID
22.8k views • 3 weeks ago
10:39
TheAIGRID
55.4k views • 3 weeks ago
27:21
Wes Roth
82.6k views • 3 weeks ago
13:03
TheAIGRID
15.0k views • 3 weeks ago
17:42
AI Explained
87.3k views • 4 weeks ago
55:07
AI For Humans
10.6k views • 4 weeks ago
11:29
AI Jason
73.9k views • 4 weeks ago
34:18
Wes Roth
68.4k views • 4 weeks ago
19:32
Wes Roth
57.4k views • 4 weeks ago
41:56
TheAIGRID
26.2k views • 1 month ago
17:31
Wes Roth
53.2k views • 1 month ago
42:48
TheAIGRID
23.8k views • 1 month ago
33:44
TheAIGRID
9.3k views • 1 month ago
118:46
Wes Roth
20.7k views • 1 month ago
10:12
TheAIGRID
20.1k views • 1 month ago
20:16
AI Explained
73.4k views • 1 month ago
49:17
AI For Humans
9.8k views • 1 month ago
11:29
TheAIGRID
6.1k views • 1 month ago
08:52
Wes Roth
74.2k views • 1 month ago
27:54
Wes Roth
20.7k views • 1 month ago
13:46
TheAIGRID
24.4k views • 1 month ago
62:38
Wes Roth
29.0k views • 1 month ago
10:43
TheAIGRID
7.0k views • 1 month ago
81:11
Wes Roth
109.0k views • 1 month ago
18:47
Wes Roth
212.0k views • 1 month ago
00:26
AI For Humans
3.0k views • 1 month ago
20:07
Wes Roth
45.6k views • 1 month ago
02:17
TheAIGRID
5.5k views • 1 month ago
12:33
AI Jason
30.0k views • 1 month ago
00:42
AI For Humans
4.8k views • 1 month ago
44:40
AI For Humans
15.3k views • 1 month ago
19:38
Wes Roth
34.9k views • 1 month ago
14:56
AI Explained
59.6k views • 1 month ago
01:41
AI For Humans
5.6k views • 1 month ago
12:08
Wes Roth
49.0k views • 1 month ago
21:43
AI Explained
117.4k views • 1 month ago
23:40
TheAIGRID
71.3k views • 1 month ago
14:08
Wes Roth
54.9k views • 1 month ago
13:33
TheAIGRID
8.8k views • 1 month ago
14:40
TheAIGRID
27.0k views • 1 month ago
15:37
TheAIGRID
111.2k views • 1 month ago
15:07
TheAIGRID
17.0k views • 1 month ago
18:27
AI Explained
61.5k views • 1 month ago
45:12
AI For Humans
11.8k views • 1 month ago
19:45
TheAIGRID
124.0k views • 1 month ago
13:13
TheAIGRID
48.4k views • 1 month ago
12:54
AI Explained
60.3k views • 1 month ago
55:23
AI For Humans
11.6k views • 2 months ago
62:27
AI For Humans
9.4k views • 2 months ago
08:33
AI Jason
38.0k views • 2 months ago
01:25
AI For Humans
4.7k views • 2 months ago
01:25
AI For Humans
2.0k views • 2 months ago
01:19
AI For Humans
3.2k views • 2 months ago
52:48
AI For Humans
13.3k views • 2 months ago
14:14
AI Explained
58.1k views • 2 months ago
05:14
AI Jason
32.5k views • 2 months ago
48:09
AI For Humans
11.4k views • 2 months ago
53:01
AI For Humans
14.5k views • 2 months ago
11:47
AI Jason
58.4k views • 3 months ago
57:36
AI For Humans
14.4k views • 3 months ago
01:43
AI For Humans
2.7k views • 3 months ago
15:44
AI Explained
58.5k views • 3 months ago
02:06
AI For Humans
8.1k views • 3 months ago
14:07
AI Explained
67.2k views • 3 months ago
38:23
AI For Humans
11.1k views • 3 months ago
49:55
AI For Humans
10.9k views • 3 months ago
02:12
AI For Humans
2.8k views • 3 months ago
11:32
AI Explained
48.6k views • 3 months ago
11:32
AI Explained
20.2k views • 3 months ago
50:33
AI For Humans
13.3k views • 3 months ago
06:41
AI Jason
17.0k views • 4 months ago
44:47
AI For Humans
9.4k views • 4 months ago
52:12
AI For Humans
19.1k views • 4 months ago
18:55
AI Explained
57.7k views • 4 months ago
53:56
AI For Humans
16.4k views • 4 months ago
44:52
AI For Humans
11.4k views • 4 months ago
16:02
AI Jason
106.1k views • 4 months ago
53:25
AI For Humans
18.3k views • 5 months ago
15:02
AI Explained
163.3k views • 5 months ago
11:55
AI Explained
196.3k views • 5 months ago
40:18
AI For Humans
35.8k views • 5 months ago
64:05
AI For Humans
20.0k views • 5 months ago
18:44
AI Jason
136.6k views • 5 months ago
17:20
AI Explained
84.6k views • 5 months ago
51:06
AI For Humans
18.8k views • 5 months ago
07:02
AI Jason
81.1k views • 5 months ago
02:12
AI For Humans
12.9k views • 5 months ago
11:44
AI Explained
178.4k views • 5 months ago
09:29
AI Jason
52.5k views • 6 months ago
55:27
AI For Humans
22.3k views • 6 months ago
16:39
AI Jason
184.9k views • 6 months ago
26:20
AI Explained
110.0k views • 6 months ago
51:33
AI For Humans
19.7k views • 6 months ago
05:56
AI Jason
22.9k views • 6 months ago
01:22
AI For Humans
4.0k views • 6 months ago
14:01
AI Explained
101.6k views • 6 months ago
45:51
AI For Humans
19.1k views • 6 months ago
22:02
AI Jason
17.9k views • 6 months ago
16:50
AI Explained
96.4k views • 7 months ago
02:47
AI For Humans
15.8k views • 7 months ago
00:49
AI For Humans
3.2k views • 7 months ago
56:57
AI For Humans
31.6k views • 7 months ago
13:09
AI For Humans
9.5k views • 7 months ago
03:35
AI Jason
52.2k views • 7 months ago
19:05
AI Explained
98.9k views • 7 months ago
56:15
AI For Humans
20.4k views • 7 months ago
04:25
AI Jason
24.5k views • 7 months ago
17:08
AI Explained
99.6k views • 7 months ago
02:10
AI For Humans
31.0k views • 7 months ago
14:02
AI Jason
10.2k views • 7 months ago
01:59
AI For Humans
2.0k views • 7 months ago
17:42
AI Explained
83.1k views • 7 months ago
48:59
AI For Humans
15.4k views • 7 months ago
01:19
AI For Humans
1.5k views • 8 months ago
11:44
AI Jason
34.1k views • 8 months ago
34:24
AI Explained
105.7k views • 8 months ago
14:34
AI Explained
60.8k views • 8 months ago
19:04
AI Jason
54.9k views • 8 months ago
15:30
AI Jason
286.3k views • 9 months ago
13:19
AI Jason
172.4k views • 9 months ago
64:53
AI For Humans
7.7k views • 9 months ago
09:14
AI Jason
223.7k views • 9 months ago
10:09
AI Jason
16.5k views • 9 months ago
01:22
AI For Humans
1.1k views • 9 months ago
02:20
AI For Humans
1.2k views • 9 months ago
13:07
AI Jason
86.4k views • 9 months ago
131:12
Andrej Karpathy
2.2M views • 10 months ago
13:17
AI Jason
226.1k views • 10 months ago
55:52
AI For Humans
7.0k views • 10 months ago
04:01
AI For Humans
6.4k views • 10 months ago
211:24
Andrej Karpathy
4.4M views • 11 months ago
01:08
AI For Humans
2.3k views • 11 months ago
52:17
AI For Humans
9.7k views • 11 months ago
81:55
Andrej Karpathy
34.7k views • 1 year ago
51:56
AI For Humans
8.0k views • 1 year ago
52:16
AI For Humans
6.5k views • 1 year ago
46:52
AI For Humans
7.6k views • 1 year ago
07:16
AI For Humans
4.2k views • 1 year ago
01:00
AI For Humans
2.4k views • 1 year ago
00:52
AI For Humans
3.1k views • 1 year ago
00:50
AI For Humans
9.6k views • 1 year ago
241:26
Andrej Karpathy
963.2k views • 1 year ago
30:38
Morningside AI
13.8k views • 1 year ago
133:35
Andrej Karpathy
989.2k views • 1 year ago
26:56
Morningside AI
4.2k views • 2 years ago
45:54
Morningside AI
10.1k views • 2 years ago
59:48
Andrej Karpathy
3.3M views • 2 years ago
39:00
Morningside AI
26.7k views • 2 years ago
116:20
Andrej Karpathy
6.7M views • 2 years ago
56:22
Andrej Karpathy
254.4k views • 3 years ago
115:24
Andrej Karpathy
313.6k views • 3 years ago
115:58
Andrej Karpathy
457.4k views • 3 years ago
75:40
Andrej Karpathy
490.9k views • 3 years ago
103 Comments
I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?
Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏
Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.
This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.
The amount of useful information in this video is impressive. Thanks for such good content.
Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!
🐐     See Less
I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?     See Less
Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏     See Less
🎯Course outline for quick navigation:
[00:00-03:21]1.     See More ng and refactoring neural networks for language modeling
-[00:00-00:30]Continuing makemore implementation with multilayer perceptron for character-level language modeling, planning to move to larger neural networks.
-[00:31-01:03]Understanding neural net activations and gradients in training is crucial for optimizing architectures.
-[02:06-02:46]Refactored code to optimize neural net with 11,000 parameters over 200,000 steps, achieving train and val loss of 2.16.
-[03:03-03:28]Using torch.nograd decorator to prevent gradients computation.
[03:22-14:22]2. Efficiency of torch.no_grad and neural net initialization issues
-[03:22-04:00]Using torch's no_grad makes computation more efficient by eliminating gradient tracking.
-[04:22-04:50]Network initialization causes high loss of 27, rapidly decreases to 1 or 2.
-[05:00-05:32]At initialization, the model aims for a uniform distribution among 27 characters, with roughly 1/27 probability for each.
-[05:49-06:19]Neural net creates skewed probability distributions leading to high loss.
-[12:08-12:36]Loss at initialization as expected, improved to 2.12-2.16
[14:24-36:39]3. Neural network initialization
-[16:03-16:31]The chain rule with local gradient is affected when outputs of tanh are close to -1 or 1, leading to a halt in back propagation.
-[18:09-18:38]Concern over destructive gradients in flat regions of h outputs, tackled by analyzing absolute values.
-[26:03-26:31]Optimization led to improved validation loss from 2.17 to 2.10 by fixing softmax and 10-inch layer issues.
-[29:28-30:02]Standard deviation expanded to three, aiming for unit gaussian distribution in neural nets.
-[30:17-30:47]Scaling down by 0.2 shrinks gaussian with standard deviation 0.6.
-[31:03-31:46]Initializing neural network weights for well-behaved activations, kaiming he et al.
-[36:24-36:55]Modern innovations have improved network stability and behavior, including residual connections, normalization layers, and better optimizers.
[36:39-51:52]4. Neural net initialization and batch normalization
-[36:39-37:05]Modern innovations like normalization layers and better optimizers reduce the need for precise neural net initialization.
-[40:32-43:04]Batch normalization enables reliable training of deep neural nets, ensuring roughly gaussian hidden states for improved performance.
-[40:51-41:13]Batch normalization from 2015 enabled reliable training of deep neural nets.
-[41:39-42:09]Standardizing hidden states to be unit gaussian is a perfectly differentiable operation, a key insight in the paper.
-[43:20-43:50]Calculating standard deviation of activations, mean is average value of neuron's activation.
-[45:45-46:16]Back propagation guides distribution movement, adding scale and shift for final output
[51:52-01:01:35]5. Jittering and batch normalization in neural network training
-[52:10-52:37]Padding input examples adds entropy, augments data, and regularizes neural nets.
-[53:44-54:09]Batch normalization effectively controls activations and their distributions.
-[56:05-56:33]Batch normalization paper introduces running mean and standard deviation estimation during training.
-[01:00:46-01:01:10]Eliminated explicit calibration stage, almost done with batch normalization, epsilon prevents division by zero.
[01:01:36-01:09:21]6. Batch normalization and resnet in pytorch
-[01:02:00-01:02:30]Biases are subtracted out in batch normalization, reducing their impact to zero.
-[01:03:13-01:03:53]Using batch normalization to control activations in neural net, with gain, bias, mean, and standard deviation parameters.
-[01:07:25-01:07:53]Creating deep neural networks with weight layers, normalization, and non-linearity, as exemplified in the provided code.
[01:09:21-01:23:37]7. Pytorch weight initialization and batch normalization
-[01:10:05-01:10:32]Pytorch initializes weights using 1/fan-in square root from a uniform distribution.
-[01:11:11-01:11:40]Scaling weights by 1 over sqrt of fan in, using batch normalization layer in pytorch with 200 features.
-[01:14:02-01:14:35]Importance of understanding activations and gradients in neural networks, especially as they get bigger and deeper.
-[01:16:00-01:16:30]Batch normalization centers data for gaussian activations in deep neural networks.
-[01:17:32-01:18:02]Batch normalization, influential in 2015, enabled reliable training of much deeper neural nets.
[01:23:39-01:55:56]8. Custom pytorch layer and network analysis
-[01:24:01-01:24:32]Updating buffers using exponential moving average with torch.nograd context manager.
-[01:25:47-01:27:11]The model has 46,000 parameters and uses pytorch for forward and backward passes, with visualizations of forward pass activations.
-[01:28:04-01:28:30]Saturation stabilizes at 20% initially, then stabilizes at 5% with a standard deviation of 0.65 due to gain set at 5 over 3.
-[01:33:19-01:33:50]Setting gain correctly at 1 prevents shrinking and diffusion in batch normalization.
-[01:38:41-01:39:11]The last layer has gradients 100 times greater, causing faster training, but it self-corrects with longer training.
-[01:43:18-01:43:42]Monitoring update ratio for parameters to ensure efficient training, aiming for -3 on log plot.
-[01:51:36-01:52:04]Introduce batch normalization and pytorch modules for neural networks.
-[01:52:39-01:53:06]Introduction to diagnostic tools for neural network analysis.
-[01:54:45-01:55:50]Introduction to diagnostic tools in neural networks, active research in initialization and backpropagation, ongoing progress
offered by Coursnap    See Less
Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.     See Less
This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.     See Less
The amount of useful information in this video is impressive. Thanks for such good content.     See Less
I keep coming back to these videos again and again. Andrej is legend!     See Less
Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!     See Less
what is the purpose of bnmean_running and bnstd_running?     See Less