🐐
Andrej Karpathy
1.1M subscribersWe dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, ...
I keep coming back to these videos again and again. Andrej i...
99 Comments
00:42
AI For Humans
1.9k views • 1 day ago
44:40
AI For Humans
10.1k views • 2 days ago
19:38
Wes Roth
31.0k views • 2 days ago
14:56
AI Explained
48.0k views • 3 days ago
01:41
AI For Humans
4.3k views • 3 days ago
21:31
TheAIGRID
12.7k views • 3 days ago
21:31
TheAIGRID
7.6k views • 3 days ago
12:08
Wes Roth
46.5k views • 3 days ago
21:43
AI Explained
102.9k views • 4 days ago
23:40
TheAIGRID
56.2k views • 4 days ago
Shelf will be hidden for 30 daysUndo
14:08
Wes Roth
53.4k views • 5 days ago
26:37
Wes Roth
49.0k views • 5 days ago
13:33
TheAIGRID
8.5k views • 6 days ago
14:40
TheAIGRID
25.8k views • 6 days ago
15:37
TheAIGRID
81.1k views • 1 week ago
15:07
TheAIGRID
14.7k views • 1 week ago
29:19
Wes Roth
35.2k views • 1 week ago
18:27
AI Explained
58.7k views • 1 week ago
45:12
AI For Humans
11.0k views • 1 week ago
08:14
TheAIGRID
13.3k views • 1 week ago
19:45
TheAIGRID
117.2k views • 1 week ago
19:57
Wes Roth
24.8k views • 1 week ago
13:13
TheAIGRID
47.1k views • 1 week ago
48:20
Wes Roth
35.4k views • 1 week ago
12:54
AI Explained
57.9k views • 1 week ago
16:14
TheAIGRID
20.9k views • 1 week ago
11:02
Wes Roth
29.6k views • 1 week ago
18:35
TheAIGRID
12.8k views • 1 week ago
79:17
Wes Roth
85.9k views • 2 weeks ago
55:23
AI For Humans
11.6k views • 2 weeks ago
18:03
Wes Roth
53.8k views • 2 weeks ago
26:42
Wes Roth
71.0k views • 2 weeks ago
24:58
Wes Roth
53.2k views • 2 weeks ago
37:44
Wes Roth
30.0k views • 3 weeks ago
36:05
TheAIGRID
22.3k views • 3 weeks ago
13:16
TheAIGRID
27.6k views • 3 weeks ago
62:27
AI For Humans
9.4k views • 3 weeks ago
08:33
AI Jason
32.5k views • 3 weeks ago
01:25
AI For Humans
4.7k views • 3 weeks ago
07:58
Wes Roth
43.5k views • 3 weeks ago
25:09
Wes Roth
50.1k views • 3 weeks ago
01:25
AI For Humans
2.0k views • 3 weeks ago
01:19
AI For Humans
3.2k views • 4 weeks ago
08:55
TheAIGRID
18.2k views • 4 weeks ago
15:17
Wes Roth
14.4k views • 4 weeks ago
52:48
AI For Humans
13.3k views • 4 weeks ago
13:52
TheAIGRID
98.9k views • 1 month ago
14:14
AI Explained
57.3k views • 1 month ago
10:52
TheAIGRID
19.6k views • 1 month ago
23:43
Wes Roth
28.4k views • 1 month ago
14:23
TheAIGRID
28.0k views • 1 month ago
15:02
TheAIGRID
14.7k views • 1 month ago
15:51
TheAIGRID
18.6k views • 1 month ago
14:44
TheAIGRID
27.4k views • 1 month ago
05:14
AI Jason
30.5k views • 1 month ago
24:40
Wes Roth
168.0k views • 1 month ago
14:17
TheAIGRID
26.6k views • 1 month ago
48:09
AI For Humans
11.4k views • 1 month ago
28:09
Wes Roth
60.8k views • 1 month ago
35:42
Wes Roth
39.4k views • 1 month ago
37:59
TheAIGRID
14.6k views • 1 month ago
24:15
Wes Roth
137.1k views • 1 month ago
06:07
Wes Roth
21.1k views • 1 month ago
53:01
AI For Humans
14.5k views • 1 month ago
23:55
Wes Roth
32.4k views • 1 month ago
14:50
Wes Roth
43.0k views • 1 month ago
112:36
Wes Roth
12.9k views • 1 month ago
11:47
AI Jason
53.8k views • 1 month ago
57:36
AI For Humans
14.4k views • 1 month ago
01:43
AI For Humans
2.7k views • 1 month ago
14:22
TheAIGRID
9.0k views • 1 month ago
15:44
AI Explained
58.0k views • 1 month ago
24:25
TheAIGRID
17.5k views • 1 month ago
02:06
AI For Humans
8.1k views • 1 month ago
14:07
AI Explained
66.2k views • 1 month ago
38:23
AI For Humans
11.1k views • 1 month ago
18:08
TheAIGRID
19.5k views • 1 month ago
49:55
AI For Humans
10.9k views • 2 months ago
02:12
AI For Humans
2.8k views • 2 months ago
11:32
AI Explained
20.2k views • 2 months ago
11:32
AI Explained
48.3k views • 2 months ago
50:33
AI For Humans
13.3k views • 2 months ago
06:41
AI Jason
15.8k views • 2 months ago
44:47
AI For Humans
9.4k views • 2 months ago
52:12
AI For Humans
19.1k views • 2 months ago
18:55
AI Explained
57.4k views • 2 months ago
53:56
AI For Humans
16.4k views • 3 months ago
44:52
AI For Humans
11.4k views • 3 months ago
16:02
AI Jason
94.2k views • 3 months ago
53:25
AI For Humans
18.3k views • 3 months ago
15:02
AI Explained
163.0k views • 3 months ago
11:55
AI Explained
194.4k views • 3 months ago
40:18
AI For Humans
35.8k views • 3 months ago
64:05
AI For Humans
20.0k views • 3 months ago
18:44
AI Jason
133.7k views • 3 months ago
17:20
AI Explained
84.4k views • 4 months ago
51:06
AI For Humans
18.8k views • 4 months ago
07:02
AI Jason
80.4k views • 4 months ago
02:12
AI For Humans
12.9k views • 4 months ago
11:44
AI Explained
177.5k views • 4 months ago
09:29
AI Jason
51.6k views • 4 months ago
55:27
AI For Humans
22.3k views • 4 months ago
16:39
AI Jason
181.8k views • 4 months ago
26:20
AI Explained
109.2k views • 4 months ago
51:33
AI For Humans
19.7k views • 5 months ago
05:56
AI Jason
22.7k views • 5 months ago
01:22
AI For Humans
4.0k views • 5 months ago
14:01
AI Explained
101.3k views • 5 months ago
45:51
AI For Humans
19.1k views • 5 months ago
22:02
AI Jason
17.7k views • 5 months ago
16:50
AI Explained
96.3k views • 5 months ago
02:47
AI For Humans
15.8k views • 5 months ago
00:49
AI For Humans
3.2k views • 5 months ago
56:57
AI For Humans
31.6k views • 5 months ago
13:09
AI For Humans
9.5k views • 5 months ago
03:35
AI Jason
51.5k views • 5 months ago
19:05
AI Explained
98.8k views • 6 months ago
56:15
AI For Humans
20.4k views • 6 months ago
04:25
AI Jason
24.3k views • 6 months ago
17:08
AI Explained
99.6k views • 6 months ago
02:10
AI For Humans
31.0k views • 6 months ago
14:02
AI Jason
10.1k views • 6 months ago
01:59
AI For Humans
2.0k views • 6 months ago
17:42
AI Explained
82.8k views • 6 months ago
48:59
AI For Humans
15.4k views • 6 months ago
01:19
AI For Humans
1.5k views • 6 months ago
11:44
AI Jason
33.0k views • 6 months ago
34:24
AI Explained
105.2k views • 6 months ago
14:34
AI Explained
60.8k views • 6 months ago
19:04
AI Jason
54.5k views • 7 months ago
14:25
AI Explained
94.2k views • 7 months ago
20:10
AI Explained
60.3k views • 7 months ago
15:30
AI Jason
284.2k views • 7 months ago
23:52
AI Explained
72.7k views • 7 months ago
21:22
AI Explained
110.0k views • 7 months ago
13:19
AI Jason
165.9k views • 7 months ago
64:53
AI For Humans
7.7k views • 8 months ago
09:14
AI Jason
223.4k views • 8 months ago
10:09
AI Jason
16.5k views • 8 months ago
01:22
AI For Humans
1.1k views • 8 months ago
02:20
AI For Humans
1.2k views • 8 months ago
13:07
AI Jason
86.0k views • 8 months ago
131:12
Andrej Karpathy
2.1M views • 8 months ago
13:17
AI Jason
223.1k views • 9 months ago
55:52
AI For Humans
7.0k views • 9 months ago
04:01
AI For Humans
6.4k views • 9 months ago
211:24
Andrej Karpathy
4.0M views • 9 months ago
20:35
AI Jason
16.2k views • 9 months ago
08:40
AI Jason
18.5k views • 9 months ago
01:08
AI For Humans
2.3k views • 9 months ago
16:12
AI Jason
52.0k views • 10 months ago
52:17
AI For Humans
9.7k views • 10 months ago
81:55
Andrej Karpathy
34.7k views • 11 months ago
51:56
AI For Humans
8.0k views • 11 months ago
52:16
AI For Humans
6.5k views • 11 months ago
46:52
AI For Humans
7.6k views • 1 year ago
07:16
AI For Humans
4.2k views • 1 year ago
01:00
AI For Humans
2.4k views • 1 year ago
00:52
AI For Humans
3.1k views • 1 year ago
00:50
AI For Humans
9.6k views • 1 year ago
241:26
Andrej Karpathy
943.4k views • 1 year ago
30:38
Morningside AI
13.8k views • 1 year ago
133:35
Andrej Karpathy
962.9k views • 1 year ago
26:56
Morningside AI
4.2k views • 1 year ago
45:54
Morningside AI
10.1k views • 1 year ago
59:48
Andrej Karpathy
3.2M views • 2 years ago
39:00
Morningside AI
26.7k views • 2 years ago
116:20
Andrej Karpathy
6.6M views • 2 years ago
56:22
Andrej Karpathy
250.4k views • 3 years ago
115:24
Andrej Karpathy
307.4k views • 3 years ago
115:58
Andrej Karpathy
448.2k views • 3 years ago
75:40
Andrej Karpathy
481.7k views • 3 years ago
99 Comments
I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?
Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏
Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.
This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.
The amount of useful information in this video is impressive. Thanks for such good content.
Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!
🐐     See Less
I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?     See Less
Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏     See Less
🎯Course outline for quick navigation:
[00:00-03:21]1.     See More ng and refactoring neural networks for language modeling
-[00:00-00:30]Continuing makemore implementation with multilayer perceptron for character-level language modeling, planning to move to larger neural networks.
-[00:31-01:03]Understanding neural net activations and gradients in training is crucial for optimizing architectures.
-[02:06-02:46]Refactored code to optimize neural net with 11,000 parameters over 200,000 steps, achieving train and val loss of 2.16.
-[03:03-03:28]Using torch.nograd decorator to prevent gradients computation.
[03:22-14:22]2. Efficiency of torch.no_grad and neural net initialization issues
-[03:22-04:00]Using torch's no_grad makes computation more efficient by eliminating gradient tracking.
-[04:22-04:50]Network initialization causes high loss of 27, rapidly decreases to 1 or 2.
-[05:00-05:32]At initialization, the model aims for a uniform distribution among 27 characters, with roughly 1/27 probability for each.
-[05:49-06:19]Neural net creates skewed probability distributions leading to high loss.
-[12:08-12:36]Loss at initialization as expected, improved to 2.12-2.16
[14:24-36:39]3. Neural network initialization
-[16:03-16:31]The chain rule with local gradient is affected when outputs of tanh are close to -1 or 1, leading to a halt in back propagation.
-[18:09-18:38]Concern over destructive gradients in flat regions of h outputs, tackled by analyzing absolute values.
-[26:03-26:31]Optimization led to improved validation loss from 2.17 to 2.10 by fixing softmax and 10-inch layer issues.
-[29:28-30:02]Standard deviation expanded to three, aiming for unit gaussian distribution in neural nets.
-[30:17-30:47]Scaling down by 0.2 shrinks gaussian with standard deviation 0.6.
-[31:03-31:46]Initializing neural network weights for well-behaved activations, kaiming he et al.
-[36:24-36:55]Modern innovations have improved network stability and behavior, including residual connections, normalization layers, and better optimizers.
[36:39-51:52]4. Neural net initialization and batch normalization
-[36:39-37:05]Modern innovations like normalization layers and better optimizers reduce the need for precise neural net initialization.
-[40:32-43:04]Batch normalization enables reliable training of deep neural nets, ensuring roughly gaussian hidden states for improved performance.
-[40:51-41:13]Batch normalization from 2015 enabled reliable training of deep neural nets.
-[41:39-42:09]Standardizing hidden states to be unit gaussian is a perfectly differentiable operation, a key insight in the paper.
-[43:20-43:50]Calculating standard deviation of activations, mean is average value of neuron's activation.
-[45:45-46:16]Back propagation guides distribution movement, adding scale and shift for final output
[51:52-01:01:35]5. Jittering and batch normalization in neural network training
-[52:10-52:37]Padding input examples adds entropy, augments data, and regularizes neural nets.
-[53:44-54:09]Batch normalization effectively controls activations and their distributions.
-[56:05-56:33]Batch normalization paper introduces running mean and standard deviation estimation during training.
-[01:00:46-01:01:10]Eliminated explicit calibration stage, almost done with batch normalization, epsilon prevents division by zero.
[01:01:36-01:09:21]6. Batch normalization and resnet in pytorch
-[01:02:00-01:02:30]Biases are subtracted out in batch normalization, reducing their impact to zero.
-[01:03:13-01:03:53]Using batch normalization to control activations in neural net, with gain, bias, mean, and standard deviation parameters.
-[01:07:25-01:07:53]Creating deep neural networks with weight layers, normalization, and non-linearity, as exemplified in the provided code.
[01:09:21-01:23:37]7. Pytorch weight initialization and batch normalization
-[01:10:05-01:10:32]Pytorch initializes weights using 1/fan-in square root from a uniform distribution.
-[01:11:11-01:11:40]Scaling weights by 1 over sqrt of fan in, using batch normalization layer in pytorch with 200 features.
-[01:14:02-01:14:35]Importance of understanding activations and gradients in neural networks, especially as they get bigger and deeper.
-[01:16:00-01:16:30]Batch normalization centers data for gaussian activations in deep neural networks.
-[01:17:32-01:18:02]Batch normalization, influential in 2015, enabled reliable training of much deeper neural nets.
[01:23:39-01:55:56]8. Custom pytorch layer and network analysis
-[01:24:01-01:24:32]Updating buffers using exponential moving average with torch.nograd context manager.
-[01:25:47-01:27:11]The model has 46,000 parameters and uses pytorch for forward and backward passes, with visualizations of forward pass activations.
-[01:28:04-01:28:30]Saturation stabilizes at 20% initially, then stabilizes at 5% with a standard deviation of 0.65 due to gain set at 5 over 3.
-[01:33:19-01:33:50]Setting gain correctly at 1 prevents shrinking and diffusion in batch normalization.
-[01:38:41-01:39:11]The last layer has gradients 100 times greater, causing faster training, but it self-corrects with longer training.
-[01:43:18-01:43:42]Monitoring update ratio for parameters to ensure efficient training, aiming for -3 on log plot.
-[01:51:36-01:52:04]Introduce batch normalization and pytorch modules for neural networks.
-[01:52:39-01:53:06]Introduction to diagnostic tools for neural network analysis.
-[01:54:45-01:55:50]Introduction to diagnostic tools in neural networks, active research in initialization and backpropagation, ongoing progress
offered by Coursnap    See Less
Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.     See Less
This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.     See Less
The amount of useful information in this video is impressive. Thanks for such good content.     See Less
I keep coming back to these videos again and again. Andrej is legend!     See Less
Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!     See Less
what is the purpose of bnmean_running and bnstd_running?     See Less