Building makemore Part 3: Activations & Gradients, BatchNorm

Andrej Karpathy

1.2M subscribers

457.4k views  •  3 years ago

We dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, ...

Comments445

User Image

I keep coming back to these videos again and again. Andrej i...

103 Comments

@EricRubio7  1 year ago

🐐     See Less

@adosar7261  1 year ago

I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?     See Less

@theusualcouple  1 year ago

Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏     See Less

@zlsj861  1 year ago

🎯Course outline for quick navigation:

[00:00-03:21]1.     See More ng and refactoring neural networks for language modeling
-[00:00-00:30]Continuing makemore implementation with multilayer perceptron for character-level language modeling, planning to move to larger neural networks.
-[00:31-01:03]Understanding neural net activations and gradients in training is crucial for optimizing architectures.
-[02:06-02:46]Refactored code to optimize neural net with 11,000 parameters over 200,000 steps, achieving train and val loss of 2.16.
-[03:03-03:28]Using torch.nograd decorator to prevent gradients computation.

[03:22-14:22]2. Efficiency of torch.no_grad and neural net initialization issues
-[03:22-04:00]Using torch's no_grad makes computation more efficient by eliminating gradient tracking.
-[04:22-04:50]Network initialization causes high loss of 27, rapidly decreases to 1 or 2.
-[05:00-05:32]At initialization, the model aims for a uniform distribution among 27 characters, with roughly 1/27 probability for each.
-[05:49-06:19]Neural net creates skewed probability distributions leading to high loss.
-[12:08-12:36]Loss at initialization as expected, improved to 2.12-2.16

[14:24-36:39]3. Neural network initialization
-[16:03-16:31]The chain rule with local gradient is affected when outputs of tanh are close to -1 or 1, leading to a halt in back propagation.
-[18:09-18:38]Concern over destructive gradients in flat regions of h outputs, tackled by analyzing absolute values.
-[26:03-26:31]Optimization led to improved validation loss from 2.17 to 2.10 by fixing softmax and 10-inch layer issues.
-[29:28-30:02]Standard deviation expanded to three, aiming for unit gaussian distribution in neural nets.
-[30:17-30:47]Scaling down by 0.2 shrinks gaussian with standard deviation 0.6.
-[31:03-31:46]Initializing neural network weights for well-behaved activations, kaiming he et al.
-[36:24-36:55]Modern innovations have improved network stability and behavior, including residual connections, normalization layers, and better optimizers.

[36:39-51:52]4. Neural net initialization and batch normalization
-[36:39-37:05]Modern innovations like normalization layers and better optimizers reduce the need for precise neural net initialization.
-[40:32-43:04]Batch normalization enables reliable training of deep neural nets, ensuring roughly gaussian hidden states for improved performance.
-[40:51-41:13]Batch normalization from 2015 enabled reliable training of deep neural nets.
-[41:39-42:09]Standardizing hidden states to be unit gaussian is a perfectly differentiable operation, a key insight in the paper.
-[43:20-43:50]Calculating standard deviation of activations, mean is average value of neuron's activation.
-[45:45-46:16]Back propagation guides distribution movement, adding scale and shift for final output

[51:52-01:01:35]5. Jittering and batch normalization in neural network training
-[52:10-52:37]Padding input examples adds entropy, augments data, and regularizes neural nets.
-[53:44-54:09]Batch normalization effectively controls activations and their distributions.
-[56:05-56:33]Batch normalization paper introduces running mean and standard deviation estimation during training.
-[01:00:46-01:01:10]Eliminated explicit calibration stage, almost done with batch normalization, epsilon prevents division by zero.

[01:01:36-01:09:21]6. Batch normalization and resnet in pytorch
-[01:02:00-01:02:30]Biases are subtracted out in batch normalization, reducing their impact to zero.
-[01:03:13-01:03:53]Using batch normalization to control activations in neural net, with gain, bias, mean, and standard deviation parameters.
-[01:07:25-01:07:53]Creating deep neural networks with weight layers, normalization, and non-linearity, as exemplified in the provided code.

[01:09:21-01:23:37]7. Pytorch weight initialization and batch normalization
-[01:10:05-01:10:32]Pytorch initializes weights using 1/fan-in square root from a uniform distribution.
-[01:11:11-01:11:40]Scaling weights by 1 over sqrt of fan in, using batch normalization layer in pytorch with 200 features.
-[01:14:02-01:14:35]Importance of understanding activations and gradients in neural networks, especially as they get bigger and deeper.
-[01:16:00-01:16:30]Batch normalization centers data for gaussian activations in deep neural networks.
-[01:17:32-01:18:02]Batch normalization, influential in 2015, enabled reliable training of much deeper neural nets.

[01:23:39-01:55:56]8. Custom pytorch layer and network analysis
-[01:24:01-01:24:32]Updating buffers using exponential moving average with torch.nograd context manager.
-[01:25:47-01:27:11]The model has 46,000 parameters and uses pytorch for forward and backward passes, with visualizations of forward pass activations.
-[01:28:04-01:28:30]Saturation stabilizes at 20% initially, then stabilizes at 5% with a standard deviation of 0.65 due to gain set at 5 over 3.
-[01:33:19-01:33:50]Setting gain correctly at 1 prevents shrinking and diffusion in batch normalization.
-[01:38:41-01:39:11]The last layer has gradients 100 times greater, causing faster training, but it self-corrects with longer training.
-[01:43:18-01:43:42]Monitoring update ratio for parameters to ensure efficient training, aiming for -3 on log plot.
-[01:51:36-01:52:04]Introduce batch normalization and pytorch modules for neural networks.
-[01:52:39-01:53:06]Introduction to diagnostic tools for neural network analysis.
-[01:54:45-01:55:50]Introduction to diagnostic tools in neural networks, active research in initialization and backpropagation, ongoing progress

offered by Coursnap    See Less

@adamskrodzki6152  1 year ago

Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.     See Less

@styssine  1 year ago

This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.     See Less

@lucianovidal8721  1 year ago

The amount of useful information in this video is impressive. Thanks for such good content.     See Less

@sanjaybhatikar  1 year ago

I keep coming back to these videos again and again. Andrej is legend!     See Less

@JuliusSmith  1 year ago

Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!     See Less

@pravingaikwad1337  1 year ago

what is the purpose of bnmean_running and bnstd_running?     See Less

27:23

Claude Code is about to break everything

Wes Roth

23.3k views   •   23 hours ago

08:52

Can This AI Breakthrough Bring DeepSeek Back?

TheAIGRID

3.1k views   •   1 day ago

19:30

OpenAI's new killer app

Wes Roth

27.7k views   •   1 day ago

30:11

The Most Stunning AI Announcements From CES 2026

TheAIGRID

21.1k views   •   2 days ago

111:10

the "GOD" company is coming...

Wes Roth

11.7k views   •   2 days ago

13:07

OpenAI’s AI Pen Might Be the First AI Device That Works

TheAIGRID

37.5k views   •   4 days ago

89:57

Lee Cronin "Sam Altman Is Delusional, Hinton Needs Therapy, P(Doom) Is Nonsense"

Wes Roth

18.0k views   •   4 days ago

15:52

The Growing AI Backlash Nobody Wants to Talk About.

TheAIGRID

36.1k views   •   6 days ago

21:05

Google's "Infinite Learning" and OpenAI's leaked "AI Pen"

Wes Roth

46.3k views   •   6 days ago

14:24

How To Identify AI Images - How To Spot AI Images 2026

TheAIGRID

3.9k views   •   1 week ago

11:31

This Test Was Built to Block AI — GPT-5 Finally Passed It

TheAIGRID

19.9k views   •   1 week ago

10:45

Meta just did the thing

Wes Roth

31.6k views   •   1 week ago

27:21

How A.I Changes In 2026 - Major Predictions

TheAIGRID

38.5k views   •   1 week ago

18:55

the creator of Claude Code just revealed the truth

Wes Roth

54.2k views   •   1 week ago

10:25

A New Kind of AI Is Emerging And Its Better Than LLMS?

TheAIGRID

416.1k views   •   1 week ago

88:45

this new benchmark is next-level insane

Wes Roth

13.4k views   •   1 week ago

18:54

Secrets to unlock Gemini 3's hidden power...

AI Jason

63.4k views   •   2 weeks ago

33:27

What the Freakiness of 2025 in AI Tells Us About 2026

AI Explained

112.1k views   •   2 weeks ago

09:34

NVIDIA's New AI Agent Just Crossed the Line - The Age of AI Agents Begins (Nvidia Nitrogen)

TheAIGRID

18.2k views   •   2 weeks ago

62:07

Avi Loeb reveals the truth about 3I/ATLAS

Wes Roth

11.7k views   •   2 weeks ago

20:00

Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but …

AI Explained

86.7k views   •   3 weeks ago

45:20

OpenAI’s new ChatGPT Images is here! But…Will it top Nano Banana Pro?

AI For Humans

8.5k views   •   3 weeks ago

38:57

AI earns while you sleep

Wes Roth

31.1k views   •   3 weeks ago

25:37

China’s "Impossible" AI Breakthrough: We Are In Trouble

TheAIGRID

41.8k views   •   3 weeks ago

23:58

OpenAI is "Is Hiding the Truth"

Wes Roth

64.4k views   •   3 weeks ago

35:11

AI News :The First “AGI-Capable” Model, Prompting Changes Forever , Automated AI Lab and more..

TheAIGRID

22.8k views   •   3 weeks ago

10:39

OpenAI Researcher QUITS — Says the Company Is Hiding the Truth - (It Actually Worse Than You Think)

TheAIGRID

55.4k views   •   3 weeks ago

27:21

Google DeepMind: "The arrival of AGI"

Wes Roth

82.6k views   •   3 weeks ago

13:03

Ex Google AI Veteran Claims Worlds First AGI Capable System - And Nobodys Talking About it...

TheAIGRID

15.0k views   •   3 weeks ago

17:42

GPT 5.2: OpenAI Strikes Back

AI Explained

87.3k views   •   4 weeks ago

55:07

GPT-5.2 Finally Arrived, But The Disney Deal is Bigger

AI For Humans

10.6k views   •   4 weeks ago

11:29

Nano Banana + Gemini 3 = S-TIER UI DESIGNER

AI Jason

73.9k views   •   4 weeks ago

34:18

GPT 5.2 is the first HUMAN LABOR replacement

Wes Roth

68.4k views   •   4 weeks ago

19:32

Pentagon "Four Months to Prepare for AGI"

Wes Roth

57.4k views   •   4 weeks ago

41:56

The Latest AI Breakthroughs You Need to See (Google, OpenAI, Deepseek and More)

TheAIGRID

26.2k views   •   1 month ago

17:31

"Code Red" Over, OpenAI is about to blow...

Wes Roth

53.2k views   •   1 month ago

42:48

The Latest Humanoid Robotics Breakthroughs You Need to See

TheAIGRID

23.8k views   •   1 month ago

33:44

AI News : Deepseek Returns, Amazons Secret AI Models, Googles Breakthrough , Veo 3 Beaten and More

TheAIGRID

9.3k views   •   1 month ago

118:46

Elon Reveals GROK 4.20... and it's getting scary good

Wes Roth

20.7k views   •   1 month ago

10:12

Google’s New Breakthrough Brings AGI Even Closer - Titans and Miras

TheAIGRID

20.1k views   •   1 month ago

20:16

You Are Being Told Contradictory Things About AI

AI Explained

73.4k views   •   1 month ago

49:17

OpenAI's Code Red: Can New AI Models Hold Off Google Gemini?

AI For Humans

9.8k views   •   1 month ago

11:29

ChatGPT Privacy CRACKS:The Court Now Has Your ChatGPT History

TheAIGRID

6.1k views   •   1 month ago

08:52

BREAKING: Grok 4.20 might be *too* good...

Wes Roth

74.2k views   •   1 month ago

27:54

this experiment could END the AI hype

Wes Roth

20.7k views   •   1 month ago

13:46

AI Is About to Change Coding Forever in 2026 - "Software Engineering Is Done"

TheAIGRID

24.4k views   •   1 month ago

62:38

China Just Popped America's AI Bubble: Cyrus Janssen Reveals What Happens Next!

Wes Roth

29.0k views   •   1 month ago

10:43

Grok Thinks Elon Musk Is a God… This Is Where It Gets Dangerous

TheAIGRID

7.0k views   •   1 month ago

81:11

You have TWO YEARS LEFT to prepare - Dr. Roman Yampolskiy

Wes Roth

109.0k views   •   1 month ago

18:47

Claude turns chaotic evil

Wes Roth

212.0k views   •   1 month ago

00:26

Do NOT do this with Nano Banana Pro #ai #aiart #google

AI For Humans

3.0k views   •   1 month ago

20:07

Claude just beat Gemini 3... how?!

Wes Roth

45.6k views   •   1 month ago

02:17

How to Tell If an Image Is AI-Generated (Beginner Friendly)

TheAIGRID

5.5k views   •   1 month ago

12:33

"okay, but I want Gemini3 to perform 10x for my specific use case" - Here is how

AI Jason

30.0k views   •   1 month ago

00:42

Nano Banana Pro: Take a Selfie With Every Version of You

AI For Humans

4.8k views   •   1 month ago

44:40

Google's Nano Banana Pro & Gemini 3 Just Changed Everything!

AI For Humans

15.3k views   •   1 month ago

19:38

Google's UNREAL New Nano Banana Pro...

Wes Roth

34.9k views   •   1 month ago

14:56

Nano Banana Pro: But Did You Catch These 10 Details?

AI Explained

59.6k views   •   1 month ago

01:41

Google's Nano Banana Pro is INSANE

AI For Humans

5.6k views   •   1 month ago

12:08

the world wasn't ready for Gemini 3

Wes Roth

49.0k views   •   1 month ago

21:43

Gemini 3 Pro: Breakdown

AI Explained

117.4k views   •   1 month ago

23:40

Gemini 3 Shows a Level of Intelligence We Haven’t Seen Before. (Gemini 3 Explained)

TheAIGRID

71.3k views   •   1 month ago

14:08

Gemini 3 just got *scary* good

Wes Roth

54.9k views   •   1 month ago

13:33

This Chip Could Give OpenAI an Unfair Advantage.

TheAIGRID

8.8k views   •   1 month ago

14:40

Researchers Just Broke AI’s Most Important Assumption. (We Were Wrong About LLMs)

TheAIGRID

27.0k views   •   1 month ago

15:37

If This Works… AGI Arrives Early. (Thermodynamic Computing)

TheAIGRID

111.2k views   •   1 month ago

15:07

Google’s SIMA 2: The Most Advanced AI Agent Ever Built

TheAIGRID

17.0k views   •   1 month ago

18:27

Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that

AI Explained

61.5k views   •   1 month ago

45:12

OpenAI Surprise Drops GPT-5.1 But Google Is Lurking

AI For Humans

11.8k views   •   1 month ago

19:45

Meta’s AI Genius Just Quit — Even Zuckerberg Seems Surprised.

TheAIGRID

124.0k views   •   1 month ago

13:13

Leaked Letter Reveals OpenAI’s Real Plan... And people Aren't Happy About It.

TheAIGRID

48.4k views   •   1 month ago

12:54

Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)

AI Explained

60.3k views   •   1 month ago

55:23

AI Job Losses Are Real. Don’t Panic (Yet).

AI For Humans

11.6k views   •   2 months ago

62:27

OpenAI Unveils 2028 AGI Plan But First... Sora 2 Is Now For Pets?!

AI For Humans

9.4k views   •   2 months ago

08:33

The Design Mode for Claude Code...

AI Jason

38.0k views   •   2 months ago

01:25

How I Turned My Pet Into a Sora 2 Cameo Character

AI For Humans

4.7k views   •   2 months ago

01:25

This AI Makes Videos As You Type!

AI For Humans

2.0k views   •   2 months ago

01:19

Sora 2 Prompt Allows You To See "Real" AI Movies

AI For Humans

3.2k views   •   2 months ago

52:48

Will OpenAI's ChatGPT Atlas Roll Over Google in 2025?

AI For Humans

13.3k views   •   2 months ago

14:14

Did you miss these 2 AI stories? A *Real* LLM-crafted Breakthrough + Continual Learning Blocked?

AI Explained

58.1k views   •   2 months ago

05:14

Claude Skills - the SOP for your agent that is bigger than MCP

AI Jason

32.5k views   •   2 months ago

48:09

OpenAI’s Curvy Road to AGI Includes Sora 2 and… Erotica??

AI For Humans

11.4k views   •   2 months ago

53:01

OpenAI Nerfs Sora 2. Chaos Still Reigns. Is It Over??

AI For Humans

14.5k views   •   2 months ago

11:47

.agent folder is making claude code 10x better...

AI Jason

58.4k views   •   3 months ago

57:36

OpenAI’s Sora 2: Future of Media or AI SLOPOCALYPSE??

AI For Humans

14.4k views   •   3 months ago

01:43

Introducing AndThen. Play the Conversation.

AI For Humans

2.7k views   •   3 months ago

15:44

Sora 2 - It will only get more realistic from here

AI Explained

58.5k views   •   3 months ago

02:06

You Won't Believe Sora 2's New Features!

AI For Humans

8.1k views   •   3 months ago

14:07

OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings

AI Explained

67.2k views   •   3 months ago

38:23

OpenAI Raises Billions While AI Creates New Drugs. What's Next?

AI For Humans

11.1k views   •   3 months ago

49:55

Meta’s $800 AI Glasses Show The Future… Sometimes Breaks

AI For Humans

10.9k views   •   3 months ago

02:12

How Did He Make This With AI?

AI For Humans

2.8k views   •   3 months ago

11:32

ChatGPT Can Now Call the Cops, but 'Wait till 2100 for Full Job Impact' - Altman

AI Explained

48.6k views   •   3 months ago

11:32

ChatGPT Can Now Call the Cops, but 'Wait till 2100 for Full Job Impact' - Altman

AI Explained

20.2k views   •   3 months ago

50:33

OpenAI Is Spending A Fortune To Get To AGI. Will They Make It?

AI For Humans

13.3k views   •   3 months ago

06:41

Vibe Design is much better than I thought...

AI Jason

17.0k views   •   4 months ago

44:47

AI Is Taking Jobs. It Doesn't Have To Take Yours.

AI For Humans

9.4k views   •   4 months ago

52:12

We Tried Google’s Nano Banana AI Model. It’s... Ridiculous.

AI For Humans

19.1k views   •   4 months ago

18:55

An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana

AI Explained

57.7k views   •   4 months ago

53:56

Move Over OpenAI… Google Looks Ready To Take The AI Lead

AI For Humans

16.4k views   •   4 months ago

44:52

OpenAI's GPT-5 Struggles To Be AI For Everything & Everybody

AI For Humans

11.4k views   •   4 months ago

16:02

I was using sub-agents wrong... Here is my way after 20+ hrs test

AI Jason

106.1k views   •   4 months ago

53:25

OpenAI’s GPT-5 Is Very Good... But AGI Might Be Delayed.

AI For Humans

18.3k views   •   5 months ago

15:02

GPT-5 has Arrived

AI Explained

163.3k views   •   5 months ago

11:55

Genie 3: The World Becomes Playable (DeepMind)

AI Explained

196.3k views   •   5 months ago

40:18

OpenAI’s GPT-5 Leaks Show Us The Future (Of Next Week??)

AI For Humans

35.8k views   •   5 months ago

64:05

OpenAI Teases GPT-5 as America Goes Full 'AI Action' Mode

AI For Humans

20.0k views   •   5 months ago

18:44

I was using Claude Code wrong... The Ultimate Workflow

AI Jason

136.6k views   •   5 months ago

17:20

How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …)

AI Explained

84.6k views   •   5 months ago

51:06

OpenAI’s New ChatGPT Agent Might've Just Stolen Your Job

AI For Humans

18.8k views   •   5 months ago

07:02

Claude Killer? My review on Kimi K2 after hrs of testing...

AI Jason

81.1k views   •   5 months ago

02:12

Is Grok 4 the smartest AI model in the world?

AI For Humans

12.9k views   •   5 months ago

11:44

Grok 4 - 10 New Things to Know

AI Explained

178.4k views   •   5 months ago

09:29

Tired of AI-ish UI? Here is how to make it better...

AI Jason

52.5k views   •   6 months ago

55:27

OpenAI & Google Are Using AI To Take Over. What About Us?

AI For Humans

22.3k views   •   6 months ago

16:39

Claude Designer is insane...Ultimate vibe coding UI workflow

AI Jason

184.9k views   •   6 months ago

26:20

When Will AI Models Blackmail You, and Why?

AI Explained

110.0k views   •   6 months ago

51:33

OpenAI's GPT-5 Is Coming But Sam Altman Won't Stop Throwing Shade

AI For Humans

19.7k views   •   6 months ago

05:56

Vibe Versioning - Iterate UI in Cursor 10x faster

AI Jason

22.9k views   •   6 months ago

01:22

Would You Let This Robot In Your House?

AI For Humans

4.0k views   •   6 months ago

14:01

Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know

AI Explained

101.6k views   •   6 months ago

45:51

OpenAI Preps To Blow Past AGI Straight to Super Intelligence

AI For Humans

19.1k views   •   6 months ago

22:02

Build the next Billion $ Agent 🚀

AI Jason

17.9k views   •   6 months ago

16:50

AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed

AI Explained

96.4k views   •   7 months ago

02:47

Is GPT-5 Coming Next Month? Find Out What Might Happen!

AI For Humans

15.8k views   •   7 months ago

00:49

Will AI take your job?!

AI For Humans

3.2k views   •   7 months ago

56:57

Anthropic's CEO Says AI Will Take 50% of Jobs. Now What?

AI For Humans

31.6k views   •   7 months ago

13:09

Is VEO 3 really the death of human creativity?

AI For Humans

9.5k views   •   7 months ago

03:35

10x better UI design for vibe coders - Use v0 directly in Cursor

AI Jason

52.2k views   •   7 months ago

19:05

Claude 4: Full 120 Page Breakdown … Is it the Best New Model?

AI Explained

98.9k views   •   7 months ago

56:15

Google Went AI Crazy and VEO 3 Is Just the Start

AI For Humans

20.4k views   •   7 months ago

04:25

How to make accurate UI Tweak in Cursor with Stagewise

AI Jason

24.5k views   •   7 months ago

17:08

Google Takes No Prisoners Amid Torrent of AI Announcements

AI Explained

99.6k views   •   7 months ago

02:10

VEO 3 is actually insane. Best AI video + audio AI tool yet.

AI For Humans

31.0k views   •   7 months ago

14:02

Build MCP business for vibe coder

AI Jason

10.2k views   •   7 months ago

01:59

Will this be the biggest AI News week to date?!?

AI For Humans

2.0k views   •   7 months ago

17:42

AI Improves at Self-improving

AI Explained

83.1k views   •   7 months ago

48:59

Google's New AI Agent Improves Itself. But Can It Stop AI Babies?

AI For Humans

15.4k views   •   7 months ago

01:19

Can Google Gemini Make Coding Easy for Everyone?

AI For Humans

1.5k views   •   8 months ago

11:44

Cursor + Browser control = Self improving coding agent

AI Jason

34.1k views   •   8 months ago

34:24

"OpenAI is Not God” - The DeepSeek Documentary on Liang Wenfeng, R1 and What's Next

AI Explained

105.7k views   •   8 months ago

14:34

o3 breaks (some) records, but AI becomes pay-to-win

AI Explained

60.8k views   •   8 months ago

19:04

How I reduced 90% errors for my Cursor (Part 2)

AI Jason

54.9k views   •   8 months ago

15:30

How I reduced 90% errors for my Cursor (+ any other AI IDE)

AI Jason

286.3k views   •   9 months ago

13:19

Don't do RAG - This method is way faster & accurate...

AI Jason

172.4k views   •   9 months ago

64:53

NVIDIA Dominates The Race To AGI at GTC 2025

AI For Humans

7.7k views   •   9 months ago

09:14

Claude Designer is insane...Ultimate vibe coding UI workflow

AI Jason

223.7k views   •   9 months ago

10:09

Gemini 2.0 blew me away - The future of Multimodal Model

AI Jason

16.5k views   •   9 months ago

01:22

Jurassic Park AI Video Fail 😭🤖😳#ai #aivideo #funny

AI For Humans

1.1k views   •   9 months ago

02:20

AI will write 100% of code. What happens next?! 😳 #ai #technology #chatgpt

AI For Humans

1.2k views   •   9 months ago

13:07

MCP = Next Big Opportunity? EASIST way to build your own MCP business

AI Jason

86.4k views   •   9 months ago

131:12

How I use LLMs

Andrej Karpathy

2.2M views   •   10 months ago

13:17

Those MCP totally 10x my Cursor workflow…

AI Jason

226.1k views   •   10 months ago

55:52

Who Will Control The Future of AI?

AI For Humans

7.0k views   •   10 months ago

04:01

Sam Altman Confirms GPT-5 & It Will Be FREE For Everyone

AI For Humans

6.4k views   •   10 months ago

211:24

Deep Dive into LLMs like ChatGPT

Andrej Karpathy

4.4M views   •   11 months ago

01:08

China’s Robotics advances are INSANE 🤖🤯👀 #ai #robotics #technology

AI For Humans

2.3k views   •   11 months ago

52:17

OpenAI Starts Prepping For Super Intelligence (ASI) & More AI News

AI For Humans

9.7k views   •   11 months ago

81:55

Founding fathers on today's America

Andrej Karpathy

34.7k views   •   1 year ago

51:56

The Biggest Week in AI Yet (For Real This Time)

AI For Humans

8.0k views   •   1 year ago

52:16

The Future of AI: OpenAI's 12 Days of Surprises

AI For Humans

6.5k views   •   1 year ago

46:52

Why OpenAI's o1 Model Might Be The Future of AI Scaling

AI For Humans

7.6k views   •   1 year ago

07:16

How AI Video Is Changing Hollywood

AI For Humans

4.2k views   •   1 year ago

01:00

OpenAI’s Orion Coming in November?!? 👀🤖🤯 #ai #tech #openai

AI For Humans

2.4k views   •   1 year ago

00:52

Nobel Prize Winner Disses Sam Altman 😭🤯👀 #ai #news #openai

AI For Humans

3.1k views   •   1 year ago

00:50

Voice Memo to Musical with Suno Covers 🔊🤖 #ai #aimusic #technology

AI For Humans

9.6k views   •   1 year ago

241:26

Let's reproduce GPT-2 (124M)

Andrej Karpathy

963.2k views   •   1 year ago

30:38

Expert AI Developer Explains NEW OpenAI Assistants API v2 Release

Morningside AI

13.8k views   •   1 year ago

133:35

Let's build the GPT Tokenizer

Andrej Karpathy

989.2k views   •   1 year ago

26:56

Expert AI Developer Explains What OpenAI's Q* Means for Businesses

Morningside AI

4.2k views   •   2 years ago

45:54

Voiceflow CEO Talks GPTs, Future of AI Agencies and Chatbot Builders (Full Interview)

Morningside AI

10.1k views   •   2 years ago

59:48

[1hr Talk] Intro to Large Language Models

Andrej Karpathy

3.3M views   •   2 years ago

39:00

Expert AI Developer Explains What OpenAI 'GPTs' Mean For Businesses

Morningside AI

26.7k views   •   2 years ago

116:20

Let's build GPT: from scratch, in code, spelled out.

Andrej Karpathy

6.7M views   •   2 years ago

56:22

Building makemore Part 5: Building a WaveNet

Andrej Karpathy

254.4k views   •   3 years ago

115:24

Building makemore Part 4: Becoming a Backprop Ninja

Andrej Karpathy

313.6k views   •   3 years ago

115:58

Building makemore Part 3: Activations & Gradients, BatchNorm

Andrej Karpathy

457.4k views   •   3 years ago

75:40

Building makemore Part 2: MLP

Andrej Karpathy

490.9k views   •   3 years ago

103 Comments

@EricRubio7  1 year ago

🐐     See Less

@adosar7261  1 year ago

I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?     See Less

@theusualcouple  1 year ago

Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏     See Less

@zlsj861  1 year ago

🎯Course outline for quick navigation:

[00:00-03:21]1.     See More ng and refactoring neural networks for language modeling
-[00:00-00:30]Continuing makemore implementation with multilayer perceptron for character-level language modeling, planning to move to larger neural networks.
-[00:31-01:03]Understanding neural net activations and gradients in training is crucial for optimizing architectures.
-[02:06-02:46]Refactored code to optimize neural net with 11,000 parameters over 200,000 steps, achieving train and val loss of 2.16.
-[03:03-03:28]Using torch.nograd decorator to prevent gradients computation.

[03:22-14:22]2. Efficiency of torch.no_grad and neural net initialization issues
-[03:22-04:00]Using torch's no_grad makes computation more efficient by eliminating gradient tracking.
-[04:22-04:50]Network initialization causes high loss of 27, rapidly decreases to 1 or 2.
-[05:00-05:32]At initialization, the model aims for a uniform distribution among 27 characters, with roughly 1/27 probability for each.
-[05:49-06:19]Neural net creates skewed probability distributions leading to high loss.
-[12:08-12:36]Loss at initialization as expected, improved to 2.12-2.16

[14:24-36:39]3. Neural network initialization
-[16:03-16:31]The chain rule with local gradient is affected when outputs of tanh are close to -1 or 1, leading to a halt in back propagation.
-[18:09-18:38]Concern over destructive gradients in flat regions of h outputs, tackled by analyzing absolute values.
-[26:03-26:31]Optimization led to improved validation loss from 2.17 to 2.10 by fixing softmax and 10-inch layer issues.
-[29:28-30:02]Standard deviation expanded to three, aiming for unit gaussian distribution in neural nets.
-[30:17-30:47]Scaling down by 0.2 shrinks gaussian with standard deviation 0.6.
-[31:03-31:46]Initializing neural network weights for well-behaved activations, kaiming he et al.
-[36:24-36:55]Modern innovations have improved network stability and behavior, including residual connections, normalization layers, and better optimizers.

[36:39-51:52]4. Neural net initialization and batch normalization
-[36:39-37:05]Modern innovations like normalization layers and better optimizers reduce the need for precise neural net initialization.
-[40:32-43:04]Batch normalization enables reliable training of deep neural nets, ensuring roughly gaussian hidden states for improved performance.
-[40:51-41:13]Batch normalization from 2015 enabled reliable training of deep neural nets.
-[41:39-42:09]Standardizing hidden states to be unit gaussian is a perfectly differentiable operation, a key insight in the paper.
-[43:20-43:50]Calculating standard deviation of activations, mean is average value of neuron's activation.
-[45:45-46:16]Back propagation guides distribution movement, adding scale and shift for final output

[51:52-01:01:35]5. Jittering and batch normalization in neural network training
-[52:10-52:37]Padding input examples adds entropy, augments data, and regularizes neural nets.
-[53:44-54:09]Batch normalization effectively controls activations and their distributions.
-[56:05-56:33]Batch normalization paper introduces running mean and standard deviation estimation during training.
-[01:00:46-01:01:10]Eliminated explicit calibration stage, almost done with batch normalization, epsilon prevents division by zero.

[01:01:36-01:09:21]6. Batch normalization and resnet in pytorch
-[01:02:00-01:02:30]Biases are subtracted out in batch normalization, reducing their impact to zero.
-[01:03:13-01:03:53]Using batch normalization to control activations in neural net, with gain, bias, mean, and standard deviation parameters.
-[01:07:25-01:07:53]Creating deep neural networks with weight layers, normalization, and non-linearity, as exemplified in the provided code.

[01:09:21-01:23:37]7. Pytorch weight initialization and batch normalization
-[01:10:05-01:10:32]Pytorch initializes weights using 1/fan-in square root from a uniform distribution.
-[01:11:11-01:11:40]Scaling weights by 1 over sqrt of fan in, using batch normalization layer in pytorch with 200 features.
-[01:14:02-01:14:35]Importance of understanding activations and gradients in neural networks, especially as they get bigger and deeper.
-[01:16:00-01:16:30]Batch normalization centers data for gaussian activations in deep neural networks.
-[01:17:32-01:18:02]Batch normalization, influential in 2015, enabled reliable training of much deeper neural nets.

[01:23:39-01:55:56]8. Custom pytorch layer and network analysis
-[01:24:01-01:24:32]Updating buffers using exponential moving average with torch.nograd context manager.
-[01:25:47-01:27:11]The model has 46,000 parameters and uses pytorch for forward and backward passes, with visualizations of forward pass activations.
-[01:28:04-01:28:30]Saturation stabilizes at 20% initially, then stabilizes at 5% with a standard deviation of 0.65 due to gain set at 5 over 3.
-[01:33:19-01:33:50]Setting gain correctly at 1 prevents shrinking and diffusion in batch normalization.
-[01:38:41-01:39:11]The last layer has gradients 100 times greater, causing faster training, but it self-corrects with longer training.
-[01:43:18-01:43:42]Monitoring update ratio for parameters to ensure efficient training, aiming for -3 on log plot.
-[01:51:36-01:52:04]Introduce batch normalization and pytorch modules for neural networks.
-[01:52:39-01:53:06]Introduction to diagnostic tools for neural network analysis.
-[01:54:45-01:55:50]Introduction to diagnostic tools in neural networks, active research in initialization and backpropagation, ongoing progress

offered by Coursnap    See Less

@adamskrodzki61...  1 year ago

Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.     See Less

@styssine  1 year ago

This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.     See Less

@lucianovidal87...  1 year ago

The amount of useful information in this video is impressive. Thanks for such good content.     See Less

@sanjaybhatikar  1 year ago

I keep coming back to these videos again and again. Andrej is legend!     See Less

@JuliusSmith  1 year ago

Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!     See Less

@pravingaikwad1...  1 year ago

what is the purpose of bnmean_running and bnstd_running?     See Less