Building makemore Part 3: Activations & Gradients, BatchNorm

Andrej Karpathy

1.3M subscribers

475.4k views  •  3 years ago

We dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, ...

Comments454

User Image

I keep coming back to these videos again and again. Andrej i...

111 Comments

@EricRubio7  2 years ago

🐐     See Less

@adosar7261  2 years ago

I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?     See Less

@theusualcouple  2 years ago

Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏     See Less

@zlsj861  2 years ago

🎯Course outline for quick navigation:

[00:00-03:21]1.     See More ng and refactoring neural networks for language modeling
-[00:00-00:30]Continuing makemore implementation with multilayer perceptron for character-level language modeling, planning to move to larger neural networks.
-[00:31-01:03]Understanding neural net activations and gradients in training is crucial for optimizing architectures.
-[02:06-02:46]Refactored code to optimize neural net with 11,000 parameters over 200,000 steps, achieving train and val loss of 2.16.
-[03:03-03:28]Using torch.nograd decorator to prevent gradients computation.

[03:22-14:22]2. Efficiency of torch.no_grad and neural net initialization issues
-[03:22-04:00]Using torch's no_grad makes computation more efficient by eliminating gradient tracking.
-[04:22-04:50]Network initialization causes high loss of 27, rapidly decreases to 1 or 2.
-[05:00-05:32]At initialization, the model aims for a uniform distribution among 27 characters, with roughly 1/27 probability for each.
-[05:49-06:19]Neural net creates skewed probability distributions leading to high loss.
-[12:08-12:36]Loss at initialization as expected, improved to 2.12-2.16

[14:24-36:39]3. Neural network initialization
-[16:03-16:31]The chain rule with local gradient is affected when outputs of tanh are close to -1 or 1, leading to a halt in back propagation.
-[18:09-18:38]Concern over destructive gradients in flat regions of h outputs, tackled by analyzing absolute values.
-[26:03-26:31]Optimization led to improved validation loss from 2.17 to 2.10 by fixing softmax and 10-inch layer issues.
-[29:28-30:02]Standard deviation expanded to three, aiming for unit gaussian distribution in neural nets.
-[30:17-30:47]Scaling down by 0.2 shrinks gaussian with standard deviation 0.6.
-[31:03-31:46]Initializing neural network weights for well-behaved activations, kaiming he et al.
-[36:24-36:55]Modern innovations have improved network stability and behavior, including residual connections, normalization layers, and better optimizers.

[36:39-51:52]4. Neural net initialization and batch normalization
-[36:39-37:05]Modern innovations like normalization layers and better optimizers reduce the need for precise neural net initialization.
-[40:32-43:04]Batch normalization enables reliable training of deep neural nets, ensuring roughly gaussian hidden states for improved performance.
-[40:51-41:13]Batch normalization from 2015 enabled reliable training of deep neural nets.
-[41:39-42:09]Standardizing hidden states to be unit gaussian is a perfectly differentiable operation, a key insight in the paper.
-[43:20-43:50]Calculating standard deviation of activations, mean is average value of neuron's activation.
-[45:45-46:16]Back propagation guides distribution movement, adding scale and shift for final output

[51:52-01:01:35]5. Jittering and batch normalization in neural network training
-[52:10-52:37]Padding input examples adds entropy, augments data, and regularizes neural nets.
-[53:44-54:09]Batch normalization effectively controls activations and their distributions.
-[56:05-56:33]Batch normalization paper introduces running mean and standard deviation estimation during training.
-[01:00:46-01:01:10]Eliminated explicit calibration stage, almost done with batch normalization, epsilon prevents division by zero.

[01:01:36-01:09:21]6. Batch normalization and resnet in pytorch
-[01:02:00-01:02:30]Biases are subtracted out in batch normalization, reducing their impact to zero.
-[01:03:13-01:03:53]Using batch normalization to control activations in neural net, with gain, bias, mean, and standard deviation parameters.
-[01:07:25-01:07:53]Creating deep neural networks with weight layers, normalization, and non-linearity, as exemplified in the provided code.

[01:09:21-01:23:37]7. Pytorch weight initialization and batch normalization
-[01:10:05-01:10:32]Pytorch initializes weights using 1/fan-in square root from a uniform distribution.
-[01:11:11-01:11:40]Scaling weights by 1 over sqrt of fan in, using batch normalization layer in pytorch with 200 features.
-[01:14:02-01:14:35]Importance of understanding activations and gradients in neural networks, especially as they get bigger and deeper.
-[01:16:00-01:16:30]Batch normalization centers data for gaussian activations in deep neural networks.
-[01:17:32-01:18:02]Batch normalization, influential in 2015, enabled reliable training of much deeper neural nets.

[01:23:39-01:55:56]8. Custom pytorch layer and network analysis
-[01:24:01-01:24:32]Updating buffers using exponential moving average with torch.nograd context manager.
-[01:25:47-01:27:11]The model has 46,000 parameters and uses pytorch for forward and backward passes, with visualizations of forward pass activations.
-[01:28:04-01:28:30]Saturation stabilizes at 20% initially, then stabilizes at 5% with a standard deviation of 0.65 due to gain set at 5 over 3.
-[01:33:19-01:33:50]Setting gain correctly at 1 prevents shrinking and diffusion in batch normalization.
-[01:38:41-01:39:11]The last layer has gradients 100 times greater, causing faster training, but it self-corrects with longer training.
-[01:43:18-01:43:42]Monitoring update ratio for parameters to ensure efficient training, aiming for -3 on log plot.
-[01:51:36-01:52:04]Introduce batch normalization and pytorch modules for neural networks.
-[01:52:39-01:53:06]Introduction to diagnostic tools for neural network analysis.
-[01:54:45-01:55:50]Introduction to diagnostic tools in neural networks, active research in initialization and backpropagation, ongoing progress

offered by Coursnap    See Less

@adamskrodzki6152  2 years ago

Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.     See Less

@styssine  2 years ago

This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.     See Less

@lucianovidal8721  2 years ago

The amount of useful information in this video is impressive. Thanks for such good content.     See Less

@sanjaybhatikar  2 years ago

I keep coming back to these videos again and again. Andrej is legend!     See Less

@JuliusSmith  2 years ago

Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!     See Less

@pravingaikwad1337  2 years ago

what is the purpose of bnmean_running and bnstd_running?     See Less

29:32

Did OpenAI Just Help the Government Kill Anthropic?

TheAIGRID

6.7k views   •   18 hours ago

12:34

OpenAI & Google Just JOINED FORCES - Staff Demand “No Killer AI”

TheAIGRID

8.4k views   •   1 day ago

18:31

CLAUDE JUST GOT BANNED

Wes Roth

36.0k views   •   1 day ago

13:00

Anthropic REFUSES Military Demands, Pentagon Left STUNNED!

TheAIGRID

7.2k views   •   1 day ago

13:40

Deadline Day for Autonomous AI Weapons & Mass Surveillance

AI Explained

32.2k views   •   2 days ago

16:26

The US Government is Threatening to SEIZE Claude

TheAIGRID

10.2k views   •   2 days ago

20:55

Anthropic might be DONE (48 hours left)

Wes Roth

75.6k views   •   3 days ago

32:43

The 2028 Global Intelligence Crisis Explained - What Happens When AI Breaks The Economy?

TheAIGRID

20.0k views   •   3 days ago

04:53

How to prompt Gemini 3.1 for Epic animations

AI Jason

17.0k views   •   4 days ago

32:27

$1 Trillion Gone

Wes Roth

54.4k views   •   4 days ago

07:00

Every Vibe Coder Needs This AI Agent - Kane AI Testing Agent

TheAIGRID

3.0k views   •   4 days ago

24:44

the SCARIEST chart in AI

Wes Roth

75.0k views   •   5 days ago

96:00

"The Universe Is A PROGRAM" Is this the SOURCE CODE of our Universe? - Stephen Wolfram

Wes Roth

39.8k views   •   6 days ago

19:59

Sam Altman Sparks OUTRAGE With Controversial AI Comment

TheAIGRID

13.9k views   •   6 days ago

14:10

Anthropic killed Tool calling

AI Jason

162.9k views   •   1 week ago

13:09

OpenClaw Setup Tutorial With New Usecases (OpenClaw Usecases 2026)

TheAIGRID

4.7k views   •   1 week ago

21:55

AGI by 2028? Sam Altman Just Changed the Timeline

TheAIGRID

15.3k views   •   1 week ago

17:29

How to Build ANYTHING with Oz by Warp

Wes Roth

16.3k views   •   1 week ago

18:50

Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI

AI Explained

102.2k views   •   1 week ago

52:21

Gemini 3.1 Just Dropped. SuperIntelligence Is Coming. We're Fine.

AI For Humans

14.0k views   •   1 week ago

12:43

Gemini 3.1 Pro For Beginners - All New Features Explained (Gemini 3.1 Pro Tutorial)

TheAIGRID

30.9k views   •   1 week ago

10:58

did Anthropic just END OpenClaw?

Wes Roth

48.5k views   •   1 week ago

21:15

Meta's New AI Is Freaking Everyone Out...

TheAIGRID

20.0k views   •   1 week ago

15:28

Agent memory resolved?

AI Jason

34.2k views   •   1 week ago

10:15

Grok 4.2 Agents For Beginners - Grok 4.2 Full Guide With Usecases

TheAIGRID

12.2k views   •   1 week ago

18:06

Elon Musk vs OpenAI Just Took a Wild Turn

TheAIGRID

19.7k views   •   1 week ago

02:30

Seedance 2.0 Proves China is catching up in AI #ai #ainews #tech

AI For Humans

5.0k views   •   1 week ago

10:32

Gemini 3 Deepthink For Beginners - Gemini 3 Deepthink Full Guide With Usecases

TheAIGRID

9.2k views   •   1 week ago

11:24

WebMCP - Why is awesome & How to use it

AI Jason

48.5k views   •   2 weeks ago

12:54

8 BILION DIGITAL CLONES

Wes Roth

33.0k views   •   2 weeks ago

01:50

Seedance 2.0 is scary. #ai #aivideo #seedance2

AI For Humans

4.5k views   •   2 weeks ago

26:19

it JUST happened

Wes Roth

101.8k views   •   2 weeks ago

22:33

Google Gemini 3 DeepThink Is Now the Smartest AI In The World

TheAIGRID

57.9k views   •   2 weeks ago

62:43

Seedance 2.0 Is Peak AI Video. We Tested It. Send Help.

AI For Humans

26.9k views   •   2 weeks ago

23:07

Insider QUITS OpenAI and Sounds the Alarm - They're making a BIG mistake.

TheAIGRID

12.5k views   •   2 weeks ago

10:31

AI Video Just Went TOO FAR... NOT OK!

Wes Roth

38.9k views   •   2 weeks ago

50:36

INSTALL OPENCLAW in 30 seconds and START BUILDING... | Local Install and VPS FULL Tutorial

Wes Roth

71.6k views   •   2 weeks ago

43:03

Elon Musk Reveals the Future of AI - XAI Full Reveal (Supercut)

TheAIGRID

59.8k views   •   2 weeks ago

08:06

How To Access Seedance 2.0 - Seedance 2.0 Tutorial Complete Guide For Beginners

TheAIGRID

43.1k views   •   2 weeks ago

21:07

Elon Musk’s AI Startup Is Falling Apart?

TheAIGRID

28.9k views   •   2 weeks ago

22:15

Did Anthropic Accidentally Create a Conscious AI?

TheAIGRID

53.8k views   •   2 weeks ago

18:32

Sam Altman Breaks Silence On The AI Chaos

TheAIGRID

25.1k views   •   2 weeks ago

15:29

OPUS 4.6 thinks it's "DEMON POSSESSED"

Wes Roth

67.9k views   •   2 weeks ago

12:35

Meta’s Most Powerful AI Model Just Leaked - (Meta Avocado)

TheAIGRID

14.8k views   •   3 weeks ago

09:57

China’s New Robot "Bolt" Just Broke the Human Speed Limit

TheAIGRID

12.8k views   •   3 weeks ago

09:39

How to install and use Claude Code Agent Teams (Reverse-engineered)

AI Jason

23.2k views   •   3 weeks ago

10:07

Ex-OpenAI Researcher Says They're ALL Wrong About AI

TheAIGRID

13.8k views   •   3 weeks ago

19:50

The Two Best AI Models/Enemies Just Got Released Simultaneously

AI Explained

78.7k views   •   3 weeks ago

10:57

Claude Opus 4.6 For Beginners - All New Features Explained (Claude Opus 4.6 Tutorial)

TheAIGRID

7.5k views   •   3 weeks ago

57:15

AI Coding Updates from OpenAI & Anthropic Are Good... Maybe Too Good?

AI For Humans

10.8k views   •   3 weeks ago

10:30

xAI will birth a SENTIENT SUN...

Wes Roth

20.0k views   •   3 weeks ago

15:44

OpenAI's FRONTIER might be the "JOB KILLER" we were waiting for

Wes Roth

45.7k views   •   3 weeks ago

05:25

How To Use Google Image FX - Image FX Google Tutorial

TheAIGRID

623 views   •   3 weeks ago

10:03

Opus 4.6 is about to send SHOCKWAVES...

Wes Roth

57.4k views   •   3 weeks ago

01:35

Opus 4.6 & GPT-5.3 Incoming!! #ai #ainews #openai

AI For Humans

2.4k views   •   3 weeks ago

10:55

The Internet Is Turning Against ChatGPT - Here's Why - ChatGPT Boycotts Explained

TheAIGRID

9.5k views   •   3 weeks ago

12:49

Sam Altman FIRES Back At Critics - We Are Not STUPID!

TheAIGRID

11.1k views   •   3 weeks ago

10:47

AI bubble JUST popped...

Wes Roth

92.2k views   •   3 weeks ago

10:12

OpenAI Stunned as Anthropic Takes Shots at ChatGPT (Anthropic super bowl)

TheAIGRID

12.5k views   •   3 weeks ago

01:47

Starting an AI Video MicroStudio?? #ai #aivideo #aiagents

AI For Humans

1.2k views   •   3 weeks ago

05:19

Google Gemini Agentic Vision Tutorial - How To Use Google Gemini Agentic Vision

TheAIGRID

4.0k views   •   3 weeks ago

31:51

ClawdBot makes money

Wes Roth

80.1k views   •   3 weeks ago

08:50

Sam Altman Finally Admits It: "We Screwed Up"

TheAIGRID

41.5k views   •   3 weeks ago

08:05

How To Setup OpenClaw For Beginners - OpenClaw Tutorial For Complete Beginners

TheAIGRID

3.4k views   •   3 weeks ago

32:40

ClawdBot BROKE EVERYTHING in 72 hours...

Wes Roth

78.2k views   •   3 weeks ago

13:22

Yann LeCun Just Called Out the Entire Robotics Industry

TheAIGRID

23.6k views   •   3 weeks ago

13:22

Yann LeCun Just Called Out the Entire Robotics Industry

TheAIGRID

2.3k views   •   3 weeks ago

01:44

Project Genie makes WEIRD AI games #ai #videogames #googleai

AI For Humans

1.2k views   •   3 weeks ago

11:31

Grok Imagine Tutorial - How To Use Grok Imagine 1.0 for Beginners

TheAIGRID

10.9k views   •   3 weeks ago

10:00

MOLTBOOK EXPOSED: The New AI Scam That Fooled Everyone

TheAIGRID

44.1k views   •   3 weeks ago

25:00

Clawdbot is about to BREAK EVEREYTHING

Wes Roth

121.6k views   •   1 month ago

11:46

Moltbook Just Stunned The Entire AI Industry And Is Now Out Of Control....

TheAIGRID

23.0k views   •   1 month ago

01:22

Is Moltbook Real? #ai #aiagents #moltbook

AI For Humans

20.7k views   •   4 weeks ago

11:25

Googles New Genie 3 Just Shocked The AI World And Broke The Stock Market.

TheAIGRID

7.2k views   •   4 weeks ago

53:42

The AI Holodeck Just Got Real: Google's Project Genie

AI For Humans

11.8k views   •   4 weeks ago

23:17

Google's MIND BLOWING World Creator (GENIE 3)

Wes Roth

34.9k views   •   4 weeks ago

08:02

Project Genie Tutorial (How to use Project Genie)

TheAIGRID

9.4k views   •   4 weeks ago

17:30

KIMI K2.5 AGENT SWARM is INSANE

Wes Roth

34.2k views   •   1 month ago

22:13

Claude AI Co-founder Publishes 4 Big Claims about Near Future: Breakdown

AI Explained

68.9k views   •   1 month ago

23:08

"Almost UNIMAGINABLE Power" - Anthropic Founder

Wes Roth

41 views   •   1 month ago

23:08

"Almost UNIMAGINABLE Power" - Anthropic Founder

Wes Roth

36.1k views   •   1 month ago

09:19

Social Media is Melting Down Over This OpenAI Headline (Here’s the Reality)

TheAIGRID

7.6k views   •   1 month ago

11:31

Microsoft CEO: AI Fails If This Doesn’t Happen

TheAIGRID

13.4k views   •   1 month ago

27:09

ClawdBot is out of control

Wes Roth

77.6k views   •   1 month ago

12:43

AI in 2026 is going to be wild

Wes Roth

28.4k views   •   1 month ago

10:51

Why MCP is dead & How I vibe now

AI Jason

18.0k views   •   1 month ago

09:46

Google’s AI CEO Just Called Out OpenAI Over AGI Claims

TheAIGRID

21.1k views   •   1 month ago

35:11

Vibecoder is jailed and strip searched...

Wes Roth

36.6k views   •   1 month ago

08:19

I Didn’t Expect This AI Tool to Be This Good - Higgsfield AI Is Stunning

TheAIGRID

4.3k views   •   1 month ago

59:16

Claude Code Is Taking Over (And We Don't Hate It)

AI For Humans

11.5k views   •   1 month ago

09:27

This Is How You Know AGI Is Close...

TheAIGRID

19.4k views   •   1 month ago

11:05

The First AI Browser That Actually Works - Norton Neo AI Browser

TheAIGRID

4.4k views   •   1 month ago

09:31

RIP OpenAI? Apple Dumps ChatGPT for Google Gemini!

TheAIGRID

13.7k views   •   1 month ago

08:10

Elon Musks Grok Is Probably Going To Be Banned...

TheAIGRID

8.9k views   •   1 month ago

01:20

Will you opt into Gemini’s Personal Intelligence AI?

AI For Humans

2.8k views   •   1 month ago

12:56

Nvidia Just Changed Self Driving Forever - Tesla Should Be Worried

TheAIGRID

6.8k views   •   1 month ago

55:25

Google's AI Knows Everything About You (We Said Yes)

AI For Humans

12.2k views   •   1 month ago

19:03

Anthropic: Our AI just created a tool that can ‘automate all white collar work’, Me:

AI Explained

103.4k views   •   1 month ago

18:16

How Googles Winning The AI Race

TheAIGRID

19.3k views   •   1 month ago

13:18

Boston Dynamics Atlas Is The Only New Humanoid That Matters

TheAIGRID

21.4k views   •   1 month ago

01:56

Google Gemini Will Power Apple’s Siri AI!?!

AI For Humans

8.6k views   •   1 month ago

41:56

The AI Robot Uprising Has Begun (And It's Weirder Than You Think)

AI For Humans

11.0k views   •   1 month ago

02:31

Star Wars fan film made with AI actually works #starwars #ai #shorts

AI For Humans

2.0k views   •   1 month ago

15:52

The Growing AI Backlash Nobody Wants to Talk About.

TheAIGRID

49.7k views   •   1 month ago

10:25

A New Kind of AI Is Emerging And Its Better Than LLMS?

TheAIGRID

448.8k views   •   2 months ago

18:54

Secrets to unlock Gemini 3's hidden power...

AI Jason

73.0k views   •   2 months ago

33:27

What the Freakiness of 2025 in AI Tells Us About 2026

AI Explained

123.3k views   •   2 months ago

09:34

NVIDIA's New AI Agent Just Crossed the Line - The Age of AI Agents Begins (Nvidia Nitrogen)

TheAIGRID

18.8k views   •   2 months ago

20:00

Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but …

AI Explained

89.7k views   •   2 months ago

45:20

OpenAI’s new ChatGPT Images is here! But…Will it top Nano Banana Pro?

AI For Humans

9.4k views   •   2 months ago

25:37

China’s "Impossible" AI Breakthrough: We Are In Trouble

TheAIGRID

42.4k views   •   2 months ago

35:11

AI News :The First “AGI-Capable” Model, Prompting Changes Forever , Automated AI Lab and more..

TheAIGRID

23.1k views   •   2 months ago

10:39

OpenAI Researcher QUITS — Says the Company Is Hiding the Truth - (It Actually Worse Than You Think)

TheAIGRID

56.1k views   •   2 months ago

13:03

Ex Google AI Veteran Claims Worlds First AGI Capable System - And Nobodys Talking About it...

TheAIGRID

15.1k views   •   2 months ago

17:42

GPT 5.2: OpenAI Strikes Back

AI Explained

89.4k views   •   2 months ago

55:07

GPT-5.2 Finally Arrived, But The Disney Deal is Bigger

AI For Humans

11.4k views   •   2 months ago

11:29

Nano Banana + Gemini 3 = S-TIER UI DESIGNER

AI Jason

95.0k views   •   2 months ago

41:56

The Latest AI Breakthroughs You Need to See (Google, OpenAI, Deepseek and More)

TheAIGRID

26.6k views   •   2 months ago

33:44

AI News : Deepseek Returns, Amazons Secret AI Models, Googles Breakthrough , Veo 3 Beaten and More

TheAIGRID

9.4k views   •   2 months ago

10:12

Google’s New Breakthrough Brings AGI Even Closer - Titans and Miras

TheAIGRID

20.4k views   •   2 months ago

20:16

You Are Being Told Contradictory Things About AI

AI Explained

74.8k views   •   2 months ago

49:17

OpenAI's Code Red: Can New AI Models Hold Off Google Gemini?

AI For Humans

10.6k views   •   2 months ago

11:29

ChatGPT Privacy CRACKS:The Court Now Has Your ChatGPT History

TheAIGRID

6.3k views   •   2 months ago

13:46

AI Is About to Change Coding Forever in 2026 - "Software Engineering Is Done"

TheAIGRID

25.8k views   •   3 months ago

10:43

Grok Thinks Elon Musk Is a God… This Is Where It Gets Dangerous

TheAIGRID

7.1k views   •   2 months ago

00:26

Do NOT do this with Nano Banana Pro #ai #aiart #google

AI For Humans

3.6k views   •   3 months ago

02:17

How to Tell If an Image Is AI-Generated (Beginner Friendly)

TheAIGRID

6.1k views   •   3 months ago

12:33

"okay, but I want Gemini3 to perform 10x for my specific use case" - Here is how

AI Jason

31.2k views   •   3 months ago

00:42

Nano Banana Pro: Take a Selfie With Every Version of You

AI For Humans

5.4k views   •   3 months ago

44:40

Google's Nano Banana Pro & Gemini 3 Just Changed Everything!

AI For Humans

15.8k views   •   3 months ago

14:56

Nano Banana Pro: But Did You Catch These 10 Details?

AI Explained

60.4k views   •   3 months ago

01:41

Google's Nano Banana Pro is INSANE

AI For Humans

6.0k views   •   3 months ago

21:43

Gemini 3 Pro: Breakdown

AI Explained

118.5k views   •   3 months ago

23:40

Gemini 3 Shows a Level of Intelligence We Haven’t Seen Before. (Gemini 3 Explained)

TheAIGRID

71.5k views   •   3 months ago

13:33

This Chip Could Give OpenAI an Unfair Advantage.

TheAIGRID

8.8k views   •   3 months ago

14:40

Researchers Just Broke AI’s Most Important Assumption. (We Were Wrong About LLMs)

TheAIGRID

27.0k views   •   3 months ago

15:37

If This Works… AGI Arrives Early. (Thermodynamic Computing)

TheAIGRID

113.8k views   •   3 months ago

15:07

Google’s SIMA 2: The Most Advanced AI Agent Ever Built

TheAIGRID

17.6k views   •   3 months ago

18:27

Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that

AI Explained

61.9k views   •   3 months ago

45:12

OpenAI Surprise Drops GPT-5.1 But Google Is Lurking

AI For Humans

12.1k views   •   3 months ago

12:54

Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)

AI Explained

60.7k views   •   3 months ago

55:23

AI Job Losses Are Real. Don’t Panic (Yet).

AI For Humans

13.1k views   •   3 months ago

08:33

The Design Mode for Claude Code...

AI Jason

41.2k views   •   3 months ago

14:14

Did you miss these 2 AI stories? A *Real* LLM-crafted Breakthrough + Continual Learning Blocked?

AI Explained

58.4k views   •   4 months ago

05:14

Claude Skills - the SOP for your agent that is bigger than MCP

AI Jason

33.5k views   •   4 months ago

11:47

.agent folder is making claude code 10x better...

AI Jason

61.0k views   •   4 months ago

15:44

Sora 2 - It will only get more realistic from here

AI Explained

58.8k views   •   5 months ago

14:07

OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings

AI Explained

67.5k views   •   5 months ago

11:32

ChatGPT Can Now Call the Cops, but 'Wait till 2100 for Full Job Impact' - Altman

AI Explained

48.7k views   •   5 months ago

11:32

ChatGPT Can Now Call the Cops, but 'Wait till 2100 for Full Job Impact' - Altman

AI Explained

20.2k views   •   5 months ago

06:41

Vibe Design is much better than I thought...

AI Jason

17.9k views   •   5 months ago

18:55

An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana

AI Explained

57.9k views   •   6 months ago

16:02

I was using sub-agents wrong... Here is my way after 20+ hrs test

AI Jason

115.1k views   •   6 months ago

15:02

GPT-5 has Arrived

AI Explained

163.6k views   •   6 months ago

11:55

Genie 3: The World Becomes Playable (DeepMind)

AI Explained

199.3k views   •   6 months ago

18:44

I was using Claude Code wrong... The Ultimate Workflow

AI Jason

138.9k views   •   7 months ago

17:20

How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …)

AI Explained

84.7k views   •   7 months ago

07:02

Claude Killer? My review on Kimi K2 after hrs of testing...

AI Jason

81.7k views   •   7 months ago

11:44

Grok 4 - 10 New Things to Know

AI Explained

179.1k views   •   7 months ago

09:29

Tired of AI-ish UI? Here is how to make it better...

AI Jason

53.2k views   •   7 months ago

16:39

Claude Designer is insane...Ultimate vibe coding UI workflow

AI Jason

188.0k views   •   8 months ago

26:20

When Will AI Models Blackmail You, and Why?

AI Explained

110.5k views   •   8 months ago

05:56

Vibe Versioning - Iterate UI in Cursor 10x faster

AI Jason

23.0k views   •   8 months ago

14:01

Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know

AI Explained

101.8k views   •   8 months ago

22:02

Build the next Billion $ Agent 🚀

AI Jason

18.2k views   •   8 months ago

16:50

AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed

AI Explained

96.4k views   •   8 months ago

03:35

10x better UI design for vibe coders - Use v0 directly in Cursor

AI Jason

52.6k views   •   9 months ago

19:05

Claude 4: Full 120 Page Breakdown … Is it the Best New Model?

AI Explained

99.1k views   •   9 months ago

04:25

How to make accurate UI Tweak in Cursor with Stagewise

AI Jason

24.5k views   •   9 months ago

14:02

Build MCP business for vibe coder

AI Jason

10.2k views   •   9 months ago

11:44

Cursor + Browser control = Self improving coding agent

AI Jason

35.5k views   •   9 months ago

19:04

How I reduced 90% errors for my Cursor (Part 2)

AI Jason

55.2k views   •   10 months ago

131:12

How I use LLMs

Andrej Karpathy

2.3M views   •   1 year ago

211:24

Deep Dive into LLMs like ChatGPT

Andrej Karpathy

5.7M views   •   1 year ago

81:55

Founding fathers on today's America

Andrej Karpathy

34.7k views   •   1 year ago

241:26

Let's reproduce GPT-2 (124M)

Andrej Karpathy

1.0M views   •   1 year ago

30:38

Expert AI Developer Explains NEW OpenAI Assistants API v2 Release

Morningside AI

13.8k views   •   1 year ago

133:35

Let's build the GPT Tokenizer

Andrej Karpathy

1.0M views   •   2 years ago

26:56

Expert AI Developer Explains What OpenAI's Q* Means for Businesses

Morningside AI

4.2k views   •   2 years ago

45:54

Voiceflow CEO Talks GPTs, Future of AI Agencies and Chatbot Builders (Full Interview)

Morningside AI

10.1k views   •   2 years ago

59:48

[1hr Talk] Intro to Large Language Models

Andrej Karpathy

3.4M views   •   2 years ago

39:00

Expert AI Developer Explains What OpenAI 'GPTs' Mean For Businesses

Morningside AI

26.7k views   •   2 years ago

116:20

Let's build GPT: from scratch, in code, spelled out.

Andrej Karpathy

6.9M views   •   3 years ago

56:22

Building makemore Part 5: Building a WaveNet

Andrej Karpathy

263.0k views   •   3 years ago

115:24

Building makemore Part 4: Becoming a Backprop Ninja

Andrej Karpathy

326.7k views   •   3 years ago

115:58

Building makemore Part 3: Activations & Gradients, BatchNorm

Andrej Karpathy

475.4k views   •   3 years ago

75:40

Building makemore Part 2: MLP

Andrej Karpathy

510.0k views   •   3 years ago

111 Comments

@EricRubio7  2 years ago

🐐     See Less

@adosar7261  2 years ago

I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?     See Less

@theusualcouple  2 years ago

Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏     See Less

@zlsj861  2 years ago

🎯Course outline for quick navigation:

[00:00-03:21]1.     See More ng and refactoring neural networks for language modeling
-[00:00-00:30]Continuing makemore implementation with multilayer perceptron for character-level language modeling, planning to move to larger neural networks.
-[00:31-01:03]Understanding neural net activations and gradients in training is crucial for optimizing architectures.
-[02:06-02:46]Refactored code to optimize neural net with 11,000 parameters over 200,000 steps, achieving train and val loss of 2.16.
-[03:03-03:28]Using torch.nograd decorator to prevent gradients computation.

[03:22-14:22]2. Efficiency of torch.no_grad and neural net initialization issues
-[03:22-04:00]Using torch's no_grad makes computation more efficient by eliminating gradient tracking.
-[04:22-04:50]Network initialization causes high loss of 27, rapidly decreases to 1 or 2.
-[05:00-05:32]At initialization, the model aims for a uniform distribution among 27 characters, with roughly 1/27 probability for each.
-[05:49-06:19]Neural net creates skewed probability distributions leading to high loss.
-[12:08-12:36]Loss at initialization as expected, improved to 2.12-2.16

[14:24-36:39]3. Neural network initialization
-[16:03-16:31]The chain rule with local gradient is affected when outputs of tanh are close to -1 or 1, leading to a halt in back propagation.
-[18:09-18:38]Concern over destructive gradients in flat regions of h outputs, tackled by analyzing absolute values.
-[26:03-26:31]Optimization led to improved validation loss from 2.17 to 2.10 by fixing softmax and 10-inch layer issues.
-[29:28-30:02]Standard deviation expanded to three, aiming for unit gaussian distribution in neural nets.
-[30:17-30:47]Scaling down by 0.2 shrinks gaussian with standard deviation 0.6.
-[31:03-31:46]Initializing neural network weights for well-behaved activations, kaiming he et al.
-[36:24-36:55]Modern innovations have improved network stability and behavior, including residual connections, normalization layers, and better optimizers.

[36:39-51:52]4. Neural net initialization and batch normalization
-[36:39-37:05]Modern innovations like normalization layers and better optimizers reduce the need for precise neural net initialization.
-[40:32-43:04]Batch normalization enables reliable training of deep neural nets, ensuring roughly gaussian hidden states for improved performance.
-[40:51-41:13]Batch normalization from 2015 enabled reliable training of deep neural nets.
-[41:39-42:09]Standardizing hidden states to be unit gaussian is a perfectly differentiable operation, a key insight in the paper.
-[43:20-43:50]Calculating standard deviation of activations, mean is average value of neuron's activation.
-[45:45-46:16]Back propagation guides distribution movement, adding scale and shift for final output

[51:52-01:01:35]5. Jittering and batch normalization in neural network training
-[52:10-52:37]Padding input examples adds entropy, augments data, and regularizes neural nets.
-[53:44-54:09]Batch normalization effectively controls activations and their distributions.
-[56:05-56:33]Batch normalization paper introduces running mean and standard deviation estimation during training.
-[01:00:46-01:01:10]Eliminated explicit calibration stage, almost done with batch normalization, epsilon prevents division by zero.

[01:01:36-01:09:21]6. Batch normalization and resnet in pytorch
-[01:02:00-01:02:30]Biases are subtracted out in batch normalization, reducing their impact to zero.
-[01:03:13-01:03:53]Using batch normalization to control activations in neural net, with gain, bias, mean, and standard deviation parameters.
-[01:07:25-01:07:53]Creating deep neural networks with weight layers, normalization, and non-linearity, as exemplified in the provided code.

[01:09:21-01:23:37]7. Pytorch weight initialization and batch normalization
-[01:10:05-01:10:32]Pytorch initializes weights using 1/fan-in square root from a uniform distribution.
-[01:11:11-01:11:40]Scaling weights by 1 over sqrt of fan in, using batch normalization layer in pytorch with 200 features.
-[01:14:02-01:14:35]Importance of understanding activations and gradients in neural networks, especially as they get bigger and deeper.
-[01:16:00-01:16:30]Batch normalization centers data for gaussian activations in deep neural networks.
-[01:17:32-01:18:02]Batch normalization, influential in 2015, enabled reliable training of much deeper neural nets.

[01:23:39-01:55:56]8. Custom pytorch layer and network analysis
-[01:24:01-01:24:32]Updating buffers using exponential moving average with torch.nograd context manager.
-[01:25:47-01:27:11]The model has 46,000 parameters and uses pytorch for forward and backward passes, with visualizations of forward pass activations.
-[01:28:04-01:28:30]Saturation stabilizes at 20% initially, then stabilizes at 5% with a standard deviation of 0.65 due to gain set at 5 over 3.
-[01:33:19-01:33:50]Setting gain correctly at 1 prevents shrinking and diffusion in batch normalization.
-[01:38:41-01:39:11]The last layer has gradients 100 times greater, causing faster training, but it self-corrects with longer training.
-[01:43:18-01:43:42]Monitoring update ratio for parameters to ensure efficient training, aiming for -3 on log plot.
-[01:51:36-01:52:04]Introduce batch normalization and pytorch modules for neural networks.
-[01:52:39-01:53:06]Introduction to diagnostic tools for neural network analysis.
-[01:54:45-01:55:50]Introduction to diagnostic tools in neural networks, active research in initialization and backpropagation, ongoing progress

offered by Coursnap    See Less

@adamskrodzki61...  2 years ago

Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.     See Less

@styssine  2 years ago

This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.     See Less

@lucianovidal87...  2 years ago

The amount of useful information in this video is impressive. Thanks for such good content.     See Less

@sanjaybhatikar  2 years ago

I keep coming back to these videos again and again. Andrej is legend!     See Less

@JuliusSmith  2 years ago

Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!     See Less

@pravingaikwad1...  2 years ago

what is the purpose of bnmean_running and bnstd_running?     See Less