Building makemore Part 3: Activations & Gradients, BatchNorm

Andrej Karpathy

446K subscribers

240.3k views  •  1 year ago

We dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, ...

Comments300

User Image

I keep coming back to these videos again and again. Andrej i...

41 Comments

@EricRubio7  2 months ago

🐐     See Less

@adosar7261  2 months ago

I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?     See Less

@theusualcouple  3 months ago

Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏     See Less

@zlsj861  3 months ago

🎯Course outline for quick navigation:

[00:00-03:21]1.     See More ng and refactoring neural networks for language modeling
-[00:00-00:30]Continuing makemore implementation with multilayer perceptron for character-level language modeling, planning to move to larger neural networks.
-[00:31-01:03]Understanding neural net activations and gradients in training is crucial for optimizing architectures.
-[02:06-02:46]Refactored code to optimize neural net with 11,000 parameters over 200,000 steps, achieving train and val loss of 2.16.
-[03:03-03:28]Using torch.nograd decorator to prevent gradients computation.

[03:22-14:22]2. Efficiency of torch.no_grad and neural net initialization issues
-[03:22-04:00]Using torch's no_grad makes computation more efficient by eliminating gradient tracking.
-[04:22-04:50]Network initialization causes high loss of 27, rapidly decreases to 1 or 2.
-[05:00-05:32]At initialization, the model aims for a uniform distribution among 27 characters, with roughly 1/27 probability for each.
-[05:49-06:19]Neural net creates skewed probability distributions leading to high loss.
-[12:08-12:36]Loss at initialization as expected, improved to 2.12-2.16

[14:24-36:39]3. Neural network initialization
-[16:03-16:31]The chain rule with local gradient is affected when outputs of tanh are close to -1 or 1, leading to a halt in back propagation.
-[18:09-18:38]Concern over destructive gradients in flat regions of h outputs, tackled by analyzing absolute values.
-[26:03-26:31]Optimization led to improved validation loss from 2.17 to 2.10 by fixing softmax and 10-inch layer issues.
-[29:28-30:02]Standard deviation expanded to three, aiming for unit gaussian distribution in neural nets.
-[30:17-30:47]Scaling down by 0.2 shrinks gaussian with standard deviation 0.6.
-[31:03-31:46]Initializing neural network weights for well-behaved activations, kaiming he et al.
-[36:24-36:55]Modern innovations have improved network stability and behavior, including residual connections, normalization layers, and better optimizers.

[36:39-51:52]4. Neural net initialization and batch normalization
-[36:39-37:05]Modern innovations like normalization layers and better optimizers reduce the need for precise neural net initialization.
-[40:32-43:04]Batch normalization enables reliable training of deep neural nets, ensuring roughly gaussian hidden states for improved performance.
-[40:51-41:13]Batch normalization from 2015 enabled reliable training of deep neural nets.
-[41:39-42:09]Standardizing hidden states to be unit gaussian is a perfectly differentiable operation, a key insight in the paper.
-[43:20-43:50]Calculating standard deviation of activations, mean is average value of neuron's activation.
-[45:45-46:16]Back propagation guides distribution movement, adding scale and shift for final output

[51:52-01:01:35]5. Jittering and batch normalization in neural network training
-[52:10-52:37]Padding input examples adds entropy, augments data, and regularizes neural nets.
-[53:44-54:09]Batch normalization effectively controls activations and their distributions.
-[56:05-56:33]Batch normalization paper introduces running mean and standard deviation estimation during training.
-[01:00:46-01:01:10]Eliminated explicit calibration stage, almost done with batch normalization, epsilon prevents division by zero.

[01:01:36-01:09:21]6. Batch normalization and resnet in pytorch
-[01:02:00-01:02:30]Biases are subtracted out in batch normalization, reducing their impact to zero.
-[01:03:13-01:03:53]Using batch normalization to control activations in neural net, with gain, bias, mean, and standard deviation parameters.
-[01:07:25-01:07:53]Creating deep neural networks with weight layers, normalization, and non-linearity, as exemplified in the provided code.

[01:09:21-01:23:37]7. Pytorch weight initialization and batch normalization
-[01:10:05-01:10:32]Pytorch initializes weights using 1/fan-in square root from a uniform distribution.
-[01:11:11-01:11:40]Scaling weights by 1 over sqrt of fan in, using batch normalization layer in pytorch with 200 features.
-[01:14:02-01:14:35]Importance of understanding activations and gradients in neural networks, especially as they get bigger and deeper.
-[01:16:00-01:16:30]Batch normalization centers data for gaussian activations in deep neural networks.
-[01:17:32-01:18:02]Batch normalization, influential in 2015, enabled reliable training of much deeper neural nets.

[01:23:39-01:55:56]8. Custom pytorch layer and network analysis
-[01:24:01-01:24:32]Updating buffers using exponential moving average with torch.nograd context manager.
-[01:25:47-01:27:11]The model has 46,000 parameters and uses pytorch for forward and backward passes, with visualizations of forward pass activations.
-[01:28:04-01:28:30]Saturation stabilizes at 20% initially, then stabilizes at 5% with a standard deviation of 0.65 due to gain set at 5 over 3.
-[01:33:19-01:33:50]Setting gain correctly at 1 prevents shrinking and diffusion in batch normalization.
-[01:38:41-01:39:11]The last layer has gradients 100 times greater, causing faster training, but it self-corrects with longer training.
-[01:43:18-01:43:42]Monitoring update ratio for parameters to ensure efficient training, aiming for -3 on log plot.
-[01:51:36-01:52:04]Introduce batch normalization and pytorch modules for neural networks.
-[01:52:39-01:53:06]Introduction to diagnostic tools for neural network analysis.
-[01:54:45-01:55:50]Introduction to diagnostic tools in neural networks, active research in initialization and backpropagation, ongoing progress

offered by Coursnap    See Less

@adamskrodzki6152  3 months ago

Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.     See Less

@styssine  3 months ago

This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.     See Less

@lucianovidal8721  3 months ago

The amount of useful information in this video is impressive. Thanks for such good content.     See Less

@sanjaybhatikar  3 months ago

I keep coming back to these videos again and again. Andrej is legend!     See Less

@JuliusSmith  3 months ago

Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!     See Less

@pravingaikwad1337  3 months ago

what is the purpose of bnmean_running and bnstd_running?     See Less

30:20

SHOCKING Robots EVOLVE in the SIMULATION plus OpenAI Leadership Just... LEAVES?

Wes Roth

47.7k views   •   4 days ago

25:41

Googles NEW "Med-Gemini" SURPRISES Doctors! (Googles New Medical AI)

TheAIGRID

22.3k views   •   5 days ago

14:50

GPT-5 Launch Day is NEAR | The End of Privacy and "Digital People"

Wes Roth

61.1k views   •   5 days ago

17:10

China's FULLY AUTONOMOUS ROBOT Triggers Fears | Is it REAL?

Wes Roth

26.7k views   •   6 days ago

27:42

AI NEWS: OpenAI STEALTH Models | California KILLS Open Source?

Wes Roth

54.9k views   •   1 week ago

24:02

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

AI Jason

68.3k views   •   1 week ago

16:17

OpenAIs New SECRET "GPT2" Model SHOCKS Everyone" (OpenAI New gpt2 chatbot)

TheAIGRID

36.6k views   •   1 week ago

10:33

How To Use ChatGPT Memory (ChatGPT New Memory Guide) ChatGPT Memory Tutorial

TheAIGRID

6.3k views   •   1 week ago

19:51

SORA Demo FAKED? Elon Musk’s 18 billion. New AI Characters, New Humanoid Robot

TheAIGRID

19.6k views   •   1 week ago

14:46

Chinas NewTEXT TO VIDEO AI SHOCKS The Entire Industry! New VIDU AI BEATS SORA! - Shengshu AI

TheAIGRID

13.6k views   •   1 week ago

25:48

STUNNING Step for Autonomous AI Agents PLUS OpenAI Defense Against JAILBROKEN Agents

Wes Roth

51.6k views   •   1 week ago

01:53

Why Stop at a Zuckerberg Beard?

AI For Humans

598 views   •   1 week ago

32:37

BREAKING AI NEWS - OpenAI and iPhone | Sam Altman and Homeland Security | Elon Musk and Helen Toner.

Wes Roth

60.7k views   •   1 week ago

15:33

Chinas New FULLY AUTONOMOUS AGI Level Robot SHOCKS The Entire Industry! (Astribot S1)

TheAIGRID

147.6k views   •   1 week ago

18:42

China takes the LEAD! New AI Model STUNS OPENAI Sense time V5.0 Beats GPT4 On All Benchmarks

TheAIGRID

42.4k views   •   1 week ago

69:10

Meta’s Llama 3 Uncensored, Microsoft's Scary Deep Fakes and ChatGPT for Your DNA | Ep54

AI For Humans

3.0k views   •   1 week ago

23:41

Silicon Valley in SHAMBLES! Government's AI Crackdown Leaves Developers SPEECHLESS

TheAIGRID

30.8k views   •   2 weeks ago

26:29

BIG win for Open Source AI | Snowflake Arctic 128 Experts MoE, "Cookbook" create world-class models

Wes Roth

48.7k views   •   2 weeks ago

47:41

Rabbit R1 - Hands On Live Unboxing Demo by Jesse Lyu [SUPERCUT]

Wes Roth

38.9k views   •   2 weeks ago

27:06

Meta STRIKES AGAIN! New AI DEVICE, Microsofts NEW Model PHI-3, Adobe Firefly 3 STUNS! And More

TheAIGRID

32.0k views   •   2 weeks ago

24:01

DARPA BOMBSHELL "AI-piloted F-16 engaging in dogfights against humans" | ANDURIL, Midjourney Random

Wes Roth

59.0k views   •   2 weeks ago

27:56

Ted Talk BOMBSHELL, Microsoft CEO WARNING,, Autonomous Vehicles DESTROYED, Apples Secret AI Plans,

TheAIGRID

33.5k views   •   2 weeks ago

24:15

OpenAI Employees LEAVING, SCARY Microsoft AI Product, GPT-5 Updated Date, Stable Diffusion3 And More

TheAIGRID

85.7k views   •   2 weeks ago

20:51

INSANE AI VIDEO | AI Girlfriends a BILLION $ Industry | EndlessDreams, VASA-1 and AI Music

Wes Roth

91.6k views   •   2 weeks ago

30:38

Expert AI Developer Explains NEW OpenAI Assistants API v2 Release

Morningside AI

8.0k views   •   2 weeks ago

25:37

LLAMA 3 *BREAKS* the Industry | Government Safety Limits Approaching | Will Groq kill NVIDIA?

Wes Roth

138.8k views   •   2 weeks ago

05:33

This Is The Emotional AI Mark Zuckerberg Is Talking About

AI For Humans

763 views   •   2 weeks ago

12:05

How To Use META AI (Complete Tutorial) Beginner Tutorial (LLAMA 3 Tutorial)

TheAIGRID

28.6k views   •   2 weeks ago

31:00

Meta's LLAMA 3 SHOCKS the Industry | OpenAI Killer? Better than GPT-4, Claude 3 and Gemini Pro

Wes Roth

33.3k views   •   2 weeks ago

17:12

‘Her’ AI, Almost Here? Llama 3, Vasa-1, and Altman ‘Plugging Into Everything You Want To Do’

AI Explained

123.1k views   •   2 weeks ago

15:30

Metas LLAMA 3 Just STUNNED Everyone! (Open Source GPT-4)

TheAIGRID

457.7k views   •   2 weeks ago

44:19

Udio’s Incredible Music Tool, Adobe’s AI Push & Humane AI Pin Fallout | Ep53

AI For Humans

1.4k views   •   2 weeks ago

21:49

Googles $100 BILLION A.I Masterplan, Secret HUMANOID Robot, Yann Lecun On AGI, Grok 1.5 AI

TheAIGRID

21.7k views   •   3 weeks ago

13:43

Devin AI STUNNING Release | Was the "DEBUNKING DEVIN" video right? | Story of Scott Wu Devin AI CEO

Wes Roth

28.4k views   •   3 weeks ago

09:53

Boston Dynamics WEIRD Humanoid Robot SHOCKS The ENTIRE INDUSTRY | New Atlas by Boston Dynamics

Wes Roth

48.5k views   •   3 weeks ago

13:16

Boston Dynamics NEW HUMANOID ROBOT SHOCKS The ENTIRE INDUSTRY! (New BOSTON Dynamics ATLAS)

TheAIGRID

398.9k views   •   3 weeks ago

27:52

Limitless Pendant AI | MKBHD Tears Apart Humane Pin | Rabbit's Jesse Lyu Pits R1 against Humane Pin

Wes Roth

19.7k views   •   3 weeks ago

37:51

You Won't BELIEVE What AI Can Do Now! (NEW 2024 A.I REPORT Reveals All)

TheAIGRID

31.3k views   •   3 weeks ago

22:10

Unlock AI Agent real power?! Long term memory & Self improving

AI Jason

37.7k views   •   3 weeks ago

26:52

AI News: The AI Arms Race is Getting Insane!

Wes Roth

60.8k views   •   3 weeks ago

04:41

Adobe Premiere's STUNNING Integration With OpenAI's SORA | Coming this year?

Wes Roth

20.5k views   •   3 weeks ago

15:44

Limitless New AI DEVICE STUNS The ENTIRE INDUSTRY! (New AI Device)

TheAIGRID

18.1k views   •   3 weeks ago

29:48

Sam Altman Drops GPT6 BOMBHSHELL, Creating AGI GOD, NEW Multimodal AI SYSTEM and more

TheAIGRID

53.2k views   •   3 weeks ago

29:39

SECRET WAR to Control AGI | AI Doomer $755M War Chest | Vitalik Buterin, X-risk & Techno Optimism

Wes Roth

46.4k views   •   3 weeks ago

13:48

Gold Gang (100% AI) | INSANE AI Music Video + How it was done (behind the scenes look)

Wes Roth

33.3k views   •   3 weeks ago

16:12

Grok 1.5 Vision Shows STUNNING Performance | Beats GPT-4, Claude and Gemini 1.5

Wes Roth

82.6k views   •   3 weeks ago

12:58

First Look at Vertex AI Agent Builder | Google's AI Agent Playbook

Wes Roth

35.2k views   •   3 weeks ago

19:01

OpenAI Employees FIRED, 10X Compute IN AI, Agi Levels, NEW AI Music , Infinite Context Length

TheAIGRID

40.1k views   •   3 weeks ago

05:24

Sam Altman "gpt-4 now significantly smarter" | OpenAI Updates GPT-4 and Reveals Open Source Evals

Wes Roth

37.3k views   •   3 weeks ago

08:04

New AI Model ONSLAUGHT | New GPT-4, Mixtral and Gemini 1.5 Pro | AI Movies, Music & Streamers

Wes Roth

16.2k views   •   3 weeks ago

27:19

Sam Altman Reveals AGI PREDICITON DATE In NEW INTERVIEW (Sam Altman New Interview)

TheAIGRID

26.4k views   •   3 weeks ago

14:08

Udio, the Mysterious GPT Update, and Infinite Attention

AI Explained

115.9k views   •   3 weeks ago

85:03

OpenAI Rips YouTube, Elon’s Robotaxis & AI Artist Purz Beats | Ep52

AI For Humans

2.9k views   •   3 weeks ago

50:28

Googles 6 NEW AI AGENTS Takes The INUDSTRY BY STORM! (Google Gemini AGENTS)

TheAIGRID

41.8k views   •   3 weeks ago

36:07

Google Announces STUNNING AI Agents | Google Cloud Keynote AI Agents

Wes Roth

129.9k views   •   4 weeks ago

22:00

GPT-5s New STUNNING Capabilities, Autonomous Software Engineer, Shocking AI Research

TheAIGRID

40.3k views   •   4 weeks ago

10:44

SHOCKING New AI Models! | All new GPT-4, Gemini, Imagen 2, Mistral and Command R+

Wes Roth

39.1k views   •   4 weeks ago

36:45

Future of E-commerce?! Virtual clothing try-on agent

AI Jason

65.1k views   •   4 weeks ago

23:03

Elon Musks $18 BILLION Ai Plan, Sam Altman REMOVED , Superintelligence, Stunning New AI Music,

TheAIGRID

40.6k views   •   4 weeks ago

06:55

How OpenAI's Sora Actually Works

AI For Humans

373 views   •   1 month ago

35:23

OpenAI's NEW SHOCKING Statement On AI (Open AI Board Statement)

TheAIGRID

51.3k views   •   1 month ago

23:22

Open AIs New AI Product, Teslas AI Autonomous Fleet, OpenAIs Future Robot Plans, New Robots and More

TheAIGRID

19.7k views   •   1 month ago

30:39

OpenAI's Leaked INTERNAL AI AGENTS,,New Custom OpenAI Models, Google Research, A.I GENERATED GAMES,

TheAIGRID

31.6k views   •   1 month ago

21:42

Chinas New "AGI Robot" Is STUNNING (Chinas Answer To Figure 01/OpenAI)

TheAIGRID

103.6k views   •   1 month ago

88:57

OpenAI’s Secret Stargate, Apple AI Agents & Chat With Sora Filmmakers | Ep51

AI For Humans

3.5k views   •   1 month ago

22:56

Apollos NEW HUMANOID Robot DEMO STUNS Everyone! (Apptronik Upgrade!)

TheAIGRID

45.8k views   •   1 month ago

13:38

OpenAI Employee ACCIDENTALLY REVEALS Q* Details! (Open AI Q*)

TheAIGRID

56.0k views   •   1 month ago

19:37

Why Does OpenAI Need a 'Stargate' Supercomputer? Ft. Perplexity CEO Aravind Srinivas

AI Explained

115.3k views   •   1 month ago

05:13

OpenAI's Sora Is Not For You

AI For Humans

2.3k views   •   1 month ago

80:00

OpenAI Goes Hollywood, AI Gadgets & Director Fiona Nova | Ep50

AI For Humans

1.7k views   •   1 month ago

24:28

AI Employees Outperform Human Employees?! Build a real Sales Agent

AI Jason

27.2k views   •   1 month ago

08:10

Why Nvidia Will Power The Future Of Everything

AI For Humans

334 views   •   1 month ago

44:27

Nvidia’s Insane AI Chip, Apple Gets Serious & Sam Altman Talks GPT-5 | Ep49

AI For Humans

2.6k views   •   1 month ago

16:32

AGI Inches Closer - 5 Key Quotes: Altman, Huang and 'The Most Interesting Year'

AI Explained

105.5k views   •   1 month ago

19:21

AI Agents Take the Wheel: Devin, SIMA, Figure 01 and The Future of Jobs

AI Explained

130.2k views   •   1 month ago

79:00

Open Sourcing AI, New Midjourney Update & Educator Alex Kotran | Ep48

AI For Humans

1.5k views   •   1 month ago

27:07

INSANELY Fast AI Cold Call Agent- built w/ Groq

AI Jason

202.1k views   •   1 month ago

07:25

Claude 3 & GPT-4 Debate About Toilet Paper

AI For Humans

1.4k views   •   1 month ago

16:51

The New, Smartest AI: Claude 3 – Tested vs Gemini 1.5 + GPT-4

AI Explained

177.3k views   •   2 months ago

37:06

Real time AI Conversation Co-pilot on your phone, Crazy or Creepy?

AI Jason

32.9k views   •   2 months ago

15:04

The AI 'Genie' is Out + Humanoid Robotics Step Closer

AI Explained

150.6k views   •   2 months ago

133:35

Let's build the GPT Tokenizer

Andrej Karpathy

453.8k views   •   2 months ago

16:10

Sora - Full Analysis (with new details)

AI Explained

240.7k views   •   2 months ago

26:13

OpenAI's Agent 2.0: Excited or Scared?

AI Jason

62.1k views   •   2 months ago

28:29

Gemini 1.5 and The Biggest Night in AI

AI Explained

187.2k views   •   2 months ago

16:31

Gemini Ultra - Full Review

AI Explained

160.9k views   •   3 months ago

27:20

The REAL cost of LLM (And How to reduce 78%+ of Cost)

AI Jason

87.5k views   •   3 months ago

20:13

GPT-5: Everything You Need to Know So Far

AI Explained

270.8k views   •   3 months ago

18:22

GPT5 unlocks LLM System 2 Thinking?

AI Jason

60.6k views   •   3 months ago

14:46

Alpha Everywhere: AlphaGeometry, AlphaCodium and the Future of LLMs

AI Explained

96.5k views   •   3 months ago

12:40

AI Robot's ChatGPT moment at 2024?

AI Jason

7.1k views   •   3 months ago

22:52

OpenAI Flip-Flops and '10% Chance of Outperforming Humans in Every Task by 2027' - 3K AI Researchers

AI Explained

145.7k views   •   3 months ago

19:53

4 Reasons AI in 2024 is On An Exponential: Data, Mamba, and More

AI Explained

133.0k views   •   4 months ago

11:02

Midjourney v6, Altman 'Age Reversal' and Gemini 2 - Christmas Edition

AI Explained

79.4k views   •   4 months ago

31:38

Real Gemini demo? Rebuild with GPT4V + Whisper + TTS

AI Jason

16.1k views   •   4 months ago

12:05

A 100T Transformer Model Coming? Plus ByteDance Saga and the Mixtral Price Drop

AI Explained

84.0k views   •   4 months ago

17:51

Phi-2, Imagen-2, Optimus-Gen-2: Small New Models to Change the World?

AI Explained

74.5k views   •   4 months ago

19:44

Gemini Full Breakdown + AlphaCode 2 Bombshell

AI Explained

144.7k views   •   5 months ago

24:48

GPT4V + Puppeteer = AI agent browse web like human? 🤖

AI Jason

68.5k views   •   5 months ago

26:56

Expert AI Developer Explains What OpenAI's Q* Means for Businesses

Morningside AI

4.1k views   •   5 months ago

15:16

OpenAI Insights and Training Data Shenanigans - 7 'Complicated' Developments + Guest Star

AI Explained

83.7k views   •   5 months ago

21:15

"Research agent 3.0 - Build a group of AI researchers" - Here is how

AI Jason

134.2k views   •   5 months ago

11:44

What is Q* | Reinforcement learning 101 & Hypothesis

AI Jason

33.3k views   •   5 months ago

45:54

Voiceflow CEO Talks GPTs, Future of AI Agencies and Chatbot Builders (Full Interview)

Morningside AI

9.7k views   •   5 months ago

27:45

Q* - Clues to the Puzzle?

AI Explained

229.0k views   •   5 months ago

59:48

[1hr Talk] Intro to Large Language Models

Andrej Karpathy

1.8M views   •   5 months ago

15:54

Are We Back to Before? OpenAI 2.0, Inflection-2 and a Major AI Cancer Breakthrough

AI Explained

112.8k views   •   5 months ago

17:22

Altman@Microsoft, Shear@OpenAI, Chaos@Everywhere: Sutskever Regret and the Weekend That Changed AI

AI Explained

167.2k views   •   5 months ago

14:42

Altman Out: Reasons, Reactions and the Repercussions for the Industry

AI Explained

156.8k views   •   5 months ago

39:00

Expert AI Developer Explains What OpenAI 'GPTs' Mean For Businesses

Morningside AI

26.0k views   •   5 months ago

08:32

How to use New OpenAI DevDay features - GPT4V x TTS demo tutorial

AI Jason

16.1k views   •   6 months ago

16:15

AI Declarations and AGI Timelines – Looking More Optimistic?

AI Explained

96.4k views   •   6 months ago

19:28

After 7 days letting AI agents control my email inbox... 📮

AI Jason

68.7k views   •   6 months ago

16:34

State of AI 2023: Highlights of 163 Page Report + Eureka Self-Improvement, MEG, Suno AI and GPT F

AI Explained

120.6k views   •   6 months ago

20:34

AI agent + Vision = Incredible

AI Jason

52.5k views   •   6 months ago

03:54

StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?

AI Jason

19.1k views   •   7 months ago

12:50

Autogen - Microsoft's best AI Agent framework that is controllable?

AI Jason

52.7k views   •   7 months ago

24:21

AI agent manages community 24/7 - Build Agent workforce ep#1

AI Jason

28.6k views   •   7 months ago

09:16

How to scale your AI automation pipeline

AI Jason

13.6k views   •   7 months ago

13:41

Build AI agent workforce - Multi agent framework with MetaGPT & chatDev

AI Jason

183.4k views   •   8 months ago

11:38

"Next Level Prompts?" - 10 mins into advanced prompting

AI Jason

48.2k views   •   8 months ago

09:10

“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

AI Jason

26.5k views   •   8 months ago

09:50

"How to 10x chatbot UX? 🤖 🖼️ " - Add Image Responses to GPT knowledge retrieval apps

AI Jason

14.4k views   •   8 months ago

116:20

Let's build GPT: from scratch, in code, spelled out.

Andrej Karpathy

4.2M views   •   1 year ago

56:22

Building makemore Part 5: Building a WaveNet

Andrej Karpathy

153.6k views   •   1 year ago

115:24

Building makemore Part 4: Becoming a Backprop Ninja

Andrej Karpathy

167.4k views   •   1 year ago

115:58

Building makemore Part 3: Activations & Gradients, BatchNorm

Andrej Karpathy

240.3k views   •   1 year ago

75:40

Building makemore Part 2: MLP

Andrej Karpathy

269.9k views   •   1 year ago

41 Comments

@EricRubio7  2 months ago

🐐     See Less

@adosar7261  2 months ago

I still can't understand why BatchNorm helps against vanishing/exploding gradients. Is there any ideas?     See Less

@theusualcouple  3 months ago

Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏     See Less

@zlsj861  3 months ago

🎯Course outline for quick navigation:

[00:00-03:21]1.     See More ng and refactoring neural networks for language modeling
-[00:00-00:30]Continuing makemore implementation with multilayer perceptron for character-level language modeling, planning to move to larger neural networks.
-[00:31-01:03]Understanding neural net activations and gradients in training is crucial for optimizing architectures.
-[02:06-02:46]Refactored code to optimize neural net with 11,000 parameters over 200,000 steps, achieving train and val loss of 2.16.
-[03:03-03:28]Using torch.nograd decorator to prevent gradients computation.

[03:22-14:22]2. Efficiency of torch.no_grad and neural net initialization issues
-[03:22-04:00]Using torch's no_grad makes computation more efficient by eliminating gradient tracking.
-[04:22-04:50]Network initialization causes high loss of 27, rapidly decreases to 1 or 2.
-[05:00-05:32]At initialization, the model aims for a uniform distribution among 27 characters, with roughly 1/27 probability for each.
-[05:49-06:19]Neural net creates skewed probability distributions leading to high loss.
-[12:08-12:36]Loss at initialization as expected, improved to 2.12-2.16

[14:24-36:39]3. Neural network initialization
-[16:03-16:31]The chain rule with local gradient is affected when outputs of tanh are close to -1 or 1, leading to a halt in back propagation.
-[18:09-18:38]Concern over destructive gradients in flat regions of h outputs, tackled by analyzing absolute values.
-[26:03-26:31]Optimization led to improved validation loss from 2.17 to 2.10 by fixing softmax and 10-inch layer issues.
-[29:28-30:02]Standard deviation expanded to three, aiming for unit gaussian distribution in neural nets.
-[30:17-30:47]Scaling down by 0.2 shrinks gaussian with standard deviation 0.6.
-[31:03-31:46]Initializing neural network weights for well-behaved activations, kaiming he et al.
-[36:24-36:55]Modern innovations have improved network stability and behavior, including residual connections, normalization layers, and better optimizers.

[36:39-51:52]4. Neural net initialization and batch normalization
-[36:39-37:05]Modern innovations like normalization layers and better optimizers reduce the need for precise neural net initialization.
-[40:32-43:04]Batch normalization enables reliable training of deep neural nets, ensuring roughly gaussian hidden states for improved performance.
-[40:51-41:13]Batch normalization from 2015 enabled reliable training of deep neural nets.
-[41:39-42:09]Standardizing hidden states to be unit gaussian is a perfectly differentiable operation, a key insight in the paper.
-[43:20-43:50]Calculating standard deviation of activations, mean is average value of neuron's activation.
-[45:45-46:16]Back propagation guides distribution movement, adding scale and shift for final output

[51:52-01:01:35]5. Jittering and batch normalization in neural network training
-[52:10-52:37]Padding input examples adds entropy, augments data, and regularizes neural nets.
-[53:44-54:09]Batch normalization effectively controls activations and their distributions.
-[56:05-56:33]Batch normalization paper introduces running mean and standard deviation estimation during training.
-[01:00:46-01:01:10]Eliminated explicit calibration stage, almost done with batch normalization, epsilon prevents division by zero.

[01:01:36-01:09:21]6. Batch normalization and resnet in pytorch
-[01:02:00-01:02:30]Biases are subtracted out in batch normalization, reducing their impact to zero.
-[01:03:13-01:03:53]Using batch normalization to control activations in neural net, with gain, bias, mean, and standard deviation parameters.
-[01:07:25-01:07:53]Creating deep neural networks with weight layers, normalization, and non-linearity, as exemplified in the provided code.

[01:09:21-01:23:37]7. Pytorch weight initialization and batch normalization
-[01:10:05-01:10:32]Pytorch initializes weights using 1/fan-in square root from a uniform distribution.
-[01:11:11-01:11:40]Scaling weights by 1 over sqrt of fan in, using batch normalization layer in pytorch with 200 features.
-[01:14:02-01:14:35]Importance of understanding activations and gradients in neural networks, especially as they get bigger and deeper.
-[01:16:00-01:16:30]Batch normalization centers data for gaussian activations in deep neural networks.
-[01:17:32-01:18:02]Batch normalization, influential in 2015, enabled reliable training of much deeper neural nets.

[01:23:39-01:55:56]8. Custom pytorch layer and network analysis
-[01:24:01-01:24:32]Updating buffers using exponential moving average with torch.nograd context manager.
-[01:25:47-01:27:11]The model has 46,000 parameters and uses pytorch for forward and backward passes, with visualizations of forward pass activations.
-[01:28:04-01:28:30]Saturation stabilizes at 20% initially, then stabilizes at 5% with a standard deviation of 0.65 due to gain set at 5 over 3.
-[01:33:19-01:33:50]Setting gain correctly at 1 prevents shrinking and diffusion in batch normalization.
-[01:38:41-01:39:11]The last layer has gradients 100 times greater, causing faster training, but it self-corrects with longer training.
-[01:43:18-01:43:42]Monitoring update ratio for parameters to ensure efficient training, aiming for -3 on log plot.
-[01:51:36-01:52:04]Introduce batch normalization and pytorch modules for neural networks.
-[01:52:39-01:53:06]Introduction to diagnostic tools for neural network analysis.
-[01:54:45-01:55:50]Introduction to diagnostic tools in neural networks, active research in initialization and backpropagation, ongoing progress

offered by Coursnap    See Less

@adamskrodzki61...  3 months ago

Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.     See Less

@styssine  3 months ago

This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.     See Less

@lucianovidal87...  3 months ago

The amount of useful information in this video is impressive. Thanks for such good content.     See Less

@sanjaybhatikar  3 months ago

I keep coming back to these videos again and again. Andrej is legend!     See Less

@JuliusSmith  3 months ago

Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!     See Less

@pravingaikwad1...  3 months ago

what is the purpose of bnmean_running and bnstd_running?     See Less