Smash the like button if you’re also excited to see break down on how to build the Deepseek R1 from scratch!🚀
Andrej Karpathy
458K subscribersWe reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we ...
If anyone is following from nn-zero-to-hero implementation a...
18 Comments
TheAIGRID
22.1k views • 3 days ago
AI Explained
103.0k views • 1 month ago
AI Explained
117.6k views • 1 month ago
TheAIGRID
62.7k views • 1 month ago
Wes Roth
35.0k views • 1 month ago
AI For Humans
5.4k views • 2 months ago
Andrej Karpathy
34.7k views • 2 months ago
TheAIGRID
217 views • 3 months ago
TheAIGRID
48.5k views • 3 months ago
AI For Humans
5.1k views • 3 months ago
Shelf will be hidden for 30 daysUndo
Wes Roth
22.8k views • 3 months ago
AI For Humans
1.0k views • 3 months ago
Wes Roth
75.1k views • 3 months ago
Wes Roth
43.1k views • 3 months ago
Wes Roth
19.1k views • 3 months ago
Wes Roth
6.4k views • 3 months ago
Wes Roth
5.2k views • 3 months ago
Wes Roth
9.0k views • 4 months ago
Wes Roth
15.7k views • 4 months ago
Wes Roth
2.3k views • 4 months ago
Wes Roth
10.8k views • 5 months ago
AI Jason
56.9k views • 5 months ago
Wes Roth
82.8k views • 5 months ago
Wes Roth
17.6k views • 5 months ago
Wes Roth
81.6k views • 5 months ago
Wes Roth
70.8k views • 5 months ago
Wes Roth
183.9k views • 5 months ago
Wes Roth
35.0k views • 5 months ago
Wes Roth
53.5k views • 6 months ago
Wes Roth
56.7k views • 6 months ago
TheAIGRID
15.6k views • 6 months ago
Wes Roth
59.7k views • 6 months ago
AI For Humans
626 views • 6 months ago
Wes Roth
10.6k views • 6 months ago
Wes Roth
19.1k views • 7 months ago
Wes Roth
53.5k views • 7 months ago
Wes Roth
70.3k views • 7 months ago
Wes Roth
49.0k views • 7 months ago
Wes Roth
21.3k views • 7 months ago
Wes Roth
40.2k views • 7 months ago
Wes Roth
59.4k views • 7 months ago
Wes Roth
55.4k views • 7 months ago
AI Jason
56.1k views • 7 months ago
Wes Roth
25.4k views • 7 months ago
Wes Roth
40.3k views • 7 months ago
Wes Roth
100.8k views • 7 months ago
Wes Roth
90.4k views • 7 months ago
AI Jason
13.0k views • 7 months ago
Wes Roth
67.1k views • 7 months ago
Wes Roth
34.6k views • 7 months ago
Wes Roth
84.7k views • 7 months ago
Wes Roth
42.6k views • 8 months ago
Wes Roth
31.7k views • 8 months ago
Wes Roth
48.3k views • 8 months ago
Wes Roth
43.4k views • 8 months ago
Wes Roth
25.9k views • 8 months ago
Wes Roth
69.4k views • 8 months ago
Wes Roth
46.3k views • 8 months ago
Wes Roth
34.1k views • 8 months ago
Wes Roth
75.3k views • 8 months ago
AI Jason
15.0k views • 8 months ago
AI For Humans
1.8k views • 8 months ago
Andrej Karpathy
686.3k views • 9 months ago
AI Jason
17.6k views • 9 months ago
TheAIGRID
21.0k views • 9 months ago
TheAIGRID
29.2k views • 9 months ago
TheAIGRID
36.3k views • 9 months ago
TheAIGRID
10.5k views • 9 months ago
TheAIGRID
61.1k views • 9 months ago
AI For Humans
5.6k views • 9 months ago
TheAIGRID
14.1k views • 9 months ago
AI Explained
151.7k views • 9 months ago
TheAIGRID
4.9k views • 9 months ago
TheAIGRID
95.1k views • 9 months ago
TheAIGRID
16.8k views • 9 months ago
TheAIGRID
54.5k views • 9 months ago
TheAIGRID
43.5k views • 9 months ago
TheAIGRID
18.7k views • 9 months ago
TheAIGRID
30.1k views • 9 months ago
TheAIGRID
39.1k views • 9 months ago
AI Jason
75.1k views • 9 months ago
TheAIGRID
176.2k views • 9 months ago
TheAIGRID
37.7k views • 9 months ago
TheAIGRID
17.5k views • 9 months ago
TheAIGRID
35.1k views • 9 months ago
AI Explained
388.7k views • 9 months ago
TheAIGRID
71.3k views • 9 months ago
TheAIGRID
55.3k views • 9 months ago
TheAIGRID
6.2k views • 10 months ago
TheAIGRID
27.9k views • 10 months ago
TheAIGRID
14.6k views • 10 months ago
AI For Humans
948 views • 10 months ago
TheAIGRID
20.8k views • 10 months ago
TheAIGRID
25.3k views • 10 months ago
TheAIGRID
36.6k views • 10 months ago
AI Explained
129.2k views • 10 months ago
AI Explained
97.7k views • 10 months ago
AI For Humans
5.7k views • 10 months ago
AI Jason
354.8k views • 10 months ago
AI For Humans
667 views • 10 months ago
AI For Humans
3.5k views • 10 months ago
Morningside AI
13.5k views • 10 months ago
AI For Humans
781 views • 10 months ago
AI Explained
129.9k views • 10 months ago
AI For Humans
1.5k views • 10 months ago
AI Jason
49.4k views • 10 months ago
AI Explained
118.4k views • 11 months ago
AI For Humans
3.0k views • 11 months ago
AI Jason
113.7k views • 11 months ago
AI For Humans
387 views • 11 months ago
AI For Humans
3.6k views • 11 months ago
AI Explained
118.3k views • 11 months ago
AI For Humans
2.3k views • 11 months ago
AI For Humans
1.7k views • 11 months ago
AI Jason
30.7k views • 11 months ago
AI For Humans
339 views • 11 months ago
AI For Humans
2.6k views • 11 months ago
AI Explained
106.4k views • 11 months ago
AI Explained
131.0k views • 11 months ago
AI For Humans
1.5k views • 11 months ago
AI Jason
218.6k views • 1 year ago
AI For Humans
1.4k views • 1 year ago
AI For Humans
1.8k views • 1 year ago
AI Explained
181.1k views • 1 year ago
AI Jason
35.1k views • 1 year ago
AI Explained
151.1k views • 1 year ago
Andrej Karpathy
482.7k views • 1 year ago
AI Explained
241.8k views • 1 year ago
AI Jason
63.7k views • 1 year ago
AI Explained
187.7k views • 1 year ago
AI Explained
161.6k views • 1 year ago
AI Jason
91.0k views • 1 year ago
AI Explained
272.8k views • 1 year ago
AI Jason
61.4k views • 1 year ago
AI Explained
96.8k views • 1 year ago
AI Jason
7.2k views • 1 year ago
AI Explained
145.9k views • 1 year ago
AI Explained
133.4k views • 1 year ago
AI Explained
79.5k views • 1 year ago
AI Jason
16.8k views • 1 year ago
AI Explained
84.1k views • 1 year ago
AI Explained
74.6k views • 1 year ago
AI Explained
144.9k views • 1 year ago
AI Jason
75.2k views • 1 year ago
Morningside AI
4.1k views • 1 year ago
AI Explained
83.7k views • 1 year ago
AI Jason
140.3k views • 1 year ago
AI Jason
33.7k views • 1 year ago
Morningside AI
9.8k views • 1 year ago
AI Explained
229.6k views • 1 year ago
Andrej Karpathy
1.9M views • 1 year ago
AI Explained
112.8k views • 1 year ago
Morningside AI
26.1k views • 1 year ago
AI Jason
16.3k views • 1 year ago
AI Jason
71.9k views • 1 year ago
AI Jason
53.8k views • 1 year ago
AI Jason
20.4k views • 1 year ago
AI Jason
53.4k views • 1 year ago
AI Jason
28.9k views • 1 year ago
Andrej Karpathy
4.3M views • 2 years ago
Andrej Karpathy
157.3k views • 2 years ago
Andrej Karpathy
172.8k views • 2 years ago
Andrej Karpathy
247.8k views • 2 years ago
Andrej Karpathy
278.3k views • 2 years ago
18 Comments
Smash the like button if you’re also excited to see break down on how to build the Deepseek R1 from scratch!🚀
I am following your work in C++ now in addition to Raff K. I think you would love to render float space, I do this.
Thank you, thanks a lot Andrej, for providing such a course for free, learnt a lot from this course, and am very thankful to you :)
I'm training my GPT2 model on an old 730 card with 2GB ram
It will also run a pre-trained GPT2 LLM out of the box, but you have to use/install python3.7 and CUDA 9.2 to support it.
    See More
Hello Andrej, thank you so much for creating and sharing such high quality resources. It's really impressive and super helpful.
If anyone is following from nn-zero-to-hero implementation and confused about the reason to transpose these openai's weight matrices [attn.c_attn.weight', 'attn.c_proj.weight&     See More
The fact that the number of current subs and views are pretty close is driving me crazy!
Smash the like button if you’re also excited to see break down on how to build the Deepseek R1 from scratch!🚀     See Less
I am following your work in C++ now in addition to Raff K. I think you would love to render float space, I do this.     See Less
Thank you, thanks a lot Andrej, for providing such a course for free, learnt a lot from this course, and am very thankful to you :)     See Less
I'm training my GPT2 model on an old 730 card with 2GB ram
It will also run a pre-trained GPT2 LLM out of the box, but you have to use/install python3.7 and CUDA 9.2 to support it.
    See More you've ever taken a backprop class in college, you can use "gradient accumulation" to effectively have a minibatch size of any size you want, with the small memory footprint of a single sample.    See Less
Thanks!     See Less
Hello Andrej, thank you so much for creating and sharing such high quality resources. It's really impressive and super helpful.     See Less
Andrej! Multimodal. RAG! 🎉 🙏🙏🤗     See Less
If anyone is following from nn-zero-to-hero implementation and confused about the reason to transpose these openai's weight matrices [attn.c_attn.weight', 'attn.c_proj.weight&     See More ;mlp.c_fc.weight], here's an explanation.
In previous videos, we simply defined our own Linear class as
class Layer:
def __init__(self,fan_in, fan_out, bias=False):
self.w = torch.randn((fan_in, fan_out),generator = g) #/ (fan_in)**(0.5) # applying kaiming init
self.bias = bias
if bias:
self.b = torch.zeros(fan_out)
def __call__(self, x):
y = x @ self.w
self.out = y + self.b if self.bias else y
return self.out
But this is a slight deviation from pytorch's implementation, If you go to pytorch's nn.Linear documentation, the weight matrix is constructed with shape (out_features,in_features) and then while applying this weight matrix it is simply transposed y=x@W.T + b.
which is exactly the reverse of what we did in our previous implementation.
so the proper way is to adhere to pytorch's implementation i.e if we define linear layer nn.Linear(fan_in,fan_out) our weight will have shape (fan_out,fan_in)    See Less
The fact that the number of current subs and views are pretty close is driving me crazy!     See Less
God bless you     See Less