Setup helicone to monitor your LLM app cost now: https://www.helicone.ai/?utm_source=ai-jason
Join AI builder club to access     See More
AI Jason
171K subscribersCAG intro + Build a MCP server that read API docs Setup helicone to monitor your LLM app cost now: ...
Setup helicone to monitor your LLM app cost now:
32 Comments
TheAIGRID
2.7k views • 16 hours ago
TheAIGRID
7.9k views • 1 day ago
TheAIGRID
11.6k views • 2 days ago
Wes Roth
78.8k views • 3 days ago
TheAIGRID
4.9k views • 3 days ago
TheAIGRID
28.4k views • 4 days ago
Wes Roth
69.2k views • 4 days ago
TheAIGRID
22.4k views • 4 days ago
TheAIGRID
20.5k views • 5 days ago
Wes Roth
38.5k views • 5 days ago
Shelf will be hidden for 30 daysUndo
AI Explained
97.0k views • 6 days ago
Wes Roth
100.8k views • 6 days ago
TheAIGRID
61.9k views • 6 days ago
TheAIGRID
44.4k views • 1 week ago
AI For Humans
11.1k views • 1 week ago
TheAIGRID
18.9k views • 1 week ago
AI Jason
48.0k views • 1 week ago
Wes Roth
77.3k views • 1 week ago
TheAIGRID
16.2k views • 1 week ago
AI Explained
130.3k views • 1 week ago
AI Explained
89.6k views • 1 week ago
Wes Roth
23.8k views • 1 week ago
TheAIGRID
48.3k views • 1 week ago
TheAIGRID
33.7k views • 1 week ago
Wes Roth
75.1k views • 1 week ago
TheAIGRID
13.1k views • 1 week ago
AI For Humans
6.8k views • 2 weeks ago
TheAIGRID
10.2k views • 2 weeks ago
Wes Roth
58.4k views • 2 weeks ago
AI Jason
168.6k views • 2 weeks ago
TheAIGRID
48.4k views • 2 weeks ago
Wes Roth
47.4k views • 2 weeks ago
TheAIGRID
13.1k views • 2 weeks ago
AI For Humans
709 views • 2 weeks ago
AI Jason
15.2k views • 2 weeks ago
AI For Humans
243 views • 2 weeks ago
TheAIGRID
57.2k views • 2 weeks ago
Wes Roth
86.6k views • 2 weeks ago
TheAIGRID
56.7k views • 2 weeks ago
TheAIGRID
17.9k views • 2 weeks ago
Wes Roth
116.2k views • 2 weeks ago
TheAIGRID
35.1k views • 2 weeks ago
TheAIGRID
96.5k views • 3 weeks ago
AI Explained
113.9k views • 3 weeks ago
Wes Roth
78.3k views • 3 weeks ago
TheAIGRID
27.8k views • 3 weeks ago
AI Jason
70.9k views • 3 weeks ago
Wes Roth
153.2k views • 3 weeks ago
Wes Roth
17.5k views • 3 weeks ago
Wes Roth
404.0k views • 3 weeks ago
TheAIGRID
36.2k views • 3 weeks ago
TheAIGRID
47.0k views • 3 weeks ago
Wes Roth
214.3k views • 3 weeks ago
Wes Roth
72.9k views • 3 weeks ago
Wes Roth
79.3k views • 4 weeks ago
Wes Roth
68.4k views • 4 weeks ago
Wes Roth
143.2k views • 1 month ago
Wes Roth
125.4k views • 1 month ago
AI Explained
108.0k views • 1 month ago
Wes Roth
29.7k views • 1 month ago
Andrej Karpathy
1.1M views • 1 month ago
Wes Roth
6.2k views • 1 month ago
Wes Roth
36.3k views • 1 month ago
Wes Roth
28.4k views • 1 month ago
AI Explained
133.7k views • 1 month ago
AI Jason
150.8k views • 1 month ago
AI Explained
110.4k views • 1 month ago
Andrej Karpathy
2.1M views • 1 month ago
AI Jason
13.1k views • 1 month ago
AI Explained
122.0k views • 2 months ago
AI Jason
17.4k views • 2 months ago
AI Explained
107.3k views • 2 months ago
AI Explained
182.1k views • 2 months ago
AI Jason
50.6k views • 2 months ago
AI Explained
105.7k views • 2 months ago
AI Jason
36.6k views • 2 months ago
AI Jason
38.9k views • 2 months ago
AI Explained
108.0k views • 2 months ago
AI Jason
59.7k views • 2 months ago
AI For Humans
5.4k views • 3 months ago
AI Explained
286.0k views • 3 months ago
Andrej Karpathy
34.7k views • 3 months ago
AI Jason
73.6k views • 3 months ago
AI Explained
87.1k views • 3 months ago
AI Explained
74.8k views • 3 months ago
AI Explained
152.1k views • 3 months ago
AI For Humans
5.1k views • 3 months ago
AI Explained
116.8k views • 3 months ago
AI For Humans
1.0k views • 4 months ago
AI Explained
99.4k views • 4 months ago
AI Jason
60.5k views • 4 months ago
AI Explained
142.2k views • 4 months ago
AI Explained
112.2k views • 5 months ago
AI Jason
298.4k views • 5 months ago
AI Explained
88.6k views • 5 months ago
AI Explained
83.1k views • 5 months ago
AI Explained
166.7k views • 6 months ago
AI Jason
141.0k views • 6 months ago
AI Explained
100.6k views • 6 months ago
AI Explained
168.4k views • 6 months ago
AI Explained
198.5k views • 6 months ago
AI Jason
190.6k views • 6 months ago
AI Jason
30.1k views • 7 months ago
AI Jason
18.7k views • 7 months ago
AI For Humans
626 views • 7 months ago
AI Jason
123.9k views • 8 months ago
AI Jason
17.8k views • 8 months ago
AI Jason
16.4k views • 9 months ago
AI For Humans
1.8k views • 9 months ago
Andrej Karpathy
763.6k views • 9 months ago
AI Jason
19.5k views • 9 months ago
AI For Humans
5.6k views • 10 months ago
AI Explained
151.7k views • 10 months ago
AI Jason
104.0k views • 10 months ago
AI Explained
388.7k views • 10 months ago
AI For Humans
948 views • 10 months ago
AI Explained
129.2k views • 10 months ago
AI Explained
97.7k views • 11 months ago
AI For Humans
5.7k views • 11 months ago
AI Jason
608.3k views • 11 months ago
AI For Humans
667 views • 11 months ago
AI For Humans
3.5k views • 11 months ago
Morningside AI
13.7k views • 11 months ago
AI For Humans
781 views • 11 months ago
AI Explained
129.9k views • 11 months ago
AI For Humans
1.5k views • 11 months ago
AI Jason
60.6k views • 11 months ago
AI Explained
118.4k views • 11 months ago
AI For Humans
3.0k views • 11 months ago
AI For Humans
387 views • 11 months ago
AI For Humans
3.6k views • 11 months ago
AI Explained
118.3k views • 1 year ago
AI For Humans
2.3k views • 1 year ago
AI For Humans
1.7k views • 1 year ago
AI For Humans
339 views • 1 year ago
AI For Humans
2.6k views • 1 year ago
AI Explained
106.4k views • 1 year ago
AI Explained
131.0k views • 1 year ago
AI For Humans
1.5k views • 1 year ago
AI For Humans
1.4k views • 1 year ago
AI For Humans
1.8k views • 1 year ago
AI Explained
181.1k views • 1 year ago
AI Explained
151.1k views • 1 year ago
Andrej Karpathy
742.8k views • 1 year ago
AI Explained
241.8k views • 1 year ago
AI Explained
187.7k views • 1 year ago
AI Explained
161.6k views • 1 year ago
AI Explained
272.8k views • 1 year ago
AI Explained
96.8k views • 1 year ago
AI Explained
145.9k views • 1 year ago
AI Explained
133.4k views • 1 year ago
AI Explained
79.5k views • 1 year ago
AI Explained
84.1k views • 1 year ago
AI Explained
74.6k views • 1 year ago
AI Explained
144.9k views • 1 year ago
Morningside AI
4.2k views • 1 year ago
AI Explained
83.7k views • 1 year ago
Morningside AI
10.1k views • 1 year ago
AI Explained
229.6k views • 1 year ago
Andrej Karpathy
2.7M views • 1 year ago
AI Explained
112.8k views • 1 year ago
Morningside AI
26.6k views • 1 year ago
Andrej Karpathy
5.4M views • 2 years ago
Andrej Karpathy
213.4k views • 2 years ago
Andrej Karpathy
254.8k views • 2 years ago
Andrej Karpathy
363.4k views • 2 years ago
Andrej Karpathy
403.3k views • 2 years ago
32 Comments
Setup helicone to monitor your LLM app cost now: https://www.helicone.ai/?utm_source=ai-jason
Join AI builder club to access     See More
More context = less attention, more latency, less repeatability in answer generation, more tokens, less concurrency, more cost
Its true but CAG has a limit in TKM, tokens per minute, and tokens per request rate limit, with this limits you need play with LLMs models for implement CAG systems, so if you have a enourmo     See More
I don't think this is CAG, this is.. just a BFP (big f!cking prompt).. I know there was a research paper about CAG and Im pretty sure it requires manipulating the internals of the model.
Helicone is honestly awful, would recommend using LangFuse instead. We used to be paying users of helicone, but their software is so slow and sluggish
strongly agree with this approach that I am experimenting as well : with RAG frameworks even with multi hop query etc. retrieval was really complicated. With CAG it destroy every query I do.     See More
Thank you very much for the information.. this is definitely better than RAG
How could we implement this in N8N?
I don't feel like you described actual CAG.....you left out the mechanism of 'C' (cache). CAG actually caches the KV computed values of your static knowledge base in the first l     See More
Setup helicone to monitor your LLM app cost now: https://www.helicone.ai/?utm_source=ai-jason
Join AI builder club to access     See More ple & Doc MCP: http://aibuilderclub.com/    See Less
More context = less attention, more latency, less repeatability in answer generation, more tokens, less concurrency, more cost     See Less
Its true but CAG has a limit in TKM, tokens per minute, and tokens per request rate limit, with this limits you need play with LLMs models for implement CAG systems, so if you have a enourmo     See More of context CAG is not the way    See Less
This is just a worse version of RAG, why are we going backwards?     See Less
I don't think this is CAG, this is.. just a BFP (big f!cking prompt).. I know there was a research paper about CAG and Im pretty sure it requires manipulating the internals of the model.     See Less
Helicone is honestly awful, would recommend using LangFuse instead. We used to be paying users of helicone, but their software is so slow and sluggish     See Less
strongly agree with this approach that I am experimenting as well : with RAG frameworks even with multi hop query etc. retrieval was really complicated. With CAG it destroy every query I do.     See More ven more clever I think for large dataset is like a simplified GraphRAG by labelling the docs with tags, put every document that have relevant tags into the cache, and still perform a RAG query on every documents for precise and local request (for example, query like « who is xxx » where RAG works well on proper names) to know in which document the info is and load in the cache all the document with the same tags that the one found with the RAG and boom, knowledge issue is done    See Less
Thank you very much for the information.. this is definitely better than RAG
How could we implement this in N8N?     See Less
Preloading whole database into context? 🤣     See Less
I don't feel like you described actual CAG.....you left out the mechanism of 'C' (cache). CAG actually caches the KV computed values of your static knowledge base in the first l     See More he model. Then, any incoming prompts are added as tokens AFTER that precomputed data. The model has to do far fewer computations to begin outputting the first tokens. This maximizes speed to first token, which is a huge part of building production chat/agents. However, this approach will require some model interface/API changes and is currently only supported in Gemini's latest offering as far as I know. There is another video you can search on "rag vs cag solving knowledge gaps" by ibm which goes into the caching mechanism.    See Less