Setup helicone to monitor your LLM app cost now: https://www.helicone.ai/?utm_source=ai-jason
Join AI builder club to access     See More
AI Jason
186K subscribersCAG intro + Build a MCP server that read API docs Setup helicone to monitor your LLM app cost now: ...
Setup helicone to monitor your LLM app cost now:
50 Comments
TheAIGRID
99.5k views • 1 day ago
Wes Roth
91.1k views • 1 day ago
Wes Roth
11.6k views • 1 day ago
AI For Humans
2.1k views • 2 days ago
Wes Roth
79.9k views • 2 days ago
TheAIGRID
21.2k views • 2 days ago
Wes Roth
10.0k views • 2 days ago
AI For Humans
25.0k views • 3 days ago
TheAIGRID
13.6k views • 3 days ago
Wes Roth
80.3k views • 3 days ago
Shelf will be hidden for 30 daysUndo
Wes Roth
13.8k views • 3 days ago
Wes Roth
24.6k views • 4 days ago
TheAIGRID
8.1k views • 4 days ago
AI For Humans
8.9k views • 5 days ago
Wes Roth
29.3k views • 5 days ago
TheAIGRID
17.6k views • 5 days ago
TheAIGRID
49.3k views • 1 week ago
AI Jason
31.5k views • 1 week ago
Wes Roth
19.4k views • 1 week ago
Wes Roth
15.0k views • 1 week ago
Wes Roth
201.6k views • 1 week ago
TheAIGRID
32.6k views • 1 week ago
Wes Roth
148.3k views • 1 week ago
TheAIGRID
3.3k views • 1 week ago
TheAIGRID
23.5k views • 1 week ago
Wes Roth
45.1k views • 1 week ago
AI Explained
90.5k views • 1 week ago
Wes Roth
17.4k views • 1 week ago
AI For Humans
18.8k views • 1 week ago
Wes Roth
4.9k views • 1 week ago
AI Jason
16.1k views • 1 week ago
Wes Roth
51.5k views • 1 week ago
TheAIGRID
439.3k views • 1 week ago
AI Explained
95.2k views • 1 week ago
Wes Roth
103.7k views • 1 week ago
AI For Humans
25.6k views • 1 week ago
Wes Roth
53.8k views • 1 week ago
AI Jason
7.6k views • 1 week ago
TheAIGRID
31.1k views • 1 week ago
AI For Humans
1.8k views • 1 week ago
AI Explained
74.6k views • 1 week ago
Wes Roth
62.3k views • 1 week ago
TheAIGRID
37.3k views • 2 weeks ago
Wes Roth
9.5k views • 2 weeks ago
Wes Roth
3.0k views • 2 weeks ago
Wes Roth
5.9k views • 2 weeks ago
Wes Roth
13.9k views • 2 weeks ago
TheAIGRID
39.8k views • 2 weeks ago
AI For Humans
13.8k views • 2 weeks ago
Wes Roth
99.4k views • 2 weeks ago
Wes Roth
58.1k views • 2 weeks ago
AI For Humans
1.5k views • 2 weeks ago
TheAIGRID
9.6k views • 2 weeks ago
TheAIGRID
35.8k views • 3 weeks ago
AI For Humans
21.2k views • 3 weeks ago
AI For Humans
1.3k views • 3 weeks ago
TheAIGRID
29.3k views • 3 weeks ago
AI Jason
24.4k views • 3 weeks ago
AI For Humans
1.5k views • 3 weeks ago
AI For Humans
1.7k views • 3 weeks ago
TheAIGRID
11.6k views • 4 weeks ago
TheAIGRID
24.4k views • 1 month ago
AI For Humans
12.7k views • 1 month ago
TheAIGRID
13.6k views • 1 month ago
TheAIGRID
22.0k views • 1 month ago
TheAIGRID
7.6k views • 1 month ago
TheAIGRID
39.1k views • 1 month ago
AI Explained
98.3k views • 1 month ago
TheAIGRID
27.7k views • 1 month ago
AI Explained
60.0k views • 1 month ago
AI For Humans
17.2k views • 1 month ago
TheAIGRID
5.9k views • 1 month ago
TheAIGRID
50.8k views • 1 month ago
AI Jason
45.3k views • 1 month ago
AI For Humans
16.5k views • 1 month ago
AI Explained
92.6k views • 1 month ago
AI Explained
56.7k views • 1 month ago
AI For Humans
10.3k views • 1 month ago
AI Jason
221.8k views • 1 month ago
AI Explained
72.1k views • 1 month ago
AI For Humans
11.3k views • 1 month ago
AI For Humans
3.4k views • 2 months ago
AI Explained
108.0k views • 2 months ago
AI For Humans
14.8k views • 2 months ago
AI Jason
97.6k views • 2 months ago
AI Explained
135.6k views • 2 months ago
AI Explained
93.2k views • 2 months ago
AI For Humans
7.6k views • 2 months ago
AI Jason
211.8k views • 2 months ago
AI For Humans
949 views • 2 months ago
AI Jason
16.2k views • 2 months ago
AI For Humans
987 views • 2 months ago
AI For Humans
1.1k views • 2 months ago
AI Explained
117.1k views • 2 months ago
AI For Humans
11.2k views • 2 months ago
AI Jason
82.4k views • 2 months ago
AI For Humans
1.0k views • 2 months ago
AI Explained
109.7k views • 3 months ago
Andrej Karpathy
1.4M views • 3 months ago
AI Explained
135.2k views • 3 months ago
AI Jason
189.7k views • 3 months ago
AI Explained
111.2k views • 3 months ago
Andrej Karpathy
2.6M views • 3 months ago
AI Jason
14.7k views • 3 months ago
AI Explained
122.8k views • 3 months ago
AI Jason
18.1k views • 3 months ago
AI Explained
107.6k views • 4 months ago
AI Explained
182.7k views • 4 months ago
AI Jason
51.4k views • 4 months ago
AI Explained
106.0k views • 4 months ago
AI Jason
46.6k views • 4 months ago
AI Jason
43.4k views • 4 months ago
AI Explained
108.3k views • 4 months ago
AI Jason
68.1k views • 4 months ago
AI Explained
287.4k views • 5 months ago
Andrej Karpathy
34.7k views • 5 months ago
AI Jason
96.7k views • 5 months ago
AI Explained
87.3k views • 5 months ago
AI Explained
74.9k views • 5 months ago
AI Explained
153.6k views • 5 months ago
AI Explained
116.9k views • 5 months ago
AI Jason
62.6k views • 6 months ago
AI Jason
325.6k views • 7 months ago
AI Jason
151.4k views • 8 months ago
AI Jason
201.3k views • 8 months ago
AI Jason
31.3k views • 8 months ago
AI Jason
19.7k views • 9 months ago
AI Jason
124.6k views • 10 months ago
Andrej Karpathy
811.3k views • 11 months ago
Morningside AI
13.8k views • 1 year ago
Andrej Karpathy
813.6k views • 1 year ago
Morningside AI
4.2k views • 1 year ago
Morningside AI
10.1k views • 1 year ago
Andrej Karpathy
2.8M views • 1 year ago
Morningside AI
26.7k views • 1 year ago
Andrej Karpathy
5.7M views • 2 years ago
Andrej Karpathy
223.2k views • 2 years ago
Andrej Karpathy
268.3k views • 2 years ago
Andrej Karpathy
384.0k views • 2 years ago
Andrej Karpathy
423.4k views • 2 years ago
50 Comments
Setup helicone to monitor your LLM app cost now: https://www.helicone.ai/?utm_source=ai-jason
Join AI builder club to access     See More
More context = less attention, more latency, less repeatability in answer generation, more tokens, less concurrency, more cost
Its true but CAG has a limit in TKM, tokens per minute, and tokens per request rate limit, with this limits you need play with LLMs models for implement CAG systems, so if you have a enourmo     See More
I don't think this is CAG, this is.. just a BFP (big f!cking prompt).. I know there was a research paper about CAG and Im pretty sure it requires manipulating the internals of the model.
Helicone is honestly awful, would recommend using LangFuse instead. We used to be paying users of helicone, but their software is so slow and sluggish
strongly agree with this approach that I am experimenting as well : with RAG frameworks even with multi hop query etc. retrieval was really complicated. With CAG it destroy every query I do.     See More
Thank you very much for the information.. this is definitely better than RAG
How could we implement this in N8N?
I don't feel like you described actual CAG.....you left out the mechanism of 'C' (cache). CAG actually caches the KV computed values of your static knowledge base in the first l     See More
Setup helicone to monitor your LLM app cost now: https://www.helicone.ai/?utm_source=ai-jason
Join AI builder club to access     See More ple & Doc MCP: http://aibuilderclub.com/    See Less
More context = less attention, more latency, less repeatability in answer generation, more tokens, less concurrency, more cost     See Less
Its true but CAG has a limit in TKM, tokens per minute, and tokens per request rate limit, with this limits you need play with LLMs models for implement CAG systems, so if you have a enourmo     See More of context CAG is not the way    See Less
This is just a worse version of RAG, why are we going backwards?     See Less
I don't think this is CAG, this is.. just a BFP (big f!cking prompt).. I know there was a research paper about CAG and Im pretty sure it requires manipulating the internals of the model.     See Less
Helicone is honestly awful, would recommend using LangFuse instead. We used to be paying users of helicone, but their software is so slow and sluggish     See Less
strongly agree with this approach that I am experimenting as well : with RAG frameworks even with multi hop query etc. retrieval was really complicated. With CAG it destroy every query I do.     See More ven more clever I think for large dataset is like a simplified GraphRAG by labelling the docs with tags, put every document that have relevant tags into the cache, and still perform a RAG query on every documents for precise and local request (for example, query like « who is xxx » where RAG works well on proper names) to know in which document the info is and load in the cache all the document with the same tags that the one found with the RAG and boom, knowledge issue is done    See Less
Thank you very much for the information.. this is definitely better than RAG
How could we implement this in N8N?     See Less
Preloading whole database into context? 🤣     See Less
I don't feel like you described actual CAG.....you left out the mechanism of 'C' (cache). CAG actually caches the KV computed values of your static knowledge base in the first l     See More he model. Then, any incoming prompts are added as tokens AFTER that precomputed data. The model has to do far fewer computations to begin outputting the first tokens. This maximizes speed to first token, which is a huge part of building production chat/agents. However, this approach will require some model interface/API changes and is currently only supported in Gemini's latest offering as far as I know. There is another video you can search on "rag vs cag solving knowledge gaps" by ibm which goes into the caching mechanism.    See Less