Feeda - OnScreen Live

Don't do RAG - This method is way faster & accurate...

AI Jason

171K subscribers

48.0k views • 1 week ago

CAG intro + Build a MCP server that read API docs Setup helicone to monitor your LLM app cost now: ...

Setup helicone to monitor your LLM app cost now:

32 Comments

@AIJasonZ 1 week ago

Setup helicone to monitor your LLM app cost now: https://www.helicone.ai/?utm_source=ai-jason

Join AI builder club to access See More ple & Doc MCP: http://aibuilderclub.com/ See Less

@EDA.architect 6 days ago

More context = less attention, more latency, less repeatability in answer generation, more tokens, less concurrency, more cost See Less

@christianmagnus1003 6 days ago

Its true but CAG has a limit in TKM, tokens per minute, and tokens per request rate limit, with this limits you need play with LLMs models for implement CAG systems, so if you have a enourmo See More of context CAG is not the way See Less

@mojajojajo 1 week ago

This is just a worse version of RAG, why are we going backwards? See Less

@EternalKernel 1 week ago

I don't think this is CAG, this is.. just a BFP (big f!cking prompt).. I know there was a research paper about CAG and Im pretty sure it requires manipulating the internals of the model. See Less

@eitomiyamura 1 week ago

Helicone is honestly awful, would recommend using LangFuse instead. We used to be paying users of helicone, but their software is so slow and sluggish See Less

@quentinnnnn 1 week ago

strongly agree with this approach that I am experimenting as well : with RAG frameworks even with multi hop query etc. retrieval was really complicated. With CAG it destroy every query I do. See More ven more clever I think for large dataset is like a simplified GraphRAG by labelling the docs with tags, put every document that have relevant tags into the cache, and still perform a RAG query on every documents for precise and local request (for example, query like « who is xxx » where RAG works well on proper names) to know in which document the info is and load in the cache all the document with the same tags that the one found with the RAG and boom, knowledge issue is done See Less

@AvatarsGeek 1 week ago

Thank you very much for the information.. this is definitely better than RAG

How could we implement this in N8N? See Less

@beckbeckend7297 1 week ago

Preloading whole database into context? 🤣 See Less

@tshock22 1 week ago

I don't feel like you described actual CAG.....you left out the mechanism of 'C' (cache). CAG actually caches the KV computed values of your static knowledge base in the first l See More he model. Then, any incoming prompts are added as tokens AFTER that precomputed data. The model has to do far fewer computations to begin outputting the first tokens. This maximizes speed to first token, which is a huge part of building production chat/agents. However, this approach will require some model interface/API changes and is currently only supported in Gemini's latest offering as far as I know. There is another video you can search on "rag vs cag solving knowledge gaps" by ibm which goes into the caching mechanism. See Less

AI News AI News

13:25

Bill Gates Surprising AI Statement " Humans Will No Longer Be Needed"

TheAIGRID

2.7k views • 16 hours ago

09:14

Amazons NEW AI Agents Are Actually Impressive... "Amazon Nova ACT"

TheAIGRID

7.9k views • 1 day ago

15:43

Runways Text To Video "GEN 4" Actually Changes The Industry!

TheAIGRID

11.6k views • 2 days ago

17:51

The TRUTH About Sam Altman's Firing; Peter Thiel, Eliezer Yudkowsky and Effective Altruism

Wes Roth

78.8k views • 3 days ago