Feeda - OnScreen Live

StreamingLLM - Extend Llama2 to 4 million token & 22x faster inference?

AI Jason

100K subscribers

19.1k views • 7 months ago

It's hard to get LLM generate big amount of content and take in large inputs; To solve this, introducing StreamingLLM, Extend ...

I don't understand how can it help even for books. Will...

22 Comments

@zyxwvutsrqponmlkh 4 months ago

Ask the llm to summarize stuff when it's running low, should be better than just rememberig the first bit and a window. See Less

@jonathanberry1111 5 months ago

🎯 Key Takeaways for quick navigation:

00:00 📚 Increasing Data Input for Large Language Models
- Large languag See More ace challenges with increased data input.
- GPU memory limitations and computational time impact performance.
- The concept of "Window attention" has been used to mitigate these issues.
01:09 🔄 Introducing StreamingLM and Attention Sync
- StreamingLM is a research project to enhance data input for large language models.
- Attention Sync focuses on the importance of initial tokens for context.
- StreamingLM combines initial tokens and a rolling cache for effective context.
02:49 🔓 Unlocking Possibilities with StreamingLM
- StreamingLM enables handling long-form content generation and movie transcripts.
- It works well for scenarios that require generating a large amount of content.
- However, it may not handle extremely complex tasks with extensive context loss.

Made with HARPA AI See Less

@ilyasshynbergen7693 6 months ago

is there is still no solution for extending data? See Less

@unicornist 6 months ago

wait wait! you just finished too quickly I hope you have elaborated more on what can't be accomplished I didn't quite get why and the difference between what we can achieve with long See More ents. See Less

@fab_spaceinvaders 6 months ago

as usual, inspiring, accurate and updated. ty sir See Less

@aldorodriguez7310 6 months ago

Which LLM has the largest token limit to expand the context length of the chat? See Less

@elmflor4365 6 months ago

This is my third video I’ve seen today from you and you are so consistent with providing value with your words. Thank you my new AI guru🙏 See Less

@juancasas5532 6 months ago

i luv u See Less

@Dron008 6 months ago

I don't understand how can it help even for books. Will it forget everything in the middle of the book? I try to think how it works in human brain. When we reed a book we (usually) don&# See More ber each word. What we do, we create visual images inside and it compress the book very effectively. Supposedly, these images are like tokens or maybe like embeddings and don't occupy much space in memory. Is it possible to implement something like this for LLMs? They should kind of learn during "reading the book" and they should convert texts to multimodal embeddings or even find (create) approximate path in embedding space and later they should have the ability to analyze this path later. Not sure how it should be implemented. See Less

@BorutDelFabbro 6 months ago

Great job at providing information about new developments, Jason! Thanks! See Less

AI News AI News

27:44

STUNNING Medical AI Agents OUTPERFORM Doctors 🤯trained in the simulation, continuous improvement.

Wes Roth

4.7k views • 15 hours ago

13:31

Googles ALPHAFOLD-3 Just Changed EVERYTHING! (AlphaFold 3 Explained)

TheAIGRID

11.2k views • 1 day ago