Feeda - OnScreen Live

What is Q* | Reinforcement learning 101 & Hypothesis

AI Jason

118K subscribers

33.7k views • 1 year ago

Links - Jim Fan's tweet: https://twitter.com/DrJimFan/status/1728100123862004105 - Reinforcement learning deep dive: ...

Anything else I missed about Q*? Leave comment & let me...

22 Comments

@AIJasonZ 1 year ago

Anything else I missed about Q*? Leave comment & let me know! See Less

@dancingdudezz 1 year ago

hey , Can you please make a video on detection on some significant insight using the reinforcement learning.
I was curious about making the model to learn itself about the irregular pat See More needs to be classified using the reinforcement learning See Less

@HarpaAI 1 year ago

Great overview! Jason, your videos on the AI topic are the best!

00:00 🤖 "Q Star" is generating a lot of disc See More the AI community, and it's associated with OpenAI's recent actions, but its exact nature remains speculative.
01:08 🎮 Reinforcement learning is a machine learning framework where an agent learns from trial and error, aiming to maximize future rewards. It involves policy networks and value networks.
03:25 🧠 Reinforcement learning allows AI agents to self-play and discover new strategies, as demonstrated by DeepMind's achievements in games like Breakout and AlphaGo.
08:01 📚 There's speculation that "Q Star" could involve using policy networks and value networks, similar to AlphaGo, to improve reasoning and logic in large language models like GPT.
11:14 🐍 You can experiment with reinforcement learning in simple games with open-source projects, even if you're new to the field. See Less

@Laurie-eg8ct 1 year ago

How does the reward system work for reinforcing behavior beyond Pavlovian bell sounds that signal approval? See Less

@user-dt7px5xp6z 1 year ago

Can't wait for it to be open sourced 😂 See Less

@abdelkaioumbouaicha 1 year ago

📝 Summary of Key Points:

📌 Reinforced learning is a machine learning framework that allows AI to learn from its own trials and errors by receiving rewards or penalties based o See More ons.
🧐 AI systems like DeepMind's AlphaGo have achieved superhuman performance in tasks through reinforced learning, discovering new strategies in the process.
🚀 Reinforced learning could be applied to large language models like GPT, improving reasoning and logic capabilities by proposing multiple solutions and evaluating their value.
📌 OpenAI's research paper "Let's Verify Step by Step" explores a reward model for large language models, involving another model critiquing the reasoning process for better results.

💡 Additional Insights and Observations:

💬 "The ability of AI to explore different paths and uncover novel solutions is seen as a promising development."
📊 No specific data or statistics were mentioned in the video.
🌐 OpenAI's research paper "Let's Verify Step by Step" can be referenced for further information on the reward model for large language models.

📣 Concluding Remarks:

Reinforced learning is a powerful framework in AI that allows machines to learn from their own experiences. It has shown remarkable success in tasks like playing games and could potentially enhance the reasoning and logic capabilities of large language models. OpenAI's recent breakthrough, qar, has sparked excitement and speculation within the AI community, and further research, like the "Let's Verify Step by Step" paper, is exploring new ways to improve language models through reinforced learning.
Generated using Talkbud (Browser Extension) See Less

@nickstaresinic4031 1 year ago

Very well organized and informative presentation. See Less

@jayhu6075 1 year ago

I think Q* must be OPEN SOURCE for benefit humanity. Not only for big companies. See Less

@lucamatteobarbieri2493 1 year ago

Open*AI See Less

@csabaczcsomps7655 1 year ago

Q is question and * is repeat, so make sintezis of lot answer you got general inteligent ansver. My noob opinion. See Less

AI News AI News

11:35

OpenAIs Problems Are Getting Worse....

TheAIGRID

22.1k views • 3 days ago

22:18

AGI: (gets close), Humans: ‘Who Gets to Own it?’

AI Explained

103.0k views • 1 month ago

18:33

Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research

AI Explained

117.6k views • 1 month ago

11:00

OpenAI PROVES DeepSeek COPIED Them!

TheAIGRID

62.7k views • 1 month ago

11:15

AI NEWS: ChatGPT New Feature "TASKS" | White House New AI Mandate | AutoGen v0.4 and more!

Wes Roth

35.0k views • 1 month ago

15:52

AI Predictions for 2025 & More on OpenAI's o3

AI For Humans

5.4k views • 2 months ago

81:55

Founding fathers on today's America

Andrej Karpathy

34.7k views • 2 months ago

08:18

Teslabot STUNNED As China Releases Fully Autonomous Humanoid Robots

TheAIGRID

217 views • 3 months ago

08:27

Google's Quantum Chip Just Broke Reality (You Won't Believe What It Can Do!)

TheAIGRID

48.5k views • 3 months ago

52:16

The Future of AI: OpenAI's 12 Days of Surprises

AI For Humans

5.1k views • 3 months ago

Shorts

Google AI presents VEO it's answer to Op...

6.2k views

Yes, This AI Was Trained on Babies #arti...

2.1k views

Weird New AI Facial Animation Software #...

1.4k views

ChatGPT Turns You Into a Bad Electrician...

2.8k views

AI Can Make You Run On Water #artificial...

2.0k views

Shelf will be hidden for 30 daysUndo