Feeda - OnScreen Live

o3 and o4-mini - they’re great, but easy to over-hype

AI Explained

388K subscribers

94.2k views • 7 months ago

Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, ...

🖤 🔥

76 Comments

@soulspawn 7 months ago

🖤 🔥 See Less

@snucke123 7 months ago

something i think would be great to add to simple bench is to add the date as to when a given model was run and tested on the benchmark, its a lot to try and keep up with, so to see a date w See More nk at least be great! maybe just add a date column after "Organization" See Less

@MetaverseAdventures 7 months ago

The o3 issues you uncovered all came back to logic for me and its lack thereof This has been a trend since I started using GPT3.5 and if there has been logic progress, it has been much mor See More than all other metrics. In fact, I am certain that if all other metrics were the same, BUT logic significantly advanced, we would have AGI now. Without logic, it will never beat humans and all the things we would love AI to do and for sure it will NOT replace a developer and so many other intelligent careers as logic really is a key to them all.

A lack of logic = hallucinations

I am not sure why OpenAI said it was hallucination free when it clearly is not. Perhaps for investors to hear? Seems foolish unless those investing really have no clue about the reality of the model which may very well be. See Less

@shApYT 7 months ago

No. AGI is when it can generalized out of its training domain. Tell it to DM a D&D campaign and let's see it try to do anything impressive. See Less

@MatthewKelley-mq4ce 7 months ago

AGI for me is when it is continually learning. But that's not the standard path at the moment. See Less

@CasualTortoise 7 months ago

Thank you, this kind of more level headed analysis is desperately needed!

Since it was released to the plus tier (nice change by OpenAI!) I did a fair amount of testing myself. Overal See More ry, very impressed. The image reasoning is a step change compared to previous OpenAI models. It's also super fun to watch it reasoning in images. It is sort of clumsy, in the same way robots are clumsy when they walk. It's kind of cute when it keeps zooming in on various parts of the image talking to itself 😊.

It did hallucinate for me too. For example it kept insisting that clockwise is 9 -> 8 -> 7 -> 6, as part of its solution to a more complex problem. Finally it admitted that time moves forward 😅.

I feel like this video gave a slightly too negative impression, but as a counterbalance it was great! See Less

@Kleddamag 7 months ago

For the algorithm, I hope you had a wonderful flight. See Less

@SirQuantization 7 months ago

It's somewhat reassuring that OpenAI are saying they won't release a model if it can help people create bioweapons but I'm also not convinced they won't reneg on this obligat See More future. The money they can make from it will likely change their minds. They've backtracked on most of the promises they've made about AI safety, I doubt this will be any different. Hope to be wrong. See Less

@GrindThisGame 7 months ago

You will need to spin up a few digital clones to keep up with the advancing rates of model hype. See Less

@isajoha9962 7 months ago

Open AI models are not most famous for their physical reality proximation descriptions. 🙂 When they catch up, I guess eg Sora might improve. See Less

AI News AI News

00:42

Nano Banana Pro: Take a Selfie With Every Version of You

AI For Humans

1.9k views • 1 day ago

44:40

Google's Nano Banana Pro & Gemini 3 Just Changed Everything!

AI For Humans

10.1k views • 2 days ago

19:38

Google's UNREAL New Nano Banana Pro...

Wes Roth

31.0k views • 2 days ago