o3 and o4-mini - they’re great, but easy to over-hype

AI Explained

388K subscribers

94.2k views  •  7 months ago

Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, ...

Comments407

User Image

🖤 🔥

76 Comments

@soulspawn  7 months ago

🖤 🔥     See Less

@snucke123  7 months ago

something i think would be great to add to simple bench is to add the date as to when a given model was run and tested on the benchmark, its a lot to try and keep up with, so to see a date w     See More nk at least be great! maybe just add a date column after "Organization"    See Less

@MetaverseAdventures  7 months ago

The o3 issues you uncovered all came back to logic for me and its lack thereof This has been a trend since I started using GPT3.5 and if there has been logic progress, it has been much mor     See More than all other metrics. In fact, I am certain that if all other metrics were the same, BUT logic significantly advanced, we would have AGI now. Without logic, it will never beat humans and all the things we would love AI to do and for sure it will NOT replace a developer and so many other intelligent careers as logic really is a key to them all.


A lack of logic = hallucinations


I am not sure why OpenAI said it was hallucination free when it clearly is not. Perhaps for investors to hear? Seems foolish unless those investing really have no clue about the reality of the model which may very well be.    See Less

@shApYT  7 months ago

No. AGI is when it can generalized out of its training domain. Tell it to DM a D&D campaign and let's see it try to do anything impressive.     See Less

@MatthewKelley-mq4ce  7 months ago

AGI for me is when it is continually learning. But that's not the standard path at the moment.     See Less

@CasualTortoise  7 months ago

Thank you, this kind of more level headed analysis is desperately needed!

Since it was released to the plus tier (nice change by OpenAI!) I did a fair amount of testing myself. Overal     See More ry, very impressed. The image reasoning is a step change compared to previous OpenAI models. It's also super fun to watch it reasoning in images. It is sort of clumsy, in the same way robots are clumsy when they walk. It's kind of cute when it keeps zooming in on various parts of the image talking to itself 😊.

It did hallucinate for me too. For example it kept insisting that clockwise is 9 -> 8 -> 7 -> 6, as part of its solution to a more complex problem. Finally it admitted that time moves forward 😅.

I feel like this video gave a slightly too negative impression, but as a counterbalance it was great!    See Less

@Kleddamag  7 months ago

For the algorithm, I hope you had a wonderful flight.     See Less

@SirQuantization  7 months ago

It's somewhat reassuring that OpenAI are saying they won't release a model if it can help people create bioweapons but I'm also not convinced they won't reneg on this obligat     See More future. The money they can make from it will likely change their minds. They've backtracked on most of the promises they've made about AI safety, I doubt this will be any different. Hope to be wrong.    See Less

@GrindThisGame  7 months ago

You will need to spin up a few digital clones to keep up with the advancing rates of model hype.     See Less

@isajoha9962  7 months ago

Open AI models are not most famous for their physical reality proximation descriptions. 🙂 When they catch up, I guess eg Sora might improve.     See Less

00:42

Nano Banana Pro: Take a Selfie With Every Version of You

AI For Humans

1.9k views   •   1 day ago

44:40

Google's Nano Banana Pro & Gemini 3 Just Changed Everything!

AI For Humans

10.1k views   •   2 days ago

19:38

Google's UNREAL New Nano Banana Pro...

Wes Roth

31.0k views   •   2 days ago

14:56

Nano Banana Pro: But Did You Catch These 10 Details?

AI Explained

48.0k views   •   3 days ago

01:41

Google's Nano Banana Pro is INSANE

AI For Humans

4.3k views   •   3 days ago

21:31

The World’s Elite Just Called for an AGI Ban… This Is Bigger Than You Think.

TheAIGRID

7.6k views   •   3 days ago

21:31

The World’s Elite Just Called for an AGI Ban… This Is Bigger Than You Think.

TheAIGRID

12.7k views   •   3 days ago

12:08

the world wasn't ready for Gemini 3

Wes Roth

46.5k views   •   3 days ago

21:43

Gemini 3 Pro: Breakdown

AI Explained

102.9k views   •   4 days ago

23:40

Gemini 3 Shows a Level of Intelligence We Haven’t Seen Before. (Gemini 3 Explained)

TheAIGRID

56.2k views   •   4 days ago

14:08

Gemini 3 just got *scary* good

Wes Roth

53.4k views   •   5 days ago

26:37

xAI's new model is insane...

Wes Roth

49.0k views   •   5 days ago

13:33

This Chip Could Give OpenAI an Unfair Advantage.

TheAIGRID

8.5k views   •   6 days ago

14:40

Researchers Just Broke AI’s Most Important Assumption. (We Were Wrong About LLMs)

TheAIGRID

25.8k views   •   6 days ago

15:37

If This Works… AGI Arrives Early. (Thermodynamic Computing)

TheAIGRID

81.1k views   •   1 week ago

15:07

Google’s SIMA 2: The Most Advanced AI Agent Ever Built

TheAIGRID

14.7k views   •   1 week ago

29:19

SIMA 2 is a "significant step towards AGI" says Google

Wes Roth

35.2k views   •   1 week ago

18:27

Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that

AI Explained

58.7k views   •   1 week ago

45:12

OpenAI Surprise Drops GPT-5.1 But Google Is Lurking

AI For Humans

11.0k views   •   1 week ago

08:14

GPT 5.1 - The AI Update Nobody Expected...

TheAIGRID

13.3k views   •   1 week ago

19:45

Meta’s AI Genius Just Quit — Even Zuckerberg Seems Surprised.

TheAIGRID

117.2k views   •   1 week ago

19:57

the BIG SHORT against the AI BUBBLE (Nov 25th is the day)

Wes Roth

24.8k views   •   1 week ago

13:13

Leaked Letter Reveals OpenAI’s Real Plan... And people Aren't Happy About It.

TheAIGRID

47.1k views   •   1 week ago

48:20

ok WTF is going on? we need to discuss this...

Wes Roth

35.4k views   •   1 week ago

12:54

Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)

AI Explained

57.9k views   •   1 week ago

16:14

OpenAIs New Agent is One Step Closer To Superintelligence. (AI 2027 Is Happening...)

TheAIGRID

20.9k views   •   1 week ago

11:02

KIMI K2 just broke the AI Industry... here's it's "secret"

Wes Roth

29.6k views   •   1 week ago

18:35

Nvidia CEO SHOCKS Everyone: “China Will WIN The AI Race!”

TheAIGRID

12.8k views   •   1 week ago

79:17

"No One is Prepared" the next 1,000 days are CRUCIAL | Emad Mostaque

Wes Roth

85.9k views   •   2 weeks ago

55:23

AI Job Losses Are Real. Don’t Panic (Yet).

AI For Humans

11.6k views   •   2 weeks ago

18:03

Google's new AI project is UNREAL

Wes Roth

53.8k views   •   2 weeks ago

26:42

Claude just developed self awareness

Wes Roth

71.0k views   •   2 weeks ago

24:58

BREAKING: Ilya Sutskever DEPOSED, Sam Altman firing was planned a year in advance and more...

Wes Roth

53.2k views   •   2 weeks ago

37:44

LLMs can't reason

Wes Roth

30.0k views   •   3 weeks ago

36:05

Big AI News : Gemini 3, AI Music Ban, New Humanoid Robot, Groks AGI, UBI Starts and OpenAI Changes!

TheAIGRID

22.3k views   •   3 weeks ago

13:16

GAME OVER! AI Music Is Now BANNED!

TheAIGRID

27.6k views   •   3 weeks ago

62:27

OpenAI Unveils 2028 AGI Plan But First... Sora 2 Is Now For Pets?!

AI For Humans

9.4k views   •   3 weeks ago

08:33

The Design Mode for Claude Code...

AI Jason

32.5k views   •   3 weeks ago

01:25

How I Turned My Pet Into a Sora 2 Cameo Character

AI For Humans

4.7k views   •   3 weeks ago

07:58

OpenAI just said it

Wes Roth

43.5k views   •   3 weeks ago

25:09

Self Improving AI is getting wild

Wes Roth

50.1k views   •   3 weeks ago

01:25

This AI Makes Videos As You Type!

AI For Humans

2.0k views   •   3 weeks ago

01:19

Sora 2 Prompt Allows You To See "Real" AI Movies

AI For Humans

3.2k views   •   4 weeks ago

08:55

Microsoft’s New AI Copilot Update Just Changed The Way You Will Use Computers Forever

TheAIGRID

18.2k views   •   4 weeks ago

15:17

FINALLY, this AI coding tool actually works!

Wes Roth

14.4k views   •   4 weeks ago

52:48

Will OpenAI's ChatGPT Atlas Roll Over Google in 2025?

AI For Humans

13.3k views   •   4 weeks ago

13:52

Google's New Quantum Computing Breakthrough Just SHOCKED THE WORLD! (Quantum Echoes)

TheAIGRID

98.9k views   •   1 month ago

14:14

Did you miss these 2 AI stories? A *Real* LLM-crafted Breakthrough + Continual Learning Blocked?

AI Explained

57.3k views   •   1 month ago

10:52

OpenAI Is Facing Massive Backlash!

TheAIGRID

19.6k views   •   1 month ago

23:43

ChatGPT Atlas is live | Testing OpenAI's new AI browser

Wes Roth

28.4k views   •   1 month ago

14:23

Elon Musk STUNS : Grok 5 Will Be AGI! (Grok-5 Details)

TheAIGRID

28.0k views   •   1 month ago

15:02

China's Latest Medical Breakthrough Will Change YOUR Body Forever (Bone-02)

TheAIGRID

14.7k views   •   1 month ago

15:51

Google's New AI Just Made a Shocking Cancer Discovery – And Scientists Proved It's REAL

TheAIGRID

18.6k views   •   1 month ago

14:44

Google SLAMS OpenAI's GPT-5: This Is EMBARASSING!

TheAIGRID

27.4k views   •   1 month ago

05:14

Claude Skills - the SOP for your agent that is bigger than MCP

AI Jason

30.5k views   •   1 month ago

24:40

Grok just 5X’d real money in one day

Wes Roth

168.0k views   •   1 month ago

14:17

AI Community Outraged As OpenAI Plans New Feature For GPT-6

TheAIGRID

26.6k views   •   1 month ago

48:09

OpenAI’s Curvy Road to AGI Includes Sora 2 and… Erotica??

AI For Humans

11.4k views   •   1 month ago

28:09

Dead Internet Theory is now True | AgentKit by OpenAI | Apps in ChatGPT and other AI News

Wes Roth

60.8k views   •   1 month ago

35:42

VEO 3.1 is UNLEASHED...

Wes Roth

39.4k views   •   1 month ago

37:59

BIG AI News: Sora 2 Takes Over, Claudes Secret,Tiny AI Model Beats OpenAI, Figure 03 Robot and more.

TheAIGRID

14.6k views   •   1 month ago

24:15

Anthropic's co-founder is throwing up MASSIVE red flags...

Wes Roth

137.1k views   •   1 month ago

06:07

OpenAI's new AI chip

Wes Roth

21.1k views   •   1 month ago

53:01

OpenAI Nerfs Sora 2. Chaos Still Reigns. Is It Over??

AI For Humans

14.5k views   •   1 month ago

23:55

State of AI 2025: GPT-5 can't beat o3, robots coming into your house and fake Veo 3.1 rumors

Wes Roth

32.4k views   •   1 month ago

14:50

OpenAI, Google and xAI are about to blow

Wes Roth

43.0k views   •   1 month ago

112:36

LIVESTREAM: OpenAI Dev Day Stream

Wes Roth

12.9k views   •   1 month ago

11:47

.agent folder is making claude code 10x better...

AI Jason

53.8k views   •   1 month ago

57:36

OpenAI’s Sora 2: Future of Media or AI SLOPOCALYPSE??

AI For Humans

14.4k views   •   1 month ago

01:43

Introducing AndThen. Play the Conversation.

AI For Humans

2.7k views   •   1 month ago

14:22

Forget GPT-5… Anthropic’s Sonnet 4.5 Just Changed Everything

TheAIGRID

9.0k views   •   1 month ago

15:44

Sora 2 - It will only get more realistic from here

AI Explained

58.0k views   •   1 month ago

24:25

OpenAI's Sora 2 Just SHOCKED The Entire Industry! (10 Things To Know About Sora 2)

TheAIGRID

17.5k views   •   1 month ago

02:06

You Won't Believe Sora 2's New Features!

AI For Humans

8.1k views   •   1 month ago

14:07

OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings

AI Explained

66.2k views   •   1 month ago

38:23

OpenAI Raises Billions While AI Creates New Drugs. What's Next?

AI For Humans

11.1k views   •   1 month ago

18:08

REVEALED: The 100x Faster AI Brain Behind China's New AI Breakthrough

TheAIGRID

19.5k views   •   1 month ago

49:55

Meta’s $800 AI Glasses Show The Future… Sometimes Breaks

AI For Humans

10.9k views   •   2 months ago

02:12

How Did He Make This With AI?

AI For Humans

2.8k views   •   2 months ago

11:32

ChatGPT Can Now Call the Cops, but 'Wait till 2100 for Full Job Impact' - Altman

AI Explained

48.3k views   •   2 months ago

11:32

ChatGPT Can Now Call the Cops, but 'Wait till 2100 for Full Job Impact' - Altman

AI Explained

20.2k views   •   2 months ago

50:33

OpenAI Is Spending A Fortune To Get To AGI. Will They Make It?

AI For Humans

13.3k views   •   2 months ago

06:41

Vibe Design is much better than I thought...

AI Jason

15.8k views   •   2 months ago

44:47

AI Is Taking Jobs. It Doesn't Have To Take Yours.

AI For Humans

9.4k views   •   2 months ago

52:12

We Tried Google’s Nano Banana AI Model. It’s... Ridiculous.

AI For Humans

19.1k views   •   2 months ago

18:55

An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana

AI Explained

57.4k views   •   2 months ago

53:56

Move Over OpenAI… Google Looks Ready To Take The AI Lead

AI For Humans

16.4k views   •   3 months ago

44:52

OpenAI's GPT-5 Struggles To Be AI For Everything & Everybody

AI For Humans

11.4k views   •   3 months ago

16:02

I was using sub-agents wrong... Here is my way after 20+ hrs test

AI Jason

94.2k views   •   3 months ago

53:25

OpenAI’s GPT-5 Is Very Good... But AGI Might Be Delayed.

AI For Humans

18.3k views   •   3 months ago

15:02

GPT-5 has Arrived

AI Explained

163.0k views   •   3 months ago

11:55

Genie 3: The World Becomes Playable (DeepMind)

AI Explained

194.4k views   •   3 months ago

40:18

OpenAI’s GPT-5 Leaks Show Us The Future (Of Next Week??)

AI For Humans

35.8k views   •   3 months ago

64:05

OpenAI Teases GPT-5 as America Goes Full 'AI Action' Mode

AI For Humans

20.0k views   •   3 months ago

18:44

I was using Claude Code wrong... The Ultimate Workflow

AI Jason

133.7k views   •   3 months ago

17:20

How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …)

AI Explained

84.4k views   •   4 months ago

51:06

OpenAI’s New ChatGPT Agent Might've Just Stolen Your Job

AI For Humans

18.8k views   •   4 months ago

07:02

Claude Killer? My review on Kimi K2 after hrs of testing...

AI Jason

80.4k views   •   4 months ago

02:12

Is Grok 4 the smartest AI model in the world?

AI For Humans

12.9k views   •   4 months ago

11:44

Grok 4 - 10 New Things to Know

AI Explained

177.5k views   •   4 months ago

09:29

Tired of AI-ish UI? Here is how to make it better...

AI Jason

51.6k views   •   4 months ago

55:27

OpenAI & Google Are Using AI To Take Over. What About Us?

AI For Humans

22.3k views   •   4 months ago

16:39

Claude Designer is insane...Ultimate vibe coding UI workflow

AI Jason

181.8k views   •   4 months ago

26:20

When Will AI Models Blackmail You, and Why?

AI Explained

109.2k views   •   4 months ago

51:33

OpenAI's GPT-5 Is Coming But Sam Altman Won't Stop Throwing Shade

AI For Humans

19.7k views   •   5 months ago

05:56

Vibe Versioning - Iterate UI in Cursor 10x faster

AI Jason

22.7k views   •   5 months ago

01:22

Would You Let This Robot In Your House?

AI For Humans

4.0k views   •   5 months ago

14:01

Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know

AI Explained

101.3k views   •   5 months ago

45:51

OpenAI Preps To Blow Past AGI Straight to Super Intelligence

AI For Humans

19.1k views   •   5 months ago

22:02

Build the next Billion $ Agent 🚀

AI Jason

17.7k views   •   5 months ago

16:50

AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed

AI Explained

96.3k views   •   5 months ago

02:47

Is GPT-5 Coming Next Month? Find Out What Might Happen!

AI For Humans

15.8k views   •   5 months ago

00:49

Will AI take your job?!

AI For Humans

3.2k views   •   5 months ago

56:57

Anthropic's CEO Says AI Will Take 50% of Jobs. Now What?

AI For Humans

31.6k views   •   5 months ago

13:09

Is VEO 3 really the death of human creativity?

AI For Humans

9.5k views   •   5 months ago

03:35

10x better UI design for vibe coders - Use v0 directly in Cursor

AI Jason

51.5k views   •   5 months ago

19:05

Claude 4: Full 120 Page Breakdown … Is it the Best New Model?

AI Explained

98.8k views   •   6 months ago

56:15

Google Went AI Crazy and VEO 3 Is Just the Start

AI For Humans

20.4k views   •   6 months ago

04:25

How to make accurate UI Tweak in Cursor with Stagewise

AI Jason

24.3k views   •   6 months ago

17:08

Google Takes No Prisoners Amid Torrent of AI Announcements

AI Explained

99.6k views   •   6 months ago

02:10

VEO 3 is actually insane. Best AI video + audio AI tool yet.

AI For Humans

31.0k views   •   6 months ago

14:02

Build MCP business for vibe coder

AI Jason

10.1k views   •   6 months ago

01:59

Will this be the biggest AI News week to date?!?

AI For Humans

2.0k views   •   6 months ago

17:42

AI Improves at Self-improving

AI Explained

82.8k views   •   6 months ago

48:59

Google's New AI Agent Improves Itself. But Can It Stop AI Babies?

AI For Humans

15.4k views   •   6 months ago

01:19

Can Google Gemini Make Coding Easy for Everyone?

AI For Humans

1.5k views   •   6 months ago

11:44

Cursor + Browser control = Self improving coding agent

AI Jason

33.0k views   •   6 months ago

34:24

"OpenAI is Not God” - The DeepSeek Documentary on Liang Wenfeng, R1 and What's Next

AI Explained

105.2k views   •   6 months ago

14:34

o3 breaks (some) records, but AI becomes pay-to-win

AI Explained

60.8k views   •   6 months ago

19:04

How I reduced 90% errors for my Cursor (Part 2)

AI Jason

54.5k views   •   7 months ago

14:25

o3 and o4-mini - they’re great, but easy to over-hype

AI Explained

94.2k views   •   7 months ago

20:10

‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2.0: 7 Updates Critically Analysed

AI Explained

60.3k views   •   7 months ago

15:30

How I reduced 90% errors for my Cursor (+ any other AI IDE)

AI Jason

284.2k views   •   7 months ago

23:52

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax + ‘Superintelligence in 2027’ ...

AI Explained

72.7k views   •   7 months ago

21:22

Gemini 2.5 Pro - It’s a Darn Smart Chatbot … (New Simple High Score)

AI Explained

110.0k views   •   7 months ago

13:19

Don't do RAG - This method is way faster & accurate...

AI Jason

165.9k views   •   7 months ago

64:53

NVIDIA Dominates The Race To AGI at GTC 2025

AI For Humans

7.7k views   •   8 months ago

09:14

Claude Designer is insane...Ultimate vibe coding UI workflow

AI Jason

223.4k views   •   8 months ago

10:09

Gemini 2.0 blew me away - The future of Multimodal Model

AI Jason

16.5k views   •   8 months ago

01:22

Jurassic Park AI Video Fail 😭🤖😳#ai #aivideo #funny

AI For Humans

1.1k views   •   8 months ago

02:20

AI will write 100% of code. What happens next?! 😳 #ai #technology #chatgpt

AI For Humans

1.2k views   •   8 months ago

13:07

MCP = Next Big Opportunity? EASIST way to build your own MCP business

AI Jason

86.0k views   •   8 months ago

131:12

How I use LLMs

Andrej Karpathy

2.1M views   •   8 months ago

13:17

Those MCP totally 10x my Cursor workflow…

AI Jason

223.1k views   •   9 months ago

55:52

Who Will Control The Future of AI?

AI For Humans

7.0k views   •   9 months ago

04:01

Sam Altman Confirms GPT-5 & It Will Be FREE For Everyone

AI For Humans

6.4k views   •   9 months ago

211:24

Deep Dive into LLMs like ChatGPT

Andrej Karpathy

4.0M views   •   9 months ago

20:35

The ONLY way to run your own Deepseek on mobile...

AI Jason

16.2k views   •   9 months ago

08:40

Yep, o3-mini is WORTH the money - Build your own reasoning agent

AI Jason

18.5k views   •   9 months ago

01:08

China’s Robotics advances are INSANE 🤖🤯👀 #ai #robotics #technology

AI For Humans

2.3k views   •   9 months ago

16:12

Deepseek R1 - The Era of Reasoning models

AI Jason

52.0k views   •   10 months ago

52:17

OpenAI Starts Prepping For Super Intelligence (ASI) & More AI News

AI For Humans

9.7k views   •   10 months ago

81:55

Founding fathers on today's America

Andrej Karpathy

34.7k views   •   11 months ago

51:56

The Biggest Week in AI Yet (For Real This Time)

AI For Humans

8.0k views   •   11 months ago

52:16

The Future of AI: OpenAI's 12 Days of Surprises

AI For Humans

6.5k views   •   11 months ago

46:52

Why OpenAI's o1 Model Might Be The Future of AI Scaling

AI For Humans

7.6k views   •   1 year ago

07:16

How AI Video Is Changing Hollywood

AI For Humans

4.2k views   •   1 year ago

01:00

OpenAI’s Orion Coming in November?!? 👀🤖🤯 #ai #tech #openai

AI For Humans

2.4k views   •   1 year ago

00:52

Nobel Prize Winner Disses Sam Altman 😭🤯👀 #ai #news #openai

AI For Humans

3.1k views   •   1 year ago

00:50

Voice Memo to Musical with Suno Covers 🔊🤖 #ai #aimusic #technology

AI For Humans

9.6k views   •   1 year ago

241:26

Let's reproduce GPT-2 (124M)

Andrej Karpathy

943.4k views   •   1 year ago

30:38

Expert AI Developer Explains NEW OpenAI Assistants API v2 Release

Morningside AI

13.8k views   •   1 year ago

133:35

Let's build the GPT Tokenizer

Andrej Karpathy

962.9k views   •   1 year ago

26:56

Expert AI Developer Explains What OpenAI's Q* Means for Businesses

Morningside AI

4.2k views   •   1 year ago

45:54

Voiceflow CEO Talks GPTs, Future of AI Agencies and Chatbot Builders (Full Interview)

Morningside AI

10.1k views   •   1 year ago

59:48

[1hr Talk] Intro to Large Language Models

Andrej Karpathy

3.2M views   •   2 years ago

39:00

Expert AI Developer Explains What OpenAI 'GPTs' Mean For Businesses

Morningside AI

26.7k views   •   2 years ago

116:20

Let's build GPT: from scratch, in code, spelled out.

Andrej Karpathy

6.6M views   •   2 years ago

56:22

Building makemore Part 5: Building a WaveNet

Andrej Karpathy

250.4k views   •   3 years ago

115:24

Building makemore Part 4: Becoming a Backprop Ninja

Andrej Karpathy

307.4k views   •   3 years ago

115:58

Building makemore Part 3: Activations & Gradients, BatchNorm

Andrej Karpathy

448.2k views   •   3 years ago

75:40

Building makemore Part 2: MLP

Andrej Karpathy

481.7k views   •   3 years ago

76 Comments

@soulspawn  7 months ago

🖤 🔥     See Less

@snucke123  7 months ago

something i think would be great to add to simple bench is to add the date as to when a given model was run and tested on the benchmark, its a lot to try and keep up with, so to see a date w     See More nk at least be great! maybe just add a date column after "Organization"    See Less

@MetaverseAdven...  7 months ago

The o3 issues you uncovered all came back to logic for me and its lack thereof This has been a trend since I started using GPT3.5 and if there has been logic progress, it has been much mor     See More than all other metrics. In fact, I am certain that if all other metrics were the same, BUT logic significantly advanced, we would have AGI now. Without logic, it will never beat humans and all the things we would love AI to do and for sure it will NOT replace a developer and so many other intelligent careers as logic really is a key to them all.


A lack of logic = hallucinations


I am not sure why OpenAI said it was hallucination free when it clearly is not. Perhaps for investors to hear? Seems foolish unless those investing really have no clue about the reality of the model which may very well be.    See Less

@shApYT  7 months ago

No. AGI is when it can generalized out of its training domain. Tell it to DM a D&D campaign and let's see it try to do anything impressive.     See Less

@MatthewKelley-...  7 months ago

AGI for me is when it is continually learning. But that's not the standard path at the moment.     See Less

@CasualTortoise  7 months ago

Thank you, this kind of more level headed analysis is desperately needed!

Since it was released to the plus tier (nice change by OpenAI!) I did a fair amount of testing myself. Overal     See More ry, very impressed. The image reasoning is a step change compared to previous OpenAI models. It's also super fun to watch it reasoning in images. It is sort of clumsy, in the same way robots are clumsy when they walk. It's kind of cute when it keeps zooming in on various parts of the image talking to itself 😊.

It did hallucinate for me too. For example it kept insisting that clockwise is 9 -> 8 -> 7 -> 6, as part of its solution to a more complex problem. Finally it admitted that time moves forward 😅.

I feel like this video gave a slightly too negative impression, but as a counterbalance it was great!    See Less

@Kleddamag  7 months ago

For the algorithm, I hope you had a wonderful flight.     See Less

@SirQuantizatio...  7 months ago

It's somewhat reassuring that OpenAI are saying they won't release a model if it can help people create bioweapons but I'm also not convinced they won't reneg on this obligat     See More future. The money they can make from it will likely change their minds. They've backtracked on most of the promises they've made about AI safety, I doubt this will be any different. Hope to be wrong.    See Less

@GrindThisGame  7 months ago

You will need to spin up a few digital clones to keep up with the advancing rates of model hype.     See Less

@isajoha9962  7 months ago

Open AI models are not most famous for their physical reality proximation descriptions. 🙂 When they catch up, I guess eg Sora might improve.     See Less