Feeda - OnScreen Live

The New Claude 3.5 Sonnet: Better, Yes, But Not Just in the Way You Might Think

AI Explained

339K subscribers

88.6k views • 5 months ago

A new state of the art LLM (at least for creative writing and basic reasoning) but what lies behind the numbers that were put out?

100% spot on on reliability, that is always the one thing I...

20 Comments

@goldeternal 3 months ago

im ready when you are ~
well that couldve been weirder dear i say scarier See Less

@maciejbala477 4 months ago

100% spot on on reliability, that is always the one thing I focus on when people hype up AI. Yes, it's absolutely great BUT it will never be consistently useful as of now, and won't See More ble to be left alone, because of the risks of minor or even major mistakes, especially as e.g. context goes up See Less

@Shadowclaw25 5 months ago

my biggest issues with claude atm are its only 1% of gpt-canvas limit usage
Its still impossible for it to keep folowing a story that evolves over 10 chapters

Shouldnt be to hard f See More eep a backtrack of red lines behind the scences how story progresses if i told it we write a story over many chapters...but nothing from that exist so far . See Less

@sebastianbauer4768 5 months ago

Guys, guys … does it click on that "I‘m not a robot" checkbox or not? Can it solve captchas? Cause I struggle with them. Also wouldn’t it kinda set a bad precedent if the sma See More s start lying to the stupid programs to do their job? See Less

@alectoireneperez8444 5 months ago

The AI zoom call was the most soulless thing I’d ever seen See Less

@mbrochh82 5 months ago

Here's a ChatGPT summary:

- The new Claude 3.5 Sonnet from Anthropic is a significant advancement, particularly in reasoning, coding, and visual processing abilities.
- The mod See More bility to use a computer via an API is limited due to unreliability and inability to perform tasks like sending emails or making purchases.
- Claude 3.5 Sonnet has knowledge of world events up until April 2024.
- In the OS World benchmark, Claude 3.5 Sonnet achieved 22% accuracy compared to 72% by computer science majors.
- In the SWE Bench Software Engineering benchmark, Claude 3.5 Sonnet scored 49%, outperforming the 0.1 preview model.
- The new Claude 3.5 Sonnet performs better in challenging science questions, general knowledge, coding, mathematics, and visual question answering compared to its predecessor.
- The model's performance in creative writing is superior to the original Claude 3.5 Sonnet.
- In multilingual challenges, the new Claude 3.5 Sonnet is slightly worse than the previous version.
- The new model shows a reverse scaling law in reliability, where performance drops as the number of attempts increases.
- Claude 3.5 Sonnet is slightly worse at correctly refusing toxic requests and incorrectly refusing innocent requests compared to the previous model.
- The new model's performance in the retail and airline tasks is not outstanding, with a 46% success rate in airline tasks given one try.
- The Simple Bench test showed a significant improvement in the new Claude 3.5 Sonnet compared to the previous version.
- The new model's performance in reasoning and creative writing is impressive, though it struggles with computation-heavy tasks.
- The new Claude 3.5 Sonnet is better at reasoning and creative writing but still faces challenges in reliability and multilingual tasks.
- Main message: The new Claude 3.5 Sonnet represents a significant step forward in reasoning and processing abilities, though it still faces challenges in reliability and certain tasks. See Less

@santoshpss 5 months ago

It could also be malaria. Or maybe dengue. What about the African illness that they came up with? What about Delta? Alpha? ABCD+? Or just... Occam's razor. See Less

@WillyJunior 5 months ago

19:16 Open bobs See Less

@AngeloWakstein-b7e 5 months ago

Love your videos and can't wait for the next one, super informative and good fact check See Less

@TheTruthOfAI 5 months ago

the new model truly sucks XD the accuracy went down big time, simple stuff .. make him loose his shit easily.. to a point where its not just visible, but its unusable.. See Less

AI News AI News

13:25

Bill Gates Surprising AI Statement " Humans Will No Longer Be Needed"

TheAIGRID

2.7k views • 17 hours ago

09:14

Amazons NEW AI Agents Are Actually Impressive... "Amazon Nova ACT"

TheAIGRID

7.9k views • 1 day ago

15:43

Runways Text To Video "GEN 4" Actually Changes The Industry!

TheAIGRID

11.6k views • 2 days ago

17:51

The TRUTH About Sam Altman's Firing; Peter Thiel, Eliezer Yudkowsky and Effective Altruism

Wes Roth

78.8k views • 3 days ago