Thoughts, stories and ideas.
Claude 3.7 Sonnet outperforms o3-mini and R1 on industry-standard coding benchmarks. Source: Anthropic

Only a week after Grok 3 walked away with the title of “world’s smartest” model, Anthropic is here to snatch it away. It just released Claude 3.7 Sonnet, which is now considered the top LLM for both coding and agentic tasks. 

Coding: 

  • Claude 3.7 Sonnet achieves a 62.3% (or 70.3% when it’s paired with other tools) on the industry-standard SWE benchmark, compared to 49.3% for OpenAI’s o3-mini and 49.2% for DeepSeek’s R1. 

  • It’s already available in the popular AI coding platform Cursor. Box is also using Claude 3.7 for enhanced reasoning and complex tasks.

  • Meanwhile, a new research preview called Claude Code lets users hand “substantial engineering tasks to Claude directly from their terminal.”

Reasoning: 

  • Claude users can now choose “extended thinking mode” for more in-depth tasks. What’s unique is that instead of toggling between two different models, the same LLM can simply think for longer, and with more intensity, to tackle trickier, multi-step prompts. You’ll also get to see Claude’s thinking process for the first time.

What’s next: 

  • By the end of the year, Claude will be able to tackle hours of in-depth work for you, at the same level as experts. And in just two years, Anthropic predicts that Claude will solve problems “that would have taken teams years to achieve.” 

  • With OpenAI’s GPT-4.5 right around the corner, it’ll be interesting to see how long Anthropic can hold onto its lead.

You’ve successfully subscribed to Intrepid IQ
Welcome back! You’ve successfully signed in.
Great! You’ve successfully signed up.
Success! Your email is updated.
Your link has expired
Success! Check your email for magic link to sign-in.