What AI models are leading the intelligence race in 2025?

As of 2025, the leading commercial models are GPT-5.2 (OpenAI), Gemini 3 Pro (Google), and Claude Opus 4.5 (Anthropic). In the open-source sector, DeepSeek R1, Llama 4, and Qwen 3 lead the rankings.

What are Erdős problems and why do they matter for AI?

Erdős problems are mathematical conjectures posed by legendary mathematician Paul Erdős that have remained unsolved for decades. AI systems like AlphaEvolve have now solved 17+ of these problems, demonstrating genuine reasoning capability rather than pattern matching.

AlphaEvolve is an AI system that has generated solutions to previously open Erdős mathematical problems. It represents a breakthrough in AI reasoning, producing proofs that have been verified through formal proof assistants like Lean.

Velocity

Neural Velocity Analysis // Locked

The AI
Race

A Race Towards The Greatest Invention Of Mankind

Road to AGI

Real-time Trajectory Index // Frontier Telemetry v5.0

System Live

ZOOM LEVEL100%

Registry Active

CTRL + Scroll to Zoom

This real-time index tracks humanity's unprecedented velocity toward Artificial General Intelligence (AGI)—systems that equal human performance across all economically valuable tasks—and the subsequent, inevitable leap to Artificial Superintelligence (ASI).

Current frontier models are no longer just tools; they are embryonic agents demonstrating advanced reasoning, long-horizon planning, and autonomous code execution. As we scale compute by orders of magnitude and refine architecture through reinforcement learning, the distinction between silicon and sentient logic blurs. We are witnessing the final phase of the "narrow AI" era, rapidly transitioning into a future where digital minds will not only match our cognitive capabilities but vastly exceed them, reshaping science, economics, and the very definition of intelligence itself.

Frontier Elite Analysis

Real-time Rankings across Commercial and Open Weights

Commercial Sector

Scale: Proprietary

Top Tier Analysis

GPT-5.2

"OpenAI's most advanced model. Perfect AIME score, dominant in reasoning and coding benchmarks."

Score

97.2%

GPT-5.2

Gemini 3 Pro

"Google's flagship model. 1M token context window, excels in multimodal understanding."

Score

96.0%

Gemini

Claude 4.5 Opus

"Anthropic's most capable model. Superior at nuanced reasoning and extended context tasks."

Score

93.0%

Claude

Gemini 3 Flash

"Google’s ultra-fast frontier model. Beats Gemini 3 Pro on Toolathlon and coding efficiency, ties or wins on MMMU‑Pro, while staying much cheaper and faster."

Score

92.2%

Gemini

Gemini 2.5 Flash

"High-velocity intelligence. Optimized for sub-second inference with frontier accuracy."

Score

84.0%

Gemini

Last System Sync: 7/13/2026

Sources: AIME, HLE, GPQA, Chatbot ArenaVerified by Otherworlds Lab

Open Source Sector

Scale: Weights Available

Weights Evolution

Llama 3.1 405B

"Proven reliability and massive ecosystem support for fine-tuning."

Score

94.4%

Llama

Kimi K2

"Moonshot AI's flagship thinking and reasoning model, for long context and complex reasoning tasks."

Score

92.0%

Kimi

Qwen 3-72B

"Leading multilingual and mathematical performance in the public weights sector."

Score

91.0%

Qwen

DeepSeek-V3

"China's open-weights champion. Remarkable efficiency and reasoning at fraction of compute."

Score

88.5%

DeepSeek-V3

Llama 4-405B

"Meta's open foundation model. Industry standard for fine-tuning and deployment."

Score

86.2%

Llama

DeepSeek-V2.5

"China's open-weights champion. Remarkable efficiency and reasoning at fraction of compute."

Score

82.0%

DeepSeek-V2.5

Last System Sync: 7/13/2026

Sources: AIME, HLE, GPQA, Chatbot ArenaVerified by Otherworlds Lab

Industry Analysis // 01

Meta's Next Big Bet: This New App Lets You Build Games Simply by Typing a Prompt

AI Industry Analysis // Meta's Next Big Bet: This New App Lets You Build Games Simply by Typing a PromptAI Industry Analysis // Move Over Nvidia: Anthropic and Samsung Team Up to Break the AI Chip MonopolyAI Industry Analysis // The Tech Industry is Shifting: Venice AI Raises $65M to Give Users 'Uncensored' AIAI Industry Analysis // Anthropic’s Surprise Double Launch Directly Targets OpenAI and Google’s AI Dominance

AI Industry Analysis // Move Over Nvidia: Anthropic and Samsung Team Up to Break the AI Chip MonopolyAI Industry Analysis // The Tech Industry is Shifting: Venice AI Raises $65M to Give Users 'Uncensored' AIAI Industry Analysis // Anthropic’s Surprise Double Launch Directly Targets OpenAI and Google’s AI DominanceAI Industry Analysis // The Harvard Dropouts Who Built a $5B Nvidia Competitor From Scratch

AI Industry Analysis // The Tech Industry is Shifting: Venice AI Raises $65M to Give Users 'Uncensored' AIAI Industry Analysis // Anthropic’s Surprise Double Launch Directly Targets OpenAI and Google’s AI DominanceAI Industry Analysis // The Harvard Dropouts Who Built a $5B Nvidia Competitor From ScratchAI Industry Analysis // Meta's Next Big Bet: This New App Lets You Build Games Simply by Typing a Prompt

Historical Pulse

Intelligence TraJECTORIES

Neural Evolution Baseline / v4.2

Intelligence Trajectories

Historical Intelligence Mapping / v4.2

Symbolic Era

Deep Learning

Transformer Era

SYMBOLIC: 1990-2012DEEP LEARNING: 2012-2017TRANSFORMER: 2017-PRESENT

The trajectory visualized above represents the "Great Acceleration," an epoch definitively tracing the evolution of artificial intelligence from the publication of the Transformer architecture in 2017 to the maturation of autonomous reasoning agents in late 2025. This historical curve is not merely a record of parameter scaling, but a visualization of the collapse of cognitive barriers. It captures the distinct phases of the AI race: the Scale Era (2017–2022), dominated by the pursuit of raw parameter dominance which validated the scaling laws; the Productization Era (2023–2024), marked by the Cambrian explosion of commercial interfaces like ChatGPT and the crystallization of a potent open-source insurgency led by Meta’s Llama series, Mistral, and Alibaba’s Qwen; and finally, the Reasoning Era (2025).

The graph highlights pivotal inflection points, such as the democratization of high-performance inference via the LLaMA leak, and the emergence of "Test-Time Compute" in 2025 with models like DeepSeek R1 and OpenAI's o-series. These milestones signify the shift from static token prediction to dynamic, self-correcting logic chains—systems that "think" before they speak. Most strikingly, this era witnessed AlphaEvolve and similar reasoning agents generating solutions to previously open Erdős problems—mathematical challenges that had resisted human intellect for over 50 years. This trajectory maps humanity's transition from utilizing silicon tools to collaborating with digital agents capable of independent scientific discovery.

Telemetry Log // 02

Move Over Nvidia: Anthropic and Samsung Team Up to Break the AI Chip Monopoly

The Reasoning Frontier

The Fall of the
Unsolvable

AI systems are now generating solutions to Erdős problems—mathematical challenges that have resisted human intellect for over 50 years. This isn't memorization. This is de novo reasoning.

Verified

17+

Problems Solved

Full AI-generated solutions

Verified

Partial Breakthroughs

Significant progress made

Verified

31+

Proofs Formalized

Verified in Lean/Isabelle

Verified

70+

Literature Reviews

AI-powered research synthesis

Key Breakthroughs

AI-Generated Solutions to Previously Open Problems

"The purest test of reasoning isn't a standardized exam—it's the unsolved."

Our
Benchmarking
Philosophy

"Static benchmarks are dead. We measure the soul of the machine through dynamic friction and recursive logic."

The Core Directive

Contamination-Proof
Intelligence Verification

In an era where "state-of-the-art" models are released weekly, traditional static benchmarks like MMLU or GSM8K have become trivialized by contamination—models are frequently trained on the very questions they are tested against. Our methodology rejects this "memorization contest." Instead, we employ a dynamic, adversarial testing framework designed to probe the frontier of reasoning, not recall. We focus on "novelty generalization"—the ability of a system to synthesize disparate concepts into coherent, actionable strategies in scenarios it has never encountered. By prioritizing "test-time compute" efficiency and agentic reliability over raw parameter counts, we reveal the true cognitive density of a model. We don't just ask, "What does it know?" We ask, "How does it think?"

The ultimate validation of this philosophy lies in mathematics. We track AI's progress against Erdős problems—open conjectures posed by the legendary Paul Erdős that have stumped mathematicians for decades. When a model solves one of these, it cannot be dismissed as pattern matching or data leakage. It is genuine reasoning, verified through formal proof assistants like Lean. This is the gold standard.

Dynamic Reasoning

We utilize non-public, evolving test cases to prevent training-data contamination.

Latency Scaling

Analysis of reasoning depth relative to response speed—real-time "IQ per second" metrics.

Cross-Domain Logic

Evaluating how models synthesize information across distinct, unrelated knowledge fields.

Active Protocol

v5.0.2

Testing Saturation94%

Verification ModeRecursive Logic Check

Uptime: 99.9%

System Core

Otherworlds Home

Further Intelligence

Frontier Briefings.

AI Industry Analysis

Meta's Next Big Bet: This New App Lets You Build Games Simply by Typing a Prompt

Meta Secretly Drops a New AI App That Generates Mini-Games From Prompts: Meta Quietly Launches Vibe-Coded Gaming App "Pocket" A stealth Google Play...

2026-07-03

AI Industry Analysis

Move Over Nvidia: Anthropic and Samsung Team Up to Break the AI Chip Monopoly

Anthropic Fires Back at OpenAI’s ‘Jalapeño’ Chip with Secret Samsung Talks: Anthropic Is in Talks With Samsung to Build a Custom AI Chip: Following...

2026-07-03

AI Industry Analysis

The Tech Industry is Shifting: Venice AI Raises $65M to Give Users 'Uncensored' AI

Venice AI Raises $65M at $1B Valuation, Betting Users Want AI Without the Guardrails: Bitcoin Pioneer Erik Voorhees Just Built a $1B 'Uncensored' AI...

2026-07-02