The Race to Build the Robot Internet: How XDOF Is Solving AI's Physical Data Crisis:
Why the Smartest AI Models in the World Are Suddenly Hitting a Wall:
A $70M bet that the next great bottleneck in AI isn't chips or models — it's the physical world data powering the robotics revolution.
Section 1: The Bottleneck Nobody Saw Coming:
The AI industry has a new frontier problem. While language models were trained on a vast ocean of publicly available text — books, websites, research papers — robots need something fundamentally different: training data that captures physical interaction with the real world. And that data barely exists.
OpenAI's announcement to relaunch its robotics program (shuttered in 2021) has reignited what industry insiders call the 'physical AI race.' Every major frontier AI lab is now racing to teach machines how to navigate, manipulate, and operate in the messy physical world. But without the robot training data to support these ambitions, even the most sophisticated AI models hit a hard wall.
Unlike LLMs trained on terabytes of text, robots need data that captures physical interaction — and that kind of data barely exists.
YouTube videos and footage captured by gig workers fall short. They're low-fidelity, inconsistent, and extraordinarily difficult to reconcile with the physical variables of the real world. What the industry needs is a purpose-built robot data collection infrastructure — and that's precisely the gap one startup is betting $70 million it can close.
Section 2: XDOF Emerges From Stealth With a Bold Vision:
XDOF (pronounced 'ecks-doff') launched out of stealth with a clear thesis: the next great bottleneck in artificial intelligence won't be model architecture or chip performance. It will be the data feedback loop required to teach robots how to interact with the physical world.
Founded in October 2024 by CEO Philipp Wu, CTO Fred Shentu, and COO Nemo Jin, XDOF is building the data pipelines, collection tooling, and annotation systems that frontier robotics companies can't easily develop themselves. The company has already secured 20 customers — including several frontier AI labs — and raised $70M in backing from some of the most respected names in venture capital.
$70M: Raised
20+ Customers
60: Employees
130K: Trajectories in ABC Dataset
Investors include Thrive Capital, Spark Capital, a16z, Lux, and WndrCo — a who's who of deep-tech venture that signals serious confidence in XDOF's approach. The company currently employs approximately 60 people, with plans to scale aggressively as demand for robotic AI training data accelerates.
Section 3: From Berkeley PhD to Robot Data Pioneer:
The XDOF origin story begins where many great AI breakthroughs do — inside a university lab. Philipp Wu was a PhD student at UC Berkeley, focused on enabling robots to learn skills from large-scale datasets. The research was sound. The problem was immediate and frustrating: there was no large-scale data to work with.
'We didn't have large-scale data to work with. There was this chicken-and-egg problem — we first needed to actually collect data before we could even ask how to train a foundation model for robotics.' — Philipp Wu, CEO of XDOF
Wu and Shentu tackled this problem directly by developing GELLO, a low-cost teleoperation system that allows a human operator to control a robotic arm — generating high-quality manipulation data in the process. The project became an influential paper in robotics research, widely adopted by teams facing the same data bottleneck.
Recognizing that their solution pointed to a massive market opportunity, Wu, Shentu, and Jin launched XDOF as a dedicated company to build a complete data ecosystem for robotic AI — not just data collection, but cleaning, tooling, and annotation at scale.
Section 4: The Three-Tier Data Pyramid for Physical AI:

The Hidden AI War
Nobody Is Telling You About
Our latest documentary deep-dive into the geopolitical struggle for machine intelligence dominance. Explore the two paths of AI development: open source vs. closed architecture.
XDOF's approach is structured around what the company calls a three-tier data pyramid — a framework for generating robot training data with varying levels of specificity and deployment fidelity.
Tier One — the most valuable — is teleoperation data collected directly on the robot being deployed. This task-specific, hardware-matched data is the gold standard for training robotics models that need to perform real-world actions reliably.
Tier Two involves teleoperated robots — like those using GELLO-type systems — collecting broader manipulation data across a wider range of environments and use cases. This tier provides the scale necessary to pre-train general-purpose foundation models.
Tier Three captures 'egocentric data' — footage and sensor readings from humans performing everyday tasks. XDOF plans to develop its own wearable sensor hardware for this tier, acknowledging that camera choice and hand-tracking algorithm performance are tightly coupled: bad hardware design produces bad data, regardless of downstream processing.
'Your camera choice is going to affect the quality of your data — which is going to affect how your hand-tracking algorithm performs.' — Philipp Wu
To execute this vision at scale, XDOF plans to hire and train large teams of teleoperators and egocentric data operators globally. It's a deliberately labor-intensive model — one that requires warehouses, maintained robots, calibrated physical parameters, and rigorously trained operators. Most AI labs simply don't want to build that infrastructure themselves.
Section 5: The ABC Dataset: Opening the Floodgates for Robotics Research:
To establish its credibility and accelerate the broader robotics research ecosystem, XDOF is partnering with UC Berkeley's AI Research lab to release what it believes is the largest collection of high-quality robot training data ever made publicly available: the ABC dataset.
130K: Robot Manipulation Trajectories
300hr: Simulation Data
100hr: Evaluation Recordings
This kind of scaled-up pre-training data has never before been made available to the academic robotics community. The release is designed to catalyze an open research wave, much like large language model releases transformed the NLP field.
'We've seen in language, image generation, and other fields, that when models and data are released, the community achieves things that you wouldn't necessarily have expected.' — David McAllister, UC Berkeley PhD Student
XDOF has already used the ABC dataset to train robots on real-world benchmark tasks — including folding T-shirts, flattening boxes, and loading AirPods into their cases. These aren't toy demonstrations; they represent the nuanced, dexterous manipulation challenges that define physical AI's next frontier.
Section 6: What Physical AI Means for Your Business — And How Agent+ Helps You Prepare:
The rise of physical AI and autonomous robotics isn't a distant future scenario — it's actively reshaping the competitive landscape across manufacturing, logistics, healthcare, retail, and beyond. Companies that understand how to integrate AI-driven automation into their operations today will have a structural advantage as the technology matures.
At Otherworlds AI, we build the business AI infrastructure that positions your organization for this future. Our Agent+ Business AI Platform automates the workflows, data pipelines, and decision-making processes that let your team focus on strategy — not repetitive execution. And with Google Opal Automated Workflows, we help you connect your business systems into a coherent, intelligent operation.
The lesson from XDOF is clear: the companies building foundational infrastructure today — data pipelines, automation systems, intelligent workflows — will define the economics of tomorrow's AI-powered industries. Whether you're exploring AI for the first time or scaling an existing program, Otherworlds AI is your partner in making physical and digital AI work for your business.
Ready to build your AI-powered business infrastructure? Visit otherworldsai.com to explore Agent+ plans starting at $297/month, or contact our team for an Enterprise custom AI build tailored to your operations.
Keywords: robot training data, physical AI, robotic AI training data, robot data collection




