top of page

Blog

Form Follows Data

July 12, 2025 / 10 min / Stefan Weirich

One of the first questions we are always asked at mimic is: "Why the human hand form factor?" It's a fair question. Indeed, it is not obvious why we would want to focus on this specific morphology, when the design space of grippers allows for so many alternatives, from simpler two finger grippers, to hands more capable than the human, for example with 6 fingers.

 

Here I want to lay out the design philosophy guiding our quest towards solving general-purpose robotic dexterity and what I think the future of the field will look like. As the title already suggests, the reason isn't purely functional: it’s a choice driven by data availability.

​

Note: Are you also excited about general purpose dexterity?  - We are hiring.

​​​

​

The Data Problem in Robotics

 

One of the greatest challenges in building robotic foundation models is the scarcity of robot behavior data. Unlike other modalities like natural language or vision, where easily and cheaply available data already existed thanks to the internet, robotics lacks extensive, systematic records of robot data. Unfortunately, collecting such data is extraordinarily expensive and time-consuming: when using teleoperation, training a first generation robotics foundation model like pi0 requires a small army of human operators driving robots for months. If we want this to really work, there must be a way to scale robot training data more efficiently.

​

The Human Data Scale

​

However, we do have access to one enormous, readily available data source: human behavior.
The first obvious step is human video data: countless hours of video showing humans performing all sorts of manipulation tasks with their own hands, freely available on the internet.
This is something more and more embodied AI researchers are now leveraging for pre-training, as models trained on large scale video data appear to encode useful and robust physical priors. Interestingly, there are start-ups emerging entirely focused on providing such training data to robotics companies, enabling quick access to egocentric video data recorded by an on demand workforce around the world.

 

But video data, although rich in diversity, often lacks quality and misses a lot of crucial information, such as position accuracy and applied forces. It cannot then be the sole component in a robotics foundation model recipe.

​Quality vs. Quantity

 

One of the key questions is then how can we create higher quality data in a sufficient quantity, at an unprecedented scale and execution speed.

​

Our solution was to create a wearable method for data collection: a sensorized glove set-up equipped with integrated cameras to record accurate positions, applied force and point of view video data.

​

The focus should not be on the wearable hardware or the sensors, but rather that this method allows humans to perform tasks with their own hands, as they are used to and already do in their day to day work. No training on robot teleoperation or VR needed. No physical presence of a robot needed. This makes our method approximately 5x cheaper and 7x faster than collecting conventional teleoperation data, while allowing for more diverse and realistic data collection with less friction.

​​​Bridging the Embodiment Gap

 

Now, this new approach to robot learning and data collection drastically changes the way that we need to think about robot hardware. To effectively utilize this new wealth of high quality human data obtainable with video and wearables, we need to minimize what is called the "embodiment gap", the difference between the physical form generating the training data (humans) and the robot's form. We don’t engineer robotic gripper hardware hyper-optimized to one task anymore.

 

We optimize the hardware design to match our data collection methods: "form follows data".

Consequently, If we want to leverage human demonstration data effectively, our robots need to physically resemble humans, particularly in their manipulation capabilities.

​

At this point we come back to the system design question: suddenly it makes sense to approximate human anatomy and degrees of freedom. Converting this into an engineering problem, there are obviously tradeoffs between anthropomorphism and complexity or reliability. At its core, mimic’s hardware is designed to be the simplest and most reliable system able to match key human functionalities, like an opposable thumb and independently controllable fingers including abduction and adduction. To fully match human data, we also need to match the integration of position, tactile and torque sensing, as well as vision/cameras between the robot and the wearable data collection method.

​​​​

The Limits of Anthropomorphism

 

Interestingly, this principle doesn't necessarily extend to the entire robot body. While manipulation benefits enormously from human-like design due to available demonstration data, legged locomotion follows different rules. Most legged robot locomotion is trained in simulation using reinforcement learning, a method better fitting to this problem due to its less complex nature.

This is why we can be more flexible with robot locomotion design, potentially deviating from human anatomy where beneficial (wheeled platforms can be quite efficient). While the pace of recent developments in bipedal humanoid robots is surely fascinating from an engineering perspective, we rarely encountered real world customer requirements that would justify this complexity. Instead we are choosing a platform agnostic approach compatible with most standard robotic arms and mobile robots.​​

​

What’s next?

 

Looking ahead, we'll likely see methods combining imitation learning and reinforcement learning, as well as real world data and simulation data. The emergence of sophisticated cross-embodiment will allow AI models to generalize across different robotic manipulators, from human-like hands to more specialized grippers (as we have shown recently). However, this future capability doesn't negate our current need for human-like robots, it builds upon it.

​

The "form follows data" principle isn't about permanently binding robots to the human form. Rather, it's about recognizing that right now, human-like robot manipulators represent our best opportunity to leverage cheap data for rapid advancement towards truly general foundation models. As we build better systems for collecting robot behavior data and generalize across embodiments, future designs may well transcend human limitations. But for now, the path to more capable robots runs through human-like design, because that's where the data is.​

bottom of page