Language models learned from text the internet had already written down. The next generation of embodied models needs something the internet does not have: high-fidelity recordings of skilled human craft. TurboTune builds those datasets — beginning in the kitchen, where the problem is hardest and the experts are best.
A chef who has worked a station for ten years knows things that no recipe describes: how the sound of garlic shifts a moment before it burns, the resistance of dough that's been kneaded enough, the exact pressure that separates fillet from bone. This is what philosophers call techne — knowledge that lives in the hands and the senses, not on the page.
Foundation models trained on text have run out of text. The frontier is moving toward embodied, multimodal systems — robots, agents that act in the world — and the bottleneck is no longer compute or architecture. It is data. Specifically: paired vision, motion, force, audio, and intent, captured at the resolution at which skill actually unfolds.
TurboTune is a research lab building those datasets. We capture how experts actually work — at a fidelity that is trainable today and will be more valuable tomorrow than any model trained on it.
Every stream — RGB-D, body and hand pose, instrumented utensils, ambient and contact audio, thermal — is captured to a shared timeline at sub-10ms. Modalities are designed in together, not bolted on.
Position alone teaches a model where a hand moves. It cannot teach the difference between dicing an onion and crushing it. We instrument the tools and the surfaces so contact dynamics are first-class data.
A model cannot learn an entire dish from fifty examples. It can learn atomic skills — knife work, searing, folding, plating — composed into recipes. We design our taxonomy and capture protocol around that.
Most datasets capture only the successful take. We capture the broken sauce, the over-reduced glaze, the recovery — because that is where chefs' judgment is most visible, and where future systems will need it most.
The full station specification, calibration protocol, and annotation schema are shared with research partners under agreement. A high-level summary:
We are not a food company. We are a manipulation lab. Cooking is the wedge because it concentrates every problem worth solving into one workspace.
Almost every other manipulation benchmark uses rigid objects. Kitchens force the hard cases — dough, leaves, liquids, raw protein — under irrecoverable time pressure.
You cannot easily recruit a hundred neurosurgeons. You can recruit a hundred chefs. The talent density is uniquely available, and the variation between traditions is itself signal.
A model that learns to debone a fish has learned about delicate force, tool use, and visual state estimation — all of which transfer to surgery, manufacturing, and care work.
Compute and architecture commoditize. Proprietary, high-fidelity skill data does not. We are happy to walk through the thesis and the moat.
Licensed access to capture data, evaluation suites, and station replication kits for academic and industrial research groups.
Compensated, attributed, and protected. We are building the archive your technique deserves — with you as a participant, not a subject.
For humanoid, manipulator, and VLA teams who need real-world, real-skill data to push past benchmark saturation.
We keep conversations short and concrete. Tell us who you are and what you're working on.