
In this episode, Kuan Vuong of Physical Intelligence explores why robotics may be reaching a true inflection point. He explains how recent advances in foundation models, cross-embodiment learning, and real-world deployment are making robots more adaptable, more useful, and faster to build than before. Using examples like laundry folding and warehouse packaging, the conversation shows how robotics is shifting from a difficult, highly customized engineering problem into a more scalable platform opportunity. Overall, it is a clear and forward-looking discussion about why the cost of building in robotics is falling, what mixed autonomy looks like in practice, and how this could unlock a new wave of robotics startups.
The equation, I think, for starting a robotic business has changed and will continue to change at an accelerating pace because the upfront cost is not that high anymore.
Everyone's sort of spending a lot of time in the digital world, and it feels like, you know, now is the time to start thinking about the world of atoms.
You literally just gave people the playbook for how to build a vertical robotics company.
This has really been our mission from the start, is to create that Cambrian explosion.
It still like blows my mind.
I didn't know if this would exist even in my entire lifetime.
Welcome back to another episode of The Light Cone.
Today we have a very special guest, Kwan Vuong.
He's one of the co-founders of Physical Intelligence, which we think might be the robotics AI lab that brings about the GPT-1 moment for all of robotics.
Kwang, thank you for joining us.
Pleasure to be here.
I have been a long-time admirer of YC, and our mission is to build a model that can control any robot to do any task that it's physically capable of, and to do so at such a high level
of performance that's going to be useful to people in all walks of life.
And so GPT-1 for robotics, what is it?
Is the ChatGPT moment for robotics real?
Our perspective here is is that we want to build a model that's really intelligent.
We want to build a platform that allows us to externalize that intelligence to the rest of the world and allow them to use it to build very interesting applications in all sorts of
verticals in robotics.
And we think that it's going to be more like a peeling an onion analogy where you start from a really strong base model that has all sorts of common sense knowledge and already works.
To some extent on your robot, you have then a mixed autonomy system, very similar, for example, to an autonomous driving car today.
And then you actually deploy that system to do a real job.
That system might make a mistake.
It's okay.
And then over time, by actually exposing the system to the complexity and the edge case of the real world, that system gets incrementally, even just slightly better over time every
day.
And one day you wake up and you suddenly have a system that is just fully autonomous and just provides tremendous value.
Might be helpful to give the audience a bit of a mini history lesson on why robotics is so hard.
And there's been a lot of breakthroughs in the last 2 years.
And I mean, just to simplify, the robotics problem is 3 pillars: semantics, which I think we got a lot of unlocks with language models that somehow we ported.
Into robotics, then you have the planning, and then the last thing is control, which needs to be done in real time and interact with an environment that changes.
Walk us through the seminal papers that a lot of the team of Pi Robotics published that gave you the inkling that the GPT-1 moment is near.
And that started in 2024.
So the dream to build general-purpose robots has been a long-time dream, I think, in humanity.
We're not the first to say that our mission is to build a model that can work on any robot.
And we're really fortunate to be in this moment in time in history where we feel that it's possible to kind of walk back a little bit.
A few years before, there was, I think the first is Saikan, which to me was the first demonstration of language model and how you can bring all of the common sense knowledge in language
models into robotics, and therefore that significantly reduces the need to collect robot-specific data.
So for example, if you have a task of, oh, I want to go to the YC office to record a podcast, what are the steps I need to take?
You can ask a language model, just show me the steps and show me the plan.
And that worked incredibly well.
And then the way language models infiltrate, if you will, in robotics is to start at the planning level.
At the semantic level.
And then, but there's still the control problem.
At the end of the day, you still need a mechanism to convert the plan into low-level action that can actually actuate the robot.
And that brings us to PaLM-E and that brings us to RT2, which stands for Robotic Transformer 2.
And what these two works really show is that if you start from a vision language model that is really powerful, And you kind of use robotic data to adapt this model to speak robot language,
if you will.
Then you see a lot of transfer from the kind of knowledge that exists in the language, in the vision language model down to the low-level action.
One of my favorite examples when we did the RT2 project was you can have a picture of celebrity on the table.
If you have a picture of Taylor Swift, you have a picture of the Queen of England and You can ask the robot, pick up the Coke can and move it to Taylor Swift, even though the concept
of Taylor Swift, it just doesn't exist in the robot data at all.
And that works.
You can do other examples such as
spatial reasoning that doesn't exist in the robot data at all.
For example, move the dinosaurs next to the red car.
And these are always completely unseen objects in robot data.
And so that was RT2 and that was PaLM-E.
Now, RT2 and PaLM-E are single embodiment exercises.
Just for the audience, single embodiment, meaning it worked for a very specific robot.
It worked for a very specific robot.
In robotics, you can ask the question, how do you scale?
Especially how do you scale data collections?
And one of the insights that we had back then was, you know, maybe the data from one robot is not that different from another robot anyway.
If you have enough robots, in your training data, maybe what the model learned isn't to control one specific robot.
What the model learned is something that's more abstract, which is how do I kind of learn a general notion of what it means to control any particular robotic platform?
And therefore I will be better at controlling any particular platform.
And that brings us to what we call Open Cross-Embodiment and Robotic Transformer-X.
That was a big paper.
Because it was the first that showed potential scaling laws that applied to robotics because now you could start training all these models across multiple kinds of hardware, not just
one, which has never been done in robotics ever before.
Because from all the research labs, they would all train with a very specific set of sensor actuators and motors, and it was all very finicky with that particular hardware, right?
Yeah.
One of the really interesting results from Open Cross-Embodiment, and let me provide the context here, is that you can take let's say 10 different robot platforms, collect data from
them, train a policy and really optimise the policy to work well on that platform.
So let's say you have that, you have 10 different platforms, 10 different policies.
And now if you simply take the data and absorb it into a model that is high capacity enough to really absorb that data and you can compare, you have this generalist that learned to
control the 10 different robots.
You can compare it to the specialist that has been optimised to work well on a particular embodiment, how does it compare?
And the interesting result from OpenX is it was 50% better.
Wow.
And that was really surprising because in robotics it's hard enough to get your model to work on one particular robot platform.
And one of the reasons why I say that we're really fortunate to be in this moment in time in robotics is because OpenX was really only possible because of the support that we received
from the robotic community.
It was a huge collaboration.
Across the robotic community.
And the reason why that's really important is there is this joke in robotic grad school that if you want to add 2 years to your PhD, just work on a new robot platform.
By that logic, if you want to have 10 robot platforms, that's 20 years.
Why is that?
It takes like a year or 2 to just get the platform up and running to even collect the data.
Yeah.
Is it fair to say that the dataset that was created from EmbodimentX is similar to the scale of an impact that ImageNet did for vision because it was huge and it was the first large
dataset across multiple hardware, huge collaboration.
I still think that ImageNet was more impactful in the vision community.
And the reason for that is a few.
The first is that ImageNet also allowed for reproducible evaluation, right?
OpenX as an effort was more about making data available for people to use.
And evaluation is a really difficult problem in robotics that OpenX did not solve.
And the second is, I think OpenX is a drop in the bucket at this point in the robotic community.
If you measure in the scale and the volume and the diversity of data that the community is collecting, I think OpenAI at this point is a drop in the bucket.
I mean, I guess we started talking about sort of GPT-1, but even GPT-1,
that was sort of this moment where you can prove, Alec Radford figured out that there was a neuron based on a very specific input and output.
And then that allowed the scaling laws to sort of take hold.
The biggest problem in robotics I've heard is basically actually exactly what we've been talking about is like, it's the data problem.
Language you could bootstrap off of the sum total of what you could get off the internet, which is actually quite a lot.
Can you give us a sense for
scale?
Is it petabytes?
What do you think is necessary as an input to
the true GPT-1 of robotics?
Yeah.
So the data scarcity problem in robotics, there's a few ways to look at it.
The first way is that it's really two problems in disguise.
There is the generation, data generation problem, and there's data capture problem.
And the difference is that The data capture is that there might already be lots of robotic data that is being generated, but there's just never been really an incentive to capture it,
to make it easy for digestion in training.
And that's one of the goals that OpenAI was trying to solve, which is if you have robotic data, it's a really good idea to capture it and make it possible to train on.
The second way to look at it is that robotics is very different from language models.
There is not an internet.
Of robotic data that you can use.
And so you see this kind of very operationally heavy effort to collect data.
And there's the question of, is it going to scale?
Well, the way that I look at it is, let's take the US GDP, $24 trillion.
Let's say if we actually solve robotics, a model that can control any robot to do any task, napkin math, maybe contribute 10% to US GDP.
Well, that's already a massive number.
And I think that promise is one of the reasons that warrants the investment into data collections in robotics.
And the third way to look at it is we're very focused on cross-embodiment.
And cross-embodiment, there is the data collection aspect of as well, which is to really make sure that your model and your organizations and infrastructure are set up to consume data
from many different sources of robots.
And that actually allows you to scale easier.
For example, if I were to contrast our approach compared to, let's say, a company that have a particular hardware platform that they optimize for and they scale, it's not an approach
that have really allowed people to scale because it's just much harder to figure out how do you manufacture like 1,000 units of something for now compared to making sure that you yourself
are ready to absorb data from like 1,000 different types of robot that are already in there in the community.
I mean, it's a crazy problem, isn't it?
I mean, the hardware itself, even within the same design of embodiment, if there's a hardware run that goes awry or like one of the servos is slightly different, like you see it in
the data, right?
And then how do you control for that?
Yeah, so I think we were doing kind of like an inventory of robot in the company.
We were so shocked to find that there are no robot No two robot platforms are the same.
And if you ask people in the robotics community, sometimes there's debate about multi-robot versus single robot.
And the argument is that single robot is simpler to scale.
And actually that's not how it plays out in practice.
How it plays out in practice is even if you have a single robot that you're optimising for, over time that platform is going to drift.
Maybe you want to make hardware change or you have software change.
You end up in a situation where it's much harder for you to reuse old data because, you know, in machine learning, if you want to generalize from a distribution, you would like many
samples from that distribution.
And if you just have one robot platform that has a major change every 3 months, maybe you have a few data points from that distribution.
Whereas if you start from the hypothesis that if you have many robot platforms in your fleet, your model is going to learn something more abstract, which is how do I control a robot,
not any particular robot, then the model will be able to ingest data from a slightly different robot better.
And actually we're starting to see emergent property in this kind of robot large foundation model.
That's good news.
We're doing where you start to see interesting transfer between different data sources.
For example, today it's possible to perform tasks zero-shot.
Zero-shot meaning you don't collect any data.
And these are the tasks that last year might have required hundreds and hundreds of hours.
What are some examples?
Yeah, do we have any videos we can see that show that?
I might get some flak when I come back because this is not published result.
Hopefully this will come out soon.
So I want to reserve the excitement for that and I'm kind of building up the excitement a little bit.
So hopefully this will come out soon.
All right.
These are not simple tasks.
These are actually difficult tasks.
That just last year required hundreds of hours of data collections.
You hear it here on Lightcone first that there's some emergent properties that are going to come out of Pi.
Can you give us a sense of the flavor of the task?
It's really easy to fool yourself.
And so we wanted to test across a few different tasks of different flavor, a task that requires precision, that does require reasoning with multiple objects in the scene.
It all seems to have this property.
That's really nice.
So it does seem like that's something that's kind of a more general property that emerged rather than we just got lucky and suddenly the model started working on one particular task.
Could you help us understand where we are now in terms of what's working and how well it's working?
We're not quite at the ChatGPT moment yet.
Where are we?
And I think you brought some videos that you were going to show us to help everybody visualize what the current state of the art actually looks like.
I think where we are is I think if you have a task where it's okay for the robot to make a mistake
and it's possible for you to set up a mixed autonomy system where you have a person that takes over when the robot makes a mistake and provides corrections, it is possible to get to
a level of performance where it starts to make sense to think about scaling robot deployment.
And the example that I specifically want to highlight here is this blog post that we did with Weave and Ultra.
And it's great that these are both YC companies.
I want to provide a little bit of context here first.
The context is that Pi is primarily a research organization.
We want to focus on building the best model, but we also want to not be tunnel vision.
We want to make sure that the model that we built is actually going to be useful and actually perform tasks that people in society cares about.
And one of the really good way for us to do so is to partner really closely with company that want to get robot out there today.
And the way that these relationship work is that we treat each other like we're on the same team, very free flow of information.
And we design a system that try to get the best possible performance for the task that these company care about.
So let me talk about we first.
What you're seeing in this video is a system that we built together folding really diverse items of laundry in a real laundromat in the mission.
You can see people walking outside.
And why this task is difficult is because there's just infinite possibility of observation space.
Like, you know, clothing are deformable and no two items of clothing here are the same.
And these are also unseen.
These are not clothing items that are seen in the training data.
Yeah, I love this team.
They are some of the most cracked people out of Apple I've ever met.
Gary was the partner for Weave.
Maybe you want to explain what Weave is and what their company is.
Yeah, I mean, they're actually shipping their first robots into the home.
We sort of talked about it as being able to do household tasks like this.
And I think they were very inspired by Physical Intelligence's first demos with with laundry folding.
So it's actually a total trip to hear about it.
You know, a year ago we were talking about them doing it, and then now to see them do it working hand in hand with you is really awesome.
I think this is a great example of like, you know, you need the model smarts, you need the data collection, and then the hardware and the sort of system integration all working together
is just hard to nail.
So yeah.
And to get back to your question about why robotics is hard, it's really it is a really hard system problem.
Like you need everything to work well and work well together to get this result.
And like Weave is such an incredible team for us to work with to get this result.
And it actually didn't even take us that long to get this result.
It was roughly, well, we set a goal then maybe it was like 2 weeks afterwards where we got a model that was, got a model and a system that was good enough at performing this task.
It still like blows my mind to see a robot actually folding laundry because I remember until basically until ChatGPT, I didn't know if this would exist even in my entire lifetime because
like folding laundry, I mean, it's, it's always been like the Turing test for robotics because there's no way to like deterministically program a system the way that you did like pre-AI
to do this because the space is like so infinite and like we've shown that it's possible for us to do like basically everyone can do this like robots will be able to do everything.
It's only a matter of like improving it from There was a funny story where
when we first published Pi Zero, people thought of us as the laundry company because the demo was just focused on laundry.
And actually picking home tasks, especially tasks that has to do with deformable objects, is a very intentional choice on our end.
We're not just after the home.
We really want to make it broadly applicable.
But picking home tasks for us to start with has a few benefits.
Like one is relatable.
You can see the laundry folding demo and you can kind of like grok how this is going to be useful and you can get a sense of why it's hard.
And the second is that it's really easy to set up to test generalization.
You can talk about Ultra, which is your company, Jared, a demo of it.
Yeah, this is Ultra.
The thing that I love about this video is you see it's bright outside and you see this is 4x speed and it's 100 minutes.
If I scroll to the end, The sun has set.
Oh, wow.
That was one of the big problems in robotics where it would be so sensitive to the environment in lighting and mess up the vision system, the semantics and part of it.
Yeah.
And the interesting thing here is that it is possible to get to the level of autonomy that the robot is just performing the task.
This is autonomy at scale.
Like this is ready to be scaled.
Kwan, because this task is less familiar than laundry folding, do you want to explain what the robot is doing here and what Ultra is
doing as a company?
Ultra is a company that wants to make it really easy to adapt robots to new tasks.
And right now they're focusing on logistics space, which is really important because there's lots of labor shortage in logistics.
And the task that we focus on together here is, if you order an item from Amazon, you sometimes get this soft pouch that item gets shipped from.
And the task here is you have a tray of these items here and the robot is supposed to pick one of them at a time and place it inside this pouch.
The machine would then close it and then pick up the pouch and put it on the left here to be ready for shipping.
Now this part is hard because there are many different types of objects that can be in this tray and the opening here is actually very narrow.
So you see this interesting example of the robot kind of nudging the item to go into the pouch.
And that's really hard.
Like that requires very good understanding of the scene and like very precise motion to nudge the object into the pouch.
The other thing that's hard about this task is the level of autonomy that's required.
Like this is running for an entire day.
There is still human intervention, I want to say, in this like full-day operation, but the level of intervention is actually quite minimal.
This is not just like some like demo station, right?
This is actually recorded in an actual e-commerce warehouse where they're actually shipping real products to real customers.
This isn't just like a lab.
This is packaging real customer, real order for customer to be shipped out in a real warehouse.
So this is real operations.
So I think this is really cool because I think when people think about robots, they tend to think of the consumer use cases like we have because that's what we're familiar with in our
daily life.
What I find really interesting is that there's there's like a million applications like this ultra thing that you wouldn't think of as obviously like, oh, who packs the soft pouch of
things that you get from Amazon?
Well, there's some person who does that, and this is a job that we can now build a robot to do.
The interesting thing about the approach is that you're converting it from a very difficult engineering problem into an operation problem of how do I identify the use case and how do
I collect the right data?
Which is in some sense more scalable because you can build the system that allows you to collect data for many different tasks.
So, you know, it's now a problem of how do I scale data collection rather than, you know, for every new task, how do I design a really difficult engineering system to solve it?
YC Startup School is back.
We're hand-selecting the most promising builders in the world and flying them out to San Francisco for July 25th and 26th to discuss the cutting edge of tech.
Apply now for a spot.
Okay, back to the video.
I think one thing that the audience may not know is that you have a very unique technical insight that in the past, robotics folks would've kind of gasped and be shocked because robots
need to run in real time.
A lot of times, all of the compute runs on device, but you guys have done something very different.
Can you tell us more about that so that this works in real time with large models and really well?
So the context here is that, you know, we talk to many companies that would like to deploy robots.
And one of the first questions we get is, what compute unit should we get on the robot?
You know, it's expensive, it's going to increase the BOM cost, and they're worried that it's going to go out of fashion very quickly because the model changes, the model gets bigger.
How do I make sure that the hardware that I'm going to commit to today is going to be viable for a couple of years?
It's a very difficult question.
People are often really surprised when I tell them that almost all of the robot evaluation that we run at Pi today, including the really complicated demo that we have shown making coffee,
folding laundry, mobile robots navigating around.
The model actually hosted in the cloud.
And this is not like a cloud as in a server in the office.
It's a real cloud.
The model is hosted in a data center somewhere.
And within this high-frequency control loop that is controlling the robot, the robot is actually querying an API endpoint that hosts the model, sending it images and language command
and getting back action that then executed directly on the robot.
And this is surprising because of precisely the reason that you mentioned, you know, how do you actually make it work?
This is why it's really important for Pi to couple system hardware and model development and research like very tightly together because like it allows us to solve for this problem.
So for example, one of the insights that we have here is that you can actually bury the inference time within the robot control loop because if I'm a robot, I have enough action for
me to execute for the next 100 milliseconds.
There's no reason for me to wait until I finish executing that action to ask my model for a different action.
I can do it as fast as inference, essentially.
And so maybe when I only have 50 milliseconds of action worth left, I can ask for the next sets of action.
And when the current 50 milliseconds is over, like I have something that's ready for me to continue with, you know, my next 100 milliseconds.
So that's one of the inside.
The other kind of algorithmic improvement, we refer to them as real-time chunking.
Design inference in such a way that, you know, there's going to be a delay in how long it takes to query the model on the cloud, basically.
Like the problem here, if I get a little bit more technical is an action chunk is a sequence of action that I can execute on the robot.
So, you know, it's not just one action.
And if I have an action chunk that I can execute for 100 milliseconds and 50 milliseconds in, I want to predict another action chunk and I'm going to transition to that new action chunk
after my current 50 milliseconds is over.
How do I make sure the two are consistent?
Like, you know, how do I make sure that if I'm moving this way, the next action chunk is going to continue to allow me to continue to be smoothly moving this way.
You can precompute.
Yeah, you can precompute.
And like, that's one of the algorithmic improvements that we've made to make inference using model hosted in the cloud possible.
I studied computer engineering, so I'm not really an algorithms person, but when it comes to systems like that, like pipelining, like get me all over that.
That sounds great.
That's so interesting.
I mean, this simplifies— it's kind of— it's a brilliant choice because it simplifies so much of the system for the robots.
You don't need all these clunky I don't know, people have two operating systems at some times for robots embedded, RTOS and then the regular one and all these complex giant compute
and power.
And this is what the initial versions of a Waymo used to run, basically a server on the trunk.
And you can't afford to do that with general day robotics, which is brilliant that you figure out how to do it.
Yeah, you don't have to.
I mean, you can do things.
Some of it, there obviously has to be some compute.
There, but a lot of the compute can happen elsewhere.
And then is there, there must be a video, like this, this thing that we're looking at in the top left, like how much of that is sort of like video feedback?
How much of it is like local processed?
I mean, is there any compute locally on this robot or is it just like a dumb, like video camera that streams data to the cloud?
For this, I am not 100% sure, but I am inclined to believe that it's just a dumb computer.
Like for this specific video, I don't remember, but I'm just 100% confident that we can make this work with a dumb computer and the robot.
And one other interesting thing about our collaboration with Weavon Ultra is one, I've never seen their robot in person.
Oh wow.
Two is I have very little idea about how the robot actually works.
Interesting.
And that's a very intentional choice.
Like I want to stay away from, from that as far as possible.
I also don't know how they collect data.
Like I intentionally don't ask them this question to understand whether it's possible for an organization like Pi to parachute into their existing system and to work really closely
with them on the thing that actually matters to get the system to work and not have to learn about how they've set up their system because in a way that's like a more scalable recipe.
Yeah, you completely decouple a lot of the hardware control loop choices
from the semantics and planning, which just works, which is brilliant.
Yeah, I mean, I'm really surprised that it works.
And when we started the company, we thought that real deployment is only going to be in the conversation like 5 years into the life of the company because the problem is just really
hard.
2 years in and this is the result that we have and real deployment and scaling the number of robots is a really serious consideration today.
And so the pace of progress has just been very pleasantly much faster than we expected originally.
Often on this podcast, we talk about what all this means for startup founders.
I think that might be an interesting question for us to explore here.
So if you imagine someone was listening to this podcast, maybe they're like a college student that's studying computer science and they think robots are really cool and they want to
do something like this, how should they get started and what are the skills that they need?
Do they need to be a mechanical engineer to be able to build a robot like this?
Can they just buy an off-the-shelf like robot arm and camera system and like, well, and load Pi and you're off and running.
Yeah.
Before I actually answer your question, let me provide a few more contexts.
The first is that robotics is traditionally really hard because it's an extremely vertically integrated business.
You need to have your own customer relationship.
Your own hardware, your autonomy stack, your own safety certification, your own everything.
And the barrier to entry is just really high because of that.
And one of the things that we're trying to change is that we're trying to provide a foundation of physical intelligence that the community can build on top of that allow them to onboard
autonomy onto their robot and their task much quicker than before.
So that's the first, you know, we want to provide that that kind of seat of intelligence that allow people to move much faster so that they can focus on other problems.
The second thing is that
I think the recipe for starting a vertical robotic business today is one, have a really good understanding of the existing workflow because the robotic system needs to fit into existing
workflow.
And the second is to be very meticulous about identifying where the opportunity is.
If there's a workflow that needs X number of work today, where is the robot when you insert it's going to make the biggest difference?
And two is to really be scrappy when it comes to hardware and data collections.
You don't need an incredibly expensive robot that is capable of very precise motion today to be able to do this task.
And the reason why is this model is really reactive, and so they can compensate for some of the accuracy in the actual robot movement and to ensure that you have the ability to collect
data and to run evaluation, especially evaluation in real deployment.
The next step after that is to get a mixed autonomy system that allows you to get to the point where it's break-even.
Like break-even economically.
Break-even economically.
Because the reason why that's important is because it allows you to then scale the number of robots.
Because if you lose money on every robot, it's very hard to scale.
That has been historically one of The biggest challenges for robotic companies as they go into growth stage is just the payback period.
It just doesn't make sense.
Yeah.
So the equation, I think, for starting a robotic business has changed and will continue to change at an accelerating pace because the upfront cost is not that high anymore.
And now, you know, what is the upfront cost?
The upfront cost is
much cheaper hardware.
Ability to collect data, ability to collect evaluation, and ability to kind of like understand the use case to see where they should insert the robot.
It's not about having incredibly expensive hardware.
It's not about having your own proprietary, I think, autonomy classical stack anymore to be able to do this task.
And so it allows companies to focus on the component that will actually allow them to differentiate themselves from the rest.
Of the space.
Now that you've sort of unbundled it and you no longer need to build this fully vertically integrated company in order to build a robotics company, are we on the precipice of a Cambrian
explosion of vertical robotics companies where there's going to be like 1,000 companies like Ultra going after, you know, every like menial job in the economy and like getting a deep
understanding of the customer, building a robot that can solve that problem, doing a like mixed human-machine deployment until it like can run fully autonomously and building a company
in every sector?
Is that the future that you see people building on top of Pi?
It's funny that you mentioned Cambrian explosion because when we wrote this blog post, there was that term that was very kind of like hotly debated.
We are, I think, academics at heart and we want to be kind of very measured when we communicate.
But, you know, myself personally, I believe there's going to be a Cambrian explosion of robotic companies across the entire world and across many, many different verticals.
Just because it's just so much cheaper to build and it doesn't require, you know, someone with 20 years of experience in robotics to start anymore.
You know, it requires someone that is really scrappy, that can move really quickly, can do the system integration, can understand customer, what they want to start the deployment.
I mean, what's coming up for me is obviously we work with a lot of robotics companies and meet a lot of founders and it feels like there's this continuum.
One is to use an analogy to compute, you know, personal computing.
You could argue that industrial robotics today is basically like mainframe or minicomputer level.
Like, you know, if you look back in the '70s, huge public companies like Digital Computer that, you know, just did like these sort of very, very expensive deployments, but like they
were very, very specialized and it was all extreme enterprise, like, you know, the idea of a personal computer was ridiculous, right?
You know, it took the Altair and then Apple I and Apple II and then IBM PC XT to like create personal computing.
And then like the traditional advice for robotics for many years is like, go after like dirty and dangerous.
And then of course those are sort of the industrial cases.
Like, you know, you have these giant Tesla robots in the Gigafactory and things like that.
It feels like what you said around profitability is really, really big.
So, you know, does that mean that the people who do the vertical robot Cambrian explosion sort of moment, uh, the people who are sort of first in that— like, it sounds like they would
be the first to be profitable and not dirty and dangerous.
I think this is already happening today.
I think, um, we have the fortune of having lots of visibility into the robotic community because people would like to talk to us, people would like to learn what it's like to build
a foundation model for robotics, and people would like to know, how do I get the same level of autonomy?
And there's so many companies and businesses that we talk to that would love to put a robot into that space that it's okay for the robot to make a mistake, and they just need it so
much.
I really believe that the recipe that I mentioned earlier of identify where the robot can fit in, focus on cheaper hardware, collect data, run evaluation, mix autonomy, break-even scale
robots will work across many different verticals.
And I'm seeing it play out today and it's just incredibly exciting to see.
And this is pretty cool that you literally just gave people the playbook for how to build a vertical robotics company.
Like this is a playbook that could possibly be followed successfully hundreds or thousands of times.
The reason why I want to mention it is because I do want to see that Cambrian explosions and we want to help enable it.
For Pi, if we talk about why Pi is going to fail, it's probably going to be because the problem is just way too hard.
Maybe it takes 50 more years to solve the robotic problem and not a couple of years, 5, 10.
And so we want to enable the community.
We want to accelerate progress.
And that's why we're very open.
We publish our research, we open source Pi Zero and Pi 5.
And people also shocked when they asked me, is there any difference between Pi Zero and Pi 05 that you open source versus the model that we use internally, Pi Zero and Pi 05?
And the answer was actually no, it's the same model.
The pre-trained model weights that you're using that we open source is also the pre-trained model weights that our researchers internally use for Pi Zero and Pi 05.
And so we really want to help accelerate progress in the community.
And to create that Cambrian explosion.
Yeah, that's very inspiring.
I mean, I feel like that's, uh, everyone's sort of spending a lot of time in the digital world, and it feels like, you know, now is the time to start thinking about, you know, the,
the world of atoms.
And, uh, this is sort of the perfect mix of actually, like, you know, how do you take electrons and turn it into abundance in the, you know, atoms world?
And I think about Dario Amadei's essay, all watched over by machines of loving grace.
And when you really think about the perfect manifestation of that, it's not like, you know, perfect agents that look over you just like in the electronic world.
It's, you know, actually something a little bit more akin to what we're seeing here.
Yeah.
And this has really been our mission from the start is to create that Cambrian explosion.
And, you know, this is why we choose to focus on the model because we believe that is the bottleneck.
To just really make robots useful across many different tasks in the world.
And that's why we also focus on cross-embodiment.
Success for us is not defined as only our model on our robot performing tasks that is useful.
The surface area for success is actually much larger, which is our model performing really useful tasks on somebody else's robot out there.
Maybe that we don't even know what that robot is like in a way that's useful to the end consumer.
Could we maybe talk a little bit about the humans behind the robots here?
How did the company get started?
Who are your co-founders?
How do you all get together and what skills do you each bring to such a complex problem?
Sometimes the joke I make here is that the humans behind the robots are also robots.
Not really.
Yeah, so PAI is a very, I would say, untraditional company.
We have larger than average founding teams and some of us work really closely together when we were at the robotics team at Google.
And the robotics team at Google was, I think, a really, really great environment for seeing the sign of life and creating the relationships in the community that allow the robotics
community and these advances to flourish.
There is Locky, which we met when we were thinking about starting the company and has just been really instrumental in making sure that we're a good business.
And there is Adnan, our hardware lead, that came over from Andrew.
And Adnan has a really difficult job because if you want to work on cross embodiment, you remember my joke about how if you want to add 2 years to your grad school, you bring on one
more robot.
The hardware problem and the operational problem for us is how do we build, improve, and scale a fleet of heterogeneous robots.
It's just not one robot platform.
And because we built the organization from scratch in the beginning to support that, like, I think we're able to do it, but it's just a really hard problem because
there's just like no two different robots in the fleet.
Like, how do you make sure that everything runs smoothly?
We're really good at divide and conquer, if you ask.
But so how many co-founders are there in total?
We have Brian, we have Chelsea, Sergey, myself.
Lucky and Adnan.
Is it just necessary to have that many co-founders to solve a problem as big as this?
Or was it a case like you were already sort of like a unit together, you'd already worked together and you just, whatever you started, you would all have wanted to work together?
One common question that we have is, you know, why band together?
And, you know, the first is that we really enjoy each other's company.
We spend a lot of time at work and it's, you know, in some sense, give meaning to life.
And so we really want to enjoy the relationship we have at work.
And the second is that, you know, any one of us could have started a company and be successful.
But the problem is just so incredibly hard and the chances of success is just so much higher that we band together and we can divide and conquer the problems.
And, you know, that's, I think, one of the main reasons why the progress has been much faster.
Than we expected.
What were the differences of you working before in either academia or big industry, big company like Google, and as opposed to now in a startup?
Because this is the first time for a lot of you doing a startup, right?
Yeah, this is the first time for a lot of us.
One of the really surprising things that we learned when we started the company is that the infrastructure for supporting large-scale general purpose robot, which is not there.
And this starts from the software itself.
How do you collect data?
What device do you use to collect data?
How do you manage the data?
How do you annotate the data?
How do you get visibility into the data?
How do you run evaluation?
How do you build operational process?
There wasn't company that offered this kind of services, which is very different from software.
And we were really surprised to find out.
Ends up writing a lot of the software at PAI ourselves.
But I think this is another area of incredible opportunity of building services for robot company.
If you can offer remote teleop, for example, if you can offer data collections, if you can offer annotation service, because these are functions that doesn't need to be repeated from
one company to the next.
So I think there's lots of opportunity to build support for growing robotic business.
So that's one thing, like one surprising thing that I learned.
And the second is I think one of the reasons why we have managed to achieve such progress is that there is a really tight loop of collaboration in the entire lifecycle of model development,
going from what task do you collect data for?
You collect data for the task, how do you do it?
What hardware do you use?
Once after you collect the data, how do you get visibility?
How do you ensure data quality?
How do you then make sure that you can easily train on that data?
After you train on that, how do you run evaluation?
Evaluation is really hard problem in robotics because it scales super linearly to model capability.
Let's say you have a model that can perform a 2-minute task.
Running evaluation for that is very different from running evaluation for a task that's 20 minutes.
It's not 10 times harder, it's more than 10 times harder.
After you run evaluation, how you can distill the learning from that evaluation to know how to improve the model further.
One of the really side projects I would love to take on is to build an automated robotic research scientist, which is really one of the bottlenecks we have today because this is a really
difficult skill set that requires intuition about the entire stack.
So I would love it if there is a model that can ingest multimodal data such as this and analyze failure modes, understanding, oh, is the robot performing this way because of the data
that was collected or the way that it was annotated or the way that we train the model and then suggest ideas and actually try them to figure out if those hypotheses are correct.
So that's something that I would love to have and would dramatically unlock us.
So sometimes I make the joke in the company that we should record all of the meetings and then train a model to basically just make predictions about what is the excessive experiments.
Oh, you could, you totally could.
What if it's OpenClaw and Obsidian and Markdown files and like, you know, a brain.md with like ontology that's custom to your use case.
And what if it's 100 OpenClaws in the background that you orchestrate?
I think there's two sides to this.
The first is that we already see a little bit of a sign of life where for simple failure modes during evaluation, If you can describe the way that the robot failed in text very precisely
and very clearly, then you can ask the language model to make very reasonable recommendations about what the next step is.
But the flip side is that this only works for simple cases today.
And the reason why that's the case is because I think it's a pretty fundamental limitation of the model that we have today, which is that they are not at the core model that takes action
in the world and sees the consequences.
Of its own action, especially action that changes the physical world.
And so I think this kind of very fundamental understanding about how the physical world works is missing from the really large foundation model.
And
I think that's one of the ingredients that's missing to be able to build this automated robot research scientist.
What's interesting about OpenCLAW, I don't know, I mean, basically it can go and it can just do things, which is interesting.
And then at that point, it's on the research lab to provide, like, you know, CLI, MCP endpoints to the things that might control robots or reconfigure rooms.
Or, I mean, I think Karpathy feels like he's, he's starting to talk a bunch about this where, you know, if you mix auto research plus what he's been talking about with markdown files,
like, it might just happen in the open.
Like, You know, there's this sort of sense that you have to make something much, much more complicated to make it work.
But what if that's just wrong?
What if we just have Markdown files and agents and, you know, you could make it yourself with, you know, literally Claude code and MCP today?
What if it's not an algorithm problem?
It's just literally an integration challenge.
We have a version of this internally that I use a lot.
There was a point when I was spending a, um, embarrassingly large amount of money on API queries.
And my team was like, Kuan, what are you doing?
Oh, I'm that guy at Y Combinator right now.
So to give you an example, we have a Claude skill that essentially serving the role of a pre-training on-call today.
So we have these pre-training runs that are really large.
It's very, I think, a difficult exercise to keep them alive.
For them to continue to churn just because there's so many things that can go wrong.
And we have a prototype, a pre-training on-call that kind of babysits the run and has the permission to take action to remedy error that it sees.
And one of the surprising outcomes of that exercise is that it leads to about 50% improvement in compute usage, like just overall compute utilization for that large pre-training run,
which is huge for us.
And you know, this is just a small, simple prototype that I built.
And I think like there's a lot more to be done.
Kuan, this is incredible.
Thank you so much for everything.
Thank you for making Physical Intelligence.
Thank you for showing us these incredible demos.
And honestly, like the thing that gives me the most hope is this idea that there's an entity, there's a research lab out there that is focused on giving this to the world, you know,
about to create this Cambrian explosion of robotic startups.
So someone watching, right now will be inspired by this and, you know, start playing with your models and they might create a robot that touches billions of people's lives for the good.
Thank you for having me.
Been a pleasure.
To the listener, the one takeaway that I want you to have is I think robotics has changed a lot and the cost of building in robotics has decreased and I think will continue to dramatically
decrease, and it also requires a very different kind of scrappy skill set that young startup needs.
We hope to enable really an explosion of many, many, many different robotic use cases.
And, you know, always reach out to us if you want to collaborate.
Thanks, man.
Thanks so much.
Thank you.
Thank you.