Podcastsby easyscribe.ai

Podcasts
About us
Contact us

Request Podcast

/

/

/

P

Podcasts

Let the audio touch your heart, and the transcript stay with you.

About us
Contact us
Privacy
Terms

Jensen Huang – Will Nvidia’s moat persist? - Podcast Transcripts by AI

Home/Podcasts/Dwarkesh Patel/Jensen Huang – Will Nvidia’s moat persist?

Jensen Huang – Will Nvidia’s moat persist?

Jensen Huang – Will Nvidia’s moat persist?

01:43:11Published April 24, 2026

Transcribed from audio to text byEasyScribe

Episode Description

I asked Jensen about TPU competition, Nvidia’s lock on the ever more bottlenecked supply chain needed to make advanced chips, whether we should be selling AI chips to China, why Nvidia doesn’t just become a hyperscaler, how it makes its investments, and much more. Enjoy!

Transcript

00:00:00

We've seen the valuations of a bunch of software companies crash because people are expecting AI to commoditize software.

00:00:06

And there's a potentially naive way of thinking about things, which is like, look, NVIDIA sends a GDS2 file to TSMC.

00:00:14

TSMC builds the logic dies, it builds the switches, then it packages them with the HBM that SK Hynix and Micron and Samsung make.

00:00:22

Then it sends it to an ODM in Taiwan where they assemble the racks.

00:00:25

And so NVIDIA is fundamentally making software that other people are manufacturing.

00:00:28

And if software gets commoditized, Does Nvidia get commoditized?

00:00:32

Well, in the end, something has to transform electrons to tokens.

00:00:38

That transformation, um, there's no— the transformation of electrons to tokens, uh, and making those tokens more valuable over time, uh, I, I don't— I think that, that, that's hard

00:00:54

to hard to completely commoditize.

00:00:59

The transformation from electrons to tokens is such an incredible journey.

00:01:05

And making that token, you know, it's like making one molecule more valuable than another molecule, making one token more valuable than another.

00:01:13

The amount of artistry, engineering, science, invention that goes into making that token valuable,

00:01:21

Obviously we're watching it happening in real time.

00:01:24

And so the transformation, the manufacturing, all of the science that goes in there is far from deeply understood and it's far from, the journey is far from over.

00:01:37

And so I doubt that it will happen.

00:01:40

We're gonna make it more efficient, of course.

00:01:42

I mean, the whole thing about NVIDIA, in fact, the way that you framed the question is my mental model of our company.

00:01:50

The input is electron, the output is tokens.

00:01:55

That is in the middle NVIDIA.

00:01:58

And our job is to, to do as much as necessary, as little as possible to enable that transformation to be done at incredible capabilities.

00:02:08

And, and what I mean by as low as possible, whatever I don't need to do,

00:02:13

I partner with somebody and I make it part of my ecosystem to do.

00:02:16

And if you look at NVIDIA today, we probably have the largest ecosystem of partners, both in supply chain upstream, supply chain downstream, all of the computers, the computer companies,

00:02:26

and all the application developers and all the model makers and all the, you know, AI is a 5-layer cake, if you will.

00:02:34

And we have ecosystems across the entire 5 layers.

00:02:37

And so we try to do as little as possible, but the part that we have to do, as it turns out, is insanely hard.

00:02:44

And I don't think that that gets commoditized.

00:02:48

In fact,

00:02:50

I also don't think that the enterprise software companies, the tools makers, you know, most of the software companies today are tools makers.

00:03:00

Some of them are not, but some of them are workflow codification, you know, systems.

00:03:09

But for a lot of companies, they're tool makers.

00:03:11

For example, Excel's a tool, PowerPoint's a tool, Cadence makes tools, Synopsys makes tools.

00:03:18

I actually see the opposite of what people see.

00:03:22

I think the number of agents are going to grow exponentially.

00:03:26

The number of tool users are gonna grow exponentially.

00:03:30

And it's very likely that the number of instances of

00:03:37

all these tools are gonna skyrocket.

00:03:39

It is very likely the number of instances of Synopsys Design Compiler is gonna skyrocket.

00:03:46

And the number of agents that are gonna be using the floor planners and all of our layout tools and our design rule checkers, the number of agents that are— today we're limited by the

00:03:59

number of engineers.

00:04:00

Tomorrow, those engineers are gonna be supported by a bunch of agents and we're gonna be exploring out the design space like you've never seen explored before.

00:04:07

And we're gonna use the tools that we use today.

00:04:08

And so, so I think, I think tool use is gonna cause, cause these software companies to skyrocket.

00:04:14

The reason why it hasn't happened yet is because the agents aren't good enough at using their tools yet.

00:04:19

And so either these companies are gonna build the agents themselves or agents are gonna get good enough to be able to use those tools.

00:04:26

And I think it's gonna be a combination of both.

00:04:29

I think in your latest filings it was, you had almost $100 billion in purchase commitments with foundries, memory, packaging.

00:04:37

And then Semi Analysis has reported that you will have $250 billion of these kinds of purchase commitments.

00:04:44

And so one interpretation is Nvidia's moat is really that you've locked up many years of these scarce components that are, you know, somebody else might have an accelerator, but can

00:04:54

they actually get the memory to build it?

00:04:55

Can they actually get the logic to build it?

00:04:57

And this is really Nvidia's big moat for the next few years.

00:05:01

Well, it's one of the things that we can do that is hard for someone else to do.

00:05:05

The reason why we could— we've made enormous commitments upstream.

00:05:10

Some of it is explicit, these commitments that you mentioned.

00:05:14

Some of it is implicit.

00:05:16

For example, a lot of the investments that are upstream are made by our supply chain because I said to the CEOs, let me tell you how big this industry is going to be, and let me explain

00:05:27

to you why.

00:05:28

And let me reason through it with you and let me show you what I see.

00:05:33

And so as a result of that, that process of informing, inspiring, aligning with CEOs of all different industries upstream, they're willing to make the investments.

00:05:48

Now, why are they willing to make the investments for me and not someone else?

00:05:51

And the reason for that is because they know that I have the capacity to buy it, buy their supply and sell it through my downstream.

00:06:00

The fact that NVIDIA's downstream supply chain and our downstream demand is so large, they're willing to make the investment upstream.

00:06:10

And so if you look at GTC

00:06:14

and, you know, people are marveled by the scale of GTC and the people that go, it's a 360 degrees, the entire universe.

00:06:22

Of AI all in one place.

00:06:24

And they're all in one place because they need to see each other.

00:06:28

I bring them together so that the downstream could see the upstream, the upstream could see the downstream, and all of them could see all the advances in AI.

00:06:36

And very importantly, they can all meet the AI natives and all the AI startups that are all, you know, being built and all the amazing things that are happening so that they could see

00:06:45

firsthand all the things that I tell them.

00:06:47

And so I spent a lot of my time informing directly or indirectly our supply chain and our partners and our ecosystem about the opportunity that's in front of us.

00:06:58

You know, most of my keynotes, you know, some people always say, you know, Jensen,

00:07:05

in most keynotes it's like one announcement after another announcement after another announcement after another announcement.

00:07:12

Our keynotes are,

00:07:14

there's always a part of it that's a little torturous in the sense that it almost comes across like an edu— like education.

00:07:21

And in fact, that's exactly on my mind.

00:07:24

I need to make sure that the entire supply chain upstream and downstream, the ecosystem understands what is coming at us, why it's coming, when it's coming, how big is it gonna be,

00:07:37

and be able to reason about it systematically, just like I reason about it.

00:07:43

And so I think the

00:07:46

mode as you describe it, we're able to, of course,

00:07:50

build for a future.

00:07:53

If our next several years is $1 trillion in scale, we have the supply chain to do it.

00:07:59

Without our reach, the velocity of our business, You know, just as there's cash flow, there's supply chain flow, there are turns.

00:08:10

Nobody's going to build a supply chain for an architecture if the architecture, the business turns is low.

00:08:16

And so our ability to sustain the scale is only because our downstream demand is so great and they see it and they all hear about it.

00:08:25

They, they see it all coming.

00:08:26

And so that's— it allows us to do the things that we're able to do at the scale we're able to do.

00:08:32

I do want to understand more concretely whether the upstream can keep up.

00:08:37

For many years now, you guys have been 2x-ing revenue year over year.

00:08:41

You guys have been more than tripling the amount of flops you're providing to the world year over year.

00:08:44

And 2x-ing at the scale now, it's really incredible.

00:08:47

Exactly.

00:08:47

Yeah.

00:08:48

So then you look at Logixai, you're the biggest customer on TSMC's N3 node and you're one of the biggest on N2.

00:08:57

AI as a whole this year is going to be 60% of N3.

00:09:00

It's going to be 86% next year according to Semi Analysis.

00:09:03

How do you 2x if you're the majority?

00:09:07

And how do you do that year over year?

00:09:08

So are we, are we in a regime now where the growth rate in AI compute has to slow because of upstream?

00:09:14

Do you see a way to get around these?

00:09:17

You know, how do we build 2x more fabs year over year ultimately?

00:09:21

Yeah, at some, at some level,

00:09:25

the, the instantaneous demand

00:09:29

is greater than the supply

00:09:32

upstream and downstream, uh, in the world.

00:09:36

And, and it could be at any instant, any instance, we could be limited by the number of plumbers, which, which actually happens.

00:09:46

The plumbers are invited to next year's GTC.

00:09:48

Yeah.

00:09:49

You know, by the way, great idea.

00:09:51

But that's a good condition.

00:09:53

You, you want you want a market, you want an industry where the instantaneous demand is greater than the total supply of the industry.

00:10:03

The opposite is obviously less good.

00:10:05

If we're too far apart, if one particular item, one particular component is too far away,

00:10:13

obviously the industry swarms it.

00:10:15

So for example,

00:10:17

notice people aren't talking very much about co-ops anymore.

00:10:20

Yeah.

00:10:20

And the reason for that is because for 2 years we swarmed the living daylights out of it and we double, double, double on several doubles.

00:10:27

And now I think we're in a fairly good shape.

00:10:30

And TSMC now knows that co-was supply has to keep up with the rest of the logic demand and the memory demand.

00:10:37

And so they're scaling co-was and they're scaling, you know, future packaging technologies at the same level as they scale logic, which is terrific.

00:10:47

Because for a long time, co-was rather specialty and HBM memory was rather specialty, but they're not specialties anymore.

00:10:55

People now realize they're mainstream computing technology.

00:11:00

And then, and of course, we're now much more able to influence a larger scope of our supply chain.

00:11:09

In the past, in the past,

00:11:12

you know, in the beginning of the AI revolution, all the things that I say now, I was saying 5 years ago.

00:11:18

And some people believed in it and invested in it.

00:11:20

For example, Sanjay and the Micron team, I still remember the meeting really well where I was clear about exactly what's gonna happen and why it's gonna happen and the predictions,

00:11:32

the predictions that of today.

00:11:35

And they really doubled down on it and we partnered with them and across LPDDR, across, you know, HBM memories, they really invested in it and it obviously has been tremendous for the

00:11:48

company.

00:11:49

Some people came a little bit later, but now they're all here.

00:11:54

And so I think each

00:11:57

one of these bottleneck gets a great deal of attention.

00:12:02

And now we're prefetching the bottlenecks years in advance.

00:12:06

So for example, the investments that we've done with Lumentum and Coherent and all of the silicon photonics, ecosystem.

00:12:17

The last several years, we really reshaped the ecosystem and the supply chain of silicon photonics.

00:12:23

We built up an entire supply chain around TSMC.

00:12:27

We partnered with them on COUPP, invented a whole bunch of technology.

00:12:31

We licensed those patents to the supply chain, keep it nice and open.

00:12:36

And so we're preparing the supply chain through invention of new technologies, new workflows,

00:12:42

new testing equipment, double-sided probing, investing in companies, helping them scale up their capacity.

00:12:50

And so, so you could see that we're trying to shape the ecosystem so that it's ready, the supply chain, so that it's ready to support the scale.

00:12:57

It seems like some bottlenecks are easier than others.

00:12:59

And so scaling up co-ops versus scaling up— I went to the hardest one, by the way, which is plumbers.

00:13:06

Yeah, that's true.

00:13:07

Yeah, yeah, I actually went to the hardest one.

00:13:10

Yeah, yeah, plumbers and electricians.

00:13:11

And the reason for that is because Because, and this is one of the concerns that I have about all the doomers

00:13:19

describing the end of work and killing of jobs.

00:13:22

And one of the things that if we discourage people from being software engineers,

00:13:29

we're gonna run out of software engineers.

00:13:32

And the same prediction 10 years ago, some of the doomers were saying that we're telling people, whatever you do, don't be a radiologist.

00:13:43

And you might hear some of those, some of those videos are still on the web.

00:13:46

You know, radiology is going to be the first career to go.

00:13:49

Nobody's— the world's not going to need any more radiologists.

00:13:51

Guess what we're short of?

00:13:52

Radiologists.

00:13:54

Oh, but okay.

00:13:54

So going back to this point about, well, some things you scale, other things like how do you actually get— how do you actually manufacture 2x the amount of logic a year?

00:14:03

Ultimately, that's bottlenecked by memory and logic are bottlenecked by EUV.

00:14:07

How do you get to 2x as many EUV machines a year?

00:14:09

Yeah.

00:14:10

Year over year.

00:14:10

None of that's impossible to scale quickly.

00:14:14

You just need to, you could do, all of that is easy to do within 2 or 3 years.

00:14:19

You just need a demand signal.

00:14:20

It's not, once you can build 1, you could build 10.

00:14:25

And once you can build 10, you can build a million.

00:14:27

And so these things are not hard to replicate.

00:14:30

How far down the supply chain do you go?

00:14:32

Where do you go to?

00:14:33

ASML and say, hey, if I look out 3 years from now, for NVIDIA to be generating $2 trillion a year in revenue, we need way more AUV machines and— Some of them I have to directly, some

00:14:44

of them indirectly, and some of them, if I can convince TSMC, ASML will be convinced.

00:14:50

And so that's, you know, we have to think about the critical pinch points.

00:14:55

But if TSMC is convinced,

00:14:58

you'll have plenty of, EV machines in a few years.

00:15:02

And so none of that, my point is that none of the bottlenecks last longer than a couple, 2, 3 years, none of them.

00:15:09

And meanwhile, meanwhile we're improving computing efficiency by 10x, 20x, in the case of Hopper to Blackwell, some 30, 50x.

00:15:19

We're coming up with new algorithms because CUDA is so flexible.

00:15:24

We're developing all kinds of new techniques.

00:15:27

So that we drive efficiency in addition to increasing capacity.

00:15:31

And so those are things that none of that worry me.

00:15:36

It's the stuff that's downstream from us,

00:15:40

energy policies that prevent energy from, you know, you can't grow, you can't create an industry without energy.

00:15:47

You can't create a whole new manufacturing industry without energy.

00:15:51

We want to reindustrialize the United States.

00:15:53

We want to bring back chip manufacturing and computer manufacturing and packaging.

00:15:57

And we want to build new things like EVs and robots and we want to build AI factories.

00:16:02

And you can't build any of these things without energy.

00:16:05

And those things take a long time.

00:16:08

But more chip capacity, that's a 2-3 year problem.

00:16:11

More co-was capacity, 2-3 year problem.

00:16:13

Interesting.

00:16:14

I feel like I have guests tell me the exact opposite thing sometimes.

00:16:17

And I don't— in this case, I just don't have the technical knowledge to adjudicate.

00:16:20

But well, the beautiful thing is you're talking to the expert.

00:16:22

Yeah, true, true.

00:16:24

Okay, I want to ask about your competitors.

00:16:28

Yeah.

00:16:28

So if you look at TPU, arguably 2 out of the top 3 models in the world, Claude and Gemini, were trained on TPU.

00:16:39

What does that mean for NVIDIA going forward?

00:16:42

Well, we have a very different, we build a very different thing.

00:16:47

You know what NVIDIA built is accelerated computing, not a tensor processing unit.

00:16:55

And accelerated computing is used for all kinds of things, you know, molecular dynamics and quantum chromodynamics.

00:17:01

And it's used for data processing,

00:17:05

data frames, structured data, unstructured data.

00:17:09

It's used for

00:17:11

fluid dynamics, particle physics, you know, and In addition, we use it for AI.

00:17:17

And so accelerated computing is much more diverse.

00:17:22

And although AI is the conversation today, is obviously very important and impactful,

00:17:28

computing is much broader than that.

00:17:30

And what NVIDIA has done is reinvented the way computing is done from general-purpose computing to accelerated computing.

00:17:38

Our market reach is

00:17:41

far greater than any, any TPU, any ASIC can possibly have.

00:17:46

And so if you look at our position,

00:17:50

we're the only company that, that accelerates applications of all kinds.

00:17:54

We have a gigantic ecosystem.

00:17:57

And so all kinds of frameworks and algorithms all run on NVIDIA.

00:18:01

And because our computers are designed to be operated by other people.

00:18:08

Anyone who's an operator could buy our systems.

00:18:13

Most of these home-built systems, you have to be your own operator because it was never designed to be flexible enough for other people to operate.

00:18:22

And so as a result of the fact that anybody can operate our systems, we're in every cloud, including Google and Amazon and Azure and OCI and right?

00:18:32

And so whether you want to operate it to rent or operate it— if you want to operate it to rent, you better have large ecosystem of customers in many industries that be the off-takers.

00:18:43

If you're operating it,

00:18:46

if you want to operate it for yourself, we

00:18:50

obviously have the ability to help you operate yourself.

00:18:52

Like for example, for Elon with xAI.

00:18:55

And because we could enable operators in any company, in any industry.

00:19:03

You could use it to build a supercomputer for scientific research and drug discovery at Lilly.

00:19:10

And so we can help them operate their own supercomputer and use it for the entire diversity of drug discovery and biological sciences that we accelerate.

00:19:20

And so there are just, you know, a whole bunch of applications that we can address that you can't do so with TPUs.

00:19:28

Because NVIDIA's built CUDA as a fantastic tensor processing unit as well.

00:19:34

But it does, you know, it does every lifecycle of data processing and computing and AI and so on and so forth.

00:19:41

And so I, our market opportunity is just a lot larger.

00:19:45

Our reach is a lot greater.

00:19:47

And because we have such a large,

00:19:51

we basically support every application in the world now.

00:19:54

You could build NVIDIA systems anywhere and know that there will be customers for it.

00:19:58

And so it's a very different thing.

00:19:59

This is going to be sort of a long question, but, you know, you have spectacular revenue, um, and this revenue is mostly— you're not making $60 billion a quarter from, uh, pharma and,

00:20:09

um, quantum.

00:20:10

You're making it because AI is, uh, unprecedented technology that is growing unprecedentedly fast.

00:20:16

And so then the question is, what is best for AI specifically?

00:20:18

And I'm not in the details, but I talked to my AI researcher friends and they say, look, when I use a TPU, it's this big systolic array that's perfect for doing matrix multiplies, whereas

00:20:27

a GPU is very flexible.

00:20:29

It's great when you have lots of branching, when you have

00:20:33

irregular memory access.

00:20:34

But with these, you know, what is AI?

00:20:37

Just like these very predictable matrix multiplies again and again and again.

00:20:40

And you don't have to give up any die area for warp schedulers, for, you know, switches between threads and memory banks.

00:20:47

And so the TPU is really optimized for the majority, the bulk of this growth in revenue and use case for compute that is coming online right now.

00:20:56

Yeah, I wonder how you react to that.

00:21:01

Matrix multiplies is an important part of AI, but it's not the only part of AI.

00:21:07

And if you want to come up with a new attention mechanism, or if you want to disaggregate in a different way, if you want to come up with a whole new type of architecture altogether,

00:21:18

for example, you know, a hybrid SSM.

00:21:22

If you want to use a— you want to create a model that, that, um, that fuses diffusion and autoregressive somehow, uh, you want an architecture that's just generally programmable.

00:21:36

And, and we run everything you can imagine.

00:21:41

And so that's the advantage.

00:21:42

It allows for invention of new algorithms a lot more, a lot, a lot more easily.

00:21:49

And so, because it's a programmable system and, and the ability to invent new algorithms is really what makes AI advance so quickly.

00:21:58

You know,

00:22:00

TPUs, like anything else, is impacted by Moore's Law.

00:22:04

And we know that Moore's Law is increasing about 25% per year.

00:22:08

And so The only way to really get 10x leaps, 100x leaps

00:22:15

is to fundamentally change the algorithm and how it's computed every single year.

00:22:22

And that's NVIDIA's fundamental advantage.

00:22:25

The only reason why we were able to make Blackwell the hopper 50 times, you know, I said it was 35 times and, and, and when I first announced it was gonna, Blackwell was gonna be 35

00:22:36

times more energy efficient than Hopper.

00:22:39

Nobody believed it.

00:22:41

And then Dylan wrote an article, he said, in fact, I sandbagged, it's actually 50 times.

00:22:48

And you can't reasonably do that with just Moore's Law.

00:22:52

And so the way that we solve that problem is new models, MOEs,

00:23:00

parallelized and disaggregated and distributed.

00:23:05

Across a computing system and without the ability to really get down and come up with new kernels with CUDA, it's really hard to do.

00:23:15

And, and so the combination of the programmability of our, of our architecture,

00:23:22

the, the fact that NVIDIA is an extreme co-design company where we could even offload some of the computation into the fabric itself, NVLink, for example, into the network spectrum

00:23:33

X,

00:23:36

and that we could affect change across the processors, the system, the fabric, the libraries, the algorithm.

00:23:47

All of that was done simultaneously.

00:23:49

Without CUDA to do that, I wouldn't even know where to start.

00:23:53

My sponsor, Crusoe, was among the first clouds to offer NVIDIA's Blackwell and Blackwell Ultra platforms.

00:23:58

And they just announced their NVIDIA VERA Rubin deployment scheduled for later this year.

00:24:02

But access to state-of-the-art hardware is only part of the story.

00:24:05

For example, most inference engines already do KV caching for a single user's forward passes, but Crusoe does it across users and GPUs.

00:24:12

So if 1,000 agents are running on the same system prompt, Crusoe only has to compute the KV cache once for it to become available to every single GPU in the cluster.

00:24:20

This is especially important as systems get more agentic and require much longer prefixes in order to use tools, and access files.

00:24:27

In a recent benchmark, Crusoe was able to deliver up to 10 times faster time to first token and up to 5 times better throughput than VLLM.

00:24:35

This is just one among many reasons that you should run your inference workload with Crusoe.

00:24:39

And if you need GPUs for training, you don't need to switch clouds.

00:24:42

Crusoe's got you covered there too.

00:24:43

Go to crusoe.ai/thorcache to learn more.

00:24:47

So this gets at an interesting question about, um, Nvidia's clientele, where if 60% of your revenue is coming from these big 5 hyperscalers, you know, in a different era with different

00:25:01

customers, let's say it's professors who are running experiments and they are helped a bunch by— they need CUDA, they can't use another accelerator, they need to just run PyTorch with

00:25:11

CUDA and have everything optimized.

00:25:13

But if you've got these hyperscalers, they have the resources to write their own kernels.

00:25:17

In fact, they have to, to get that extra last 5% that they need for their specific architecture.

00:25:23

Anthropic, Google are mostly running their own accelerators or running TPUs

00:25:29

and Trainium.

00:25:30

But even OpenAI using GPUs has Triton, which they're like, we need our own kernels.

00:25:37

So they've down to CUDA C++, they've instead of using CuBLAS and NCCL and everything, they've got their own stack which compiles to other accelerators as well.

00:25:47

And so if most of your customers can, can and do make replacements for CUDA, to what extent is CUDA really the thing that is going to make frontier AI happen on NVIDIA?

00:25:59

CUDA, CUDA is,

00:26:02

is a rich ecosystem.

00:26:03

And so if you want to build on any computer first, building on CUDA first is incredibly smart.

00:26:11

And because the ecosystem is so rich.

00:26:14

We support every framework.

00:26:16

If you want to create custom kernels, if you need, for example, we contribute enormously to Triton.

00:26:23

And so the backend of Triton, huge amounts of NVIDIA technology.

00:26:28

We're delighted to help every framework become as great as it can be.

00:26:32

And there's lots and lots of frameworks.

00:26:34

There's Triton, there's VLM, there's SG Lang, and then there's more.

00:26:38

right?

00:26:38

And now there's a whole bunch of new reinforcement learning frameworks coming out.

00:26:42

You know, you got Vero, you got Nemo RL, you got a whole bunch of new.

00:26:45

And then the, now with post-training and reinforcement learning, that entire area is just exploding, right?

00:26:53

And so if you want to build on an architecture, building on CUDA makes the most sense because you know that the ecosystem is great.

00:27:00

You know that if something happens, it's more likely in your code and not in the mountain of code underneath.

00:27:06

You know, don't forget the amount of code that you're dealing with when you're building these systems.

00:27:11

When something doesn't work, was it you or was it the computer?

00:27:16

You would like it always to be you and to be able to trust the computer.

00:27:20

And obviously we still have lots and lots of bugs ourselves, but

00:27:25

our system is so well rung out that you could at least build on top of the foundation.

00:27:31

So that's number one.

00:27:32

Is that the richness of the ecosystem, the programmability of it, the capability of it.

00:27:36

The second thing is, is if you were a developer and you were building anything at all, the single most important thing you want more than anything is install base.

00:27:45

You want the software that you run to run on a whole bunch of other computers.

00:27:49

You don't want to build software.

00:27:50

You're not building software just for yourself.

00:27:52

You're building software for your fleet or for everybody else's fleet because you're a framework builder.

00:27:57

And NVIDIA's CUDA Ecosystem is ultimately its great treasure.

00:28:02

We are now, I don't know how many, several hundred million GPUs.

00:28:07

Every cloud has it.

00:28:09

Goes back to A10, A100, H100, H200, you know,

00:28:16

the L series, the P series.

00:28:18

I mean, there's a whole bunch of them and they're in all kinds of sizes and shapes.

00:28:24

And if you're a robotics company, you want that CUDA stack to actually run in the robot itself.

00:28:29

We're literally everywhere.

00:28:30

And so the install base says that once you develop the software, once you develop the model, it's going to be useful everywhere.

00:28:37

And so the install base is just too incredibly valuable.

00:28:41

And then lastly, the fact that we're in every single cloud makes us genuinely unique because, you know, you're an AI company and you're an AI developer.

00:28:50

You're not exactly sure which CSP you're going to partner with and where you would like to run it.

00:28:55

And we run it everywhere.

00:28:56

Including on-prem for you if you like.

00:28:58

And so, so I think that, that the, the richness of the ecosystem, the expansiveness of the, of the, of the install base and the versatility of where, where, where we are, that combination

00:29:13

is, is, uh, makes CUDA invaluable.

00:29:16

That makes a lot of sense.

00:29:17

I guess the thing I'm curious about is, um,

00:29:20

whether those advantages matter a lot to your main customers.

00:29:27

There's many people who they might matter for, for the kind of person who can actually build their own software stack, who will make up most of your revenue.

00:29:34

Especially if you go to a world where AI is getting especially good at the things which have tight verification loops where you can RL on them.

00:29:42

And then this question of how do you write a kernel that does attention or MLP the most efficiently across a scale-up, it's a very verifiable sort of feedback loop.

00:29:52

And so, oh, can everybody, can all the hyperscalers write these custom kernels for themselves?

00:29:58

And they might still— NVIDIA has, still has great price performance, so they might still prefer to use NVIDIA.

00:30:03

But then the question is, does it just become a question of who is offering the best specs, the best flops and memory and memory bandwidth for a given dollar, where historically NVIDIA

00:30:14

has just had and still has the best margins in all of AI across hardware and software, 70% plus, because of this CUDA mode.

00:30:22

And the question is, oh, can you sustain those margins if for most of your customers they can actually afford to build, build instead of the CUDA mode?

00:30:34

The number of engineers we have assigned to these AI labs is insane, working with them, optimizing their stack.

00:30:41

And the reason for that is because because nobody knows our architecture better than we do.

00:30:46

And these architectures are not as general purpose as a CPU.

00:30:52

The reason why a CPU is so, you know, CPU is kind of like a Cadillac, you know, it's just always, you know, it's a nice cruiser.

00:31:01

It never goes too fast.

00:31:03

Everybody drives it pretty well.

00:31:05

You know, it's got cruise control, you know, and everything's easy.

00:31:10

But in a lot of ways, NVIDIA's GPUs are accelerators, are kind of like F1 racers.

00:31:16

And yeah, I could imagine everybody's able to drive it at 100 miles an hour, but it takes quite a bit of expertise to be able to push it to the limit.

00:31:26

And we use a ton of AI to create the kernels that we have.

00:31:32

And I'm pretty sure we're gonna still be needed for quite some time.

00:31:36

And so our expertise, Helps our AI Labs partners get another 2x out of their stack easily, oftentimes.

00:31:48

It's not unusual that we, you know, by the time that we're done optimizing their stack or optimizing a particular kernel, their model sped up by 3x, 2x, 50%.

00:32:01

That's a huge number, especially when you're talking about the install base of the fleet that they have, of all the hoppers and blackwalls that they have.

00:32:10

When you increase it by a factor of 2, that

00:32:13

doubles the revenues.

00:32:15

That directly translates to revenues.

00:32:17

NVIDIA's computing stack is the best performance per TCO in the world, bar none.

00:32:24

Nobody can demonstrate to me that any single platform in the world today has better performance TCO ratio.

00:32:33

Not one company.

00:32:34

And in fact, In fact, the benchmarks are out there.

00:32:38

Dylan's, right, Inference Max is sitting out there for everybody to use.

00:32:43

And not one, TPU won't come, Trainium won't come.

00:32:47

I encourage 'em to

00:32:50

use Inference Max and demonstrate their incredible inference cost.

00:32:56

It's really, really hard.

00:32:58

Nobody wants to show up.

00:33:00

MLPerf?

00:33:01

I would welcome Trainium to demonstrate their 40% that they claim all the time.

00:33:07

I would love to hear them demonstrate the cost advantage of TPUs.

00:33:12

It makes no sense in my mind.

00:33:14

It makes absolutely zero sense.

00:33:16

On first principles, it makes no sense.

00:33:18

And so I think

00:33:21

the reason why we're so successful is simply because our TCO is so great.

00:33:27

There's a second, you say 60% of our customers are the top 5, but most of that business is external.

00:33:36

For example, most of AWS's, most of NVIDIA in AWS is for external customers, not internal use.

00:33:43

Most of our customers at Azure, obviously all of our customers are external.

00:33:46

All of our customers at OCI are external, not internal use.

00:33:49

The reason why they favor us is because our reach is so great, we can bring them all of the great customers in the world.

00:33:58

They're all built on NVIDIA.

00:33:58

And the reason why all these companies are built on NVIDIA is because our reach and our versatility is so great.

00:34:05

And so I think the flywheel is really install base, the programmability of our architecture, the richness of our ecosystem, and the fact that there's so many AI companies in the world.

00:34:19

There's tens of thousands of them now.

00:34:21

And if you were one of those AI startups, what architecture would you choose?

00:34:26

You would choose an architecture that's most abundant.

00:34:29

We're the most abundant in the world.

00:34:31

The one has the largest installed base.

00:34:33

We're the most largest installed base and one that has a rich ecosystem.

00:34:37

And so that's the flywheel.

00:34:39

That's the reason why between the combination of one, our perf per dollar is so great.

00:34:45

That, that they have the lowest cost tokens.

00:34:49

Second, our perf per watt is the highest in the world.

00:34:53

And so if, if one of these companies, if our partners built a 1 gigawatt data center, that 1 gigawatt data center better deliver the maximum amount of revenues that and number of tokens,

00:35:07

which directly translates to revenues.

00:35:09

You want it to generate as many tokens as possible, maximize the revenues for that data center.

00:35:13

We are the highest tokens per watt architecture in the world.

00:35:17

And then lastly, if your goal is to rent the infrastructure, we have the most customers in the world.

00:35:22

And so that's the reason why the flywheel works.

00:35:25

Interesting.

00:35:25

I guess the question comes down to what is the actual market structure here?

00:35:30

Because even if there's other companies, there could have been a world where there's tens of thousands of AI companies that have roughly equal share of compute.

00:35:39

But if even through these 5 hyperscalers, really the people on Amazon using the computer Anthropic, OpenAI, and these big foundation labs who can themselves afford and have the ability

00:35:51

to make different accelerators work.

00:35:54

No, I think your assumption is, premise is wrong.

00:35:58

Maybe.

00:35:59

Yeah, let me ask you a slightly different question.

00:36:01

Come back and make me correct your premise.

00:36:04

Okay, let me just ask you a different question, which is, okay, if everything you're saying— But still make sure to make me come back and fix, because it's just too important.

00:36:12

To AI.

00:36:13

It's too important to the future of science.

00:36:15

It's too important to the future of the industry.

00:36:18

That premise, the premise, look, let me just finish the question and then we can address it together.

00:36:25

So what do you think if

00:36:29

all these things are true about price performance and performance per watt, et cetera, are true, why do you think it is the case that say Anthropic, for example, just announced a couple

00:36:39

days ago they have a multi-gigawatt deal with Broadcom and Google for TPUs and majority of their compute.

00:36:45

Obviously for Google, it's TPUs are majority of compute.

00:36:48

So if I look at these big AI companies, it seems like a lot of their compute— there was some point where it was all NVIDIA and now it's not.

00:36:56

And so I'm curious how to square—

00:37:00

if these things are true on paper, why are they going with other accelerators?

00:37:03

Yeah, Anthropic is a unique instance and not a trend.

00:37:09

Without Anthropic, why would there be any TPU growth at all?

00:37:14

It's 100% Anthropic.

00:37:16

Without Anthropic, why would there be any Trainium growth at all?

00:37:19

It's 100% Anthropic.

00:37:21

I think that's fairly well known and well understood.

00:37:24

It's not that there's an abundance of ASIC opportunities.

00:37:29

There's only one Anthropic.

00:37:31

But OpenAI deals with AMD.

00:37:32

They're building their own Titan accelerator.

00:37:35

Yeah, but they're mostly, I think we could all acknowledge they're vastly NVIDIA and we're gonna still do a lot of work together.

00:37:42

Yeah.

00:37:43

And we're not, I'm not offended by other people using something else and trying things.

00:37:50

If they don't try these other things, how would they know how good ours is?

00:37:54

You know, and sometimes you gotta be reminded of it

00:37:57

and we gotta, and we have to continuously earn earn the position that we're in.

00:38:04

You know, there are always big claims and look at the number of ASICs that have been canceled.

00:38:10

Just because you're gonna build an ASIC, you still have to build something better than NVIDIA.

00:38:15

And it's not that easy building something better than NVIDIA.

00:38:17

It's not sensible actually.

00:38:19

You know, it's, we, NVIDIA's gotta be missing something seriously, you know, and because our scale, our velocity, we're the only company in the world that's cranking it out.

00:38:29

Every single year, big leaps every single year.

00:38:32

I guess their logic is that, hey, it doesn't need to be better.

00:38:34

It just needs to be not more than 70% worse because they're paying you 70% margins.

00:38:39

No, no, no.

00:38:40

Don't forget, even in ASICs, margins are really quite high.

00:38:44

NVIDIA's margin is 70%, let's say, but an ASIC margin is 65%.

00:38:50

What are you really saving?

00:38:51

Oh, you mean from Broadcom or something?

00:38:52

Yeah, sure.

00:38:54

You got to pay somebody.

00:38:55

Yeah.

00:38:56

And so I think the ASIC margins are incredibly good from what I can tell.

00:39:02

And they believe it so too.

00:39:04

And so they're quite proud of their incredible ASIC margins.

00:39:09

And so you ask the question why.

00:39:12

A long time ago, we just didn't have the ability to do it.

00:39:17

And this is, and at

00:39:21

the time, I didn't deeply internalize how difficult it would be to build a foundation AI lab like OpenAI and Anthropic.

00:39:34

And the fact that they needed huge investments from the supplier themselves.

00:39:41

We just weren't in a position to make the multi-billion dollar investment into Anthropic so that they could use our compute.

00:39:48

But Google and AWS were, and they put in huge investments in the beginning so that Anthropic in return used their compute.

00:39:58

We just weren't in a position to do so at the time, nor did I, I would say my mistake is I didn't deeply internalize that they really had no other options,

00:40:12

that a VC would never put in $5, $10 billion of investment into an AI lab with the hopes of it turning out to be Anthropic.

00:40:21

And so that was my miss.

00:40:25

But even if I understood it, I don't think we would've been in a position to do that at the time.

00:40:29

But I'm not gonna make that same mistake again.

00:40:32

And I'm delighted to invest in OpenAI

00:40:37

and I'm delighted to help them scale.

00:40:41

and I believe it's essential to do so.

00:40:43

And then, and then when, um, uh, when I was able to, uh, Anthropic, when Anthropic came to us, uh, I'm delighted to be an investor, delighted to help them scale.

00:40:53

And, um, uh, but we just weren't at, at the time able to do so.

00:40:58

If I, if I could, uh, rewind everything, uh, NVIDIA could have been as big back then as we are now.

00:41:05

I would have been more than happy to do it.

00:41:06

This is actually quite interesting, which is for many years Nvidia has been this,

00:41:13

the company in AI making money, making lots of money.

00:41:17

And now you're investing it.

00:41:20

It's been reported that you've done up to $30 billion in OpenAI and $10 billion in Anthropic.

00:41:27

But now their valuations have increased and I'm sure they'll continue to increase.

00:41:30

And so if Over these many years, you were giving them the compute, you saw where AI was headed, and then they were worth like 1/10 what they are now a couple of years ago, or even a

00:41:40

year ago in some cases.

00:41:42

And you had all this cash.

00:41:46

There's a world where either NVIDIA themselves becomes a foundation lab, does a huge investment to make that possible, or has made the deals you've made now at current valuations much

00:41:56

earlier on.

00:41:58

And you had the cash to do it.

00:41:59

So I am curious actually, why not have done it earlier?

00:42:02

We did it as soon as we could have.

00:42:05

We did it as soon as we could have.

00:42:07

And

00:42:09

if I could have, I would have done it even earlier.

00:42:13

At the time that Anthropic needed us to do it, we just weren't in a position to do it.

00:42:17

It wasn't, it wasn't, you know, it wasn't in our sensibility to do so.

00:42:21

How so?

00:42:21

Like a cash thing or just?

00:42:23

Yeah, the level of investment, you know, we had never invested outside the company at the time and not that much.

00:42:31

And we didn't realize we needed to,

00:42:35

you know, I always thought that they could just go raise VCs for God's sakes, like all companies do.

00:42:43

But what they were trying to

00:42:47

do Couldn't have been done through VCs.

00:42:51

What OpenAI wanted to do couldn't have been done through VCs.

00:42:54

And I recognize that now.

00:42:56

I didn't know it then.

00:42:57

You know, but that's their genius.

00:42:58

That's why they're smart.

00:43:00

You know, and so they realized that then that they had to do something like that.

00:43:04

And I'm delighted that they did, you know.

00:43:06

And even though

00:43:10

we caused Anthropic to have to go to somebody else, I'm still happy that it happened.

00:43:17

Anthropic's existence is great for the world.

00:43:19

I'm delighted for it.

00:43:21

Uh, I guess you still are making a ton of money and you're making way more money, um, quarter after quarter.

00:43:25

It's still okay to have regrets.

00:43:28

Um, so the question still arises, okay, well, now that we're here and you have all this money that you keep making, um, what should NVIDIA be doing with it?

00:43:37

And there's one answer which says, look, there's this whole middleman ecosystem that has popped up for converting

00:43:43

CapEx into OpEx for these labs so that they can rent compute.

00:43:48

Because the chips are really expensive, they make a lot of money over their lifetime because AI models are getting better.

00:43:53

The value that they generate through tokens is increasing, but they're expensive to set up.

00:43:57

Nvidia has the money to do the CapEx.

00:44:00

And in fact,

00:44:02

it's been reported you're backstopping CoreWeave up to $6.3 billion and have invested $2B.

00:44:07

but yeah, why doesn't NVIDIA become

00:44:11

a cloud themselves?

00:44:12

Why doesn't it become a hyperscaler themselves and rent this compute out?

00:44:14

You have all this cash to do it.

00:44:15

This is a philosophy of the company and I think is wise.

00:44:18

We should do as much as needed, as little as possible.

00:44:23

And what that means is the work that we do with building our computing platform, if we don't do it, I genuinely believe it doesn't get done.

00:44:34

If we didn't take the risk that we take, if we didn't build NVLink the way we built, if we didn't build the whole stack, if we didn't create the ecosystem the way we did it, if we didn't

00:44:42

dedicate ourselves to 20 years of CUDA while losing money most of that time, if we didn't do it, nobody else would've done it.

00:44:52

If we didn't create all of the CUDA-X libraries so that they're all domain-specific, you know, this is several, a decade and a half ago, we pushed into domain-specific libraries because

00:45:02

we realized that if we didn't create these domain-specific libraries, whether it's for ray tracing or image generation or even the early works of AI, these models, if we didn't create

00:45:12

'em for data processing, structured data processing or vector data processing, if we didn't create them, nobody would.

00:45:19

And I am completely certain of that.

00:45:21

We created a library for computational lithography called CuLitho.

00:45:26

If we didn't create it, nobody would have.

00:45:29

And so accelerated computing wouldn't advance the way it has if we didn't do what we did.

00:45:34

And so we should do that.

00:45:36

We should dedicate our company, all of our might, wholeheartedly go do that.

00:45:41

However, the world has lots of clouds.

00:45:43

If I didn't do it, somebody show up.

00:45:46

And so following the recipe, the philosophy of doing as much as needed, but as little as possible.

00:45:53

As low as possible.

00:45:55

That philosophy exists in our company today, and everything I do, I do it with that lens.

00:46:02

In the case of clouds, if we didn't support CoreWeave to exist, these neo clouds, these AI clouds, wouldn't exist.

00:46:11

If we didn't help CoreWeave exist, they would not exist.

00:46:15

If we didn't support Nscale, they wouldn't be where they are today.

00:46:19

If we didn't support Nibius, they wouldn't be what they are today.

00:46:22

Now they are, they're doing fantastically.

00:46:25

Is that a business model?

00:46:27

No, we should do as much as needed, as little as possible.

00:46:30

And so we're trying, we invest in our ecosystem because I want our ecosystem to thrive and I want our, our, I want, I want the architecture and I want AI to be able to connect with

00:46:44

as many industries as possible, as many countries as possible, and make it possible for, you know, the planet to be built on AI and to be built on the American tech stack.

00:46:55

And so, so that vision, I think, is exactly what we're pursuing.

00:46:59

Now, one of the things that, that you mentioned, um, there are so many great, amazing foundation model companies, and we try to invest in all of them.

00:47:08

And this is, this is another thing that we do.

00:47:10

We don't pick winners.

00:47:12

And we, we like, we, we, we need to support everyone and it's part of our, part of our, our, our joy of doing so.

00:47:19

It's, it's an imperative to our business, but we also go out of our way not to pick winners.

00:47:23

And so when I, when I invest in one of them, I invest in all of them.

00:47:27

Why do you go out of your way to not pick winners?

00:47:29

Because it's not our job to, number one.

00:47:32

Number two, when NVIDIA first started, there were 60 graphics companies.

00:47:38

60 3D graphics companies.

00:47:40

We are the only one that survived.

00:47:42

If you would have taken those 60 companies, 60 graphics companies, and asked yourself which one was going to make it, Nvidia would be the top of that list not to make it.

00:47:53

You know, this is long before you, but Nvidia's graphics architecture was precisely wrong.

00:47:58

It's not a little bit wrong.

00:48:00

We created an architecture that was precisely wrong.

00:48:04

And it was an impossible thing for developers to support.

00:48:07

It was never gonna make it.

00:48:09

We reasoned about it from good first principles, but we ended up in the wrong solution.

00:48:16

And everybody would've counted us out.

00:48:21

And here we are.

00:48:22

And so

00:48:24

I have enough humility to recognize that, you know, don't pick winners.

00:48:30

Either let them all take care of themselves or take care of all of them.

00:48:35

One thing I didn't understand is you said, look, we're not prioritizing these new clouds just because they're new clouds and we want to prop them up.

00:48:43

But you also said you listed a bunch of new clouds and you said they wouldn't exist if it wasn't for NVIDIA.

00:48:47

Yeah.

00:48:48

And so how are those two things compatible?

00:48:51

First of all, they, they need to want to exist and they come to ask us for help.

00:48:55

And when they want to exist and have, they have a business plan and they, you know, they have expertise and you know, they have the passion for it.

00:49:04

They obviously have to have some capabilities themselves.

00:49:08

But if at the end of the day they need some investment in order to get it off the ground, we would be there for them.

00:49:15

But the sooner they get their flywheel going, you know, your question was, do we want to be in the financing business?

00:49:22

The answer's no.

00:49:23

Yeah, we don't want to be, we want to, we, because there are people in the financing business and we'd rather work with all of the people who are in the financing business than to be

00:49:31

a financier ourselves.

00:49:32

And so, so I think the, the, our goal is to focus on what we do, keep our business model as simple as possible, support our ecosystem.

00:49:41

When someone like, like OpenAI needs an investment of $30 billion scale because it's still before their IPO,

00:49:50

And, and we deeply believe in them.

00:49:54

We deeply believe that, I deeply believe that, that they're gonna be, they're gonna be an ex— well, they're an extraordinary company already today.

00:50:01

They're gonna be incredible company.

00:50:03

The world needs 'em to exist.

00:50:04

The world wants 'em to exist.

00:50:05

I want them to exist.

00:50:07

And, and they have everything on, they have the wind at their back.

00:50:10

Let's, let's support them and let them scale.

00:50:12

And so, so to those, those investments will do.

00:50:16

Because we're— they need us to do it.

00:50:19

And, um, but we're not trying to do as much as possible, we're trying to do as little as possible.

00:50:24

I spend way too much time copy-pasting text back and forth from Google Docs to chatbots, and so I built what's basically a cursor for writing, which operates the way I think an AI co-researcher

00:50:33

should operate.

00:50:34

I can tag it and it can talk with me through inline comment threads and help me dig deeper and brainstorm.

00:50:39

I built this entire thing over the weekend with Cursor and their Composer 2 model.

00:50:43

With a lot of agentic coding tools, I feel like I have no idea what's going on under the surface.

00:50:46

I just have to relinquish control and hope for the best.

00:50:48

But Cursor let me try a bunch of different ideas while staying on top of the implementation.

00:50:52

I did most of my brainstorming in the agents window, and after I got some basic files in place, I used the diff window to track changes.

00:50:59

The few times that I needed to make a quick tweak by hand, I just used the editor.

00:51:02

If you want to try my AI code researcher yourself, I've linked the GitHub repo in the description.

00:51:06

And if you have a tool that you've been wanting to build, you should make it happen.

00:51:09

Go to cursor.com/thorcash to get started.

00:51:13

This may be sort of an obvious question, but we've lived many years in this situation where there's a shortage of GPUs, and it's grown now because models are getting better.

00:51:25

We have a shortage of GPUs.

00:51:27

Yes.

00:51:27

Yeah.

00:51:28

And

00:51:30

NVIDIA is known for divvying up the scarce allocation not just based on highest bidder, but rather on, hey, we want to make sure that these Neo Clouds exist.

00:51:40

Let's give some to CoreWeave.

00:51:41

Let's give some to Crusoe.

00:51:41

Let's give some to Lambda.

00:51:44

Why is it good for NVIDIA?

00:51:45

First of all, would you agree with this characterization of fracturing the market?

00:51:49

No, no, no.

00:51:50

Yeah.

00:51:50

Your premise is just wrong.

00:51:51

Yeah.

00:51:52

Yeah.

00:51:53

We're sufficiently

00:51:56

mindful about these things.

00:51:59

We're very mindful about these things.

00:52:00

First of all, If you don't place an order, if you don't place a PO,

00:52:06

all the talking in the world won't make a difference.

00:52:09

And so until we get a PO, what are we going to do?

00:52:12

And so the first thing is, is we work, we work really hard with everybody to get a forecast done

00:52:19

because these things take a long time to build and the data centers take a long time to build.

00:52:23

And so we align ourselves with demand and supply and things like that through forecasting.

00:52:29

Okay, that's job number one.

00:52:31

Number two,

00:52:33

everybody who, you know, we've tried to forecast with as many people as possible, but in the final analysis, you still had to place an order.

00:52:41

And maybe

00:52:44

for whatever reason you didn't place your order, what can I do?

00:52:47

And so at some point, first in, first out.

00:52:51

But beyond that, if you're not ready because your data center's not ready, or certain components aren't ready to enable you to stand up a data center,

00:53:01

we might decide to serve another customer first.

00:53:04

That's just maximizing the throughput of our own factory.

00:53:09

And so we might do some adjustments there.

00:53:12

Aside from that,

00:53:15

the prioritization is first in, first out.

00:53:19

Yeah, you got to place a PO.

00:53:22

If you don't place a PO.

00:53:23

Now, of course, there are stories about that.

00:53:27

You know, like for example, all of this kind of started from, from, uh, it was an article about Larry and Elon having dinner with me where they begged for GPUs.

00:53:39

That never happened.

00:53:42

We absolutely had dinner.

00:53:44

We absolutely had dinner.

00:53:46

Um, and it was a wonderful dinner.

00:53:48

In no time did they beg for GPUs.

00:53:50

And so it, they just had to place an order and once they place an order, we do our best to get the capacity to them.

00:53:57

Yeah.

00:53:57

We're not complicated.

00:53:59

Okay.

00:53:59

So it sounds like there's a queue and then, um, uh, based on whether your data center is ready and when you place a purchase order, you get them a certain time, but it still doesn't

00:54:09

sound like highest bidder just gets it.

00:54:12

Is there a reason to do it?

00:54:13

We never do that.

00:54:14

Okay.

00:54:15

We never do that.

00:54:15

Why not just do highest bidder?

00:54:17

Because it's a bad business practice.

00:54:19

You set your price, you set your price, and then people decide to buy it or not.

00:54:25

And I understand that others in the chip industry

00:54:35

change their prices when demand is higher, but we just don't.

00:54:39

We just don't.

00:54:39

That's just never been a practice of ours.

00:54:40

You can count on us.

00:54:42

You know, I prefer to be, to be

00:54:45

dependable,

00:54:47

to be the foundation of the industry.

00:54:50

And I— you don't need to— you don't need to second guess.

00:54:53

You know, if you— if I quoted you a price,

00:54:57

we quoted you a price.

00:54:58

That's it.

00:54:59

And if demand goes through the roof, so be it.

00:55:02

And on the other end, that's why you have a productive relationship with TSMC, right?

00:55:05

Yeah.

00:55:06

Yeah, yeah.

00:55:07

NVIDIA has been in business— we've been doing business with them for I guess coming up on 30 years.

00:55:14

And NVIDIA and TSMC don't have a legal contract.

00:55:19

There's always some rough justice.

00:55:22

And sometimes I'm right, sometimes I'm wrong.

00:55:25

Sometimes I got a better deal, sometimes I got a worse deal.

00:55:28

But overall in the whole, the relationship is incredible.

00:55:32

And I can completely trust them.

00:55:34

I can completely depend on them.

00:55:36

And our, one of the things that we, you can count on with NVIDIA is that next year— this year, Vera Rubin's going to be incredible.

00:55:44

Next year, Vera Rubin Ultra will come.

00:55:46

The year after that, Feynman will come.

00:55:48

And the year after that, I haven't introduced the name yet.

00:55:50

And so, so every single year you can count on us.

00:55:55

And this is an—

00:55:57

you're going to have to go find another ASIC team in the world.

00:56:01

Pick your ASIC team where you can say, I can bet the farm of— I can bet my entire business that you will be here for me every single year.

00:56:11

Your cost, your token cost will decrease by an order of magnitude every single year.

00:56:16

I can count on it like I can count on the clock.

00:56:19

Well, I just said something about TSMC.

00:56:24

No other foundry in history can you possibly say that.

00:56:29

You can say that about NVIDIA today.

00:56:31

You can count on us every single year.

00:56:34

If you would like to buy $1 billion worth of AI Factory compute, no problem.

00:56:39

If you'd like to buy $100 million, no problem.

00:56:41

If you'd like to buy $10 million or just one rack, not a problem.

00:56:45

Or just one graphics card.

00:56:47

Okay, no problem.

00:56:49

If you would like to place an order for a $100 billion AI Factory, no problem.

00:56:54

We're the only company in the world where you can say that today.

00:56:58

I can say that about TSMC as well.

00:57:01

I want to buy one, buy $1 billion, no problem.

00:57:05

We just got to go through the process of planning for it and, you know, all this, all the things that, that mature people do, you know?

00:57:11

And so, so I, I think the, the, uh, this ability for NVIDIA to be the foundation of the world's AI industry,

00:57:21

this is a, this is a position that has taken us decade, several, couple of decades to arrive at.

00:57:28

Enormous commitment, enormous dedication.

00:57:31

And, um,

00:57:32

the stability of our company, the consistency of our company is really, really important.

00:57:37

Okay.

00:57:37

I want to ask about China.

00:57:38

Yep.

00:57:38

And I always like to take, uh, I don't actually don't know what I think about whether it's good to sell chips to China or not, but I've like played devil's advocate against my guests.

00:57:45

So when Dario was on who supports export controls, I asked him, well, why can't America and China both have country of geniuses in a data center.

00:57:52

But since you're on the opposite side, I'll ask you in the opposite way.

00:57:57

And look, one way to think about it is Anthropic actually announced a couple days ago Mythos preview— this model Mythos, they're not even releasing publicly because they say it has

00:58:05

such cyber offensive capabilities that we don't think the world is ready until we get— we make sure these zero days are patched up.

00:58:11

But they say it found thousands of high severity vulnerabilities across every major operating system.

00:58:17

Every browser.

00:58:18

It found one in OpenBSD, which is this operating system that's been specifically designed to not have zero days, and it found one.

00:58:24

For 27 years it's existed.

00:58:26

And so if Chinese companies and Chinese labs and the Chinese government had access to the AI chips to train a model like Claude Mythos with these cyberoffensive capabilities and run

00:58:37

millions of instances of it with more compute, the question is, oh, is that a threat to

00:58:43

American companies to American national security?

00:58:47

First of all, um, Mythos was, was, uh, trained on fairly mundane capacity

00:58:54

and a fairly mundane amount of it,

00:58:57

um, by an extraordinary company.

00:58:59

Uh, and so the amount of capacity and the type of compute that it was trained on is abundantly available in China.

00:59:08

And so

00:59:10

you just have to first realize that chips exist in China.

00:59:14

They manufacture 60% of the world's mainstream chips, maybe more.

00:59:19

It's a very large industry for them.

00:59:22

They have some of the world's greatest computer scientists.

00:59:26

As you know, most of the AI researchers in all of these AI labs, most of them are Chinese.

00:59:33

They have 50% of the world's AI researchers.

00:59:39

And so the question is, if you're concerned about them,

00:59:44

what is the— considering all the assets they already have, they have an abundance of energy, they have plenty of chips, they got most of the AI researchers.

00:59:55

If you're worried about them, what is the best way

00:59:59

to create a safe world?

01:00:01

Well,

01:00:03

victimizing them,

01:00:06

turning them into an enemy likely isn't the best answer.

01:00:11

They are an adversary.

01:00:13

We want United States to win.

01:00:17

But I think having a dialogue and having research dialogue is probably the safest thing to do.

01:00:23

This is an area that is glaringly missing.

01:00:27

Because of our current attitude about

01:00:30

China as an adversary.

01:00:33

It is essential that our AI researchers and their AI researchers are actually talking.

01:00:38

It is essential that we try to both agree on how to, what not to use the AI for.

01:00:46

With respect to finding bugs in software, of course, that's what AI is supposed to do.

01:00:52

Is it gonna find bugs in a lot of software?

01:00:54

Of course.

01:00:56

There's lots and lots of bugs.

01:00:58

There's lots of bugs in the AI software.

01:01:01

And so that's what AI is supposed to do.

01:01:05

And I'm delighted that AI has reached a level where it could help us be so much more productive.

01:01:12

One of the things that

01:01:15

is

01:01:18

under, underemphasized is the richness of ecosystem around cybersecurity, AI cybersecurity and AI security and AI privacy and AI safety.

01:01:30

That whole ecosystem

01:01:33

of AI startups that are trying to create this future for us where you have one AI agent that's incredible surrounded by thousands of AI agents keeping it safe, keeping it secure.

01:01:46

That future surely is gonna happen.

01:01:49

And the idea that you're gonna have an AI agent running around with nobody watching after it is kind of insane.

01:01:57

And so we know very well that this ecosystem needs to thrive.

01:02:02

It turns out this ecosystem needs open source.

01:02:05

This ecosystem needs open models.

01:02:07

They need open stacks so that all of these AI researchers and all these great computer scientists can go build AI systems that are as formidable and can keep AI safe.

01:02:22

And so one of the things that we need to make sure that we do is we keep the open-source ecosystem vibrant.

01:02:30

And that can't be ignored.

01:02:32

That can't be ignored.

01:02:33

And a lot of that is coming out of China.

01:02:38

We had a not suffocate that.

01:02:41

You know, with respect to China, we want to have, of course, we want United States to have as much computing as possible.

01:02:49

We're limited by energy,

01:02:52

but you know, we got a lot of people working on that and we had to not make energy a bottleneck for our country.

01:03:00

But what we also want is we want to make sure that all the AI developers in the world are developing on the American tech stack in making the contributions, the advancements of AI,

01:03:13

especially when it's open source, available to the American ecosystem.

01:03:17

And it would be extremely foolish to create two ecosystems, the open source ecosystem, and it only runs on the Chinese tech, a foreign tech stack, and a closed ecosystem, and that runs

01:03:30

on the American tech stack.

01:03:31

I think that that would be That would be a horrible outcome for the United States.

01:03:36

Since there are a lot of things, let me just triage the

01:03:40

response.

01:03:41

I mean, I think the concern, going back to the flop difference in the hacking, is yes, they have compute, but there's some estimates that because they're at 7 nanometer, they don't

01:03:52

have EUV because of chip making export controls.

01:03:55

The amount of flops they're able to actually produce, they have like 1/10 the amount of flops that the US has.

01:03:59

And so with that, could they train eventually a model like Mythos?

01:04:04

Yes.

01:04:05

But the question is, because we have more flops, American labs are able to get to these level of capabilities first.

01:04:12

And because Anthropic got to it first, they say, okay, we're going to hold on to it for a month while all these American companies, we give them access to it, they're going to patch

01:04:19

up all their vulnerabilities, and now we release it.

01:04:22

Furthermore, if they— even if they trained a model like this, The ability to deploy it at scale.

01:04:27

If you had a cyber hacker, it's much more dangerous if they have a million of them versus 1,000 of them.

01:04:31

So that inference compute really matters a lot.

01:04:34

And in fact, the fact that they have so many AI researchers who are so good is the thing that makes it so scary, because what is it that makes those engineers, researchers more productive

01:04:42

is compute.

01:04:44

If you talk to any AI lab in America, they say the thing that's bottlenecking them is compute.

01:04:47

And there are quotes from Deepseek founder Quine leadership or whatever, they say like the thing we're bottlenecked on is compute.

01:04:54

So then the question is, isn't it better that we get to get American companies because they have more compute, get to the level of Spud or Mythos level capabilities first, prepare our

01:05:04

society for it before China can get to it because they have less compute?

01:05:10

We should always be first and we should always have more.

01:05:14

But in order for that outcome for you to, what you described to be true, you have to take it to the extremes.

01:05:20

They have to have no compute.

01:05:26

And if they have some compute, the question is how much is needed?

01:05:29

The amount of compute they have in China is enormous.

01:05:34

I mean, you're talking about a country.

01:05:35

It's the second largest computing market in the world.

01:05:39

If they want to deploy, aggregate their compute, they got plenty of compute to aggregate.

01:05:44

But is that true?

01:05:45

I mean, there's like, people do these estimates and they're like, well, SMIC is actually behind on the process nodes.

01:05:49

So they're actually— I'm about to tell you.

01:05:51

Okay.

01:05:51

The amount of energy they have is incredible.

01:05:53

Isn't that right?

01:05:55

AI is a parallel computing problem, isn't it?

01:05:58

Why can't they just put 4, 10 times as much chips together?

01:06:03

Because energy is free.

01:06:04

They have so much energy.

01:06:05

They have data centers that are sitting completely empty, fully powered.

01:06:11

They've, you know, they have ghost cities, they have ghost data centers.

01:06:14

They have so much capacity of infrastructure.

01:06:18

If they wanted to, they just gang up more chips, even though they're 7 nanometer.

01:06:23

And their capacity of building chips is one of the largest in the world.

01:06:27

The semiconductor industry knows that they monopolize mainstream chips.

01:06:32

They overcapacity, they have too much capacity.

01:06:35

And so the idea that China won't be able to have AI chips is completely nonsense.

01:06:41

Now, of course, if you ask me,

01:06:45

would the United States be further ahead if the entire world had no compute at all?

01:06:52

But that's just not an outcome.

01:06:53

That's not a scenario that's true.

01:06:55

They have plenty of compute already.

01:06:57

The amount of threshold they need for the concern you're worried about they've already reached that threshold and beyond.

01:07:04

And so, so I think the— you misunderstand that AI is a 5-layer cake and at the lowest layer is energy.

01:07:12

When you have abundance of energy, it makes up for chips.

01:07:16

If you have abundance of chips, it makes up for energy.

01:07:19

For example,

01:07:22

United States is scarce on energy, which is the reason why NVIDIA has to keep advancing our architecture and do this extreme co-design so that with the few chips that we ship.

01:07:34

Okay, with a few chips, because the amount of energy is so limited, our throughput per watt is off the charts.

01:07:41

But if your amount of watts is completely abundant, it's free.

01:07:46

What do you care about performance per watt for?

01:07:48

You got plenty.

01:07:49

You can use old chips to do so.

01:07:51

So

01:07:53

7 nanometer chips are essentially hopper.

01:07:57

The ability to, for Hopper, um, I gotta tell you,

01:08:02

today's models are largely trained on Hopper.

01:08:05

Yeah, Hopper generation.

01:08:07

And so, so Hopper, 7-nanometer chips are plenty good.

01:08:10

The abundance of energy is their advantage.

01:08:13

But then there's a question of, okay, well, can they actually manufacture enough chips given their— But they do.

01:08:20

Uh, uh, what's, what's the evidence?

01:08:22

Huawei just had the largest single year in the history of their company.

01:08:26

How many chips did they ship?

01:08:27

A ton.

01:08:28

Millions.

01:08:29

Millions is way more, way more than Anthropic has.

01:08:35

So there's a question of how much logic SMIC can ship, then there's a question of how much memory.

01:08:39

I'm telling you what it is.

01:08:40

They have plenty of, they have plenty of logic and they have plenty of HBM2 memory.

01:08:44

Right.

01:08:44

But as you know, the bottleneck often in training and doing inference on these models is the amount of bandwidth.

01:08:51

So if you HBM2, I don't know the numbers offhand, but like versus the newest thing you have, you know, it can be almost an order of magnitude difference in memory bandwidth, which is—

01:08:58

Huawei is a networking company.

01:09:02

Huawei is a networking company.

01:09:03

But that doesn't change the fact that you need EUV for the most advanced HBM.

01:09:06

Not true.

01:09:07

Not at all true.

01:09:10

You could gang them together just like we gang them together with NVLink 72.

01:09:14

They've already demonstrated silicon photonics connecting all of these compute together into one giant supercomputer.

01:09:22

Your premise is just wrong.

01:09:25

The fact of the matter is their AI development is going just fine.

01:09:29

And the best AI researchers in the world,

01:09:33

because they are limited in compute, they also come up with extremely smart algorithms.

01:09:39

Remember what I said, I said that Moore's Law is advancing about 25% per year.

01:09:45

However, through great computer science, we could still improve algorithm performance by 10x.

01:09:52

What I'm saying is great computer science

01:09:56

is where the lever is.

01:09:58

There is no question MoE is a great invention.

01:10:02

There's no question all the incredible attention mechanisms reduce the amount of compute.

01:10:09

We have got to acknowledge that most of the advances in AI came out of algorithm advances, not just the raw hardware.

01:10:20

Now, if most advances came from algorithms and computer science and programming, tell me that their army of AI researchers is not their fundamental advantage.

01:10:31

And we see it.

01:10:32

Deepseek is not inconsequential advance.

01:10:36

And the day that Deepseek comes out on Huawei first,

01:10:40

that is a horrible outcome for our nation.

01:10:43

Why is that?

01:10:43

Because I mean, currently you can have a model like Deepseek that can run on any accelerator if it's open source.

01:10:48

Why would that stop being the case in the future?

01:10:50

Well, suppose it doesn't.

01:10:52

Suppose it optimized for Huawei.

01:10:53

Suppose it's optimized for their architecture.

01:10:56

It would put ours at a disadvantage.

01:10:59

You described the situation.

01:11:00

That I perceived to be good news,

01:11:05

that

01:11:06

a company developed software, developed an AI model, and it runs best on the American tech stack.

01:11:12

I saw that as good news.

01:11:15

You set it up as a premise that it was bad news.

01:11:18

I'm going to give you the bad news, that AI models around the world are developed and they run best on not American hardware.

01:11:27

That is bad news for us.

01:11:29

I guess I just don't see the evidence that there's these huge disparities that would prevent you from switching accelerators.

01:11:33

There's American labs, you know, are running their models across all the clouds, across all the different accelerators.

01:11:37

I am the evidence.

01:11:38

You take a model that's optimized for NVIDIA and you try to run it on something else.

01:11:42

But the American labs do that.

01:11:44

And they don't run better.

01:11:46

NVIDIA's success is perfect evidence.

01:11:50

The fact that AI models are created on our stack runs best on our stack.

01:11:55

How is that illogical to understand?

01:11:57

I'm just looking— look, Anthropic's models are run on GPUs, they're run on Trainium, they're run on TPUs.

01:12:02

A lot of work has to go into it to change.

01:12:04

But go to the Global South, go to the Middle East, coming out of the box, if all of the AI models run best on somebody else's tech stack, you've got, you've got to be arguing some ridiculous

01:12:15

claim right now that that's a good thing for the United States.

01:12:18

But I guess I don't understand arguments that are like, if, say, Chinese companies get to the next mythos first, they find that all the security vulnerabilities in American software

01:12:25

first, but they can do it on NVIDIA hardware and they ship it to the Global South that does it on NVIDIA hardware.

01:12:31

Like, how is that?

01:12:32

How is that good?

01:12:33

I mean, I just— okay, it runs on NVIDIA hardware.

01:12:34

It's not good.

01:12:35

It's not good.

01:12:36

So let's not let it happen.

01:12:39

Why do you think it's perfectly fungible that if you didn't ship them compute, it would exactly be replaced by Huawei?

01:12:43

They are behind, right?

01:12:44

They have worse chips than you.

01:12:46

It's completely— there's evidence right now.

01:12:48

Their chip industry is gigantic.

01:12:50

You can just look at the flop or bandwidth or memory comparisons between the H200 and the Huawei 910C.

01:12:55

It's like half, half, third.

01:12:56

They use more of it.

01:12:57

They use twice as many.

01:12:58

I guess it seems like your argument is they have all this energy that's ready to go, right?

01:13:02

And they need to fill it with chips.

01:13:03

And they're good at manufacturing.

01:13:04

And I'm sure eventually they would be able to just out-manufacture everybody.

01:13:08

But there's these few critical years.

01:13:10

What, what is the critical year you're talking about?

01:13:13

These next few years.

01:13:13

We've got these models that are going to be able to do all the cyber attacks.

01:13:16

If the critical years, the next critical years is critical, Then we have to make sure that all of the world's AI models are built on American tech stack

01:13:24

these critical years.

01:13:26

Okay.

01:13:26

How would that prevent— if they're built on American tech stack, how would that prevent them from, if they have more advanced capabilities, from launching the Mythos equivalent cyberattacks

01:13:33

on the US?

01:13:34

There's no guarantee either way.

01:13:36

But if you have it earlier, we can prepare for it.

01:13:39

Listen,

01:13:40

why are you, why are you causing one layer of the AI industry

01:13:46

to lose an entire market

01:13:50

so that you could benefit another layer of the AI industry?

01:13:54

There's 5 layers and every single layer has to succeed.

01:13:59

The, the, the layer that has to succeed most is actually the AI applications.

01:14:05

Why are you so fixated on that AI model, that one company?

01:14:09

For what reason?

01:14:10

Because those models make possible these incredibly offensive capabilities, and you need compute to run them.

01:14:16

The energy, the chips, the ecosystem of AI researchers make it possible.

01:14:21

A few months ago, Jane Street spent about 20,000 GPU hours training backdoors into 3 different language models.

01:14:27

Then they challenged my audience to find the trigger phrases.

01:14:29

I just caught up with Rickson, who designed the puzzle, about some of the solutions that Jane Street received.

01:14:34

If you think the base model was here and the backdoor model was here, You can kind of linearly interpolate the weights to like adjust the strength of the backdoor, but you can also

01:14:42

extrapolate it to make the backdoor even stronger.

01:14:45

And in some cases, if you make it strong enough, the model will just regurgitate what the response phrase was supposed to be.

01:14:51

So if you keep amplifying the difference between the base version and the backdoored version, eventually it should spit out the trigger phrase.

01:14:58

But this technique only worked on 2 out of the 3 models.

01:15:00

Even Rixen isn't sure why it didn't work on the other.

01:15:02

Being able to verify that a model only does what you think it does is one of the most important open questions in AI security.

01:15:07

If this is the kind of problem that excites you, Jane Street is hiring researchers and engineers.

01:15:11

Go to janestreet.com/thorcash to learn more.

01:15:16

Okay, stepping back, it has to be the case that China is able to build enough 7-nanometer capacity.

01:15:21

And remember, they're still stuck on 7 nanometer while you'll move on to 3 nanometer and then 2 nanometer or 1.6 nanometer with Feynman.

01:15:27

So while you're on 1.6 nanometer, they're still going to be on 7 nanometer.

01:15:30

and they have to produce enough of it to make up for the shortfall.

01:15:34

And they have so much energy that the more chips you give them, the more compute they'd have.

01:15:38

Right.

01:15:39

So it just comes out as a question of ultimately they are getting more compute.

01:15:43

Compute is an input to training and inference.

01:15:45

I just think you speak in absolutes.

01:15:48

I think the United States ought to be ahead.

01:15:51

The amount of compute in the United States is 100 times more than anywhere else in the world.

01:15:57

The United States ought to be ahead.

01:15:59

Okay?

01:16:00

The United States is ahead.

01:16:02

NVIDIA builds the most advanced technologies.

01:16:04

We make sure that the US labs are the first to hear about it and the first chance to buy it.

01:16:10

And if they don't have enough money, we even invest in them.

01:16:14

The United States ought to be ahead.

01:16:16

We want to do everything we can to make sure the United States is ahead.

01:16:20

Number one point.

01:16:21

Do you agree?

01:16:22

And we're doing everything we can to do that.

01:16:24

But how is shipping chips to China keeping the US— No, no, no.

01:16:26

If they're bottlenecked on compute.

01:16:27

We got Vera Rubin for United States.

01:16:31

We have Vera Rubin for United States.

01:16:33

Now, United States.

01:16:35

Am I in United States?

01:16:36

Do you consider me part of the United States?

01:16:38

Yes.

01:16:39

NVIDIA.

01:16:40

You consider NVIDIA a United States company.

01:16:43

Okay.

01:16:43

Number one,

01:16:45

why is it that we don't come up with a regulation that's more balanced so that Nvidia can win

01:16:52

around the world instead of giving up the world.

01:16:56

Why would you want United States to give up the world?

01:17:00

The chip industry is part of the American ecosystem, is part of American technology leadership.

01:17:06

It's part of the AI ecosystem.

01:17:08

It's part of AI leadership.

01:17:10

Why, why is it that your policy, your philosophy leads to United States giving up a vast part of the world's— I guess the, the claim here is, um, Dario had this quote where he said

01:17:25

it's like Boeing bragging that we're selling North Korea nukes, but the missile casings are made by Boeing, and that's somehow enabling the US technology stack.

01:17:32

Like, fundamentally, you're giving them this capability.

01:17:34

Comparing AI to anything that you just mentioned is lunacy.

01:17:37

But AI is similar to enriched uranium, right?

01:17:39

And then it can have positive uses.

01:17:41

Can have negative uses.

01:17:42

We still don't want to send enriched uranium to other countries.

01:17:46

Who's, who's sending enriched— the analogy here is enriched uranium.

01:17:50

Because it's a lousy, it's a lousy analogy.

01:17:53

It's an illogical analogy.

01:17:55

But if it's, if that compute can run a model that can do zero-day exploits against all American software, how is that not a weapon?

01:18:04

First of all, we ought to— the way to solve that problem is to have dialogues with the researchers and dialogues with China and dialogues with all the countries to make sure that people

01:18:12

don't use technology in that way.

01:18:14

That's a dialogue that has to happen.

01:18:16

Okay, number, number one.

01:18:17

Number two, um, we also need to make sure that United States is ahead.

01:18:23

Everything that Rubin, Vera Rubin, Blackwell is available in United States in abundance.

01:18:30

Mounds of it.

01:18:30

Obviously our, our results would show it.

01:18:33

Abundance.

01:18:34

Tons of it.

01:18:35

Tons of it.

01:18:36

The amount of computing we have is great.

01:18:38

We have amazing AI researchers here.

01:18:40

It's great.

01:18:41

We have to stay ahead.

01:18:42

However, we also have to recognize that AI is not just a model, that AI is a 5-layer cake, that AI industry matters across every single layer.

01:18:54

And we want United States to win at every single layer, including the chip layer.

01:18:58

And conceding the entire market is not going to allow United States to win the technology race long-term in the chip layer, in the computing stack.

01:19:08

That is just a fact.

01:19:10

I guess then the crux comes down to how does selling them chips now help us win in the long term?

01:19:16

Like Tesla sold extremely good electric vehicles to China for a long time.

01:19:21

iPhones are sold in China extremely good.

01:19:23

They didn't cause some lock-in.

01:19:24

China will still make their version of EVs and they're dominating smartphones.

01:19:29

When we started the conversation today, you would, you would acknowledge and you acknowledged that NVIDIA's position is very different.

01:19:38

You use words like moat.

01:19:40

The single most important thing to our company is our richness of our ecosystem, which is about developers.

01:19:46

50% of the AI developers are in China.

01:19:49

We don't want to— we shouldn't— the United States should not give that up.

01:19:53

But we have a lot of NVIDIA developers in the US, and that doesn't prevent American labs from also being able to use other accelerators in the future.

01:19:59

In fact, right now they're using other accelerators as well.

01:20:01

Which is fine and great.

01:20:03

I don't, I don't see why that wouldn't be the case in China as well.

01:20:05

If you sell them Nvidia chips just the same way that Google can use TPUs and Nvidia.

01:20:09

We have to keep innovating.

01:20:10

And, you know, as you, as you probably know, our share is growing, not decreasing.

01:20:16

The premise that even if we competed in China, that we're going to lose that market anyways.

01:20:25

I don't— you're not talking to somebody who woke up a loser.

01:20:29

And that loser attitude, that loser premise makes no sense to me.

01:20:34

We are not, we're not a car.

01:20:37

We are not a car.

01:20:39

It, the fact that I can buy a car, this car brand one day and use another car brand another day, easy.

01:20:47

Computing is not like that.

01:20:49

There's a reason why the x86 still exists.

01:20:51

There's a reason why ARM is so sticky.

01:20:53

These ecosystems these ecosystems are hard to replace.

01:20:58

It costs an enormous amount of time and energy, and most people don't want to do it.

01:21:01

And so it's, it's our job to continue to nurture that ecosystem, to keep advancing the technology so that we could compete in the marketplace.

01:21:10

Conceding a marketplace based on the premise you described, I simply can't acknowledge that.

01:21:15

It makes no sense because I don't think United States is a loser.

01:21:20

You, uh, our industry is now a loser.

01:21:23

And that, that losing proposition, that losing mindset makes no sense to me.

01:21:28

Okay, I'll move on.

01:21:29

I just, yeah, I just want to make sure you don't have to move on.

01:21:31

I'm enjoying it.

01:21:32

Okay, great.

01:21:33

Yeah, yeah, then, then I appreciate that.

01:21:37

But I think that maybe the crux— and thanks for walking around the circles with me because then I think it helps bring out what the crux here is.

01:21:43

The crux is you're going to extremes.

01:21:46

You're argument starts from extremes that if we give them any compute at all

01:21:51

in this narrow moment, we will lose everything.

01:21:54

No, I think what my argument is, those extremes, they're childish.

01:21:59

They're childish.

01:22:00

Yeah.

01:22:01

The idea is not that there is some key threshold of compute.

01:22:05

Yeah.

01:22:05

Is that any marginal compute is helpful, right?

01:22:08

So if you have more compute, you can train a better model.

01:22:10

And I just want you to acknowledge that any marginal sales for American technology industry is beneficial.

01:22:18

I mean, if the AI models that run on those chips are capable of cyberoffensive capabilities, or training models are capable of cyberoffensive capabilities, running more models at those

01:22:26

instances, it is not a nuclear weapon, but it enables a weapon of a kind.

01:22:31

The logic that you use, you might as well say it to microprocessors and DRAMs.

01:22:35

You might as well say it to electricity.

01:22:37

But in fact, we do have export controls on the technology that is relevant to making the most advanced DRAM.

01:22:42

Right?

01:22:42

We have all kinds of export controls on China for all kinds of chip manufacturing.

01:22:45

We sell a lot of DRAM and CPUs into China, and I think it's right.

01:22:51

I guess it goes back to the fundamental question of, is AI different?

01:22:54

Right?

01:22:55

If you have the kind of technology that can find these zero days in software, is that something where we want to minimize China's ability to get there first?

01:23:04

We want to be ahead.

01:23:07

We can control that.

01:23:08

How do we control that if the chips are already there and they're using that to train that model?

01:23:11

We have tons of compute, we have tons of AI researchers, we're racing as fast as we can.

01:23:16

Again, we have more nuclear weapons than anybody else, but we don't want to send enriched uranium anywhere.

01:23:20

We're not enriched uranium.

01:23:24

It's a chip, and it's a chip that they can make themselves.

01:23:28

But there's a reason they're buying it from you, right?

01:23:30

And we have quotes from the founders of Chinese companies that say that we're bottlenecked because our chips are better.

01:23:35

On balance, our chips are better.

01:23:36

There's just no question about it.

01:23:37

In the absence of our chip, in the absence of our chip, can you acknowledge that Huawei had a record year?

01:23:42

Can you acknowledge that a whole bunch of chip companies have gone public?

01:23:44

Can you acknowledge that?

01:23:46

Can you acknowledge that?

01:23:47

Can you— can also acknowledge that the fact that we used to have a very large share in that market and we no longer have the large share in that market?

01:23:54

We can also acknowledge that China is about 40% of the world's technology industry.

01:24:00

That market, to leave, to leave that market, concede that market for United States technology industry is a disservice to our country.

01:24:08

It is a disservice to our national security.

01:24:10

It is a disservice to our, to our technology leadership, all for the benefit, all for the benefit of one company.

01:24:16

It makes no sense to me.

01:24:17

I guess I'm confused of— it feels like you're making two different statements.

01:24:19

One is that we're going to win this competition with Huawei because our chips are going to be way better if we're allowed to compete.

01:24:24

And another is that they would be doing the same exact thing without us anyways.

01:24:28

Right?

01:24:28

How can those two things be the same, true at the same time?

01:24:30

It's obviously true.

01:24:33

In the absence of a better choice, you'll take the only choice you have.

01:24:37

How is that illogical?

01:24:38

But so logical.

01:24:39

The reason they want NVIDIA chips is they're better.

01:24:41

Better is more compute.

01:24:42

More compute means you can train a better model.

01:24:44

It's better because it's easier to program.

01:24:46

It's e— we, we have a better ecosystem, but whatever the better is, whatever the better is.

01:24:51

And of course we're gonna send them compute.

01:24:53

So what?

01:24:55

So what?

01:24:56

The fact of the matter is that you would get the benefit.

01:24:59

Don't forget, we get the benefit of American technology leadership.

01:25:03

We get the benefit of developers working on the American tech stack.

01:25:07

We get the benefit as those AI models diffuse out into the rest of the world.

01:25:12

The American tech stack is therefore the best for it.

01:25:14

We can continue to advance and diffuse American technology.

01:25:18

That, I believe, is a positive.

01:25:21

It's a very important part of American technology leadership.

01:25:25

Now, the policy that you're advocating resulted in the American telecommunication industry being policy'd out of basically the world to the point where we don't control our own telecommunications

01:25:36

anymore.

01:25:37

I don't see that as smart.

01:25:40

It's a little narrow-minded and it led to unintended consequences that I'm describing to you right now that you seem, you seem to have a very hard time understanding.

01:25:48

Okay, let's just step back.

01:25:50

It seems like the crux here is there's a potential benefit and there's a potential cost, and we're trying to figure out, is the benefit worth the cost?

01:25:58

I guess I'm trying to get you to acknowledge the potential cost, that compute is an input to training powerful models.

01:26:04

Powerful models do have powerful offensive capabilities like cyberattacks.

01:26:10

It is a good thing that American companies got to Claude Mythos level capabilities first, and then now they're going to hold off on those capabilities so that the American companies

01:26:16

and American government can make their software more protected before this level capability is announced.

01:26:23

If China had had more computer, had more cloud compute, if we could have made a Mythos-level model earlier and deployed it widely, that would have been very bad.

01:26:31

One of the reasons that hasn't happened is that we have more compute thanks to companies like NVIDIA in America.

01:26:37

That is a cost of sending chips to China.

01:26:40

And so let's leave the benefit aside for a second.

01:26:42

Do you acknowledge that this is a potential cost?

01:26:45

I will also tell you the potential cost is we allow one of the most important layers of the AI stack, the chip layer, to concede an entire market, the second largest, second largest

01:27:00

market in the world so that they could develop scale, so that they could develop their own ecosystem so that future AI models are optimized in a very different way.

01:27:11

Than the American tech stack.

01:27:13

As AI diffuses out into the rest of the world,

01:27:17

their standards, their tech stack will become superior to ours because their models are open.

01:27:24

I guess I just believe enough in NVIDIA's kernel engineers and CUDA engineers to think that they could optimize.

01:27:29

AI is more than kernel optimization, as you know.

01:27:32

Of course.

01:27:32

But there's so many things you can do from distilling to a model that's well fit for your chips.

01:27:36

We're going to do our best.

01:27:37

You have all the software.

01:27:38

It just seems hard to imagine that there's a long-term lock-in.

01:27:41

To Chinese ecosystem, even if they have this slightly better open source model for a while?

01:27:44

China is the largest contributor to open source software in the world.

01:27:48

Fact.

01:27:51

Right.

01:27:52

China is the largest contributor to open models in the world.

01:27:55

Fact.

01:27:57

Today it's built on the American tech stack, NVIDIA's.

01:28:01

Fact.

01:28:03

All 5 layers of the tech stack for AI is important.

01:28:07

United States ought to go win all 5 of them.

01:28:10

They're all important.

01:28:12

The one that is the most important, of course,

01:28:15

is the AI application layer.

01:28:18

The layer that diffuses into society, the one that uses it most, will benefit from this industrial revolution most.

01:28:27

But my point is that every, every layer has to succeed.

01:28:31

If we, if we scare this country into thinking that AI is

01:28:37

somehow a nuclear bomb

01:28:40

so that everybody hates AI

01:28:43

and everybody's afraid of AI,

01:28:45

I don't know how you're helping the United States.

01:28:49

You're doing a disservice.

01:28:51

If we scare everybody out of doing software engineering jobs because it's going to kill every software engineering job and we don't have any software engineers as a result of that,

01:28:59

we're doing a disservice to the United States.

01:29:01

If we scare everybody out of radiology so nobody wants to be a radiologist because computer vision is completely free and no AI is going to do a worse job than a radiologist, and we

01:29:11

misunderstand the difference between a job and the task, the job of a radiologist, patient care, task to read a scan.

01:29:19

If we misunderstand that so profoundly and we scare everybody out of going to radiology school, we're not going to have enough radiologists and good enough healthcare.

01:29:28

And so I,

01:29:31

I'm making the case

01:29:34

that when you make these, make a premise that is so extreme, everything goes from zero or infinity,

01:29:44

we end up scaring people in a way that's just not true.

01:29:47

Life is not like that.

01:29:50

Do I, do we want United States to be first?

01:29:52

Of course we do.

01:29:54

Do we need to be

01:29:58

a leader in every layer of that stack?

01:30:01

Of course we do.

01:30:03

Of course we do.

01:30:05

Is today you're talking about mythos because mythos is important?

01:30:08

Sure, that's fantastic.

01:30:11

But in a few years' time, I'm making you the prediction that when we want the American tech stack, when we want American technology to be diffused around the world, out to India, out

01:30:21

to the Middle East, out to Africa, out to Southeast Asia, when our country would like to export, because we would like to export our technology, we would like to export our standards.

01:30:34

On that day, I want you and I to have that same conversation again, and I will tell you exactly about today's conversation, about how your policy and how what you imagined literally

01:30:45

caused the United States to concede the second largest market in the world for no good reason at all.

01:30:52

We shouldn't concede it.

01:30:54

If we lose it, we lose it.

01:30:55

But why do we concede it?

01:30:58

Now, nobody is advocating— nobody is advocating an all or nothing.

01:31:02

Nobody's advocating all or nothing, meaning we ship everything to China at all times.

01:31:07

Nobody's advocating that.

01:31:09

We should always have the best technology here.

01:31:12

We should always have the most technology here.

01:31:15

And the first, but we should also try to compete and win around the world.

01:31:22

Both of those things can simultaneously happen.

01:31:26

It requires some amount of nuance, some amount of maturity instead of absolutes.

01:31:32

The world is just not absolutes.

01:31:34

Okay.

01:31:34

The argument hinges on they've built models that are specified for their architecture, they're the best chips that they make in a few years, and those chips get exported around the

01:31:44

world.

01:31:44

That sets a standard, um, because of EUV, um, export controls.

01:31:50

As we said, you're going to move on to 1.6 nanometer, there's going to be a 7 nanometer even after a few years from now.

01:31:54

And it may make sense that domestically they would prefer, hey, we got so much energy, we can manufacture at such scale, we'll still keep using 7 nanometer.

01:32:01

But the exporting thing, their 7 nanometer chips have to be competitive against, well, your 1.6 nanometer chips, and their models have to be so far optimized for the 7 nanometer that

01:32:11

it's better to run their models on 7 nanometer than to run their models on your 1.6 nanometer.

01:32:16

Can we, can we just look at the facts then?

01:32:19

Okay.

01:32:21

Is Blackwell 50 times more advanced lithography than Hopper?

01:32:26

Is it 50 times?

01:32:28

Not even close.

01:32:31

I just kept saying it over and over again.

01:32:32

Moore's Law is dead.

01:32:34

Between Hopper and Blackwell, from the transistors themselves, call it 75%.

01:32:40

It was 3 years apart.

01:32:43

75%.

01:32:45

Blackwell is 50 times Hopper.

01:32:50

My point is architecture matters.

01:32:54

Computer science matters.

01:32:55

Semiconductor physics matter as well,

01:32:58

but computer science matters.

01:33:01

AI, the impact of AI largely comes from the computing stack, which is the reason why CUDA is so effective, which is the reason why CUDA is so, so, so beloved.

01:33:12

It's, it's an ecosystem, a computing architecture that allows for so much flexibility that if you wanted to change an architecture completely, create something like MoE, create something

01:33:23

like diffusion, create something you know, that's disaggregated, you could do, you could do so.

01:33:29

It's easy to do.

01:33:31

And so the fact of the matter is AI is about the stack above as much as it is about the architecture below.

01:33:38

To the extent that, that we have architectures and software stacks that are optimized for our stack, for our ecosystem, it is obviously good because we started the conversation today

01:33:49

about how NVIDIA's ecosystem is so rich, why people always love programming on CUDA first.

01:33:54

They do.

01:33:55

They do.

01:33:56

And so do the researchers in China.

01:33:58

But if we are forced to leave China, if we're forced to leave China, it would be, it would be, well, first of all, it's a policy mistake.

01:34:07

Obviously has backlash.

01:34:09

It has backlash.

01:34:11

Obviously it has fired, you know, has

01:34:16

turned out badly for the United States.

01:34:19

It enabled it.

01:34:20

Accelerated their chip industry.

01:34:22

It forced all of their AI ecosystem to focus on their internal architectures.

01:34:27

It's not too late, but nonetheless, it has already happened.

01:34:33

You're going to see in the future they're not stuck at 7 nanometer.

01:34:37

Obviously, they're good at manufacturing.

01:34:39

They will continue to advance from 7 and beyond.

01:34:43

Now,

01:34:46

is there 10x difference between 5nm

01:34:50

and 7nm?

01:34:52

The answer is no.

01:34:54

Architecture matters.

01:34:55

Networking matters.

01:34:56

That's why NVIDIA bought Mellanox.

01:34:57

Networking matters.

01:34:59

Energy matters.

01:35:00

And so all of that stuff matters.

01:35:01

It's not, it's not simplistic like the way you're trying to distill it.

01:35:06

Uh, we can move on from China, but that actually raises an interesting question about, um, we were discussing earlier these bottlenecks at TSMC and memory and so forth.

01:35:15

And so if we're in this world where, you know, you're already the majority of N3, at some point you'll be N2, you'll be a majority of that.

01:35:24

Do you see that you could go back to N7, the spare capacity at an older process node, and say, hey, the demand for AI is so great and our capacity to expand the leading edge is not

01:35:35

meeting it, so we're going to make a Hopper or Ampere, but everything we know about Enumics today, and all the other improvements you described.

01:35:42

Do you see that world happening within, before 2030?

01:35:45

It's not necessary to.

01:35:47

And the reason for that is because with every, every generation, the architecture,

01:35:53

the architecture, um, is more than just, is more than just, uh, the transistor scale.

01:36:02

It also, you're doing so much engineering and packaging and stacking and and the numerics and, you know, the system architecture.

01:36:13

When you run outta capacity

01:36:16

to easily go back to another node, that's a level of R&D that no one could afford.

01:36:23

You know, we could afford to lean forward.

01:36:24

I don't think we could afford to go back.

01:36:26

Now, if the world simply says, if on that day, if on that day, let's do the thought experiment, on that day we go, listen, we're just never gonna have more capacity ever again.

01:36:36

Would I go back and use 7?

01:36:37

In a heartbeat.

01:36:39

Of course I would.

01:36:42

One question somebody I was talking to had is why NVIDIA doesn't run multiple different chip projects at the same time with totally different architectures.

01:36:50

So you could do like a Cerebra-style wafer scale, you could do a Dojo-style huge package, you could do one without CUDA.

01:36:57

You have the resources and the engineering talent to do all of these in parallel.

01:37:01

So why put all the eggs in one basket given who knows where AI might go and architectures might go?

01:37:06

Oh, we could.

01:37:06

It's just that we don't have a better idea.

01:37:10

Yeah, yeah.

01:37:11

We could do all of those things.

01:37:13

It's just not better.

01:37:16

And we simulate it all.

01:37:17

They're in our simulator provably worse.

01:37:21

And so we wouldn't do it.

01:37:23

Yeah.

01:37:24

We're working on exactly the projects that we want to work on.

01:37:29

And

01:37:32

if the workload were to change dramatically,

01:37:36

and I don't mean the algorithms, I actually mean the workload,

01:37:41

and that depends on the shape of the market,

01:37:47

we may decide to add other accelerators.

01:37:49

Like for example, recently we added Groq and we're gonna fold Groq into our CUDA ecosystem.

01:37:57

And we're doing that now because the value of tokens

01:38:04

have gone up so high that you could have different pricing of tokens.

01:38:09

Back in the old days, in the, you know, just a couple years ago, tokens are either free or barely, you know, barely expensive, right?

01:38:15

And so, but now you can have different customers and those customers want different answers.

01:38:20

And so because the customers make so much money, like for example, our software engineers, If I can give them much more

01:38:29

responsive tokens so that they're even more productive than they are today, I would pay for it.

01:38:35

But that market has only recently emerged.

01:38:38

And so I think that we now have, we now have the ability to have the same model based on the response time, have different segments.

01:38:47

And that's the reason why we decided to expand the Pareto frontier and create a segment of inference that is faster response time even though it's lower throughput.

01:39:00

Until now, higher throughput is always better.

01:39:03

We think that there could be a world where there could be very high ASP tokens

01:39:09

and

01:39:11

even though the throughput is lower in the factory, the ASPs make up for it.

01:39:16

Yeah, that's the reason why we did it.

01:39:17

But otherwise, from an architecture perspective, I think NVIDIA's architectures— I would rather put— if I had more money, I'd put more behind the architecture.

01:39:28

I think this idea of extremely premium tokens and just the disaggregation of the inference market is very interesting.

01:39:34

The segmentation.

01:39:35

Yeah.

01:39:35

Yeah.

01:39:36

Yeah.

01:39:36

All right.

01:39:37

Final question.

01:39:39

Suppose the deep learning revolution didn't happen.

01:39:42

What would NVIDIA be doing?

01:39:45

Obviously games, but given— Accelerated computing.

01:39:50

Accelerated computing.

01:39:52

The same thing we've been doing all along.

01:39:55

The premise of our company is that Moore's Law is going to— more general-purpose computing is good for a lot of things, but for a lot of computation, it's not ideal.

01:40:05

And so we combined an architecture called a GPU, CUDA, to a CPU so that we can accelerate the workload of the CPU.

01:40:15

And so different kernels of code or algorithms could be offloaded onto our GPU.

01:40:20

And as a result, you speed up an application by, you know, 100x, 200x.

01:40:25

And where can you use that?

01:40:27

Well, obviously engineering and science and physics and, you know, so-and-so data processing.

01:40:33

Computer graphics, image generation.

01:40:35

I mean, all kinds of things.

01:40:37

Even if AI doesn't exist today, NVIDIA will be very, very large.

01:40:40

Yeah.

01:40:41

And so, so I think the, the reason for that is, is fairly fundamental, which is, which is the ability for general purpose computing to continue to scale has largely run its course.

01:40:52

And the only, the, the, not the only way, but the, the way to do that is through domain-specific acceleration.

01:40:58

And One of the domain that we started with was computer graphics, but there are many, many other domains.

01:41:05

I mean, there's, you know, all kinds of scientific particle physics and fluids and, you know, and so structured data processing, all kinds of different types of algorithms that benefit

01:41:17

from CUDA.

01:41:18

And so our mission was really to bring accelerated computing to the world advance the type of applications that general-purpose computing can't do and scale to the level of capability

01:41:32

that helps break through certain fields of science.

01:41:35

And so some of the early applications were molecular dynamics, seismic processing for energy discovery,

01:41:44

image processing, of course.

01:41:46

And so all of those kind of fields where general-purpose computing is just simply too inefficient to do so.

01:41:52

And so, yeah, if there was no AI, I would be very sad.

01:41:58

But because of the advances that we made in computing, we democratized deep learning.

01:42:06

We made it possible for any researcher, any scientist anywhere, any student to be able to access a PC or a GeForce add-in card and do amazing science.

01:42:19

And that fundamental promise hasn't changed, not even a little bit.

01:42:24

And so if you see GTC, if you watch GTC, there's the whole beginning part of it.

01:42:30

None of it's AI.

01:42:31

That whole part of it with computational lithography or our quantum chemistry work or, you know, all of that stuff, data processing work, all of that stuff is, is unrelated to AI and

01:42:47

it's still very important.

01:42:48

I mean, there's, you know, I know that AI is very interesting and quite exciting,

01:42:55

but

01:42:56

there's a lot of people doing a lot of very important work that's not AI related and tensors is not the only way that you compute with.

01:43:03

And we wanna help everybody.

01:43:06

Jensen, thank you so much.

01:43:08

You're welcome.

01:43:09

I enjoyed it.

01:43:09

Me too.

01:43:10

Sweet.