close
close
migores1

Nvidia could lose this part of the AI ​​market

In AI hardware circles, almost everyone is talking about inference.

Colette Kress, Nvidia’s chief financial officer, said on the company’s earnings call Wednesday that inference accounted for about 40 percent of Nvidia’s $26.3 billion in data center revenue in the second quarter. AWS CEO Matt Garman recently told the No Priors podcast that inference is probably half the work done on AI computing servers today. And that share is likely to grow, attracting competitors eager to crush Nvidia’s crown.

It turns out that many of the companies looking to take market share from Nvidia start with inferences.

A founding team of Google alumni founded Groq, which focuses on inference hardware, and raised $640 million at a $2.8 billion valuation in August.

In December 2023, the Positron AI emerged from stealth with an inference The chip it claims can perform the same calculations as Nvidia’s H100, but five times cheaper. Amazon is developing both training and inference chips — aptly named Trainium and Inferentia, respectively.

“I think the more diversity there is, the better off we are,” Garman said on the same podcast.

And Cerebras, the California company famous for its oversized AI training chips, announced last week that it has developed an equally large inference chip that is the fastest on the market, according to CEO Andrew Feldman.

All inference chips are not created equal

Chips designed for AI workloads must be optimized for training or inference.

Training is the first phase of developing an AI tool – when you feed labeled and annotated data into a model so that it learns to produce accurate and useful results. Inference is the act of producing those results once the model is trained.

Training chips tend to optimize for sheer computing power. Inference chips require less computing muscle, in fact some inference can be done on traditional processors. Chipmakers for this task are more concerned with latency, as the difference between an addictive AI tool and an annoying one often comes down to speed. That’s what Cerebras CEO Andrew Feldman is banking on.

Cerebras’ chip has 7,000 times more memory bandwidth than Nvidia’s H100, according to the company. This is what enables what Feldman calls “breakneck speed.”

The company, which has begun the process of launching an IPO, is also launching Inference as a multi-tiered service, including a free tier.

“Inference is a memory bandwidth issue,” Feldman told Business Insider.

To monetize AI, scale inference workloads

Choosing to optimize a chip design for training or inference is not only a technical decision, it is also a market decision. Most companies making AI tools will need both at some point, but most of their needs will likely be in one area or the other, depending on where the company is in its build cycle.

The massive training workload could be considered the research and development phase of AI. When a company moves primarily to inference, that means whatever product it builds works for end customers—at least in theory.

Inference is expected to represent the vast majority of computing tasks as more AI projects and startups mature. In fact, according to AWS’s Garman, that’s what needs to happen to realize the yet-to-be-realized return on hundreds of billions in AI infrastructure investments.

“Inference workloads have to dominate, or all this investment in these big models will not pay off,” Garman told No Priors.

However, the simple binary of training vs. inference for chip designers may not last forever.

“Some of the clusters that are in our data centers, customers are using them for both,” said Raul Martynek, CEO of data center owner Databank.

Nvidia’s recent acquisition of Run.ai may support Martynek’s prediction that the wall between inference and training may soon fall.

In April, Nvidia agreed to acquire Israeli firm Run:ai, but the deal has not yet been completed and is under review by the Department of Justice, according to Politico. Run:ai technology makes GPUs work more efficiently, enabling more work on fewer chips.

“I think for most businesses, they will merge. You’re going to have a group that trains and makes inferences,” Martynek said.

Nvidia declined to comment on this report.

Related Articles

Back to top button