The post NVIDIA weighs Groq as Samsung 3nm yields in focus appeared on BitcoinEthereumNews.com. NVIDIA Groq inference chip shifts decode to LPUs to improve latencyThe post NVIDIA weighs Groq as Samsung 3nm yields in focus appeared on BitcoinEthereumNews.com. NVIDIA Groq inference chip shifts decode to LPUs to improve latency

NVIDIA weighs Groq as Samsung 3nm yields in focus

For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

NVIDIA Groq inference chip shifts decode to LPUs to improve latency

NVIDIA is previewing an inference chip that integrates Groq technology to offload token-by-token decode onto low-latency processing units while leaving training on GPUs. according to Tom’s Hardware, corporate statements describe integrating Groq’s processors into the NVIDIA AI Factory architecture to expand coverage for real-time inference.

This design aligns with an industry shift that separates the prefill phase from decode in large-model inference. as reported by VentureBeat, the split enables specialized hardware to target latency-critical decode while GPUs handle bulk prefill compute.

Why it matters: prefill vs decode, cost and energy

Placing prefill on GPUs and decode on LPUs is intended to cut user-perceived latency and smooth tail behavior under load. DA Davidson notes that Groq-style designs can face memory-capacity limits, so gains may vary across model sizes and concurrency profiles.

Analysts frame this as an inference-share play where latency and efficiency drive unit economics at scale. “NVIDIA can take even greater share of the inference market,” said CJ Muse, Senior Managing Director at Cantor Fitzgerald, emphasizing both offensive and defensive motives.

Inference costs increasingly dominate total AI spend as usage scales. WisdomAI reports that this moves buyer focus from peak FLOPS toward cost per token and energy per query, especially for high-volume consumer and enterprise assistants.

OpenAI is widely reported, but not officially confirmed in detail, as a potential first production-scale user of NVIDIA’s Groq-based inference chip. According to AIwire, this would reflect a hedging strategy to secure lower-latency, lower-cost inference capacity.

Production risk may hinge on Samsung’s leading-edge process readiness if it handles first foundry builds. PhoneArena reports persistent low yields in Samsung’s 3 nm and 2 nm nodes relative to TSMC, a factor that could influence client confidence and delivery timing.

Supply chain and inference unit economics outlook

Samsung Foundry production readiness and client confidence versus TSMC

Client caution remains elevated at the leading edge. As reported by EE Times, some fabless customers are favoring TSMC due to concerns about Samsung’s yields and delivery reliability.

Samsung has responded with leadership moves focused on defect analysis and metrology to improve 3 nm and 2 nm yields. Biz Chosun reports these changes, while En. Sedaily adds that Tesla’s AI5 volume may be split between Samsung and TSMC, signaling conditional confidence if yields stabilize.

Latency, cost per token, and energy per query at scale

Separating prefill from decode provides a placement framework: keep bandwidth-heavy, sequence-initialization work on GPUs, and move token-generation loops to LPUs where serialization dominates. Bernstein has highlighted this bifurcation as the core architectural trend in inference.

The expected outcome is lower tail latency and improved energy-per-query, with cost gains accruing where decode dominates runtime. WisdomAI notes that as inference volumes outgrow training, these unit economics become decisive for platform competitiveness.

FAQ about NVIDIA Groq inference chip

Is OpenAI confirmed as the first customer for NVIDIA’s Groq-based inference chip and what advantages would it gain?

OpenAI is not officially confirmed. Reports indicate it could gain lower latency and better unit economics if decode shifts to LPUs.

How do prefill vs decode stages map to GPUs vs LPUs, and which models or workloads benefit most?

GPUs handle prefill; LPUs target decode. Latency-sensitive assistants and streaming token generation benefit most, subject to memory and model-size constraints.

Source: https://coincu.com/news/nvidia-weighs-groq-as-samsung-3nm-yields-in-focus/

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Bitcoin ETFs Outpace Ethereum With $2.9B Weekly Surge

Bitcoin ETFs Outpace Ethereum With $2.9B Weekly Surge

The surge follows a difficult August, when investors pulled out more than $750 million while rotating capital into Ethereum-focused funds. […] The post Bitcoin ETFs Outpace Ethereum With $2.9B Weekly Surge appeared first on Coindoo.
Share
Coindoo2025/09/18 01:15
Tactical haven support but structural headwinds – BBH

Tactical haven support but structural headwinds – BBH

The post Tactical haven support but structural headwinds – BBH appeared on BitcoinEthereumNews.com. Brown Brothers Harriman’s (BBH) Elias Haddad notes the Dollar
Share
BitcoinEthereumNews2026/03/16 15:44
CME Group to launch options on XRP and SOL futures

CME Group to launch options on XRP and SOL futures

The post CME Group to launch options on XRP and SOL futures appeared on BitcoinEthereumNews.com. CME Group will offer options based on the derivative markets on Solana (SOL) and XRP. The new markets will open on October 13, after regulatory approval.  CME Group will expand its crypto products with options on the futures markets of Solana (SOL) and XRP. The futures market will start on October 13, after regulatory review and approval.  The options will allow the trading of MicroSol, XRP, and MicroXRP futures, with expiry dates available every business day, monthly, and quarterly. The new products will be added to the existing BTC and ETH options markets. ‘The launch of these options contracts builds on the significant growth and increasing liquidity we have seen across our suite of Solana and XRP futures,’ said Giovanni Vicioso, CME Group Global Head of Cryptocurrency Products. The options contracts will have two main sizes, tracking the futures contracts. The new market will be suitable for sophisticated institutional traders, as well as active individual traders. The addition of options markets singles out XRP and SOL as liquid enough to offer the potential to bet on a market direction.  The options on futures arrive a few months after the launch of SOL futures. Both SOL and XRP had peak volumes in August, though XRP activity has slowed down in September. XRP and SOL options to tap both institutions and active traders Crypto options are one of the indicators of market attitudes, with XRP and SOL receiving a new way to gauge sentiment. The contracts will be supported by the Cumberland team.  ‘As one of the biggest liquidity providers in the ecosystem, the Cumberland team is excited to support CME Group’s continued expansion of crypto offerings,’ said Roman Makarov, Head of Cumberland Options Trading at DRW. ‘The launch of options on Solana and XRP futures is the latest example of the…
Share
BitcoinEthereumNews2025/09/18 00:56