DeepSeek cuts V4-Pro prices by 75% (5 minute read)

Chinese AI company DeepSeek slashes prices on its V4-Pro model by 75% and cuts cache hit costs by 90%, undercutting major US AI providers just days after White House accusations of industrial-scale model distillation.

What: DeepSeek is offering a 75% promotional discount on its V4-Pro model until May 5, 2026, bringing input token costs down to approximately $0.036 per million tokens, while its full-price offerings already undercut GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro. The company also reduced input cache hit pricing by 90% across all APIs, targeting enterprise developers with repeated requests.

Why it matters: This represents an aggressive competitive move that combines technical capability with pricing strategy to challenge US AI dominance. The timing is significant—announced three days after the White House accused Chinese firms of distilling American AI models and the same week OpenAI shipped GPT-5.5. DeepSeek's V4-Pro runs on domestic Chinese chips (Huawei Ascend 950, Cambricon) rather than Nvidia GPUs, potentially reshaping the AI hardware landscape while lowering barriers for developers to switch from US providers.

Takeaway: Developers can evaluate DeepSeek's V4-Pro for cost-sensitive applications, as it integrates natively with Claude Code, OpenClaw, and OpenCode, and offers a 1 million token context window for handling large codebases or documents.

Deep dive

V4-Pro promotional pricing drops input tokens to ~$0.036 per million (from $0.145), while output tokens remain at $3.48 per million, with the discount running until May 5, 2026
Even at full price, DeepSeek already undercuts all major US competitors on per-token basis, making the 75% discount a dramatic escalation of the pricing war that began with DeepSeek R1 in January 2025
The 90% cache-hit price reduction (to one-tenth previous levels) specifically targets enterprise and agentic applications that send similar or repeated requests, a dominant pattern in production AI deployments
V4-Pro is a mixture-of-experts model with 1.6 trillion total parameters and 49 billion active parameters per task, making it the largest open-weight model currently available, outstripping competitors like Moonshot AI's Kimi K2.6 and MiniMax's M1
The model offers a 1 million token context window and integrates natively with Western agentic coding frameworks (Claude Code, OpenClaw, OpenCode), lowering switching friction for developers whose primary constraint is cost
V4-Pro is trained on and optimized for Huawei Ascend 950 chips and Cambricon hardware rather than Nvidia GPUs, representing a strategic shift away from US chip dependency that could "accelerate adoption domestically and contribute to faster global AI development"
The announcement came three days after the White House accused foreign entities (primarily Chinese) of conducting "industrial-scale" campaigns to distill frontier AI models from US companies, though DeepSeek was not directly named
DeepSeek has previously been accused by both Anthropic and OpenAI of distilling their models, allegations the company has not addressed directly but instead responded to by cutting prices further
The timing positions DeepSeek as responding to geopolitical pressure not with denials but with competitive action, making a "political statement about where it believes the AI race will ultimately be decided"
Analysts describe V4's Hybrid Attention Architecture and ultra-long context support as a "genuine inflection point" for long-context AI processing moving from research labs into mainstream commercial applications
DeepSeek's strategy combines three elements to lower switching barriers: open-source availability removes access barriers, aggressive API pricing removes cost barriers, and the 1M token context window makes the model viable for enterprise use cases
The smaller V4-Flash variant costs $0.14 per million input tokens and $0.28 per million output tokens at full price, already undercutting GPT-5.4 Nano, Gemini 3.1 Flash, GPT-5.4 Mini, and Claude Haiku 4.5

Decoder

Mixture-of-experts: An AI architecture that uses multiple specialized sub-models (experts) and activates only a subset for each task, allowing massive total parameter counts while keeping inference costs manageable
Cache hits: When an API request includes content similar to previous requests, allowing the system to reuse cached computations rather than reprocessing from scratch, significantly reducing costs for repeated queries
Distillation: A process where a smaller AI model is trained using the outputs of a larger model to acquire similar capabilities at lower cost, which US officials characterize as intellectual property theft when applied to proprietary models
Active parameters: The subset of a model's total parameters actually used for a given task (49B out of 1.6T for V4-Pro), as opposed to total parameters, which indicates actual computational cost per inference
Context window: The maximum amount of text (measured in tokens) a model can process in a single request, with 1 million tokens enabling handling of large codebases or lengthy documents without splitting into multiple API calls

Original article

DeepSeek cuts V4-Pro prices by 75% and slashes cache costs across its entire API to a tenth

The promotional discount runs until 5 May 2026. Even at full price, V4-Pro already undercuts GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on per-token costs.

The move is a direct challenge to the pricing strategy of US AI providers at a moment when the Trump administration has accused Chinese firms of distilling American AI models on an industrial scale.

DeepSeek announced on Monday that it is offering a 75% discount on its newly released DeepSeek-V4-Pro model to developers until 5 May 2026, and is simultaneously cutting the price of input cache hits across its entire API suite to one-tenth of previous levels, effective immediately.

The discount was announced in a post on X. The move intensifies a pricing competition with US AI providers that DeepSeek first triggered in January 2025 with its R1 model, which claimed frontier-level reasoning performance at a fraction of the cost of comparable OpenAI products.

The pricing context is important. At full price, before any promotional discount, DeepSeek-V4-Pro already costs $0.145 per million input tokens and $3.48 per million output tokens, undercutting OpenAI's GPT-5.5, Google's Gemini 3.1 Pro, and Anthropic's Claude Opus 4.7 on per-token basis.

The 75% promotional discount on input tokens reduces the V4-Pro input price to approximately $0.036 per million tokens. The Flash variant, V4's smaller, faster model, costs $0.14 per million input tokens and $0.28 per million output tokens at full price, already undercutting GPT-5.4 Nano, Gemini 3.1 Flash, GPT-5.4 Mini, and Claude Haiku 4.5.

The cache-hit price cut to one-tenth of prior levels specifically targets frequent users and enterprise developers who send similar or repeated requests, which is the dominant pattern in production agentic applications.

The strategic logic is explicit and well-documented in how DeepSeek has operated since R1. Open-source availability removes the model access barrier entirely; aggressive API pricing removes the cost barrier for production deployment; a 1 million-token context window makes the model viable for enterprise use cases involving large codebases or long documents that would otherwise require multiple API calls.

V4-Pro also integrates natively with Claude Code, OpenClaw, and OpenCode, the dominant agentic coding frameworks used by developers already in the Western AI ecosystem.

The combined effect is to lower the friction of switching from an OpenAI, Anthropic, or Google API to a DeepSeek API for any developer whose primary constraint is cost. Akshar Keremane, co-founder of Bangalore-based AI startup O-Health, described the combination of pricing, open-source availability, and the 1 million-token context window as lowering barriers "for developers, startups and small enterprises."

The V4-Pro model, launched last Friday, is a mixture-of-experts model with 1.6 trillion total parameters and 49 billion active parameters per task, the largest open-weight model currently available, outstripping Moonshot AI's Kimi K2.6 and MiniMax's M1.

Its Hybrid Attention Architecture is designed to maintain coherence across long contexts. It is trained on and optimised for Huawei's Ascend 950 chips and Cambricon hardware rather than Nvidia GPUs.

Zhang Yi, founder of tech research firm iiMedia, told AFP that V4's architecture represents a "genuine inflection point" for long-context AI processing, predicting that ultra-long context support will move beyond research labs into mainstream commercial applications.

Wei Sun, principal analyst at Counterpoint Research, noted that V4 running on domestic chips "allows AI systems to be built and deployed without relying solely on Nvidia" and could "accelerating adoption domestically and contributing to faster global AI development overall."

The pricing move arrives in a charged geopolitical context. On Thursday last week, White House Director of Science and Technology Policy Michael Kratsios accused foreign entities, primarily based in China, of conducting "industrial-scale" campaigns to distil frontier AI models from US companies, a process in which a smaller model is trained using the outputs of a larger model to acquire similar capabilities at lower cost.

Kratsios's memo did not directly name DeepSeek, but DeepSeek has previously been accused by both Anthropic and OpenAI of distilling their models. CNN reported it has reached out to DeepSeek for comment on those accusations.

The US government's distillation crackdown, alongside China's parallel move to restrict US investment in its AI firms, was announced the day before V4's launch.

DeepSeek's response, three days later, is to cut prices rather than respond to the accusations directly: a competitive move that is also a political statement about where it believes the AI race will ultimately be decided.

OpenAI has cut API prices multiple times; Anthropic has introduced tiered pricing for different Claude model sizes; Google has progressively reduced Gemini API costs.

DeepSeek's Monday announcement is the latest move in that ongoing compression, but it is distinctive in its scale, a 75% promotional discount on top of a model that already undercuts the US frontier at standard pricing, and in its timing, which positions the Hangzhou startup as the low-cost challenger in the same week that OpenAI shipped GPT-5.5 and the US government moved to restrict Chinese model distillation.