Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving (2 minute read)

Alibaba's Qwen team released a preview of their next flagship language model with significant improvements in agentic coding tasks, world knowledge, and instruction following.

What: Qwen3.6-Max-Preview is a preview release of Alibaba's next proprietary language model, available through Alibaba Cloud Model Studio, showing benchmark improvements of up to 9.9 points in agentic coding tasks and notable gains in world knowledge and instruction following compared to Qwen3.6-Plus.

Why it matters: The model's emphasis on agentic capabilities and the preserve_thinking feature reflects the industry shift toward models that can handle complex multi-step workflows and maintain reasoning context across interactions, rather than just single-turn conversations.

Takeaway: Developers can test the model immediately via Qwen Studio or integrate it using Alibaba Cloud Model Studio's OpenAI-compatible API with the qwen3.6-max-preview endpoint.

Deep dive

Achieves top scores on six major coding benchmarks including SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, and SciCode
Shows double-digit improvements in agentic coding benchmarks: SkillsBench +9.9, SciCode +6.3, NL2Repo +5.0, and Terminal-Bench 2.0 +3.8 compared to predecessor
World knowledge improved significantly with SuperGPQA +2.3 and QwenChineseBench +5.3 gains
Instruction following enhanced with ToolcallFormatIFBench +2.8 improvement
Supports preserve_thinking feature that maintains reasoning content across conversation turns, specifically designed for agentic workflows
Available through OpenAI-compatible API endpoints with regional options in Beijing, Singapore, and US Virginia
Also offers Anthropic-compatible API interface for developers already using Claude's patterns
Still under active development with further improvements expected in subsequent versions
Provides enable_thinking parameter to expose the model's internal reasoning process during streaming responses

Decoder

Agentic coding: AI models performing multi-step programming tasks like repository navigation, environment interaction, and tool use rather than just generating code snippets
SWE-bench Pro: Benchmark evaluating AI models on real-world software engineering tasks from GitHub issues
preserve_thinking: Feature that retains the model's reasoning process across multiple conversation turns to maintain context for complex tasks
Terminal-Bench: Benchmark measuring a model's ability to interact with command-line interfaces and execute system commands

Original article

Qwen3.6-Max-Preview brings stronger world knowledge and instruction following along with significant agentic coding improvements across a wide range of benchmarks. The model is still under active development as researchers continue to iterate on it. Users can chat with the model interactively in Qwen Studio or call via API on Alibaba Cloud Model Studio API (coming soon).