Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving (2 minute read)
Alibaba's Qwen team released a preview of their next flagship language model with significant improvements in agentic coding tasks, world knowledge, and instruction following.
What: Qwen3.6-Max-Preview is a preview release of Alibaba's next proprietary language model, available through Alibaba Cloud Model Studio, showing benchmark improvements of up to 9.9 points in agentic coding tasks and notable gains in world knowledge and instruction following compared to Qwen3.6-Plus.
Why it matters: The model's emphasis on agentic capabilities and the preserve_thinking feature reflects the industry shift toward models that can handle complex multi-step workflows and maintain reasoning context across interactions, rather than just single-turn conversations.
Takeaway: Developers can test the model immediately via Qwen Studio or integrate it using Alibaba Cloud Model Studio's OpenAI-compatible API with the qwen3.6-max-preview endpoint.
Deep dive
- Achieves top scores on six major coding benchmarks including SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, and SciCode
- Shows double-digit improvements in agentic coding benchmarks: SkillsBench +9.9, SciCode +6.3, NL2Repo +5.0, and Terminal-Bench 2.0 +3.8 compared to predecessor
- World knowledge improved significantly with SuperGPQA +2.3 and QwenChineseBench +5.3 gains
- Instruction following enhanced with ToolcallFormatIFBench +2.8 improvement
- Supports preserve_thinking feature that maintains reasoning content across conversation turns, specifically designed for agentic workflows
- Available through OpenAI-compatible API endpoints with regional options in Beijing, Singapore, and US Virginia
- Also offers Anthropic-compatible API interface for developers already using Claude's patterns
- Still under active development with further improvements expected in subsequent versions
- Provides enable_thinking parameter to expose the model's internal reasoning process during streaming responses
Decoder
- Agentic coding: AI models performing multi-step programming tasks like repository navigation, environment interaction, and tool use rather than just generating code snippets
- SWE-bench Pro: Benchmark evaluating AI models on real-world software engineering tasks from GitHub issues
- preserve_thinking: Feature that retains the model's reasoning process across multiple conversation turns to maintain context for complex tasks
- Terminal-Bench: Benchmark measuring a model's ability to interact with command-line interfaces and execute system commands
Original article
Qwen3.6-Max-Preview brings stronger world knowledge and instruction following along with significant agentic coding improvements across a wide range of benchmarks. The model is still under active development as researchers continue to iterate on it. Users can chat with the model interactively in Qwen Studio or call via API on Alibaba Cloud Model Studio API (coming soon).