Devoured - May 25, 2026
DeepSeek has permanently cut the price of its V4 Pro LLM by 75%, making it significantly cheaper than competitors, while Anthropic is preparing to release Claude Mythos 1 for enhanced cybersecurity and code analysis capabilities.
deepseek v4 pro 75 percent price cut permanent
DeepSeek has permanently slashed prices for its V4 Pro LLM by 75%, making it significantly cheaper than models from OpenAI, Anthropic, and Google.
Deep dive
- DeepSeek V4 Pro's permanent price cut is a 75% reduction.
- New output token pricing is $0.87 per million, down from $3.48.
- This is significantly cheaper than GPT-5 ($10/M output), Claude Opus 4.7 ($25/M output), and Gemini 3.5 Flash ($0.60/M output).
- The model supports a one-million-token context window, ideal for long document processing.
- The article highlights an unresolved accusation from Anthropic that DeepSeek used “distillation attacks” (improperly trained on Claude's responses).
- This move pressures Anthropic's revenue-per-token economics and valuation trajectory.
- It intensifies the existing trend of LLM price commoditization seen with Google Gemini and OpenAI's shift to consumer features.
- Enterprise buyers face a dilemma: significant cost savings versus geopolitical risks and IP provenance concerns associated with a Chinese provider.
Decoder
- Distillation attack: A method where one AI model is trained to mimic the outputs of another, often larger or more capable, model, potentially leveraging its intellectual property.
Original article
TL;DR
DeepSeek permanently cut V4 Pro prices by 75%, to $0.87 per million output tokens. It undercuts GPT-5, Gemini, and Claude.
DeepSeek has made permanent the 75% price discount on its flagship V4 Pro model. The promotion was originally scheduled to expire on 31 May. The Chinese AI startup’s pricing now ranges from $0.003625 to $0.87 per million tokens, down from $0.0145 to $3.48.
The price points are striking in context. OpenAI’s GPT-5 charges $2.50 per million input tokens and $10 per million output tokens. Anthropic’s Claude Opus 4.7 is priced at $5 input and $25 output.
Google’s Gemini 3.5 Flash, its cost-optimised model, charges $0.15 input and $0.60 output per million tokens. DeepSeek V4 Pro’s new permanent pricing sits below all of them. The gap is widest against the frontier reasoning models that enterprise customers rely on for demanding workloads.
The decision to lock in the discount one month after launching the V4 models suggests DeepSeek is prioritising market share over per-unit revenue. The company described V4 as welcoming the “era of cost-effective 1M context length.” It is positioning its models as the default for applications that process large documents, codebases, or conversational histories where token costs compound fast.
For enterprise accounts consuming millions of tokens daily, the savings are material. Salesforce projects $300 million in Anthropic token spending this year. At DeepSeek’s new pricing, an equivalent volume would cost a fraction of that figure.
The question for enterprise buyers is whether DeepSeek’s model quality, reliability, and compliance posture justify the switch. The price advantage may be offset by the geopolitical and technical risks of routing sensitive workloads through a Chinese AI provider. That calculus varies by industry and by the sensitivity of the data involved.
The competitive dynamics are complicated by Anthropic’s public accusation that DeepSeek has engaged in “distillation attacks.” The allegation is that DeepSeek improperly trained on Claude’s responses to improve its own models. DeepSeek has not publicly addressed the accusation in detail.
If substantiated, it would mean that some of DeepSeek’s capability advantage was built on Anthropic’s research investment. The price differential would then reflect intellectual property arbitrage rather than engineering efficiency. The accusation remains unresolved.
Anthropic’s annualised revenue surged from $9 billion to $30 billion between the end of 2025 and early April 2026. That growth was driven largely by enterprise adoption of Claude Code. DeepSeek’s pricing pressure threatens the revenue-per-token economics that support Anthropic’s valuation trajectory.
If enterprise customers begin routing lower-complexity tasks to DeepSeek while reserving Claude for high-stakes reasoning, Anthropic’s token volume could hold while revenue per token declines. The broader AI pricing landscape has been moving toward commoditisation throughout 2026. Google has repeatedly cut Gemini prices to compete with open-weight models.
OpenAI’s pivot toward consumer platform features, including personal finance tools and advertising, reflects a recognition that API token revenue alone may not sustain its $852 billion valuation. DeepSeek’s permanent price cut accelerates a trend that was already compressing margins across the industry. The era of high-margin AI tokens may be ending faster than anyone expected.
DeepSeek V4 Pro supports a one-million-token context window at the new pricing. That makes it competitive for document analysis, legal review, and codebase comprehension. These are the long-context applications where input cost is the binding constraint on adoption.
The combination of frontier-adjacent capability and radically lower pricing creates a genuine dilemma for CTOs. The cheapest option is also the one with the most geopolitical complexity. It has the least transparency about training data provenance and an unresolved IP accusation from one of its most capable competitors.
DeepSeek’s strategy appears to be that price will win. Enough volume will flow to the cheapest capable model regardless of origin. The geopolitical concerns that constrain adoption in government and regulated industries will not prevent adoption in the broader market.
Whether that bet is correct depends on whether Western AI companies can close the price gap before DeepSeek closes the capability gap. The alternative is that the market bifurcates into a Western tier and a Chinese tier with fundamentally different economics. DeepSeek just made sure the gap between them got wider.
Get the TNW newsletter
Get the most important tech news in your inbox each week.
The 2026-07-28 MCP Specification Release Candidate
The Model Context Protocol (MCP) is releasing a major specification update on July 28, 2026, introducing a stateless core, an extensions framework, and hardened authorization.
Deep dive
- The MCP 2026-07-28 release candidate is the largest revision since the protocol's launch.
- The core protocol is now stateless, eliminating the need for session IDs and handshakes, allowing scaling with plain round-robin load balancers.
- Protocol version, client info, and capabilities now travel in
_metaon every request. - A formal Extensions framework is introduced, allowing capabilities like MCP Apps (server-rendered UIs in iframes) and the Tasks extension (for long-running operations) to evolve independently.
- Authorization is hardened, aligning more closely with OAuth 2.0 and OpenID Connect best practices, including validation of
issparameter and client registration improvements. - Three core features (Roots, Sampling, Logging) are deprecated, with replacements suggested.
- Tool
inputSchemaandoutputSchemanow support full JSON Schema 2020-12, including composition and references. - The error code for a missing resource changes from -32002 to the JSON-RPC standard -32602.
- A new feature lifecycle policy ensures at least twelve months between deprecation and removal for future changes.
- The final specification is scheduled for publication on July 28, 2026.
Decoder
- Model Context Protocol (MCP): A protocol designed for AI agents and models to interact with tools and services, enabling structured communication and task execution.
- Stateless protocol: A communication protocol where each request from client to server contains all the information needed to understand the request, and the server does not store any session information about the client.
- JSON Schema 2020-12: A standard for describing the structure of JSON data, allowing for validation and documentation of JSON objects.
Original article
The release candidate for MCP 2026-07-28 is now available. It is the largest revision of the protocol since launch and delivers on the 2026 roadmap:
- a stateless core that scales on ordinary HTTP infrastructure
- extensions including server-rendered UIs through MCP Apps and long-running work through the Tasks extension
- authorization that aligns more closely with OAuth and OpenID Connect deployments
- a formal deprecation policy so the protocol can evolve without breaking what you’ve built,
and many other changes.
The practical effect on a production deployment is immediate. A remote MCP server that previously needed sticky sessions, a shared session store, and deep packet inspection at the gateway can now run behind a plain round-robin load balancer, route traffic on an Mcp-Method header, and let clients cache tools/list responses for as long as the server’s ttlMs permits.
The release candidate is available today and the final specification ships on July 28, 2026. This release contains breaking changes; see Release Timeline and Validation for the details.
A Stateless Protocol
The headline change is that MCP is now stateless at the protocol layer. Six Specification Enhancement Proposals (SEPs) work together to get there, completing the plan we laid out in The Future of MCP Transports in December.
Before and after
In 2025-11-25, calling a tool over Streamable HTTP means establishing a session first:
POST /mcp HTTP/1.1
Content-Type: application/json
{"jsonrpc":"2.0","id":1,"method":"initialize",
"params":{"protocolVersion":"2025-11-25","capabilities":{},
"clientInfo":{"name":"my-app","version":"1.0"}}}
The server responds with an Mcp-Session-Id that every subsequent request must carry, pinning the client to whichever instance issued it:
POST /mcp HTTP/1.1
Mcp-Session-Id: 1868a90c-3a3f-4f5b
Content-Type: application/json
{"jsonrpc":"2.0","id":2,"method":"tools/call",
"params":{"name":"search","arguments":{"q":"otters"}}}
In 2026-07-28, the same call is a single self-contained request that any server instance can handle:
POST /mcp HTTP/1.1
MCP-Protocol-Version: 2026-07-28
Mcp-Method: tools/call
Mcp-Name: search
Content-Type: application/json
{"jsonrpc":"2.0","id":1,"method":"tools/call",
"params":{"name":"search","arguments":{"q":"otters"},
"_meta":{"io.modelcontextprotocol/clientInfo":{"name":"my-app","version":"1.0"}}}}
The handshake and session are gone
The initialize/initialized handshake is removed (SEP-2575). The protocol version, client info, and client capabilities that used to be exchanged once at connection time now travel in _meta on every request, and a new server/discover method lets clients fetch server capabilities when they need them up front.
The Mcp-Session-Id header and the protocol-level session that came with it are also removed (SEP-2567). With both gone, any MCP request can land on any server instance, and the sticky routing and shared session stores that horizontal deployments needed before are no longer required at the protocol layer.
Stateless protocol, stateful applications
Removing the protocol-level session does not mean your application has to be stateless. Servers that need to carry state across calls can do what HTTP APIs have always done: mint an explicit handle (a basket_id, a browser_id) from a tool and have the model pass it back as an ordinary argument on later calls.
In practice, we’ve found this pattern (the model threading an identifier from one tool call to the next) to be more than just a workable substitute for session state. It’s often a more powerful one. The model can compose handles across tools, reason about them, and hand them off between steps in ways that externally managed session state, hidden in transport metadata, never really allowed.
The protocol no longer manages that state for you, but it doesn’t prevent you from managing it yourself. The explicit-handle pattern simply makes the state visible to the model rather than hidden away.
Server-to-client requests, restructured
A stateless protocol still needs a way for servers to ask the client for something mid-call, such as an elicitation prompt. Two SEPs rebuild that flow so it works without a persistent connection.
Server-initiated requests may now only be issued while the server is actively processing a client request (SEP-2260). Earlier spec versions recommended this; it’s now required. A user is never prompted out of nowhere, and every elicitation traces back to something they (or their agent) started.
Multi Round-Trip Requests (SEP-2322) change how those prompts are delivered. Instead of holding a Server-Sent Events (SSE) stream open, the server returns an InputRequiredResult:
{
"resultType": "inputRequired",
"inputRequests": {
"confirm": {
"type": "elicitation",
"message": "Delete 3 files?",
"schema": { "type": "boolean" }
}
},
"requestState": "eyJzdGVwIjoxLCJmaWxlcyI6WyJhIiwiYiIsImMiXX0="
}
The client gathers the answers and re-issues the original call with inputResponses and the echoed requestState. Any server instance can pick that retry up because everything it needs is in the payload.
Routable, cacheable, traceable
Three smaller changes make the resulting traffic easier to operate.
The Streamable HTTP transport now requires Mcp-Method and Mcp-Name headers (SEP-2243) so load balancers, gateways, and rate-limiters can route on the operation without inspecting the body. Servers reject requests where the headers and body disagree.
List and resource read results now carry ttlMs and cacheScope (SEP-2549), modeled on HTTP Cache-Control. Clients know exactly how long a tools/list response is fresh and whether it’s safe to share across users, and a long-lived SSE stream is no longer the only way to learn that a list changed.
W3C Trace Context propagation in _meta is now documented (SEP-414), locking down the traceparent, tracestate, and baggage key names so distributed traces correlate across SDKs and gateways. Several SDKs and tools were already doing this; with the key names fixed in the spec, a trace that starts in a host application can follow a tool call through the client SDK, the MCP server, and whatever the server calls downstream, and show up as a single span tree in an OpenTelemetry-compatible backend.
Extensions Become First-Class
Extensions existed in the 2025-11-25 release but had no formal process behind them. SEP-2133 adds that: extensions are identified by reverse-DNS IDs, negotiated through an extensions map on client and server capabilities, live in their own ext-* repositories with delegated maintainers, and version independently of the specification. A new Extensions Track in the SEP process gives them a path from experimental to official.
This release includes two official extensions.
MCP Apps: server-rendered user interfaces
MCP Apps (SEP-1865) lets servers ship interactive HTML interfaces that hosts render in a sandboxed iframe. Tools declare their UI templates ahead of time so hosts can prefetch, cache, and security-review them before anything runs. The rendered UI talks back to the host over the same JSON-RPC base protocol used everywhere else in MCP, so every UI-initiated action goes through the same audit and consent path as a direct tool call.
Tasks graduates to an extension
Tasks shipped as an experimental core feature in 2025-11-25. Production use surfaced enough redesign that the right home for it is an extension rather than the specification.
The Tasks extension reshapes the lifecycle around the stateless model: a server can answer tools/call with a task handle, and the client drives it with tasks/get, tasks/update, and tasks/cancel. Task creation is server-directed: the client advertises the extension and the server decides when a call should run as a task. tasks/list is removed because it can’t be scoped safely without sessions.
Anyone who shipped against the 2025-11-25 experimental Tasks API will need to migrate to the new lifecycle.
Authorization Hardening
Six SEPs harden the authorization specification to align more closely with how OAuth 2.0 and OpenID Connect are deployed in practice.
Clients must now validate the iss parameter on authorization responses per RFC 9207 (SEP-2468). This is a low-cost mitigation for a class of mix-up attack that is more prevalent in MCP’s single-client, many-server deployment pattern. In a future version, clients will be expected to reject responses that omit iss, so authorization servers should begin supplying it now if they don’t already.
Clients now declare their OpenID Connect application_type during Dynamic Client Registration (SEP-837), avoiding the common case where an authorization server defaults a desktop or CLI client to "web" and rejects its localhost redirect URI. Clients bind registered credentials to the issuing authorization server’s issuer and re-register when a resource migrates between authorization servers (SEP-2352). The spec also documents how to request refresh tokens from OpenID Connect-style authorization servers (SEP-2207), and clarifies scope accumulation during step-up (SEP-2350) and the .well-known discovery suffix (SEP-2351).
Roots, Sampling, and Logging Are Deprecated
Three core features are deprecated under the new feature lifecycle policy (SEP-2577):
| Feature | Replacement |
|---|---|
| Roots | Tool parameters, resource URIs, or server configuration |
| Sampling | Direct integration with LLM provider APIs |
| Logging | stderr for stdio transports; OpenTelemetry for structured observability |
These are annotation-only deprecations. The methods, types, and capability flags continue to work in this release and in every specification version published within a year of it, and removing any of them will require a separate SEP under the lifecycle policy.
Full JSON Schema 2020-12 for Tools
Tool inputSchema and outputSchema are lifted to full JSON Schema 2020-12 (SEP-2106). Input schemas keep the type: "object" root constraint but now allow composition (oneOf, anyOf, allOf), conditionals, and references ($ref, $defs). Output schemas are unrestricted, and structuredContent can now be any JSON value rather than only an object. Implementations must not auto-dereference external $ref URIs and should bound schema depth and validation time.
Separately, the error code for a missing resource changes from the MCP-custom -32002 to the JSON-RPC standard -32602 Invalid Params (SEP-2164). If your client matches on the literal -32002 value, update it.
How the Protocol Evolves From Here
This release contains breaking changes. We don’t intend for that to be the norm.
Three governance SEPs in this release are designed so that future revisions can evolve the protocol without breaking core capabilities. The feature lifecycle policy gives every feature an Active, Deprecated, and Removed lifecycle with at least twelve months between deprecation and the earliest possible removal. The Extensions framework means new capabilities can ship as opt-in extensions and stabilize there before, if ever, moving into the specification. And a Standards Track SEP can no longer reach Final status until a matching scenario lands in the conformance suite (SEP-2484), which is the same suite the new SDK tier system scores official SDKs against.
The stateless rework in this release is the kind of foundational change that needed a clean break. With it landed, and with deprecation windows and extensions as the standard tools going forward, our expectation is that implementers targeting 2026-07-28 will be able to adopt future revisions without rewriting their transport or lifecycle code.
Release Timeline and Validation
The release candidate is locked as of May 21, 2026. The final specification will be published on July 28, 2026. The ten-week window is for SDK maintainers and client implementers to validate the changes against real workloads; under the SDK tier system, Tier 1 SDKs are expected to ship support within this window.
The full release candidate is in the draft specification, and the changelog will list every change against 2025-11-25.
If you find a problem, open an issue in the specification repository. For implementation questions, the relevant Working Group channel in the contributor Discord is the fastest path to an answer.
Looking Ahead
This release gives MCP the foundation we expect it to grow on for a long time: a protocol that runs statelessly on commodity HTTP infrastructure, an extensions framework where capabilities like Tasks and MCP Apps can ship on their own timeline, and a lifecycle policy that lets implementers build on 2026-07-28 knowing what they ship will keep working.
Thank you to everyone who shaped these proposals through the Working Groups and a great deal of patient review. We’re looking forward to making this final with the community on July 28.
Evaluating Multi-Agent Systems at Scale
OpenAI has introduced a "macro-evaluation" workflow designed to analyze recurring behavioral patterns across entire populations of multi-agent system traces, rather than focusing on isolated failures.
Deep dive
- OpenAI proposes a macro-evaluation workflow for multi-agent systems.
- This approach focuses on analyzing patterns across entire populations of agent traces, not just individual failures.
- It helps identify systemic problems like repeated missed signals or incorrect handoffs between specialist agents.
- The workflow involves generating/collecting many traced agent runs, running lower-level evals on each, turning traces into compact documents, discovering recurring behavior patterns, and drilling into high-impact patterns.
- A synthetic EV order workflow, involving specialist agents for pricing, compliance, supply, etc., serves as the example.
- The notebook uses precomputed synthetic traces and saved lower-level eval labels, allowing execution without an OpenAI API key.
- It distinguishes between lower-level evals (grading individual agents/actions) and macro evals (looking across many findings for patterns).
- Key reader-facing labels used are
case_type,run_outcome,eval_finding, andbehavior_pattern. - The goal is to translate thousands of agent events into a small number of patterns understandable by both technical and business stakeholders.
Decoder
- Agentic system: An AI system composed of multiple interconnected AI agents that collaborate and delegate tasks to achieve a larger goal.
- Macro-evaluation: A method of evaluating AI systems, especially multi-agent ones, by analyzing aggregate patterns and recurring behaviors across a large dataset of traces, rather than focusing on individual instance failures.
- Trace (in AI agents): A detailed log or record of an agent's internal thought process, actions, tool calls, and interactions throughout the execution of a task.
- Promptfoo: An open-source tool for testing and evaluating LLM prompts and agentic systems.
Original article
Full article content is not available for inline reading.
Anthropic plans Claude memory update with new Memory Files
Anthropic is preparing a major Claude memory update, introducing "Memory Files" that distribute conversational context across structured documents, akin to a personal wiki, and a "Dreams" feature for asynchronous memory consolidation.
Deep dive
- Anthropic is testing a "Memory Files" feature for Claude, moving beyond a single summarized note for user context.
- This new system will distribute Claude's notes across multiple structured documents, categorized by topic, project, or context.
- The approach is designed to function like a "built-in personal wiki" that Claude can consult selectively.
- It is similar to memory architectures found in agentic solutions like OpenClaw and Hermes, which use filesystem-style memory.
- "Memory Files" will allow Claude to have a larger and more durable record of each user without overwhelming its context window.
- This memory overhaul is likely a preparation for the debut of the Claude Conway agent.
- A related feature called "Dreams" is also being rolled out, which performs scheduled, asynchronous passes over memory files to merge duplicates, resolve contradictions, and surface patterns.
- "Dreams" is compared to REM sleep consolidation, producing a reorganized version of memory while leaving the original untouched.
- The Dreams feature is currently in limited beta for Claude Managed Agents on the developer platform, scoped to Opus 4.7 and Sonnet 4.6.
- No firm timeline for public release of Memory Files or Dreams in consumer Claude products has been announced.
- The memory rework is considered the most consequential upcoming change, aiming to put Claude on par with rivals' persistent-memory architectures while maintaining user control.
Decoder
- Agentic solutions: AI systems designed to perform a series of actions autonomously, often over extended periods, to achieve a goal.
- Context window: The maximum amount of text an LLM can process or "remember" at one time during a conversation.
- Dreams (Anthropic): An asynchronous process that runs on Claude's accumulated memory files to consolidate information, resolve contradictions, and identify patterns, akin to human sleep for memory consolidation.
- Memory Files (Claude): Anthropic's new structured memory system for Claude, organizing notes into distinct documents by topic or context to improve long-term recall.
- Persistent memory: An AI's ability to retain and recall information about past interactions across multiple sessions.
- Rolling summary: A continuously updated, single condensed note that attempts to capture the essence of a user's interaction history in a compact form.
- Token: The basic unit of data (like words or sub-words) that an LLM processes.
Original article
Anthropic appears to be preparing a substantial overhaul of how Claude remembers users across sessions, with early signals pointing to a dual-mode memory system that would let people choose between the current setup and a more sophisticated file-based architecture. The existing arrangement, framed internally as the “classic” option, condenses what Claude learns about a person into a single, summarized note. The forthcoming alternative, referred to as “Memory Files,” would distribute those notes across multiple structured documents organized by topic, project, or context. This feature is likely a new iteration of earlier discovered "Knowledge Bases".
Organized notes Claude writes as you chat and reads when they're relevant. Browse and edit them anytime.
The approach mirrors what is already powering always-on agentic solutions such as OpenClaw and Hermes, both of which rely on filesystem-style memory to scale beyond the limits of a single rolling summary. By splitting memory into discrete files, Anthropic would be able to give Claude a far larger and more durable record of each user without overwhelming the context window. In practice, it would function as a built-in personal wiki that the assistant can consult selectively depending on the topic under discussion.
Tied to this shift is the prospect that Dreams, a feature Anthropic only recently began rolling out to its Claude Managed Agents on the developer platform, eventually arrives in the consumer Claude product. Dreams runs as a scheduled, asynchronous pass over accumulated memory files, merging duplicates, replacing stale entries with fresh values, resolving contradictions, and surfacing patterns the model missed during live sessions. Anthropic has compared the process to REM sleep consolidation, with the original store left untouched while a reorganized version is produced for review.
On a similar note, Claude Conway agent is expected to arrive soon as well, and it is quite possible that Memory Files feature is part of the preparation for Conway's debut.
No firm timeline has surfaced yet, and Dreams itself remains in limited beta on the platform side, currently scoped to Opus 4.7 and Sonnet 4.6. Smaller UI tweaks are being prepared in parallel, but the memory rework stands out as the most consequential piece of what is coming next, placing Claude on a more competitive footing with the persistent-memory architectures that rivals have been building toward while preserving Anthropic’s stated emphasis on user control over what the model retains.
A hacker group is poisoning open source code at an unprecedented scale
Hacker group TeamPCP is relentlessly poisoning hundreds of open-source tools, executing supply chain attacks at an unprecedented scale, even breaching GitHub through a VSCode extension.
Deep dive
- TeamPCP has conducted over 20 "waves" of supply chain attacks, corrupting more than 500 distinct open-source software packages in recent months.
- The group breached GitHub by compromising a developer's VSCode extension, gaining access to approximately 3,800 GitHub code repositories containing GitHub's own code.
- TeamPCP claims to be selling GitHub's source code and internal organization data on BreachForums.
- Their core tactic involves gaining access to development networks, planting malware in commonly used open-source tools, and then using stolen credentials to publish malicious versions of other tools, creating a self-perpetuating cycle.
- The group has automated many attacks using a self-spreading worm known as Mini Shai-Hulud, which steals encrypted credentials.
- Previous victims include OpenAI, data contracting firm Mercor, the European Commission's public website, Trivy, LiteLLM, Checkmarx, pgserve, TanStack, and Mistral AI.
- TeamPCP is financially motivated, deploying ransomware or data extortion, and is willing to sell victims' data.
- Experts like Ben Read (Wiz) and Philipp Burckhardt (Socket) emphasize the need for better security hygiene, including rotating authentication tokens and vetting open-source updates before deployment (e.g., "age-gating").
Decoder
- Supply chain attack: A cyberattack that targets less secure elements in a supply chain, such as software components, to gain access to the main target.
- VSCode extension: A plug-in for Microsoft's Visual Studio Code integrated development environment (IDE) that adds functionality.
- Ransomware-as-a-service (RaaS): A business model where ransomware developers offer their tools and infrastructure to affiliates in exchange for a cut of the ransom payments.
- Infostealer: A type of malware designed to search for and steal sensitive information from a compromised computer.
Original article
A so-called software supply chain attack, in which hackers corrupt a legitimate piece of software to hide their own malicious code, was once a relatively rare event but one that haunted the cybersecurity world with its insidious threat of turning any innocent application into a dangerous foothold in a victim’s network. Now one group of cybercriminals has turned that occasional nightmare into a near-weekly episode, corrupting hundreds of open source tools, extorting victims for profit, and sowing a new level of distrust in an entire ecosystem used to create the world’s software.
On Tuesday night, open source code platform GitHub announced that it had been breached by hackers in one such software supply chain attack: A GitHub developer had installed a “poisoned” extension for VSCode, a plug-in for a commonly used code editor that, like GitHub itself, is owned by Microsoft. As a result, the hackers behind the breach, an increasingly notorious group called TeamPCP, claim to have accessed around 4,000 of GitHub’s code repositories. GitHub’s statement confirmed that it had found at least 3,800 compromised repositories while noting that, based on its findings so far, they all contained GitHub’s own code, not that of customers.
“We are here today to advertise GitHub’s source code and internal orgs for sale,” TeamPCP wrote on BreachForums, a forum and marketplace for cybercriminals. “Everything for the main platform is there and I very am happy to send samples to interested buyers to verify absolute authenticity.”
The GitHub breach is just the latest incident in what has become the longest-running spree of software supply chain attacks ever, with no end in sight. According to cybersecurity firm Socket, which focuses on software supply chains, TeamPCP has, in just the last few months, carried out 20 “waves” of supply chain attacks that have hidden malware in more than 500 distinct pieces of software, or well over a thousand counting all of the various versions of the code that TeamPCP has hijacked.
Those tainted pieces of code have allowed TeamPCP’s hackers to breach hundreds of companies that installed the software, says Ben Read, who leads strategic threat intelligence at the cloud security firm Wiz. GitHub is only the latest on the group’s long list of victims, which has also included AI firm OpenAI and the data contracting firm Mercor. “It may be their biggest one,” Read says of the GitHub breach. “But each one of these is a big deal for the company that it happens to. It’s not qualitatively different from the 14 breaches that happened last week.”
TeamPCP’s core tactic has become a kind of cyclical exploitation of software developers: The hackers gain access to a network where an open source tool commonly used by coders is being developed—for example, the VSCode extension that led to the GitHub breach or the data visualization software AntV that TeamPCP hijacked earlier this week. The hackers plant malware in the tool that ends up on other software developers’ machines, including some who are writing other tools intended to be used by coders.
The malware allows TeamPCP’s hackers to steal credentials that let them publish malicious versions of those software development tools, too. The cycle repeats, and TeamPCP’s collection of breached networks grows. “It’s a flywheel of supply chain compromises,” says Read. “It’s self-perpetuating, and it’s been a hugely successful way to get access to networks and steal stuff.”
Most recently, the group appears to have automated many of its software supply chain attacks with a self-spreading worm that’s come to be known as Mini Shai-Hulud. The name comes from GitHub repositories the worm creates that include encrypted credentials stolen from victims, each of which includes the phrase “A Mini Shai-Hulud Has Appeared” along with a handful of other references to the sci-fi novel Dune. That message in turn appears to be a reference not just to Dune’s sandworms but to a similar supply chain compromise worm known as Shai-Hulud that appeared in September, though there’s no evidence TeamPCP was behind that earlier self-spreading malware.
“They’re definitely going for big exposure. They really care about getting big attention,” says Philipp Burckhardt, who leads research at Socket and has tracked TeamPCP for months. “They like to toot their own horn.” A dark-web site for the group, which links to “business contacts” likely used to carry out ransom negotiations, features Matrix-style cascading ones and zeros, a reggae fusion soundtrack, and the words “TEAMPCP: The Cats Hijacking Your Supply Chains.”
Before landing on its current strategy for supply chain attacks, TeamPCP emerged in late 2025 exploiting cloud misconfigurations and a vulnerability in the web app development tool Next.js to deploy a botnet for attacks like credential theft and cryptocurrency mining. The group’s reliance on worms emerged during this time with increasing success grabbing static credentials and authentication tokens to bore deeper into victims’ systems.
“It’s been like wildfire; it’s gone very fast,” says Nathaniel Quist, manager of the Cortex Cloud intelligence team at Palo Alto Networks. “They find credentials, personal access tokens, and then it’s just how far can one credential go. I think we will continue to see these techniques. Threat actors know they work, and they’re running with it.”
TeamPCP appears to be financially motivated and often deploys ransomware or data extortion campaigns against its targets, though it also appears willing to sell victims’ data to any buyer. In the most recent case of GitHub, for instance, it wrote on its BreachForums site that “this is not a ransom. We do not care about extorting GitHub, 1 buyer and we shred the data on our end.”
It added what appeared to be a veiled threat to GitHub, perhaps intended to coerce the company to pay: “It looks like our retirement is soon so if no buyer is found we will leak it free.”
The picture has become increasingly complex, Quist says, since TeamPCP began moving to a ransomware-as-a-service model in April by establishing partnerships with the cybercriminal platforms BreachForums and DragonForce. The group has also, at times, seemed to wade into geopolitics, deploying a geographically targeted wiper (dubbed CanisterWorm by researchers) that targeted any Kubernetes cloud infrastructure with malware but only deployed a destructive wiper against Iranian targets. This week, an entity claiming to be TeamPCP also leaked the original Shai Hulud worm source code along with detailed documentation, though its motivations for that leak aren’t clear.
The scale of TeamPCP’s targeting expanded dramatically in March as it hacked more software utilities, leading to its more recent cascading effect of supply chain attacks. The group embedded an infostealer in the open source security scanner Trivy and then used stolen credentials from this attack to compromise certain versions of the AI application programming interface tool LiteLLM hosted on the popular Python software repository PyPI. The group also tainted infrastructure from the web application security firm Checkmarx, hit the development server pgserve, and compromised the web app library TanStack as well as the enterprise AI platform Mistral AI.
The fallout has been severe. In addition to GitHub, TeamPCP attacks on software service providers have led to breaches of the European Commission’s public website and the data contracting firm Mercor, compromise of two employees’ devices at OpenAI and many other incidents. But Palo Alto’s Quist emphasizes that organizations can protect themselves to a degree through security “hygiene” practices that carefully manage authentication tokens and impose access restrictions wherever possible.
“The biggest opportunistic thing that’s making this operation successful is long-lived credentials in these environments,” he says. “It’s vitally important to change your tokens even if you’re not using LiteLLM or any of these packages that have been compromised. If you have Gitlab and GitHub personal access tokens, rotate them. And AWS, Azure, GCP, Alibab, Oracle all of these credentials are being taken.”
TeamPCP’s tidal waves of tainted code also raise hard questions about how to safely use open source software in an era of mounting supply chain attacks. Wiz’s Read recommends safeguards such as “age-gating” updates to open source tools—vetting and installing security updates but otherwise holding off on immediate updates to code that’s been newly published and may be malicious.
In the case of one recent malicious TeamPCP update, Read says Wiz detected the supply chain compromise and warned customers within minutes, but many of the software’s users had auto-updates enabled and had already downloaded it. “You don’t want to just install the freshest version all the time,” Read says.
Amid an epidemic of supply chain attacks like the ones TeamPCP has unleashed, Socket’s Burckhardt says open-source users will need to take trust-but-verify measures, like analyzing updates for malware before rolling them out across a network, as well as the kind of “cool-down” period that Read recommends before downloading and running code.
“At the point it hits your machine,” Burckhardt says, “it’s already too late.”
This story originally appeared at WIRED.com.
GitHub internal repositories exfiltrated via malicious VS Code extension
GitHub confirmed a breach where 3,800 internal repositories were exfiltrated after a developer installed a malicious VS Code extension.
Deep dive
- GitHub confirmed approximately 3,800 internal repositories were exfiltrated due to a malicious VS Code extension installed by an employee.
- The incident aligns with claims from the TeamPCP hacker group, known for supply chain attacks involving CI/CD credentials.
- GitHub CISO Alexis Wales stated there is no evidence of impact to external customer repositories, but some internal data included customer support interactions.
- GitHub immediately began rotating critical secrets, prioritizing high-impact credentials, and is conducting a full investigation.
- This breach highlights the growing risk in the software supply chain, where malicious developer tools are used as an entry vector.
- The article mentions a separate, swift-response incident where the Nx Console VS Code extension (2.2 million installs) was briefly backdoored, collecting credentials silently.
- Experts, like Sonatype's Ilkka Turunen, emphasize that developers are now permanent targets, and "minimum package and extension ages" could help protect against such attacks.
Decoder
- Software supply chain attack: A cyberattack that targets vulnerabilities in the software development process, often by compromising third-party components or tools used by developers.
- VS Code extension: A program that extends the functionality of Microsoft's Visual Studio Code integrated development environment.
Original article
GitHub has confirmed that around 3,800 internal repositories have been breached, after a developer unwittingly installed a malicious VS Code extension.
The Microsoft-owned code repository and DevOps platform said the breach was detected on Monday, but that the activity involved exfiltration of GitHub-internal repositories only.
"We have no evidence of impact to customer information stored outside of GitHub's internal repositories, such as our customers' own enterprises, organizations, and repositories," said the firm's chief information security officer, Alexis Wales.
"Some of GitHub's internal repositories contain information from customers, for example, excerpts of support interactions. If any impact is discovered, we will notify customers via established incident response and notification channels."
GitHub said it started rotating critical secrets as soon as it discovered the breach, with the highest-impact credentials prioritized first. It is now analyzing logs, validating secret rotation, and monitoring its infrastructure for any follow-on activity, it said, promising a fuller report once it's finished its investigation.
GitHub hasn't explicitly named the attacker, but made reference to a claim by the TeamPCP hacker group that it had accessed around 3,800 repositories, saying that the number was consistent with its investigation so far.
TeamPCP, which first appeared late last year, is the group linked to the Mini Shai-Hulud worm, and carries out supply chain attacks by stealing CI/CD credentials and using them to publish infected versions of further packages.
The group has reportedly not asked for a ransom for the GitHub data, but is offering the stolen data for sale for $50,000, saying that if it doesn't receive an offer, it will leak it for free.
"This is another reminder that developers are now permanent targets in software supply chain attacks. TeamPCP has shown how a motivated attacker can move through the tools developers trust every day – open source packages, extensions, accounts, and credentials – rather than trying to break in through the front door," said Ilkka Turunen, Field CTO at Sonatype.
"Combined with the acceleration we're already seeing from AI-assisted vulnerability discovery, the window between compromise and exploitation is collapsing. The old assumption was that defenders would have time to identify, prioritize, and respond. That margin is disappearing."
The news came just a day after the Nx Console VS Code extension, which has 2.2 million installs, was briefly backdoored, with the malicious version collecting credentials silently when a developer opened a workspace. The issue was handled swiftly, with the extension pulled within 18 minutes on the VS Code Marketplace and 36 minutes on Open VSX.
"The community's ability to catch and remove malicious packages is real. For extensions with millions of installs, it's also insufficient," commented Shaun Brown technical product marketer at Aikido Security.
"Caught in 18 minutes and prevented exposure are not the same thing. Minimum package and extension ages are the best way to protect your devices from similar attacks today."
Designing end-to-end ingress request tracing for multi-tenant SaaS platforms
The CNCF released a framework for end-to-end ingress request tracing in multi-tenant SaaS, emphasizing trace IDs and span IDs to diagnose microservice failures.
Deep dive
- The Cloud Native Computing Foundation (CNCF) published a framework for end-to-end distributed tracing specifically designed for multi-tenant SaaS platforms.
- The core of the framework relies on two identifiers: a "Trace ID" which groups all work for a single customer request across services, and "Span IDs" which identify individual operations within that trace.
- Tracing is treated as a first-class platform capability, not an optional tool, with clear acceptance criteria for observable system outcomes.
- Key design principles include: generating a Trace ID at the ingress layer if not present, consistent context propagation across synchronous and asynchronous calls, and creating parent-child relationships between spans.
- Security is paramount, with trace data explicitly excluding sensitive information (payloads, credentials, PII) by design.
- Telemetry export is configuration-only, decoupling it from application code changes and release cycles.
- Tracing must have non-disruptive failure modes, meaning customer requests complete successfully even if telemetry backends are unavailable.
- The framework leverages industry standards like OpenTelemetry and W3C Trace Context, applicable to Kubernetes environments.
- Organizational challenges, like ensuring complete coverage and consistent adoption across all service teams, are highlighted as more difficult than technical implementation.
Decoder
- Distributed tracing: A method used to monitor requests as they flow through complex microservice architectures, providing a complete view of the request's journey and performance across multiple services.
- Trace ID: A unique identifier that links together all the individual operations (spans) related to a single user request across a distributed system.
- Span ID: A unique identifier for a single operation or unit of work within a trace, showing the duration and details of that specific step.
- Multi-tenant SaaS: A software-as-a-service model where a single instance of the software serves multiple customers (tenants), but each tenant's data is isolated.
- OpenTelemetry: A set of open-source tools, APIs, and SDKs used to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help analyze software performance and behavior.
- W3C Trace Context: A World Wide Web Consortium standard that defines HTTP headers to propagate context information across services in a distributed trace.
Original article
Modern SaaS platforms built on cloud‑native architectures frequently consist of dozens of independently deployed microservices. A single customer request entering the platform at the ingress layer may traverse authentication services, orchestration engines, data services, and downstream integrations before completing. When failures or performance regressions occur, platform operators must answer a fundamental question: what happened to this specific request, and where?
In many environments, answering this question remains difficult. Although services emit logs and metrics, these signals are disconnected. Telemetry is produced independently by each service without a shared request context, making it difficult to correlate failures, retries, or latency spikes into an end‑to‑end narrative.
This article presents a product‑led framework for designing ingress request tracing in multi‑tenant SaaS platforms. The focus is on design principles and observable system behavior, not implementation code. The framework builds on industry standards such as OpenTelemetry and W3C Trace Context and is applicable to Kubernetes‑based environments.
The observability problem
Without end‑to‑end tracing, ingress requests cannot be reliably followed as they traverse downstream services. Failures appear as isolated events. Latency regressions are visible only in aggregate metrics. Multi‑service workflows and intermittent issues are especially difficult to diagnose.
Operational teams compensate by manually correlating logs using timestamps, heuristics, and partial identifiers. This approach does not scale with service growth and results in slower diagnosis, higher cognitive load during incidents, and reduced confidence in root cause analysis.
The core challenge is not insufficient telemetry, but the lack of consistent request‑level context linking all operations together.
A product-led framework for ingress request tracing
This framework treats distributed tracing as a first‑class platform capability rather than a service‑level implementation choice. At its core are two complementary identifiers: a Trace ID that groups all work for a single customer request, and Span IDs that identify individual units of work (such as a service call or database query) within that trace.
Every ingress request must have an associated trace identifier. If an incoming request does not contain a trace ID, the ingress layer generates one. If a valid trace ID is already present, it is preserved.
1. Trace ID and span ID generation and preservation
Each service processing the request creates its own span and assigns a unique span ID to that unit of work. When the service makes a downstream call, it passes both the trace ID (unchanged) and its span ID (which becomes the parent span ID for the next service). This creates a parent‑child relationship that allows the observability platform to reconstruct the exact sequence and hierarchy of all operations.
This generate‑or‑preserve rule ensures interoperability with upstream systems while maintaining trace continuity within the platform. Both the trace ID and current span ID are attached to the request context and included in response headers so they can be used as deterministic lookup keys during investigations.
Figure 1: End-to-End trace ID and span ID propagation
In the diagram above, a single Trace ID flows unchanged through all services (auth, orchestration, data layer), representing the customer’s complete request. Each service creates its own Span ID; when Service A calls Service B, it passes both the Trace ID and its own Span ID (which Service B records as its parent). This hierarchy allows operators to see not just that a request failed, but exactly which service and at which point in the sequence.
2. Consistent context propagation
All synchronous service‑to‑service calls reuse the same trace ID. Each service creates a new span ID for its own work. Retry operations preserve the original trace ID but may create additional span IDs for each retry attempt, allowing the observability platform to distinguish between the original call and subsequent attempts while keeping them grouped under the same trace.
Where asynchronous processing exists, trace context (both trace ID and parent span ID) is propagated via message metadata to prevent observability gaps as workflows evolve.
3. Security-First Trace Metadata
Trace data is limited to operational metadata only: trace ID, span ID, parent span ID, service name, operation name, timestamps, duration, and execution status.
Request payloads, credentials, secrets, tokens, and personally identifiable information are explicitly excluded by design. Treating data exclusion as a design constraint simplifies security reviews and reduces long‑term compliance risk.
4. Configuration-Only Telemetry Export
Trace export is managed entirely via Kubernetes configuration. Operators can configure exporters, credentials, and routing parameters without application code changes.
This decouples tracing operations from release cycles and allows teams to evolve observability using existing SRE workflows.
5. Non-Disruptive Failure Modes
Tracing must never block request processing. If telemetry backends are unavailable or misconfigured, requests complete successfully. Trace data may be buffered or dropped, but customer experience is unaffected.
Partial traces are acceptable. Failed requests are not.
Acceptance criteria as executable contracts
Clear acceptance criteria define observable system outcomes, not implementation details. In this framework, acceptance criteria act as executable contracts between product management and engineering. Each criterion maps to a specific requirement and is independently testable.
| AC ID | Observable Behavior | Requirement Area |
| AC-001 | Every ingress request includes a globally unique trace ID in response headers. Trace IDs already present in incoming requests are preserved and propagated unchanged. | Trace ID Generation & Preservation |
| AC-002 | All platform services processing an ingress request create their own span with a unique span ID. Parent‑child relationships are established through parent span IDs. Retry operations preserve the original trace ID. | Span Creation & Hierarchy |
| AC-003 | Each platform service captures trace-level execution data including trace ID, span ID, parent span ID, service name, operation name, timestamps, duration, status, and HTTP response code. | Trace Data Capture |
| AC-004 | SREs can query traces using a trace ID as a primary lookup key in observability platforms and view the complete execution path with service-to-service relationships via span hierarchies. | Trace Queryability |
| AC-005 | SREs can configure trace export destinations via Kubernetes configuration files without application code changes. Multiple backends and tenant-specific routing are supported. | Config-Only Export |
| AC-006 | Traces exported to observability platforms are visualizable with end-to-end trace views, service dependency graphs, span hierarchies, and latency breakdowns per service and span. | Platform Visualization |
| AC-007 | Tracing does not block or fail requests when the telemetry backend is unavailable. Trace data excludes sensitive payload information, credentials, and PII by design. | Non-Disruptive & Secure |
These criteria prevent partial adoption, reduce ambiguity during implementation, and provide a stable basis for regression validation as the platform evolves.
Quantifying business value
Infrastructure initiatives frequently fail because they cannot articulate business value beyond engineering. The value proposition for this type of initiative should be constructed around measurable operational dimensions:
| Value Dimension | Quantified Impact |
| Root Cause Identification | Shift from heuristic-based to deterministic tracing via trace and span hierarchies; elimination of manual log correlation |
| Operational Scalability | Observability scales linearly with service count rather than degrading with complexity; span‑level granularity enables micro-service level diagnostics |
Understanding trace and span context
The W3C Trace Context standard defines how trace information propagates across services. It specifies two HTTP headers: traceparent carries the essential identifiers, and tracestate carries vendor-specific metadata. The traceparent header format is version‑trace‑id‑span‑id‑flags (for example, 00‑abc123‑def456‑01).
Trace ID: Globally unique identifier that groups all spans belonging to a single customer request. Unchanged as the request flows through all services. Enables support teams to look up the entire request path.
Span ID: Unique identifier for a single unit of work (e.g., API call, database query). Each service creates its own span ID. When making downstream calls, the current span ID becomes the parent span ID for the next service, establishing a parent‑child relationship.
Parent Span ID: The span ID of the calling service. Used to reconstruct the sequence and hierarchy of operations. Allows the observability platform to display which service called which service and in what order.
Together, trace ID and span hierarchy enable operators to ask not just ‘did this request fail’ but ‘exactly where in the sequence did it fail, and what was the sequence of calls that led to that point.’
Operational impact
Ingress request tracing shifts troubleshooting from inference to direct observation. Engineers can follow individual requests across services instead of reconstructing behavior from disconnected signals. With trace and span IDs, the entire execution path is visible: which services were called, in what order, and how much time each spent.
The qualitative benefits are immediate and significant: faster localization of failures through trace ID lookup and span hierarchy analysis, clearer cross‑team communication using shared trace references instead of symptom descriptions, reduced cognitive load during incidents as SREs observe the exact sequence rather than hypothesize, and proactive performance management through per‑service and per‑span latency decomposition.
For small SRE teams supporting complex platforms, these improvements are transformative. A single SRE with a trace can achieve what previously required a cross‑team war room.
The hardest part Is not technical
The most underestimated challenge in any tracing initiative is organizational, not technical. A distributed tracing system is only as complete as its coverage. If three out of eight services in a request path propagate trace context and five do not, the result is a trace with large gaps that is operationally unreliable. Worse, broken span‑parent relationships make the hierarchy useless.
The solution combines technical enforcement with organizational process: automated CI/CD checks that reject deployments without trace instrumentation and proper span creation, a documented onboarding checklist for every service team, and sustained adoption tracking until 100% propagation is achieved. Without this sustained attention, adoption stalls at the teams that opt in voluntarily, leaving critical gaps in exactly the services where tracing is most needed.
Replicating this framework
This framework is designed to be replicable across any multi‑service SaaS platform running on container orchestration infrastructure. The design principles—generate or preserve trace IDs, create unique span IDs per service with parent‑child relationships, capture only operational metadata including span IDs, export through configurable backends, and degrade gracefully—are architecture‑agnostic and applicable regardless of the specific microservices framework, programming languages, or observability backend in use.
Organizations considering adoption should pay particular attention to two areas: failure mode design (ensuring tracing cannot cause outages) and organizational adoption strategy (ensuring complete service coverage through both technical enforcement and process). These are the most common points of failure in distributed tracing deployments and the areas where published guidance is most sparse.
Natural extensions include expanding to asynchronous message‑based workflows, implementing intelligent sampling strategies, correlating trace and span data with infrastructure‑level signals, and ultimately leveraging historical span patterns for predictive operations.
Conclusion
Distributed tracing is foundational to operating cloud‑native platforms at scale, but tooling alone is insufficient. By treating tracing as a product capability with clear guarantees, acceptance criteria as executable contracts, and failure‑mode discipline, platforms can deliver reliable request‑level visibility without compromising security or availability.
The gap in our industry is not in tracing tools—OpenTelemetry, Jaeger, Zipkin, and commercial platforms have solved the instrumentation and visualization layers. The gap is in the product and operational decisions required to deploy tracing successfully: how to scope it, how to secure it, how to make it operator‑friendly, how to ensure complete adoption, how to establish span hierarchies that reveal the true sequence of operations, and how to measure its impact. That is the gap this framework addresses.
Migrating from Go to Rust
A guide for Go teams migrating to Rust highlights Rust's stronger compile-time guarantees, like memory safety and explicit error handling, as a trade-off for its steeper learning curve and slower compile times.
Deep dive
- The guide by Matthias Endler is specifically for Go teams considering migrating backend services to Rust.
- It notes that Go and Rust both offer static typing and strong concurrency, but diverge on compiler guarantees and runtime control.
- Rust enforces memory management, data-race prevention, and error handling through its type system (ownership, Send/Sync, Result, Option), whereas Go relies on runtime checks and conventions.
- Key pain points in Go that drive migration include verbose error handling (
if err != nil), nil pointer panics, and runtime data races (go test -raceisn't exhaustive). - Rust's
OptionandResulttypes force explicit handling of absence and errors, eliminating entire categories of runtime bugs. - Rust's monomorphized generics offer zero-cost abstractions, unlike Go's generics which can have performance implications and feel "tacked on."
- Go's garbage collector, while excellent, can cause P99 latency spikes under heavy memory pressure, a non-issue for Rust.
- The "borrow checker" is highlighted as the primary challenge for Go developers moving to Rust, enforcing memory safety and aliasing rules at compile time.
- Compile times for Rust are generally longer than Go's, but incremental builds and
cargo checkare efficient. - Go's "function coloring" (lack of explicit
async/await) is an ergonomic advantage, which Rust's explicit async model loses. - Recommended migration strategies include carving off "hot path" services, replacing sidecar/worker processes, or using a strangler pattern behind an API gateway, rather than full rewrites.
- Rust typically offers 20-40% CPU improvement and 30-50% memory reduction over Go, along with flatter P99 latency.
- The author also notes that Go remains excellent for Kubernetes tooling, CLI utilities, and simple glue services where velocity outweighs absolute correctness.
Decoder
- Monomorphization: A compilation technique where generic code is specialized for each specific type it's used with, resulting in unique machine code for each instantiation and no runtime overhead.
- Borrow checker: A component of the Rust compiler that enforces strict rules about how references (borrows) to data can be used, ensuring memory safety and preventing data races at compile time.
- Nil pointer panic: A runtime error in Go (and other languages) that occurs when a program attempts to dereference a pointer that has a null value, leading to a program crash.
- Data race: A concurrency bug that occurs when two or more threads or goroutines access the same memory location concurrently, at least one of the accesses is a write, and there is no synchronization to control the order of accesses.
- P99 latency: The 99th percentile of response times, meaning 99% of requests are processed within this latency or faster, indicating the performance for the vast majority of users.
Original article
Full article content is not available for inline reading.
Is your SIEM actually ready? A new way to find out
Elastic Security 9.4 introduces "SIEM Readiness," a new feature providing a centralized, automated view of SIEM operational health, evaluating log coverage, data quality, and retention across key telemetry domains.
Deep dive
- SIEM Readiness is a new capability in Elastic Security, available in technical preview as of version 9.4.* It aims to provide a centralized, continuously updated, and actionable view of SIEM operational health.* The initial focus is on "Visibility Health," which assesses whether the underlying data is present, correct, flowing, and retained.* It organizes the view around five core telemetry domains: Endpoint/Host, Identity, Network, Cloud, and Application/SaaS.* Four key dimensions are evaluated: Coverage, Quality, Continuity, and Retention.* Coverage: Checks if enabled detection rules have the required data sources, and assesses overall coverage against baselines like MITRE ATT&CK, NIST CSF, and CIS benchmarks, tailored to the environment.* Quality: Flags ECS incompatibilities in data that could cause rules or dashboards to fail silently.* Continuity: Monitors pipeline failure rates, flagging anything above a 1% threshold.* Retention: Evaluates retention policies against industry benchmarks (FedRAMP, NIST 800-53, SOC 2, ISO 27001) across hot, warm, and cold storage.* The feature is designed for action, with every signal tied to a concrete next step (onboard data, fix pipeline, adjust policy, create case).* It's environment-aware, excluding categories that don't apply, and telemetry-driven, inferring the environment from the data rather than requiring manual configuration.* Elastic plans to extend SIEM Readiness to "Detection Readiness" (are rules effective?) and "Response Readiness" (are workflows operational?).
Decoder
- SIEM (Security Information and Event Management): A software solution that aggregates and analyzes security alerts and logs from various sources across an organization's IT infrastructure to provide a centralized view of security events and help detect threats.* Elastic Security: Elastic's platform for security operations, which includes SIEM capabilities.* ECS (Elastic Common Schema): An open source specification that defines a common set of fields for storing event data in Elasticsearch, making it easier to analyze data from disparate sources consistently.* MITRE ATT&CK: A globally accessible knowledge base of adversary tactics and techniques based on real-world observations, used as a foundation for the development of specific threat models and methodologies.* NIST CSF (Cybersecurity Framework): A set of guidelines for private sector organizations to improve their cybersecurity posture, developed by the U.S. National Institute of Standards and Technology.* CIS benchmarks: A set of configuration guidelines for securely configuring operating systems, servers, applications, and network devices, developed by the Center for Internet Security.
Original article
Full article content is not available for inline reading.
The 58-Million-Key Freeze: What a HashMap Resize Taught Us About Memory Allocation at Scale
LinkedIn's Rust-based FishDB service froze for 10-15 seconds due to a HashMap resizing at 58.7 million keys, acquiring a process-wide mmap_lock and blocking all other threads.
Deep dive
- LinkedIn's FishDB service, a Rust application using jemalloc and Tokio, experienced recurring 10-15 second freezes, breaching availability SLOs.
- The problem was elusive: ephemeral, silent (no logs), sporadic, and without obvious external triggers.
- Correlation with RSS spikes led to suspicion of memory allocation issues.
- Traditional CPU profiling was ineffective as threads were blocked (off-CPU).
- An automated eBPF-based off-CPU profiling script was deployed to capture kernel stack traces during freezes.
- Off-CPU profiles revealed threads blocked on rwsem_down_write_slowpath (write lock for mmap), rwsem_down_read_slowpath (read lock for madvise and page faults).
- This pointed to contention on the Linux kernel's process-wide mmap_lock (VMA semaphore), which protects virtual memory area data structures.
- A large mmap allocation (requiring a write lock) blocked all other threads needing mmap_lock in read mode.
- The HashMap pkey_vs_docref (document reference index) was found to be the culprit. It held 56-59 million entries.
- At exactly 58,720,256 keys, the HashMap capacity doubled from ~1.75 GB to ~3.5 GB, requiring both buffers to coexist (total ~5.25GB, leading to observed ~4GB RSS spike).
- The fix involved pre-allocating the HashMap with HashMap::with_capacity(base_index_size
-
- to a sufficient size at startup, avoiding dynamic resizing.
- This prevented the mmap_lock contention and eliminated freezes.
Decoder
- mmap_lock: A process-wide read-write semaphore in the Linux kernel that protects the virtual memory area (VMA) data structures. Operations modifying the virtual address space (like large memory allocations or deallocations) require this lock, causing contention if held for too long.
- eBPF (extended Berkeley Packet Filter): A Linux kernel technology that allows programs to run in a sandboxed environment within the kernel, enabling powerful, flexible, and safe kernel-level tracing and profiling without modifying kernel source code.
- jemalloc: A general-purpose memory allocator that emphasizes fragmentation avoidance and scalable concurrency. It is used by many large-scale applications.
- Tokio: An asynchronous runtime for the Rust programming language, providing the necessary tools to build network applications and services.
- madvise: A system call that advises the kernel on how to handle a process's memory regions. MADV_DONTNEED is often used to tell the kernel that memory pages are no longer needed and can be reclaimed.
- RSS (Resident Set Size): The portion of a process's memory that is held in RAM (not swapped out).
Original article
Full article content is not available for inline reading.
Plan Mode All the Time, Substrait over SQL, and the End of the DE Role ft
Chris Riccomini argues AI agents, when used with "plan mode" and declarative workflows, are already capable of most data engineering, suggesting a future where data engineers become general "data" roles and LLMs prefer formats like Substrait over SQL.
Deep dive
- Chris Riccomini believes AI can handle the majority of data engineering work, especially with declarative workflows and strong quality gates.
- For financial data, correctness is maintained by defining invariants and using traditional verification tools, as well as pairing AI with human review for bug spotting.
- He advocates for LLMs to "speak" Substrait, a format representing physical data transformations (e.g., hash join vs. merge join), rather than SQL.
- Substrait could lead to fewer LLM hallucinations and allow for client-side query optimization.
- To make AI output more reliable, Chris recommends "plan mode all the time," where LLMs iterate extensively on a plan before implementation.
- Managing context by starting with fresh LLM contexts or using "Ralph Loops" (iterative autonomous AI development with external tests) can improve reliability.
- Implementing strong quality gates (defining, measuring, enforcing quality) is crucial, like enforcing test coverage with commit hooks.
- Non-determinism from LLMs can be mitigated by moving to incremental data loads, reducing the scope of potential errors.
- Security concerns for AI agents are high, with a need for "Okta for Agents" (identity/access management) to manage skills, marketplaces, lineage, and RBAC/ABAC.
- Agents are already good at inspecting failed workflows, running SQL queries, and writing Python, and could automate much of the "grunt work" of data engineering.
- The "data engineer" role may merge into a broader "data" role encompassing engineering, ML, and analysis, as tools become more agent-friendly.
- The choice of programming language may shift from human ergonomics to "agent ergonomics," favoring languages that lead to faster, cheaper, and more stable LLM output (e.g., Go over Python due to token cost/code size).
- He suggests that while AI might reduce some rote learning, it enables tackling more complex projects and learning about new domains (like FFI bindings) that would otherwise be too time-consuming.
Decoder
- Substrait: An emerging open standard that provides a cross-language serialization format for relational algebra expressions. It can represent both logical and physical query plans, allowing for more precise data transformation instructions than pure SQL.
- Plan Mode: An approach to interacting with AI where the LLM is guided to first generate a detailed, iterative plan for a task, which is then refined and approved by a human, before the LLM proceeds with implementation.
- Ralph Loop: An iterative, autonomous AI development technique where a bash loop (or similar mechanism) repeatedly prompts an AI agent with the same goal, forcing it to persistently iterate and fix errors until external tests pass.
- LLM (Large Language Model): A type of artificial intelligence program designed to understand and generate human-like text, often trained on vast amounts of text data.
- Declarative Workflows: A programming paradigm where you describe what you want the program to achieve, rather than how to achieve it (as in imperative programming). This allows the underlying system to determine the best execution strategy.
- Agent Ergonomics: The idea of designing programming languages, tools, and workflows to be optimized for AI agents to use, rather than primarily for human developers.
Original article
Full article content is not available for inline reading.
pg_infer 1.0.0 released -- transformer model knowledge as SQL relations
pg_infer 1.0.0 is a new PostgreSQL 18+ extension that exposes small transformer model internals as SQL-queryable relations, enabling efficient, costed, and parallelized inference directly within the database.
Deep dive
- pg_infer 1.0.0 is a PostgreSQL 18+ extension that exposes transformer model internals as SQL-queryable relations.
- It treats the model as a first-class data source, allowing the PostgreSQL planner to cost, schedule, and parallelize inference as an operator within a query plan.
- The extension provides functions like
describe(entity)to get learned relations,walk(prompt)for per-layer activations, andimplies(a, b)for directional support. - It includes a custom index access method supporting
ORDER BY <~>for model-aware document ranking without pre-computed embeddings. - Unlike
pgvectoror RAG-style integrations,pg_inferstores the model itself in WAL-logged 8KB pages, enabling full database backup, replication, and point-in-time recovery for model state. - Optimized for CPU execution using BLAS (OpenBLAS) and f16 gate vectors, it specifically supports Microsoft BitNet b1.58 models, which are efficient on commodity CPUs.
- A remote backend (
larql-server) allows offloading inference to idle PostgreSQL replica hosts, utilizing existing hardware capacity. - The project is based on Chris Hayuk's LARQL project, which pioneered the idea of queryable transformer internals.
Decoder
- Transformer model: A deep learning model architecture, particularly effective for processing sequential data like natural language, known for its attention mechanism.
- Gate activations: Internal numerical values within a neural network layer that determine the flow of information.
- Feature labels: Metadata associated with learned features in a model.
- Embeddings: Numerical representations of concepts, words, or entities in a continuous vector space.
- BitNet b1.58: A family of "two-bit / 1.58-bit" ternary-weight transformer models developed by Microsoft, designed for high quality on commodity CPUs with dramatically lower memory and power costs.
- vindex: A format for extracting and storing transformer model knowledge (gate vectors, feature activations, learned associations) developed by the LARQL project.
- WAL-logged pages: Data stored in PostgreSQL's Write-Ahead Log, ensuring durability and recoverability.
- Index Access Method (AM): A PostgreSQL mechanism that defines how a specific type of index is stored and accessed.
- BLAS (Basic Linear Algebra Subprograms): A specification that defines a set of low-level routines for common linear algebra operations, optimized for performance.
Original article
Full article content is not available for inline reading.
Same buffers, same instructions, same hardware. Where Is the JVM Tax?
Semyon Sinchenko's benchmarks show modern Java running vectorized arithmetic kernels over Apache Arrow buffers delivers performance comparable to native arrow-rs, challenging the "JVM tax" narrative for analytical workloads.
Deep dive
- Semyon Sinchenko benchmarked Java vs. native performance for simple vectorized arithmetic kernels over Apache Arrow buffers.
- The Java implementation used the official Apache Arrow Java SDK (16.1.0), JDK 25.0.3-temurin,
java.lang.foreign.MemorySegment, and the JDK Vector API. - The native reference used
arrow-rs(56) and the Criterion benchmark harness. - The hardware was a 13th Gen Intel Core i5-1335U.
- For
MulFloat64, Java and nativearrow-rsshowed "roughly the same performance class," with ratios typically between 1.13x and 1.25x in favor of native or Java depending on dataset size. - Sinchenko attributes performance differences to cache effects, memory bandwidth, and CPU behavior rather than an inherent "JVM tax."
- The
AddInt32benchmark showed a larger gap favoring Java, which Sinchenko attributes to a semantic mismatch (Java's wrapping arithmetic vs.arrow-rs's checked arithmetic preventing vectorization). - He argues that the "JVM tax" phrase often misattributes overheads from frameworks like Spark (scheduler, shuffle, spill) or object-per-value data models to the JVM itself.
- The benchmark specifically uses Apache Arrow to ensure a columnar memory layout, avoiding the "object layout tax" or "GC-visible object graph tax."
- Sinchenko highlights JVM benefits like dynamic code loading, cross-platform portability (e.g., Xeon to Graviton), automatic CPU dispatch, memory safety, and a unified operational surface for metrics and debugging.
- The experiment is intentionally narrow, not addressing complexities like Decimals, Strings, Nested types, Hash aggregation/joins, Parquet I/O, or end-to-end query execution.
Decoder
- JVM Tax: A pejorative term implying an inherent performance penalty associated with running code on the Java Virtual Machine compared to native execution.
- Apache Arrow: A language-agnostic columnar memory format for in-memory data processing, designed for high-performance analytical workloads.
- MemorySegment: A Java API from
java.lang.foreignfor accessing contiguous regions of memory outside the Java heap, enabling efficient interaction with native memory. - JDK Vector API: A Java API for performing vectorized computations (SIMD instructions) on arrays of primitive types, improving performance for data-parallel operations.
- JMH (Java Microbenchmark Harness): A Java tool for building, running, and analyzing nano/micro/milli/macro benchmarks written in Java.
- arrow-rs: The official Rust implementation of Apache Arrow.
- Columnar memory layout: A data storage arrangement where all values for a single column are stored contiguously, optimizing for analytical queries.
Original article
Full article content is not available for inline reading.
SAM 3: Segment Anything with Concepts (GitHub Repo)
Meta Superintelligence Labs released SAM 3, a new unified foundation model for image and video segmentation that significantly improves open-vocabulary concept segmentation from text or visual prompts, outperforming SAM 2.
Deep dive
- SAM 3 is a new unified foundation model for promptable segmentation in images and videos, developed by Meta Superintelligence Labs.
- It significantly improves upon its predecessor, SAM 2, by introducing the ability to exhaustively segment all instances of open-vocabulary concepts specified by text or visual prompts.
- SAM 3 achieves 75-80% of human performance on the new SA-Co benchmark, which contains over 270,000 unique concepts.
- This breakthrough is powered by an innovative data engine that automatically annotated over 4 million unique concepts for training.
- The model features a new architecture with a "presence token" for better discrimination between similar text prompts and a decoupled detector–tracker design to minimize task interference.
- SAM 3.1, released on March 27, 2026, introduces "Object Multiplex" for faster joint multi-object tracking.
- Requires Python 3.12+ and PyTorch 2.7+ with CUDA 12.6+.
- Access to model checkpoints needs to be requested on the SAM 3 Hugging Face repo.
- The project provides examples for image and video segmentation, batched inference, and using SAM 3 as an agent.
- Two new image benchmarks (SA-Co/Gold, SA-Co/Silver) and one video benchmark (SA-Co/VEval) are released.
- The model has 848 million parameters and consists of a detector and a tracker sharing a vision encoder.
Decoder
- Foundation model: A large AI model trained on a vast quantity of data that can be adapted to a wide range of downstream tasks.
- Promptable segmentation: The ability of an image segmentation model to identify and segment objects in an image based on various prompts, such as text descriptions, points, bounding boxes, or masks.
- Open-vocabulary concept segmentation: The ability to segment objects described by any arbitrary text phrase, rather than being limited to a predefined set of categories.
- SA-Co benchmark: A new benchmark (Segment Anything with Concepts) introduced with SAM 3, containing 270,000 unique concepts for evaluating open-vocabulary segmentation performance.
- Presence token: A new architectural component in SAM 3 designed to improve the model's ability to distinguish between closely related text prompts.
- Decoupled detector–tracker design: An architectural approach where the object detection and object tracking components are designed to operate independently, reducing interference and improving scalability.
- DETR (Detection Transformer): A transformer-based object detection model.
- MLLM (Multi-modal Large Language Model): A large language model capable of processing and understanding multiple types of data, such as text and images.
Original article
Full article content is not available for inline reading.
Cloud Native Computing Foundation Announces OpenTelemetry's Graduation, Solidifying Status as the De Facto Observability Standard
OpenTelemetry has officially graduated from the CNCF, solidifying its status as the de facto vendor-neutral standard for collecting metrics, logs, and traces.
Deep dive
- OpenTelemetry, a merger of OpenTracing and OpenCensus formed in 2019, has graduated from the Cloud Native Computing Foundation (CNCF) on May 21, 2026.
- Graduation signifies that OpenTelemetry is a stable, production-ready, and vendor-neutral open-source observability framework for collecting metrics, logs, and traces.
- The project has seen immense growth, with over 12,000 contributors from over 2,800 companies and has the second-highest project velocity in the CNCF ecosystem, behind Kubernetes.
- Widespread adoption includes major organizations like Alibaba, Anthropic, Bloomberg, Capital One, eBay, FICO Software, and Heroku.
- Download numbers for its JavaScript API and Python API packages surpassed 1.36 billion and 1.3 billion respectively in the past 12 months, setting new monthly records in April 2026.
- OpenTelemetry helps solve tool fragmentation by providing a single set of APIs, SDKs, a Collector agent, and semantic conventions, allowing organizations to switch observability backends without re-instrumenting code.
- Its maturity is supported by a third-party independent security audit and formal governance review.
- The project is also gaining interest for observing AI workloads, including performance, reliability, accuracy, and trustworthiness.
- It integrates deeply with other CNCF projects like Kubernetes, Fluentd, Jaeger, and Prometheus, and has been adopted by other Linux Foundation projects like Cloud Foundry and OpenSearch.
- Supporters like Austin Parker (Honeycomb.io), Morgan McLean (Splunk/Cisco), Michele Mancioppi (Dash0), Bob Quillin (ControlTheory), Gordon Radlein (Datadog), Richard Seroter (Google Cloud), Ted Young (Grafana Labs), Christine Yen (Honeycomb), Brendan Burns (Microsoft Azure), Juraci Paixão Kröhling (OllyGarden), and Ben Sigelman (Lightstep) provided supportive quotes, emphasizing its industry impact and shift from vendor-specific telemetry to shared standards.
Decoder
- Cloud Native Computing Foundation (CNCF): A Linux Foundation project that fosters and sustains an ecosystem of open source, vendor-neutral projects for cloud-native software.
- Observability: The ability to understand the internal state of a system by examining its external outputs, typically through metrics, logs, and traces.
- Telemetry data: Data collected from a remote or inaccessible source, including metrics (numerical measurements), logs (timestamped event records), and traces (records of requests flowing through distributed systems).
- OpenTracing: A deprecated CNCF project that provided a vendor-neutral API for distributed tracing.
- OpenCensus: A deprecated Google-led project for metrics and distributed tracing.
- Special Interest Group (SIG): A community group within an open-source project focused on a specific area, such as a programming language or component.
- Project Velocity: A metric used by CNCF to gauge the activity, growth, and adoption of its projects, often measured by contributions, commits, and community engagement.
Original article
Full article content is not available for inline reading.
7 Temporal Blind Spots Breaking Enterprise RAG
Enterprise RAG systems are often undermined by "temporal blind spots," failing to provide accurate, up-to-date information because they ignore the critical dimension of time.
Deep dive
- Enterprise RAG systems commonly fail due to "temporal blind spots," which occur when the architecture ignores the recency or temporal context of information.
- An example cited is a hedge fund RAG assistant providing two-week-old Federal Reserve information, leading to a multi-million dollar trading error.
- Stale Indexes: 61% of RAG pipelines refresh daily or less, but 73% of users expect information no older than six hours for time-critical queries, creating a significant gap.
- Time-Blind Embeddings: Embeddings excel at semantic similarity but often fail to encode temporal proximity, leading systems to retrieve older, semantically similar documents over newer, more relevant ones.
- Query-to-Context Time Mismatch: Implicit temporal references in user queries (e.g., "current policy") are often ignored, causing the system to retrieve irrelevant old information.
- Temporal Hallucination: RAG systems can faithfully reproduce facts that were true at one point but are now outdated, especially when ingesting historical archives.
- Evaluation Gaps: Traditional RAG evaluation metrics do not account for temporality, meaning systems can score high while systematically providing outdated answers.
- Chunking Against the Clock: Chunking strategies that prioritize semantic coherence can inadvertently destroy temporal narrative, breaking chronological cause-and-effect within documents.
- Cost Overruns from Real-Time Retrieval Pipelines: Addressing freshness often increases compute costs; a tiered freshness model (high, medium, low urgency data) is recommended to balance accuracy and cost.
- The article emphasizes that these issues are not edge cases but default failure modes if time is not explicitly designed into RAG systems, providing actionable engineering patterns to mitigate them.
Decoder
- Retrieval Augmented Generation (RAG): An AI architecture that enhances large language models (LLMs) by retrieving information from an external knowledge base to ground its responses, aiming to reduce hallucinations and provide up-to-date information.
- Vector Store/Index: A specialized database optimized for storing and querying vector embeddings, which represent data points (like text chunks) as numerical vectors in a high-dimensional space.
- Embedding: A numerical representation of text (or other data) in a vector space where semantically similar items are mapped closer together.
- Cosine Similarity: A measure of similarity between two non-zero vectors in an inner product space, commonly used to determine how similar two documents or embeddings are.
- Hallucination (AI): When an AI model generates information that is plausible-sounding but factually incorrect or unsupported by its training data or retrieved context.
- RAGAS: A set of metrics and tools specifically designed for evaluating the quality of Retrieval Augmented Generation (RAG) systems.
- Chunking: The process of dividing a large document into smaller, manageable pieces (chunks) before creating embeddings and storing them in a vector database, to improve retrieval relevance and fit within LLM context windows.
- Temporal Blind Spot: A failure mode in AI systems, specifically RAG, where the system overlooks or incorrectly handles the time dimension of information, leading to outdated or contextually irrelevant responses.
Original article
Full article content is not available for inline reading.
Leading Design Through the AI Shift
Slack's VP of Product Design, Will Miner, reports that 70 designers are now leveraging AI for rapid prototyping and building internal tools, shifting focus to faster customer value delivery while stressing human judgment.
Deep dive
- AI has significantly transformed design workflows at Slack, with approximately 70 designers now using it.
- Designers are leveraging coding agents for tasks like data analysis, rapid prototyping, and building custom internal tools, even fixing UI bugs themselves.
- This has blurred traditional boundaries between design and engineering roles, with designers creating executive demos in code rather than just Figma mockups.
- Slack's VP of Product Design, Will Miner, advocates for leadership guided by principles of curiosity, skepticism, and owning the design judgment at the table.
- Miner encourages designers to personally try AI tools to discern hype from reality, but without mandates or quotas, allowing teams to explore freely.
- He stresses that while LLMs can generate endlessly, human taste and judgment are crucial for deciding what's worth building and ensuring it resonates with users.
- Miner advises against "sharing slop" – using AI as an excuse to produce low-quality or unnecessary artifacts.
- He acknowledges the increased stress and uncertainty for designers and encourages patience and breaks, recognizing the challenge of learning amid industry shifts.
- The core message is to use AI to move faster and create customer value more efficiently, without sacrificing quality or human oversight.
Decoder
- Coding agents: AI programs capable of generating, analyzing, or debugging code based on instructions, used here by designers to build prototypes or tools without extensive manual coding.
Original article
Full article content is not available for inline reading.
Anthropic prepares Mythos 1 for Claude Code and Claude Security
Anthropic is nearing a broader public release of Claude Mythos 1, its AI model specialized in cybersecurity, for Claude Code and Claude Security enterprise offerings.
Deep dive
- Anthropic is preparing Claude Mythos for a broader release as Mythos 1.
- Evidence of Mythos 1 has appeared with a "claude-mythos-1-preview" label for Claude Code and Claude Security.
- Project Glasswing, using Mythos-grade models, has already identified over 10,000 high- or critical-severity vulnerabilities.
- The Claude Security offering is receiving a new dashboard for discovered vulnerabilities, historical charts, and triage results.
- Earlier statements suggested Mythos would remain restricted, making this a notable shift towards broader availability once safeguards are in place.
- Claude Opus 4.8 is also rumored to be in internal evaluations with partners, potentially launching in the coming weeks.
Decoder
- Project Glasswing: Anthropic's collaborative AI cybersecurity initiative focused on using AI models to discover software vulnerabilities.
Original article
Anthropic appears to be moving Claude Mythos closer to broader availability than its original guidance suggested. A recent Project Glasswing update notes that the model is now helping protect a wider range of organizations, including open-source projects, and adds that Mythos-grade models could reach the public once the right safeguards are in place.
Last month we launched Project Glasswing, our collaborative AI cybersecurity initiative. Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in essential software.
— Anthropic (@AnthropicAI) May 22, 2026
And in the near future, once we’ve developed the far stronger safeguards we need, we look forward to making Mythos-class models available through a general release.
This marks a notable shift from the earlier framing, in which Anthropic stated that Mythos would remain restricted. Traces of the model have already surfaced on Google Cloud and AWS through vulnerability discovery programs, and signals now point to a product called Mythos 1, carrying a preview label ("claude-mythos-1-preview"), being prepared for Claude Code and Claude Security.
Some users were temporarily able to see the "Mythos 1" model in the UI. Besides that, new strings have been added to the source code recently:
"Access to the Claude Mythos model in Claude Code and Claude Security."
The Claude Security side of that rollout is getting structural work too. A new dashboard is being built that surfaces discovered vulnerabilities, seven-day and thirty-day historical charts, and deeper triage results. There are no indications yet that the product will move beyond enterprise customers, but the refresh brings it closer to parity with how rival security suites present scan history.
In parallel, Claude Opus 4.8 is rumored to be in the works for release, with select Anthropic partners already conducting internal evaluations. A launch in the coming weeks would fit the cadence set by Opus 4.7 in April and would slot neatly alongside the Mythos and security product moves.
Lance (Hugging Face Repo)
ByteDance Research has released Lance, a lightweight native unified multimodal model with 3B parameters, demonstrating strong performance in image and video understanding, generation, and editing.
Decoder
- Multimodal model: An AI model that can process and understand information from multiple types of data, such as text, images, and video.
- Active parameters: The number of parameters in a neural network that are actively used or contribute to the model's computations during inference or a specific task.
- Hugging Face: A platform and community for machine learning, providing tools, datasets, and pre-trained models, especially for natural language processing and multimodal AI.
Original article
Lance is a lightweight native unified multimodal model that supports image and video understanding, generation, and editing. It delivers strong performance across image generation, image editing, and video generation benchmarks with just 3B active parameters. The model was trained entirely from scratch within a 128-A100-GPU budget. Examples of clips generated by the model are available in the repository.
Anthropic's march to profitability
Anthropic is on track for $10.9 billion in Q2 revenue and expects to clear $559 million in profit by its October IPO, defying the "AI labs burn money forever" narrative.
Original article
Anthropic is on track to do $10.9 billion in Q2 revenue, up from $4.8 billion in Q1, growing faster right now than Zoom did at the peak of the pandemic. The thing that flipped them to profit is compute getting cheaper, 71 cents of compute per revenue dollar in Q1, 56 cents in Q2. Claude Code on its own is at $2.5 billion in revenue, and the company expects to clear $559 million in profit just in time for its October IPO. The "AI labs burn money forever" story finally has a hole in it.
Bumblebee Goes Open Source
Perplexity open-sourced Bumblebee, a security scanner designed to identify risky packages, extensions, and AI tool configurations on developer machines.
Original article
Perplexity open-sourced Bumblebee, a read-only security scanner that identifies risky packages, extensions, and AI tool configurations on developer machines.
Gemini 3.5 Flash (Low)
Google's Gemini 3.5 Flash (Low) generates 45% fewer tokens than Flash (Medium) and surprisingly outperforms Flash (High) on software engineering tasks.
Decoder
- SWE tasks: Software engineering tasks, which involve coding, debugging, testing, and other development-related activities.
- Tokens: The basic units of text (like words or sub-words) that an LLM processes, generates, and counts. Fewer tokens often mean lower cost and faster inference.
Original article
Gemini 3.5 Flash (Low) generates around 45% fewer tokens than Gemini 3.5 Flash (Medium) and generally outperforms Gemini 3.5 Flash (High) on SWE tasks.
SpaceX Launches 400-Foot-Tall Rocket That Will Help Define Its Future
SpaceX's upgraded Starship rocket partially succeeded in its latest test launch, separating its booster but losing an engine before exploding on ocean touchdown.
Original article
SpaceX launched an upgraded version of its Starship rocket from a new launchpad at its Starbase facility on Friday. The booster successfully separated from the spacecraft, but it wasn't able to conduct an engine maneuver and eventually crashed into the Gulf of Mexico. Starship lost one of its engines but was able to make it to space. It deployed devices that mimic satellites, as well as two satellites that took images of the flying spacecraft. Starship exploded after it touched down in the Indian Ocean.
China launches Shenzhou-23 mission with potential record one-year stay in orbit
China launched its Shenzhou-23 mission to the Tiangong space station, potentially involving a record one-year stay for one astronaut and the first autonomous rapid docking.
Decoder
- Tiangong space station: China's modular space station, currently in low Earth orbit.
- Long March-2F Y23 rocket: The specific rocket variant used by China to launch Shenzhou missions.
Original article
China launches Shenzhou-23 mission with potential record one-year stay in orbit
Investing.com -- China is set to launch its Shenzhou-23 mission on Sunday, sending three astronauts to the Tiangong space station in a flight that could see one crew member remain in orbit for a full year, the longest human space mission in the country’s history.
The Shenzhou-23 spacecraft is scheduled to lift off from the Jiuquan Satellite Launch Center in northwestern China aboard a Long March-2F Y23 rocket, according to the China Manned Space Agency.
The crew includes commander Zhu Yangzhu, pilot Zhang Yuanzhi, and payload specialist Li Jiaying, a former Hong Kong police inspector who will become the first astronaut from Hong Kong to take part in a Chinese space mission.
Chinese officials said one astronaut could remain aboard Tiangong for up to a year, exceeding the six-month missions that have been standard for China’s space station program since 2021. The agency said the astronaut selected for the extended stay will be determined later during the mission.
The launch comes as China advances plans to land astronauts on the moon by 2030, setting up a race with the United States, which is targeting a crewed lunar landing in 2028 under NASA’s Artemis program.
China is developing the hardware needed for its lunar ambitions, including the Long March-10 rocket, the Mengzhou spacecraft, and the Lanyue lunar lander. Officials have described recent testing of these systems as part of preparations for future crewed lunar missions.
The Shenzhou-23 mission will also conduct the first autonomous rapid rendezvous and docking procedure with Tiangong’s core module, a capability expected to support future lunar operations.
Scientists will use the mission to study the effects of long-duration spaceflight, including radiation exposure, bone density loss, and psychological stress.
China has steadily expanded its space program in recent years. In 2024, it became the first country to return samples from the far side of the moon through a robotic mission.
Beijing is also working with Russia on plans to establish a permanent lunar base by 2035.
Inside the World's Biggest Bet on Fusion Energy
The $22 billion International Thermonuclear Experimental Reactor (ITER) in France, a collaborative fusion energy project among geopolitical rivals, aims to contain plasma 10 times hotter than the Sun's core.
Decoder
- Tokamak: A type of magnetic confinement device used for fusion research, characterized by its donut-shaped (toroidal) vacuum chamber.
Original article
Nestled in the countryside of southern France is a sprawling industrial complex where scientists and engineers from around the world have converged to build the world's largest-ever fusion reactor: a doughnut-shaped vacuum chamber designed to contain temperatures 10 times hotter than the core of the Sun.
At an estimated cost of $22 billion, the International Thermonuclear Experimental Reactor is the world's biggest bet on fusion energy: a project so daunting in scale that longtime geopolitical rivals have pooled their resources to share in its potential risks and rewards.
As ITER's chief strategic advisor Laban Coblentz put it, "That China and Russia were going to collaborate with the US and Europe, and add in Korea, India, and Japan -- that's either genius or insane."
Controlled fusion reactions produce millions of times more energy than the burning of fossil fuels, and four times more energy than the reactions powering traditional nuclear power plants -- without the risk of meltdown, long-lasting radioactive waste and carbon emissions. All humans have to do is create the right conditions for it to happen, but that's far easier said than done.
Containing ITER's 150-million-degree Celsius plasma will require superconducting magnets kept just a few degrees above absolute zero. To make that possible, engineers must place one of the hottest environments ever created right next to one of the coldest, with only a thin heat shield separating the two.
Cracks in the piping of this heat shield were discovered in 2020, along with distortions caused by welding and disruptions due to the COVID-19 pandemic, which led to a years-long delay in ITER's timeline and the need for an additional $5 billion to cover repair costs. At the same time, private fusion startups have been multiplying, with many hoping to beat ITER to major milestones.
Despite the pressure and criticisms generated by these overruns and delays, the people I met at ITER all spoke about the project like an open book. "This is a publicly funded project," said Javier Artola, a scientist working on modeling the behavior of ITER's plasma. "It is the knowledge of the world."
A publicly funded project like ITER helps de-risk the research and development needed for commercial-scale fusion, making it easier for private companies to place their own big bets on the technology. Every problem ITER solves is one less problem private fusion companies will have to figure out.
Every member state of the ITER agreement (which includes more than 30 countries) will have access to all the science that comes out of ITER, and the construction of ITER itself is developing a global fusion energy supply chain. If the member states agree to share it with them, even non-member states may benefit from ITER's science.
"We have become a model for how countries of unlike persuasion can work over decades, only through the shared vision of a better world that everybody wants for the next generations," said Coblentz.
Fusion is one of those technologies that people often joke is always a decade away. But seeing firsthand what ITER is building gave me hope that we may truly be living in the last decade when fusion is still spoken of as a distant dream.
To see our journey into the heart of this one-of-a-kind experiment in fusion energy and international collaboration, check out the video in this article.
auth.md (Website)
WorkOS introduced auth.md, an open protocol allowing AI agents to securely register users with apps without human interaction by asserting identities via Markdown files.
Deep dive
- Auth.md is a Markdown file hosted at an application's domain (e.g.,
https://yourapp.com/auth.md) that defines how AI agents can register users. - It specifies supported registration flows (e.g., agent-verified, user-claimed via OTP), available scopes, and registration endpoints.
- The "agent verified" flow allows an agent's identity provider to vouch for a user, requiring no human interaction.
- The "user claimed" flow involves an OTP (one-time password) sent to the user for confirmation.
- Apps retain control over accepted flows and the type of credentials issued to the agent (e.g., scoped API keys or access tokens).
- Auth.md is an open protocol, not tied to WorkOS infrastructure, and leverages existing OAuth standards like Protected Resource Metadata and ID-JAG identity assertions.
- WorkOS provides an AuthKit for easier implementation and is actively shaping the protocol with early adopters.
Decoder
- Agent-verified flow: A registration method where an AI agent's identity provider asserts the user's identity, eliminating the need for direct human interaction.
- User-claimed flow: A registration method where an AI agent triggers an OTP (one-time password) that the human user confirms, thereby claiming the account.
- OAuth: An open standard for access delegation, commonly used for granting websites or applications access to information on other websites without giving them the password.
- Protected Resource Metadata: Standardized way to describe the resources and capabilities of an OAuth 2.0 protected resource.
- ID-JAG identity assertions: A standard related to identity assertions, often used in federated identity systems to convey user identity information securely.
Original article
auth.md
Enable agents to register users without the sign-up form. Auth.md provides secure agent registration that any app can implement.
Self-serve agent discovery
Publish auth.md at your domain with the flows, scopes, and endpoints an agent needs to register.
Choose the flows you support
Allow trusted identity assertions, OTP-based claim flows, or anonymous access.
Credentials you control
Issue scoped API keys or access tokens tied to users — auditable, expirable, revocable.
Get started
For services that want agents to register users on behalf of their customers.
For platforms whose agents act on behalf of users.
Get in touch to enable auth.md on your account.
FAQs
- What is auth.md? A Markdown file an application hosts at its domain — typically
https://yourapp.com/auth.md— that tells agents how to register on behalf of a user. It includes which flows are supported, which scopes exist, and how to register for the service. - How does an agent register a user with my app? The agent fetches your
auth.md, picks a supported flow, and either presents a verified identity assertion (agent verified flow) or walks the user through an OTP-based claim (user claimed flow). You stay in control of which flows you accept and what credentials get issued. - What's the difference between the agent verified and user claimed flows? Agent verified is agent-attested — the agent's identity provider vouches for the user, no human interaction required. User claimed is OTP-based — the agent triggers a code, the human confirms, the account is claimed. Most apps support both and let the agent pick the right one for the situation.
- What credentials get issued to the agent? Your service decides whether to return a scoped API key or access token tied to the user. This allows for re-use of your existing API auth methods.
- Is auth.md a WorkOS only feature or an open protocol? It's open. WorkOS authors the protocol, but
auth.mdisn't tied to WorkOS infrastructure — it composes existing OAuth standards (Protected Resource Metadata, ID-JAG identity assertions) and any app can publish or any agent can read one with no WorkOS account required.
Predicting AI job exposure
A new analysis argues it's impossible to reliably predict AI's impact on specific jobs because past tech shifts show unexpected outcomes and evolving roles.
Deep dive
- Past technological shifts like accounting automation or the internet's rise demonstrate that predicting job impacts is fraught with error.
- The number of accountants continued to rise despite a century of automation because regulations changed, and efficiency gains (Jevons paradox) led to more analysis, not less work.
- Technology often makes existing tasks cheaper, leading to new types of work or increased volume of related tasks, fundamentally changing job descriptions even if titles remain.
- The internet didn't change what it meant to be a journalist, but it destroyed the newspaper's business model, an effect hard to predict from job descriptions alone.
- The rise of smartphones created Uber, an unpredictable disruption for taxi drivers, which wouldn't have been flagged by a 2005 "smartphone exposure" analysis.
- Relying on generic job descriptions like O*NET to predict automation exposure is flawed because jobs are complex, subtle meshes of activities, not simple logical steps.
- This phenomenon is akin to Gell-Mann Amnesia, where experts recognize complexity in their own field but underestimate it in others, leading to oversimplified AI impact predictions.
- Quantifying AI's impact job-by-job is "fooling yourself" because you don't truly know today's jobs or how they will change.
Decoder
- Jevons paradox: An economic theory stating that as technological efficiency increases the use of a resource, the rate of consumption of that resource also increases, rather than decreasing.
- O*NET: The Occupational Information Network, a comprehensive database of job characteristics and worker requirements used for career exploration and job analysis in the US.
- Gell-Mann Amnesia: A phenomenon where one critically assesses news in their area of expertise but uncritically accepts information from other fields as accurate, forgetting that the same journalistic sloppiness might apply elsewhere.
Original article
Predicting AI job exposure
It would be really nice if we had some way to analyse which jobs, companies and industries were exposed to AI, and if we could assign scores, and build charts, and map that against the progress of large language models. We know, in principle, that like every other big wave of technology, AI is bound to destroy some jobs and create others. But which ones? In the last three years a bunch of people have been very busy crunching census data, making tables and building viral charts.
I think this is mostly impossible: I think this is an exercise in predicting something that cannot be predicted.
The simplest way to see the problem is to back-test this against other big technology shifts in the past. Some of the industries that should have suffered most ended up much bigger, and some of the industries that did suffer most should have been immune.
Hence, we spent a century automating accounting: we built calculating machines, punch cards, mainframes, data processing, databases, PCs, spreadsheets, ERPs, cloud… in fact, we built half of the tech industry around automating this. Yet the number of accountants kept going up.
This is high-level survey data, but you can see much the same thing at the micro level. The next chart is about as specific as it gets: 50 years of financial automation doesn’t seem to have hurt the market for CPAs. If you’d done any kind of analysis of professions exposed to automation from computing, this should have been at the top of the list. Dan Bricklin talks about CPAs in the late 1970s using VisiCalc to do one-month projects in a few days. And yet, look what happened.
I think there are three things to point to in this chart. The first is that technology was not the only variable: changes in regulation produced new accounting requirements that led to a one-off surge in CPA hiring (this is why economists say ceteris paribus). Second, within the automation conversation itself there is the Jevons paradox, which is really applied price elasticity: if you make it cheaper to do something, do you do the same for less money (or resources, or employees), or more for the same money, or does a new ROI mean you do more for more money? If a DCF takes a week and then it takes 30 seconds, you probably do more DCFs. ‘Exposure to automation’ might mean more work, not less.
But then, the more important story is that if you automate something that used to be expensive and time-consuming and it becomes cheap and quick, that probably unlocks other things. If analysis becomes cheap and easy, you do much more analysis, and mostly that’s also a different kind of analysis. Accountants today aren’t doing exactly the same work that they did in 1970 or 1980 ‘but more’ - they’re still called ‘accountants’ but the job is different. New technology often starts out being used for ‘the old thing but more’, but it rarely ends up like that.
Indeed, if you dig into the detail of the Census data, then ‘accountants and auditors’ itself is a fairly stable category, but all around that term there are lots of other finance job categories that appear and disappear over time. The job of “Billing, posting and calculating machine operator” appeared in the stats for a decade or so and then disappeared again. How often did that represent someone who started their career as a stock clerk, then became a ‘posting machine operator’ because that was how you did stock-keeping, and then retired as a stock clerk again when that was absorbed into software and the Census didn’t create a category for ‘PC operator’? Equally, there’s still a category for ‘data keyer’ but not for ‘ERP operator’. The same person doing the same actual job (or rather, serving the same business purpose) gets different job titles over time, while ‘accountants’ have the same job title while doing different things.
Then, I think there's a second problem that comes up in back-testing: the job might not change at all, but the business might change underneath you.
The internet didn't really change what it took to be a good journalist or a good A&R scout, but the job of journalism was paid for by a light manufacturing and trucking operation with (in the USA) a local monopoly on classified ads, and the record executive’s salary was paid by manufacturing and shipping small pieces of plastic and aluminium foil. That was a whole other thing that would not be captured in any analysis you tried to do of what it is to be a copy editor or a sound engineer. The internet decoupled a class of business where the product and the job were not affected by the internet but the business was.
It seems to me that we should expect the same thing to happen with AI: how many people have a job that has very low exposure to AI, but the business depends on some other job that is hugely affected by AI? How many people have a job doing something that’s very hard for AI to match, but their company’s defence against competition is that they also have lots of buildings full of people doing something very boring? AI will take a bunch of stuff that used to be expensive and make it very cheap or free - what does that unlock and what does that break, and how many jobs is that?
Third, continuing the theme of big and unpredictable effects of past technologies, how does your analysis handle Uber? I worked in mobile in the 2000s and we all spent a lot of time talking about location data, but it didn’t occur to anyone that this might be an issue for taxis - you might have suggested more efficient dispatch, but no-one was considering that this could totally change the nature of the job (and make a bunch of $1m medallion mortgages worthless). If you’d been calculating ‘internet exposure’ by occupation in 1995 or ‘smartphone exposure’ in 2005 (yes, we had smartphones before the iPhone), are you confident you’d have put taxi drivers on the list?
(Source: Todd Schneider / MTA)
Narrowly, then, the problem with using things like O*NET to try to analyse what a job is and how much it can be automated is that this tells you nothing about all the ways that the job shrinks and grow with automation, and the ways that the job itself might be changed by automation elsewhere, outside your analysis.
But I think there's a more fundamental problem, too. Even if you set aside the question of change, I don't think it's possible, in principle, to create a usefully complete description of what the job is.
Reading O*NET descriptions of jobs reminds me a lot of the failure of expert systems, when people thought that you could use logical steps to build an AI system to do image recognition or language translation. Theoretically, you can describe a series of steps by which a machine can recognise a cat, and theoretically, you can write down exactly what an associate partner at a law firm does, but in reality, these things are just too complex or too subtle for us to be able to describe them like that. Sometimes, of course, the job really is just a task, that can be turned into a button, but that's actually pretty rare. Generally, the job is a complex mesh of things that we lack the capability to explain explicitly (tangentially, this is also why most people seem to struggle to use chatbots). And, of course, once you dig into the detail these descriptions fall apart, just as logical systems did before machine learning: apparently administering a family trust and running a desk at a quant fund are comparable jobs, and they need fluency in Lotus 1-2-3, Oracle or Quickbooks but not Bloomberg.
Aaron Levie, CEO of Box, described this as a variant of ‘Gell-Mann Amnesia’. You have a pretty good sense of how complex your own field is, and how incomplete AI’s addressability of that might be, but in other fields you forget this - you see a Claude template for a Powerpoint or a legal draft and you think “wow, consultants and law firms are screwed!” When you hire Bain, BCG or McKinsey, they will give you some slides, but that’s not what you’re paying for, just as when you buy software, you’ll get some code, but that’s not the product.
The counter-argument to all of this, would be to say that, yes, well done, there are important exceptions, as there always are, but directionally and in aggregate, it is ‘surely’ correct to say that jobs that involve a lot of repetitive clerical work are most exposed, and this is how many jobs that is, and by how much. That sounds good, but you don’t know if the exceptions are bigger than the rule. Suppose we’d looked at the internet in 1995 and said that this would destroy the value of physical distribution for media - this was ‘directionally correct’, but in practice that meant totally different things for record companies, newspapers, TV companies and movie studios. On average, we’re all dead. Half of the jobs you’ve analysed might be entirely unaffected, and there might be other big pools of jobs to be transformed that you miss entirely. You don’t know.
A while ago, I noted someone had criticised my work by saying that I always end by saying ‘it depends’. But when you're at such an early stage of a fundamentally new technology, any specific predictions about a particular field will only be correct by luck: it really does depend. As Yogi Berra said, “it’s tough to make predictions, especially about the future”. We can certainly point to framings and mental models for how this might work, and we can point to what happened the last half-dozen times we went through this kind of change. We can even say things that are probably directionally correct. But as soon as you try to quantify that, and model it out job by job and industry by industry, and make pretty radar charts, you’re fooling yourself, because you do not actually know what those jobs are today, and you do not know how they will change. At a minimum, you have to ask whether your model passes the newspaper test, the Uber test and the CPA test: would your approach have captured those effects? If not, how useful is it to the rest of us?
Don't Roll Your Own ...
Developer Susam Pal argues against building custom web UI features like scrolling, link navigation, or date pickers when browsers already provide robust, familiar native implementations.
Deep dive
- The principle "Don't roll your own crypto" should extend to web UI features where browsers excel, as custom implementations often degrade user experience.
- Custom page scrolling often breaks familiar responsiveness to mouse, touchpad, or keyboard input, making navigation frustrating.
- Custom link navigation, like on GitHub, can introduce delays and break standard browser behaviors (e.g., opening in a new tab sometimes being faster).
- Custom password fields often interfere with browser-native password saving, autofill, strong password generation, and security warnings for insecure connections.
- Custom date pickers vary widely across websites, forcing users to learn new interaction patterns instead of using their preferred, consistent browser default.
- Native form controls are generally well-equipped, accessible, and integrate with system-level features like password managers and accessibility tools.
- Constantly changing website layouts and interfaces, even if well-intentioned, can be highly disruptive, especially for less tech-savvy users.
- The author argues that unless there's a compelling, specific reason, developers should be more conservative with custom UI implementations for serious websites.
Original article
Don't Roll Your Own ...
This is going to be a rant about modern web design practices. But before I get to that, let me begin with a familiar principle from the world of cryptography. Among software developers, and especially among those who work on security-sensitive systems, there is a well-known maxim: Don't roll your own crypto. This does not mean that nobody is allowed to write cryptographic code. Someone has to. It means that, for ordinary production software that protects sensitive data of users, we should not rely on a private, unreviewed implementation that has not been vetted by the wider software development community. We should use established, vetted software packages or tools wherever possible.
Fortunately, it is now standard industry practice to avoid rolling your own crypto and instead use cryptographic algorithms and packages that have been peer reviewed and stood the test of time. It wasn't so some twenty years ago. I have seen several flawed home-grown RC4 implementations early in my career, with issues like improper initialisation vectors, predictable keystreams and partial leakage of plaintext into ciphertext, putting sensitive data of users at risk. But today, major e-commerce websites or banks typically do not use home-grown cryptography for its web services. In fact, in regulated domains such as payments, healthcare and personal data processing, doing so could violate requirements for strong cryptography, possibly leading to hefty financial penalties.
Website design is obviously not cryptography. A broken scroll bar is not the same kind of failure as a broken encryption scheme. But I wish there were a similar maxim for website design as well. There are many aspects of websites where, I think, developers should not be rolling their own X, especially when X is something browsers already do well and something users depend on every day. Here I present a list of such X.
- Don't roll your own page scrolling.
- Don't roll your own link navigation.
- Don't roll your own text selection.
- Don't roll your own context menu.
- Don't roll your own copy and paste.
- Don't roll your own password field.
- Don't roll your own date picker.
Of course, there are valid scenarios where you may need to roll your own X. But here I want to focus on the cases where you should not roll your own X, and how doing so can lead to a worse user experience, at least in my experience. I am not saying that nobody should ever build anything themselves. As someone who does a lot of creative computing myself and develops fun tools from time to time, I am a big proponent of developing your own stuff. But when it comes to developing user interface features for serious websites that people need to use to get their work done, I wish the software development community were more conservative in deciding what fancy feature goes into a website and what is left out. Do keep in mind that I am no expert in user experience. Far from it. So none of what I am saying here should be taken as a recommendation. But I am a user of the Web, and as a user, I have found some modern web design patterns to be frustrating. This post is a lament from one user of the Web, not a design guide.
Of all the things I mentioned above, the one that bothers me the most is custom scroll behaviour on websites. I am used to how page scrolling responds to my mouse, touchpad or keyboard input. When you override the default scrolling behaviour of the web browser with your own implementation, it 'breaks' the page for me. The page now moves too slowly or too quickly when I scroll. Keyboard scrolling may or may not work. You take something I am so familiar with that I don't even think about it, and turn it into something unfamiliar that I now have to think about.
Custom link navigation is another pet peeve of mine. Web browsers can already handle links very well. You could say that this is the whole reason web browsers even exist. Following links is their bread and butter. You shouldn't have to mess with that behaviour at all. If you think you need to, reconsider what you are trying to achieve and whether it is really so important as to disrupt normal link navigation. The worst offender I have found here is GitHub. When you click on a link on GitHub, say, a file link or an issue link, it triggers a massive piece of functionality implemented in JavaScript that handles the link click for you. If you don't believe me, visit your favourite project on GitHub using Firefox or Chrome, type F12 to open the browser's developer tools, then go to the 'Debugger' or 'Sources' tab, find 'Event Listener Breakpoints' on the right sidebar, expand 'Mouse' and select 'click'. Then click on a link on GitHub and see what happens.
I'm sure I am not the only one who has noticed that, on GitHub, a clicked link sometimes takes too long to load. Ironically, it is often faster to open the link in a new tab than to wait for GitHub's JavaScript code to handle the navigation in the current tab.
A custom password input field is another such hazard. Fortunately, custom password input fields have become rarer over the years. The password input field that comes with the web browser is generally well equipped to handle passwords. It can offer to save passwords, fill them in later and generate strong passwords for new accounts. It can also warn when a password is submitted over an insecure HTTP connection, work well with password managers and autofill, and cooperate with mobile keyboards and accessibility tools. If you replace the browser's password field with your own fake version, you may break all of that. You may also end up using an ordinary text field and masking it yourself, in which case the password may be treated by the browser, the operating system or assistive tools as ordinary visible text rather than as a password, thereby exposing the password in ways you did not intend.
Custom date pickers are another common annoyance. I know that <input type="date"> does not help you select a date range. But that is okay. You can provide two date input fields, one for the start date and one for the end date. I am willing to pay the small price of using two different inputs to select a date range if that means I can use my favourite web browser to navigate the calendar and select dates the same way everywhere. What I am less inclined to do is to learn ten different ways of using the date selector in ten different implementations across ten different websites. Right now the implementations of date selector are all over the place. Some require you to zoom out of the month view to enter a year view, where you can select years. While you are there, you cannot change the month again until you return to the month view. Some require you to click the previous-year button literally forty times to select your year of birth if you are old enough. Some do not let you type the date at all. No. I do not want to learn your calendar widget. I just want to use the date picker in my favourite browser, which is quite sane. Saner than your custom implementation. If you need to have a calendar widget to support browsers with inadequate native date-picker support, perhaps that support can be added alongside the native date picker rather than as a replacement for it. For example, the ordinary <input type="date"> element could be left intact, with a custom widget provided in addition to it so that users can manipulate the same field.
In general, just stop messing with the form controls. They almost always introduce new problems while solving some existing ones. And while you are at it, don't keep changing your website layout and interface every few months! I may adapt to the new design, but my ageing relatives cannot. For them, every time you change the user interface, it amounts to learning a whole new tool. If every website keeps doing this every few months, they have to spend a significant amount of time relearning familiar things for no functional benefit. Please just let them enjoy their retirement. Imagine how you would feel if a Linux distribution decided to redesign all its core commands and their command-line options every few months. Or imagine how you would feel if the buttons of your washing machine were rearranged every morning. It wouldn't be pleasant!
Snap Specs True AR Glasses Reportedly Launch This Fall For Around $2500
Snap's consumer "Specs" true AR glasses, designed to place virtual objects naturally into the real world, are reportedly launching this fall for around $2500, targeting early adopters.
Deep dive
- Snap's consumer "Specs" AR glasses are anticipated to launch in Fall 2026 at approximately $2500, with a production run of around 100,000 units.
- Alex Heath, a veteran tech journalist, reported this pricing and launch window.
- "Specs" are designed to be "true AR glasses," meaning they overlay virtual objects onto the physical world without significantly dimming or distorting the user's view, unlike existing development kits.
- These consumer glasses are expected to be much smaller and lighter than the current Spectacles AR development kit, which rents for $99/month.
- They run Snap OS, an Android-based system that does not support native APKs or third-party engines like Unity.
- Developers create sandboxed "Lenses" (apps) using JavaScript or TypeScript within Snap's Lens Studio, interacting with high-level APIs.
- This software approach offers advantages similar to Apple's visionOS Shared Space, including fast app launches, interaction consistency, and multi-user experiences.
- Snap OS 2.0, released late last year, added and improved first-party apps like Browser and Gallery, moving the platform closer to consumer readiness.
- The $2500 price point targets wealthy early adopters, similar to Apple Vision Pro.
- Snap's competitors, Meta and Apple, are not expected to release their true AR glasses until late 2027 and 2028, respectively.
- Snap recently spun its AR hardware division into a dedicated subsidiary, Specs Inc.
Decoder
- True AR glasses: Augmented Reality glasses that display virtual objects seamlessly integrated into the user's physical surroundings, without significantly obscuring or distorting the real-world view.
- APK: Android Package Kit, the package file format used by the Android operating system for distribution and installation of mobile apps and middleware.
- Lens Studio: Snap's desktop application for Windows and macOS that allows developers to create augmented reality "Lenses" (apps) for Snapchat and Snap Spectacles.
- Lenses: Snap's term for augmented reality experiences or applications developed for its platform.
Original article
The Snap Specs standalone true AR glasses will launch this fall, veteran tech journalist Alex Heath reports, priced around $2500.
The company behind Snapchat officially announced that it would release standalone true AR glasses, called Specs, just under one year ago.
Compared to the bulky and heavy Spectacles standalone AR development kit glasses, which the company rents to developers for $99/month or students for $49/month, Snap CEO Evan Spiegel claimed the consumer Specs will have "a much smaller form factor, at a fraction of the weight, with a ton more capability", while running the same Snap OS operating system and supporting all the same apps developed so far.
Snap OS is relatively unique. While on an underlying level it's Android-based, you can't install APKs on it, and thus developers can't run native code or use third-party engines like Unity. Instead, they build sandboxed "Lenses", the company's name for apps, using the Lens Studio software for Windows and macOS. In Lens Studio, developers use JavaScript or TypeScript to interact with high-level APIs, while the operating system itself handles the low-level core tech like rendering and core interactions. This has many of the same advantages as the Shared Space of Apple's visionOS: near-instant app launches, interaction consistency, and easy implementation of shared multi-user experiences without friction. It even allows the Spectacles mobile app to be used as a spectator view for almost any Lens. Snap OS doesn't support multitasking, but this is more likely a limitation of the current hardware than the operating system itself.
Since releasing Snap OS in the latest Spectacles kit in late 2024, Snap has repeatedly added new capabilities for developers building Lenses, and late last year launched Snap OS 2.0, adding and improving first-party apps like Browser, Gallery, and Spotlight to bring the AR platform closer to being ready for consumers.
In April, Alex Heath released a report via his Sources newsletter wherein he claimed that Snap will preview its new Specs glasses in the next couple of months, followed by a consumer release in the fall.
In an October edition of Sources, Heath said that Snap was targeting a price of around $2500 for Specs, and a production run of around 100,000.
That price puts it squarely in the realm of relatively wealthy early adopters, like Apple Vision Pro. But, assuming it isn't beaten to market by something we're not aware of, Specs will be the first standalone true AR glasses (meaning relatively normal-looking glasses that can place interfaces and virtual objects into your physical space, without significantly dimming or distorting your view of the real world) from a major tech company.
Meta's $800 glasses are considerably more affordable yes, but also vastly less capable, showing only a small fixed heads-up display (HUD) in one eye, while Snap is targeting a relatively wide field of view binocular display system with head tracking, hand tracking, and realtime environment meshing.
Multiple reports suggest Meta plans to ship its own true AR glasses in late 2027, and Bloomberg's Mark Gurman has reported that Apple won't launch AR glasses until 2028 at the earliest. Meanwhile, there are some obscure Chinese products that technically qualify as true AR glasses, but they're bulky, their onboard compute is significantly limited, and their software is not particularly fleshed out.
The news of Snap's plan to launch this fall comes a few months after it spun its AR hardware ambitions into a dedicated subsidiary, Specs Inc.
We'll keep a close eye on Snap in the coming months for any sign of a proper reveal of the design and specifications of Specs, a product that could be a milestone moment for consumer AR.
I'm actively writing on UploadVR again, and this article is one in a series of "catch up" pieces where I report on some of the interesting things that have been happening in the industry in recent months. And yes, VR Download is coming back very soon!
The Eternal Sloptember
George Hotz warns that AI agents in software development will be "one of the most costly mistakes" in history, leading to an "Eternal Sloptember" of low-quality code.
Deep dive
- George Hotz asserts that AI agents' adoption in software development will be one of the most costly mistakes in history, coining the term "Eternal Sloptember."
- He argues agents cannot truly program; they are statistical models mimicking programming output, producing "slop" that is increasingly hard to detect.
- After six months of trying to use agents for projects like tinygrad and reverse-engineering a USB-PCIe chip, Hotz found manual methods faster and better.
- While acknowledging AI's utility as a "better Google" and for quick, unpolished prototypes, he states agents are not close to the bar for a software engineer.
- Hotz believes that large organizations, with slower feedback loops and less alignment, will be hurt most by agents because lower performers will produce 10x output that is low quality.
- He predicts agents will lead to more code and features but a "golden era for buckets and buckets of slop, and a dark age for gems of quality."
- He mentions hearing that Apple is pushing AI on all its engineers, questioning if macOS quality will improve or worsen in the next two years.
- Hotz aligns with Yann LeCun and Gary Marcus, believing current LLM models without world models or genuine understanding cannot program effectively.
Original article
I’m calling it now, the adoption of AI agents into software development will be one of the most costly mistakes in the field’s history. Agents cannot program, and it’s taking longer and longer to realize that they can’t. They are a highly sophisticated statistical model designed to mimic the distribution of programming. The output is broken, but in a way that’s getting harder and harder to detect. Which is exactly what you’d expect from an increasingly accurate statistical model.
At first, I rejected this. I bought into the Twitter explanation of status anxiety. I define some of my self worth by my programming abilities, so wouldn’t it make sense to get defensive around that loss? Deny the models can code for as long as I could to preserve my ego?
I mean, it’s very clear they can solve math problems I couldn’t hope to solve if I devoted my life to it. So why can’t they program? Maybe I’m just not good enough of a programmer to recognize their genius.
I really tried for the last 6 months. I wrote some parts of tinygrad with agents. I reversed a USB <-> PCIe chip with agents. But each time I suspected I could have done it better and faster manually. The agent frontloads all the progress, then gives you a slot machine lever to pull to hope it gets the polish done. It never quite gets there.
And in before, “you are using it wrong.” I have tried all the different models, different harnesses, different prompts. It’s not this. The people who say this would probably say the same thing about slot machines, you see, you have to bet 5 lines after you get a cherry no wonder you aren’t winning!
I’m not saying that AI isn’t useful, it clearly is. It’s definitely a better Google for most searches. And whenever you need a quick prototype and don’t care about polish, it is absurdly fast. But is it a software engineer? Not close to the bar at any company I have worked at. The key aspect is knowing when to use it and when not to.
I thought more about the self worth preservation thing. AFL found more bugs than LLMs and nobody felt that way about it. Chess and Go are more popular than ever. I cannot fucking wait until I have armies of robot associates I can trust to clean up my code! I don’t fear loss of status, I almost think this is some kind of psyop to sell agents. Fear of loss is one of the only ways to make big companies move. Though I think in that fear they are making a big mistake.
Agents will end up hurting large organizations more than high performing individuals or small orgs. I’ve watched how my friends and coworkers have adopted these tools over the last 6 months. A trait you find in all high performing people is the ability to error correct, and they have mostly been good at seeing when slop is slop. It takes a bit to explore/exploit and tune the outer loops around when to use them, when to trust them, how to use them, etc…but I haven’t seen anyone of them move to a model where they don’t carefully read and understand each line, except in some confined domains.
Contrast this with a large organization. Much slower feedback loops, much less alignment. The bottom performers won’t have that self check. They are the ones producing 10x output with the agents. What do you think is happening to the average output of that organization? What is happening to the average output of the world?
Agents will end up producing more code, more apps, and more features than ever before. It is a golden era for buckets and buckets of slop, and a dark age for gems of quality.
I hear that Apple is pushing AI on all their engineers. When people think in the abstract, they think AI will do all this stuff, but let’s focus on a concrete example. Do you think macOS will get better or worse in the next 2 years?
When people see an artifact, they make assumptions about the process that was used to create it. Without even thinking about it, they assume the creator had a basically human state of mind. This assumption is no longer true. Things can be broken in ways that weren’t previously possible, and old proxies of underlying quality like syntax and grammar are useless. AI produced artifacts are not produced by the same process as human ones, and this difference, while extremely subtle in statistics, makes itself obvious when you try to interact with and build on the artifact in human ways.
Without fully endorsing all their ideas, I’m now in the LeCun/Marcus camp on LLMs. I don’t think models like this will ever be able to program, I think the process matters. I think that deep learning is still the solution, but real programming agents will need world models, not some RLVR shit that comments out the failing test and tells you all the tests are now passing.
The real story of this era will be who manages to avoid harming themselves in their AI psychosis.
Introducing Pulumi Do: Direct Resource Operations for Any Cloud
Pulumi introduced `pulumi do`, a new CLI command enabling direct, one-off cloud resource operations across thousands of providers without project setup or code.
Deep dive
pulumi doallows direct CRUD operations and queries on cloud resources using a single CLI command, such aspulumi do aws:s3:Bucket create.- It removes the need for project setup, code, or state tracking for quick, one-off tasks.
- The tool supports thousands of Pulumi-supported providers, maintaining a consistent command structure (
). - Output is predictably JSON on stdout, making it suitable for programmatic parsing by AI agents.
- Pulumi highlights its use case for AI agents, enabling them to provision infrastructure without human intervention when combined with "Agent accounts" and Pulumi ESC for credential management.
- Future roadmap includes unified credential management with Pulumi ESC, cross-resource references to handle dependencies, and a stateful mode with a "graduation path" to full Pulumi IaC projects.
- This feature is available as a research preview in Pulumi CLI v3.242.0 and later.
Decoder
- Infrastructure as Code (IaC): Managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.
- Pulumi ESC: Pulumi's solution for Environmental Secrets and Configuration, used for managing credentials and configuration across providers.
- Agentic Infrastructure Era: A concept where AI agents autonomously provision and manage infrastructure.
Original article
Introducing pulumi do: Direct Resource Operations for Any Cloud
Fraser WatersPat GavlinArun LoganathanChristian NunciatoPosted on May 22nd, 2026Infrastructure as code is the right model for production systems. State tracking, drift detection, and repeatable deployments all matter when you’re managing real workloads.
But sometimes, you also need a quick, one-off interaction with the cloud: create a bucket or a database, look up a VPC, delete a stray resource.
Today we’re introducing pulumi do, a new command for direct resource operations. With pulumi do, you can create, read, update, delete, and query any cloud resource from the terminal with a single command, across thousands of Pulumi-supported providers — no project, code, or state required.
The problem: Sometimes IaC is more than you need
When you’re managing production workloads, IaC is the proven solution. Code lets you declare complex systems, state tracking catches drift before it becomes a problem, dependency graphs sequence changes safely, and policy keeps everything in bounds. That full lifecycle, especially with the backing of a platform like Pulumi Cloud, is exactly what you want to build systems that scale.
But when you (or your coding agent) need an ad-hoc Postgres database, the simplest path with IaC still takes several steps: make a directory, create a project, configure your credentials, write the code, preview, deploy. It works, but it’s not always necessary for what should be a simple operation. pulumi do collapses all of those steps into one, using the same Pulumi providers, resource model, and ecosystem that powers the core Pulumi platform.
Resource creation is also only part of the problem. As Joe laid out in The Agentic Infrastructure Era, the real challenge for AI agents isn’t with code or CLI commands, it’s with everything else: getting a cloud account, resolving credentials, wiring configuration across multiple services. Agent accounts, also released this week, simplify this by letting an agent provision its own ephemeral Pulumi Cloud account, and Pulumi ESC takes care of consolidating credentials across providers. Together, with pulumi do, agents can now go from zero to deployed infrastructure without requiring a human in the loop — and when that one-off resource needs to grow into a more permanent system, there’s a clear graduation path back to full Pulumi IaC.
What it looks like
As an example, say you wanted to provision an S3 bucket. With the AWS CLI, you’d need to assemble an aws s3api create-bucket invocation with the right set of command-line flags, region constraints, a globally unique name, and so on. With pulumi do, it’s just this:
$ pulumi do aws:s3:Bucket create
That might not look all that different on the surface — but because you’re using the Pulumi engine and resource model, you can provide a minimal set of input properties, take advantage of provider-defined defaults, and use Pulumi’s auto-naming feature to give the bucket a unique name automatically:
$ pulumi do aws:s3:Bucket create
This will create aws:s3/bucket:Bucket with the following inputs:
{
"bucket": "bucket-279ea56",
"tagsAll": {}
}
Please confirm that this is what you'd like to do by typing `yes`:
Answer yes (or just pass --yes), and you’re done. To delete the bucket:
$ pulumi do aws:s3:Bucket delete bucket-279ea56 --yes
Need to look up an existing resource? Use a provider function:
$ pulumi do aws:ec2:getVpc --default
{
"arn": "arn:aws:ec2:us-west-2:663782525873:vpc/vpc-d7b311af",
"cidrBlock": "172.31.0.0/16",
"enableDnsHostnames": true,
"enableDnsSupport": true,
"enableNetworkAddressUsageMetrics": false,
"id": "vpc-d7b311af",
...
}
Same CLI, same output contract, same provider ecosystem.
The command shape
The do command accepts a Pulumi resource type, or type token, to determine the action to take. Type tokens have the form <package:module:resource>. For example, aws:s3:Bucket refers to the Amazon S3 Bucket resource that belongs to the s3 module of the aws package.
You can also provide a portion of the token to help you find what you’re looking for without ever having to leave the terminal:
$ pulumi do aws:s3
Functions and resources for the s3 module.
Run 'pulumi do <module/resource/function> --help' for more details on usage.
Functions:
aws:s3:getAccessPoint
aws:s3:getAccountPublicAccessBlock
aws:s3:getBucket
aws:s3:getBucketObject
...
Resources:
aws:s3:AccessPoint
aws:s3:AccountPublicAccessBlock
aws:s3:AnalyticsConfiguration
aws:s3:Bucket
...
$ pulumi do aws:s3:Bucket read bucket-d20976f
{
"arn": "arn:aws:s3:::bucket-d20976f",
"bucket": "bucket-d20976f",
"bucketDomainName": "bucket-d20976f.s3.amazonaws.com",
"bucketNamespace": "global",
...
}
The package, module, and resource/function segments all come directly from the Pulumi provider schema, so --help works at every level of the tree. Pass a package name, optional module, and optional function or resource type, and do returns the appropriate level of detail.
You can also provide the input properties of a resource in a YAML or JSON file with the --input option. To create a container service in Google Cloud Run for example:
# service.yaml
location: us-central1
deletionProtection: false
template:
containers:
- image: us-docker.pkg.dev/cloudrun/container/hello
$ pulumi do gcp:cloudrunv2:Service create \
--input yaml \
--input-file service.yaml
This will create gcp:cloudrunv2/service:Service with the following inputs:
{
"deletionProtection": false,
"location": "us-central1",
"name": "service-b8af752",
"template": {
"containers": [
{
"image": "us-docker.pkg.dev/cloudrun/container/hello"
}
]
}
}
The result:
{
"createTime": "2026-05-22T23:00:22.415839Z",
...
"urls": [
"https://service-b8af752-921927215178.us-central1.run.app",
"https://service-b8af752-ctnulmzwoa-uc.a.run.app"
]
}
Resource operations
Most resources support the full set of CRUD operations — create, read, update, delete, and list — directly from the CLI. Each operation maps to a provider CRUD method using the same provider logic a full Pulumi program would use, and resources are addressable by their cloud provider IDs:
# Create a resource
$ pulumi do aws:s3:Bucket create --yes | jq -r ".name"
bucket-4f5cb22
# Fetch it
$ pulumi do aws:s3:Bucket read bucket-4f5cb22 | jq -r ".hostedZoneId"
Z3BJ6K6RIION7M
# Update/patch it
$ pulumi do aws:s3:Bucket patch bucket-4f5cb22 --input yaml --input-file tags.yaml
$ pulumi do aws:s3:Bucket read bucket-4f5cb22 | jq ".tags"
{
"key": "value"
}
# Delete it
$ pulumi do aws:s3:Bucket delete bucket-4f5cb22
Provider configuration
Today, pulumi do resolves provider configuration — for example, applying your AWS credentials — using environment variables or credential files as supported by each individual Pulumi provider. See the Pulumi Registry for provider-specific configuration details.
Designed for humans and agents
We’ve designed pulumi do to serve humans and coding agents equally well, guided by three fundamental ideas:
-
Consistent command structure across every provider. The
do <package:module:type> <operation>pattern is the same for AWS, Azure, Google Cloud, Kubernetes, Cloudflare, Datadog, and every provider, including packages containing higher-level component resources. Once an agent learns that pattern, it applies across the board. -
Predictable output contract. JSON on stdout, progress on stderr, consistent exit codes. An agent can parse the result programmatically without scraping human-formatted tables.
-
A single CLI command that works across every cloud. Many cloud and SaaS providers don’t have a full CLI at all.
pulumi dogenerates commands from the provider schema, so if a Pulumi provider exists for it, the CLI just works. Neither humans nor agents need to install, learn, or even know about cloud provider-specific tooling.
What’s next
Resource operations and provider functions are the foundation. The pulumi do roadmap extends the same direct-operation model with credential management, state tracking, and a path to full IaC.
Unified credentials with Pulumi ESC
One of the hardest parts of multi-cloud operations is credential management. Every provider has its own authentication scheme, environment variables, and session lifecycle. An agent working across AWS, Cloudflare, and Datadog today manages three separate credential mechanisms.
We’re building Pulumi ESC integration into pulumi do so you can manage credentials in one place and resolve them everywhere. ESC handles credential resolution (including OIDC-based dynamic credential generation and short-lived tokens) across all of your providers. Name the credential set, reference it, and ESC does the rest, with rotation, RBAC, and audit built in.
Cross-resource references
Real infrastructure has dependencies — subnets need VPCs, security group rules need their security groups, and so on. When you’re building resources one at a time, those references need to flow between commands somehow.
A future version of pulumi do will let resource inputs reference outputs from previously created resources, allowing the CLI to resolve them automatically and preserve the dependency graph. Later, when the time comes to graduate to a full IaC program, the generated code contains proper resource references rather than hard-coded strings.
Stateful mode and the graduation path
Today, pulumi do is stateless. Each command runs independently. A planned stateful mode will persist resource state across operations, enabling drift detection, lifecycle management, and a graduation path to full infrastructure as code.
Here’s what we’re planning:
-
Zero setup. Your first
pulumi doimplicitly creates a project and stack. No manual initialization. -
Accumulate resources. Each operation stores resource state. After a few commands, you have a lightweight representation of your infrastructure.
-
Eject to a full project. When the time comes, generate a Pulumi project in your chosen language with all resources imported and dependency graphs intact.
-
Connect to Pulumi Cloud. Layer on governance, compliance, team collaboration, and deployment automation through Pulumi Cloud. Resources created via
pulumi docan be governed by Pulumi Insights from day one, even before you opt into full IaC.
This path works because pulumi do uses the same providers, resource types, and property schemas as every other pulumi operation. Provisioned cloud resources stay where they are as management capabilities are added as needed.
Get started
pulumi do ships as a research preview in Pulumi CLI v3.242.0 and later. Install or update the CLI, install a provider plugin, and start running commands. The documentation has the full reference.
We can’t wait to hear your feedback. Give it a try today, tell us what works (and what doesn’t), and help shape the CLI that agents and humans both reach for first.
- Documentation
- File a feature request
- Pulumi Community Slack for discussion
Subscribe to the Pulumi Monthly Newsletter
Request-Based Autoscaling Is Now Generally Available on App Platform
DigitalOcean's App Platform now offers generally available request-based autoscaling, allowing applications to scale based on real-time HTTP traffic like requests per second and P95 latency across all CPU plans.
Deep dive
- DigitalOcean's App Platform now supports request-based autoscaling as a generally available feature, announced on May 22, 2026.
- This allows applications to scale based on immediate HTTP traffic signals, specifically "requests per second per instance" and "P95 request latency."
- Unlike CPU-based autoscaling, which is reactive, request-based scaling acts on leading indicators, improving responsiveness to traffic spikes.
- The feature is now available for both shared and dedicated CPU instances, removing the previous restriction that required a dedicated CPU plan for autoscaling.
- Users can combine request-based and CPU-based metrics on dedicated plans, with scaling up occurring if any threshold is crossed and scaling down only when all metrics are back in range.
- Configuration can be done via the App Platform console's "Settings" tab or by adding an autoscaling block to the app spec using
doctl apps updateor the Apps API. - Autoscaling decisions are based on a 5-minute rate window to react to sustained load rather than brief, momentary spikes.
- This feature applies to web service components receiving external HTTP traffic; worker and function components are not eligible, nor can it be used alongside "Scale to Zero (Inactivity Sleep)".
Decoder
- P95 response latency: The 95th percentile response latency, meaning 95% of requests are served within this time or faster.
- CPU-based autoscaling: Automatically adjusting the number of instances based on the CPU utilization of the running application.
Original article
Request-Based Autoscaling Is Now Generally Available on App Platform
By Bikram Gupta and Greeshma Pillai
Today, we’re excited to announce that request-based autoscaling on DigitalOcean App Platform is now generally available. Your apps can now automatically scale based on live HTTP traffic signals (requests per second and P95 response latency) so your infrastructure reacts to what’s actually happening, not what happened minutes ago.
Now Available for Shared and Dedicated CPU Instances
Until now, autoscaling on App Platform required a dedicated CPU plan. That meant a good portion of App Platform users (anyone running on shared CPU instances) had no path to automatic horizontal scaling at all.
That changes today. Request-based autoscaling works on both shared and dedicated CPU instances. Whether you’re running an early-stage project on a shared plan or a high-throughput production service on dedicated resources, you can now configure autoscaling to match your traffic—no plan upgrade required.
Faster, More Responsive Scaling
CPU-based autoscaling is reactive by nature. CPU is a lagging indicator: your containers have to be visibly struggling before the scaler knows there’s a problem, and by then, your users are already waiting.
Request-based autoscaling acts on the signals that actually reflect user experience:
- Requests per second per instance: how many requests each container is handling right now
- P95 request latency: the response time that 95% of your users are seeing
When traffic rises and either threshold is exceeded, new containers spin up immediately. When load drops and all metrics fall back below their targets, the scaler brings containers back down. You get the capacity headroom you need, faster, and pay only for what you use.
You can also combine request-based and CPU-based metrics on dedicated plans. The autoscaler scales up when any configured threshold is crossed, and scales down only when all metrics are back in range.
Know Your Baseline Before You Set Thresholds
Configuring good autoscaling thresholds starts with understanding your normal traffic patterns. The Insights tab in the App Platform console gives you exactly that. 
The Insights tab shows you HTTP Ingress Request Rate (requests per second) and HTTP Ingress Request Duration P95 (your 95th-percentile latency) over time. Use this to understand how your service behaves under normal load before dialing in your autoscaling rules.
How to Configure Request-Based Autoscaling
Using the Control Panel
Go to the Apps page, select your app, open the Settings tab, and select your web service component. In the Resource Size section, click Edit.
Select the Shared CPU or Dedicated CPU tab. Under Scaling, toggle Autoscale on. Set your Minimum Containers and Maximum Containers, then configure at least one autoscaling rule:

- Scale on number of requests per second set a target RPS per instance
- Scale on response time and speed (P95) set a target P95 latency in milliseconds
- Scale on CPU usage threshold available on dedicated CPU plans
Click Save. A redeployment kicks off automatically and your app starts autoscaling.
Using the App Spec
Add an autoscaling block to your service component in your app spec. The example below scales between 1 and 10 containers, targeting 100 requests per second per instance and a P95 latency of 500 ms:
name: my-app
services:
- name: web
github:
repo: your-org/your-repo
branch: main
autoscaling:
min_instance_count: 1
max_instance_count: 10
metrics:
requests_per_second:
per_instance: 100
request_duration:
p95_milliseconds: 500
Submit your updated spec via doctl apps update or the Apps API. You can tune these values at any time—if your service is scaling earlier than you’d like, raise the target; if you’re seeing latency before new containers arrive, lower it.
A few things to keep in mind:
- Request-based autoscaling applies to web service components that receive external HTTP traffic. Worker and function components are not eligible.
- It cannot be used alongside Scale to Zero (Inactivity Sleep) on the same service.
- Scaling decisions are based on a 5-minute rate window, so the autoscaler responds to sustained load rather than momentary spikes.
Get Started With Request-Based Autoscaling
Your traffic doesn’t follow a schedule. Your scaling shouldn’t either. Request-based autoscaling is available now on every DigitalOcean account. Head to the Insights tab to understand your traffic patterns, then configure your autoscaling rules directly in the console or via the app spec.
Read the documentation to get started
About the author(s)
Bikram Gupta
Greeshma Pillai
- Product Updates
Start building today
From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications. Sign up
Related Articles
Powering the Inference Era: Inside the DigitalOcean AI-Native Cloud
Vinay Kumar, Chief Product & Technology Officer
Product updates
Introducing DigitalOcean AI-Native Cloud for Production AI Workloads
Paddy Srinivasan
Product updates
The Agentic Era Demands a New Class of Infrastructure: DigitalOcean Acquires Katanemo Labs
Vinay Kumar, DigitalOcean Chief Product & Technology Officer
Product updates
Add dynamically updating context to logs with Reference Tables and Observability Pipelines
Datadog Observability Pipelines now provides centralized log enrichment using dynamic Reference Tables, integrating real-time context from sources like Snowflake and ServiceNow.
Deep dive
- Datadog Observability Pipelines now offers centralized log enrichment using dynamically updating Reference Tables, generally available as of May 1, 2026.
- This feature allows security and platform engineering teams to add real-time context to logs before they are routed to downstream tools like SIEMs, logging solutions, or data lakes.
- Reference Tables integrate with various external sources, including Snowflake (threat intelligence, user profiles), ServiceNow CMDB (asset/service metadata), Salesforce (customer data), and cloud storage (allowlists/denylists).
- The "Enrichment Table processor" automatically updates Reference Tables from integrations, reducing manual effort compared to local CSV files.
- Log enrichment helps accelerate threat investigations by applying fresh context, even to rehydrated historical data from archives, allowing for updated correlation with current detections.
- Enriched attributes can also be used for conditional routing, sending high-signal, lower-volume data to expensive SIEMs (e.g., Microsoft Sentinel, CrowdStrike, Datadog Cloud SIEM) while benign, high-volume data goes to cheaper storage (e.g., Amazon S3).
- The process involves the Enrichment Table processor looking up values in a local cache, buffering logs if not found, and fetching entries from the Datadog Reference Tables API for dynamic updates.
Decoder
- Log enrichment: The process of adding additional context or metadata to log events to make them more informative and easier to analyze.
- Reference Tables: Dynamic lookup tables within Datadog Observability Pipelines that contain contextual data from external sources, used to enrich logs in real-time.
- Observability Pipelines: Datadog's feature that allows teams to process, filter, and route telemetry data (logs, metrics, traces) before sending it to destinations.
- SIEM (Security Information and Event Management): A security solution that aggregates and analyzes log data and security events from various sources across an organization's IT infrastructure.
- CMDB (Configuration Management Database): A database that contains all relevant information about hardware, software, and network configuration items within an organization's IT environment.
Original article
Security and platform engineering teams rely on context-rich logs to investigate threats, prioritize incidents, and meet compliance requirements. Context is often stored separately from applications that generate logs, in sources like threat intelligence feeds in Snowflake, asset lists in Amazon S3, ownership data in ServiceNow CMDB, and risk scores produced in Databricks. Enriching logs after ingestion means duplicating lookups in every downstream tool and manually correlating logs with external sources during every investigation. The results are slower resolution of performance and security issues as well as increased cost.
To address these challenges, Datadog Observability Pipelines supports centralized log enrichment with Reference Tables before you route data to your preferred SIEM, logging solution, or data lake. You can use dynamically updating Reference Tables that integrate with SaaS applications, data lakes, and cloud storage to attach fresh metadata to events before data leaves your infrastructure.
In this post, we’ll explain how Observability Pipelines helps you:
- Centrally enrich logs with data stored in Reference Tables
- Apply fresh context to data during threat investigations
- Process and conditionally route enriched data to your downstream logging tool, SIEM, or data lake
Centrally enrich logs with data stored in Reference Tables
Logs provide indicators of what is happening in your system, but many lack critical context that helps you answer who owns what or which Indicators of Compromise (IOCs) might be present. Moreover, key data sources like threat intelligence feeds and configuration management databases (CMDBs) receive regular updates that need to be accounted for in production data. When enriching with locally stored datasets, teams have to spend critical time manually updating CSV files and coordinating update jobs across environments. And when enrichment happens after ingestion, teams have to redo the same lookups for each downstream tool that they manage, adding latency and creating inconsistencies.
The Enrichment Table processor in Observability Pipelines helps solve these problems by enabling you to enrich logs with data stored in SaaS-hosted Reference Tables. Reference Tables stay up to date automatically with your integrations, reducing engineering effort since you don’t have to manually update datasets.
Reference Tables support several common enrichment sources that teams already use for operational and security context:
- Snowflake stores threat intelligence feeds, user profiles, compliance mappings, and business intelligence data that teams can join with authentication logs, access logs, or detection signals.
- ServiceNow CMDB provides asset and service metadata that teams can use to enrich logs to accelerate investigations and route issues to the right responders.
- Salesforce provides customer and billing metadata such as account owners, contract tiers, and account segmentation that helps teams prioritize customer-impacting issues.
- Databricks offers model outputs such as anomaly scores and fraud likelihood values that teams can attach to transaction and authentication events.
- Cloud storage sources (including Amazon S3, Microsoft Azure Blob Storage, and Google Cloud Storage) hold CSV reference data such as allowlists, denylists, asset inventories, and IP reputation lists that update on a schedule.
For example, you can enrich logs with threat intelligence from feeds like AlienVault that are stored in Snowflake. The following screenshot shows the Enrichment Table processor configured to enrich logs from the alienvault_threat_intel table that dynamically updates from Snowflake.
Once the processor is configured, logs containing values that match keys found in ip_address are enhanced with information from the table.
After Observability Pipelines enriches events in your infrastructure, you can route enriched logs to the SIEM or data lake of your choice, including Microsoft Sentinel, CrowdStrike, and Datadog BYOC (Bring Your Own Cloud) Logs.
Apply fresh context to data during threat investigations
Investigating security threats often requires you to revisit historical data for a specific user, device, or process. An IP reputation list can change after an incident begins, or a new fraud model can assign new risk scores to historical transactions. Security investigations become more difficult when older logs lack newer context, especially when this data is stored in cloud storage archives and separated from your logging or SIEM solution because of cost controls or retention strategy.
Using Observability Pipelines, you can rehydrate and extract archived data before applying processing and routing rules. You can pull historical information from your storage buckets and enrich it with current context stored in Reference Tables, and then route normalized data to your SIEM. This workflow helps you apply updated context without rebuilding custom joins in every downstream system.
For example, let’s say that you’re a security engineer investigating a Tier 0 threat by using Microsoft Sentinel and Azure Blob Storage. You can rehydrate archived authentication logs from Azure Blob Storage, enrich the logs with an up-to-date asset list from Snowflake, and route the enriched output into Microsoft Sentinel for correlation with current detections. The enriched logs can highlight connections that were not visible at original ingest time, especially when threat intelligence and scoring datasets changed after the fact.
Process and conditionally route enriched data to your downstream logging tool, SIEM, or data lake
Application logs often lack context that teams need to help them prioritize and make smart routing decisions. Without that context, routing rules tend to rely on static heuristics that have limited business meaning. Enrichment becomes especially valuable when it helps teams keep high-volume, low-risk data in less expensive storage while sending smaller, higher-signal subsets to a SIEM or analytics platform.
With log enrichment in Observability Pipelines, teams can make routing and volume control decisions by using attributes derived from external sources. A pipeline can enrich an event with a threat classification, a customer tier, an ownership team, or an environment label, and then use that information to route data to a destination that matches operational goals. The following diagram shows how Observability Pipelines enriches logs with Reference Table data on-stream:
The numbered steps in the diagram map to the following workflow:
- The Enrichment Table processor looks up the value of the key field in the local cache (e.g.,
ip_address:192.0.2.1). - If a matching entry is found in the cache (e.g., the IP address matches a row in the table that is cached) or if the log does not have a valid key field, the log is immediately enriched and sent downstream.
- A. If the value is not found in the cache, the log is buffered in memory.
B. The value is also added to the client queue to be checked against the Datadog Reference Tables API.
- The client is triggered every second or when the queue reaches a certain length, and it fetches all pending keys from the Datadog Reference Tables API.
- On a successful API response, the entries are stored in the cache and the corresponding logs are pulled out of the buffer, enriched, and sent downstream.
Consider a security pipeline that processes endpoint or network telemetry data. The pipeline can enrich events by using a threat intelligence feed stored in Snowflake and add an attribute that indicates whether an IP address or indicator appears on a benign list, suspicious list, or malicious list. Routing rules can then send benign high-volume activity to Amazon S3 while forwarding suspicious and malicious activity to a SIEM such as Microsoft Sentinel, CrowdStrike, or Datadog Cloud SIEM for faster investigation. This approach reduces noise in expensive downstream tools while keeping richer context attached to high-priority events.
Start enriching your logs with Observability Pipelines
Centralized log enrichment with Reference Tables in Observability Pipelines brings dynamic, managed lookups into log processing that runs inside your infrastructure. You can enrich logs, apply fresh context to accelerate investigations, and use the enriched attributes to guide routing and volume control across destinations such as SIEM tools and cloud storage. To learn more, check out the Observability Pipelines documentation and the Reference Tables documentation.
If you don’t already have a Datadog account, you can sign up for a 14-day free trial to get started enriching your logs.
Further reading
Datadog Platform Datasheet
Learn about the key components, capabilities, and features of the Datadog platform.
Download to learn moreRelated jobs at Datadog
We're always looking for talented people to collaborate with
Featured positions
We have positions
View allStart monitoring your metrics in minutes
find out howMitigate credential exposure in Windows environments with Boundary and Vault
HashiCorp's Boundary and Vault can secure Windows remote access by replacing static RDP credentials with dynamic, short-lived Active Directory credentials and identity-based access.
Deep dive
- Boundary provides identity-based remote access for RDP.* Vault generates dynamic, short-lived Active Directory credentials.* The solution replaces static credentials and broad VPN access for Windows remote sessions.* Credentials are injected directly, reducing exposure.* A Terraform-based AWS proof-of-concept is available for implementation guidance.
Decoder
- RDP (Remote Desktop Protocol): A proprietary protocol developed by Microsoft that allows a user to graphically connect to another computer over a network connection.* Active Directory (AD): Microsoft's directory service that stores information about network objects (like users, groups, and computers) and makes this information available to users and network administrators.
Original article
Organizations face Windows remote access risks from static credentials and broad VPN based network access. Boundary and Vault provide identity based RDP with short lived dynamic AD credentials and credential injection, plus a Terraform based AWS proof of concept setup.
Deploying to Multiple Azure Subscriptions with Terraform Provider Aliases
Sarah Lean demonstrates how to deploy resources across multiple Azure subscriptions from a single Terraform project using provider aliases and a unified state file.
Deep dive
- By default, Terraform's
azurermprovider targets a single Azure subscription.* Organizations often use separate subscriptions for different environments like dev, staging, prod.* Without aliases, this typically means separate Terraform projects and state files per subscription.* Terraform provider aliases allow declaring multiple instances of the same provider within one project.* Each instance is configured with a uniquealiasand a specificsubscription_id.* The article provides a step-by-step guide with code examples forvariables.tf,providers.tf, andmain.tf.* Resources are "pinned" to a specific provider instance using theprovider = azurerm.alias_namemeta-argument.* This enables a singleterraform planandterraform applyto manage resources across all aliased subscriptions.* The approach improves consistency, reduces configuration drift, and centralizes state management.
Decoder
- Terraform provider aliases: A Terraform feature allowing multiple configurations of the same provider within a single project, differentiated by an
aliasargument, enabling management of resources across different environments or accounts of the same cloud provider.*azurermprovider: The official Terraform provider for managing resources in Microsoft Azure.*providermeta-argument: A Terraform argument used within a resource block to explicitly specify which provider configuration (including aliased ones) should be used for that resource.* Azure subscription: A logical container for Azure services, managed by an Azure account, often used to separate billing, environments, or organizational units.
Original article
Full article content is not available for inline reading.
Accelerating LLM Inference with Prompt Caching for Open‑Source Models on Databricks
Databricks now offers automatic prompt caching for open-source LLMs like Llama and Mistral hosted on its platform, significantly boosting inference performance without configuration.
Deep dive
- Databricks has rolled out automatic prompt caching for open-source LLMs hosted on its Foundation Model APIs (FMAPIs).* This feature applies to batch inference, pay-per-token, and provisioned-throughput workloads.* Supported open-source models include GPT-OSS (20B, 120B), Gemma 3 12B, Fine-tuned Llama 3.1 8B, Llama 3.1 8B, and 3.3 70B.* Prompt caching works by reusing repeated prompt prefixes, skipping the "prefill" stage of LLM inference.* This significantly reduces latency and increases throughput.* In real-world production testing on GPT-OSS, it led to a 2.5x increase in per-replica input-token throughput and a 3x reduction in P50 latency, with a 30% cache hit ratio.* The caching is entirely automatic; customers do not need to configure anything.* Prompt caches are isolated, reside only in volatile memory, and are never persisted, ensuring data security.* Databricks previously offered this feature for proprietary models (GPT, Gemini, Claude).
Decoder
- LLM (Large Language Model): A type of artificial intelligence model trained on vast amounts of text data to understand, generate, and process human language.* Inference: The process of using a trained machine learning model to make predictions or generate outputs on new, unseen data.* Prompt caching: A technique in LLM inference where the intermediate computational results (specifically, the key-value cache or KV cache) for repeated prompt prefixes are stored and reused, avoiding re-computation and speeding up subsequent requests with the same prefix.* Prefill stage: The initial phase of LLM inference where the input prompt tokens are processed to generate the first set of internal representations (KV cache) before the model starts generating output tokens one by one.* P50 latency: The 50th percentile latency, meaning 50% of requests are completed within this time or faster. It's a measure of typical performance.
Original article
Full article content is not available for inline reading.
The Hugo evolution: Engineering Grab's unified, one-click data ingestion platform with Apache Flink
Grab dramatically reduced data pipeline onboarding from days to minutes by unifying self-service data ingestion on a new Flink-based platform called Hugo, replacing Kafka Connect and Sprinkler.
Decoder
- CDC (Change Data Capture): A set of software design patterns used to determine and track changes to data so that actions can be taken based on those changes. In databases, this often means reading transaction logs (binlogs).
- Apache Flink: An open-source stream-processing framework for distributed, high-performing, and always-on data applications. It can perform stateful computations over unbounded and bounded data streams.
- Kafka Connect: An open-source framework for connecting Kafka with other systems, allowing data to be streamed in and out of Kafka.
- Apache Iceberg: An open table format for huge analytic datasets. Iceberg adds SQL table capabilities to files in data lakes, like schema evolution, hidden partitioning, and time travel.
- Confluent Schema Registry: A service for storing and retrieving Avro, Protobuf, and JSON Schema schemas. It helps ensure data compatibility and evolution in Kafka-based systems.
Original article
Full article content is not available for inline reading.
From Batch to Streaming and AI, Iceberg for Everyone by Everyone (34 minute video)
Apache Iceberg, while successful for batch analytics and now supporting semi-structured data in v3, still requires significant community enhancements like "One File Commits" in v4 to fully support low-latency streaming and AI workloads.
Decoder
- Apache Iceberg: An open table format for huge analytic datasets. It brings reliable, SQL table semantics to data stored in data lakes (e.g., S3, HDFS), offering features like schema evolution, hidden partitioning, and time travel.
- One File Commits: A proposed feature for Apache Iceberg aimed at reducing the number of files written during micro-batch or streaming ingestion, improving performance and reducing metadata overhead for low-latency writes.
- Columnar metrics: Aggregate statistics stored at a column level within data files (e.g., min/max values, null counts), which can be used by query engines for predicate pushdown and query optimization without reading full data files.
Original article
While Apache Iceberg has seen strong success from batch analytics in v1 to the recent v3 table spec, which added vendor-neutral support for semi-structured data and improved deletes, the format still requires significant enhancements for low-latency streaming and AI workloads. The community is working on V4 to support One File Commits, better column statistics, and columnar metrics, to make Iceberg truly universal.
DuckDB 1.5.3: Not an Ordinary Patch Release
DuckDB's v1.5.3 patch release introduces "Quack" as a core beta extension, enabling client-server database functionality and enhancing Iceberg, AWS, and HTTPS proxy support.
Deep dive
- DuckDB v1.5.3 is released, containing significant new features delivered through extensions despite being a patch release.
- The "Quack" protocol, introduced on May 12, is now a core beta extension, transforming DuckDB into a client-server database.
- Quack enables client applications to connect to a remote DuckDB instance transparently.
- DuckLake, DuckDB's data lake client, now supports DuckDB with Quack as its catalog database.
- The AWS extension gains support for IAM Roles for Service Accounts (IRSA) via the
web_identitychain type and IAM authentication for managed PostgreSQL databases on RDS/Aurora. - The HTTPS extension now respects the
HTTP_PROXYenvironment variable for extension installs and network requests. - The DuckDB-Iceberg extension receives numerous updates, including
MERGE INTO,INSERTandUPDATEfor partitioned tables,CTASvia ADBC, andALTER TABLEsupport. - Internal changes include shipping
jemallocas a statically linked core library on Linux for cleaner packaging and fixing theDISABLE_EXTENSION_LOADcompile-time flag. - Quack is expected to become production-ready with DuckDB v2.0 in fall 2026.
Decoder
- Quack: A new remote protocol that turns DuckDB into a client-server database, allowing clients to connect to a remote DuckDB instance.
- DuckLake: A data lake client for DuckDB.
- IRSA (IAM Roles for Service Accounts): An AWS feature that allows Kubernetes service accounts to assume IAM roles, providing fine-grained permissions to pods.
- ADBC (Apache Arrow Database Connectivity): A standard for high-performance data access to databases, based on Apache Arrow.
Original article
Full article content is not available for inline reading.
Introducing Dimster, a performance benchmarking tool for Apache Kafka
Jack Vanlightly released Dimster, an open-source Kafka benchmarking tool designed for "Dimensional Testing" across various workloads and configurations, with Kubernetes support for standardized deployment.
Deep dive
- Dimster is an open-source performance benchmarking tool for Apache Kafka, created by Jack Vanlightly.
- It is designed for "Dimensional Testing," allowing users to systematically vary single or co-varying dimensions of configuration or workload to analyze performance impact.
- Results are self-contained and shareable, including JSON, CSV, source configs, log files, and interactive charts/Grafana dashboards (as HTML).
- Supports four test modes:
Run(fixed throughput, live interaction, optional availability),Explore(finds peak sustainable throughput under latency targets),Drain-backlog(times backlog processing), andCorrectness(detects data loss, corruption, out-of-order, duplicates). - Provides CLI commands for pre-benchmark resource calculation (
resources), comparing runs (compare), and pivoting results (pivot). - Kubernetes is a standardized runtime for Dimster, simplifying deployment and orchestration across various environments (local, EKS, GKE).
- Dimster can deploy Kafka clusters to Kubernetes or connect to external Kafka services.
- The tool is written in Java and leverages modern JVM features.
Decoder
- Dimensional Testing: A benchmarking technique where configurations or workload aspects (dimensions) are systematically varied to observe their impact on performance.
- p99 end-to-end latency: The 99th percentile of the total time taken for a message to travel from producer to consumer.
- mTLS (mutual TLS): A security protocol where both client and server authenticate each other using TLS certificates.
- OpenMessagingBenchmark (OMB): An existing open-source benchmarking framework for messaging systems, which inspired aspects of Dimster.
Original article
Full article content is not available for inline reading.
Bintrail: MySQL Time-Travel Queries Using Indexed Binlogs
Bintrail introduces time-travel and diff queries to MySQL via ProxySQL and indexed binlogs, enabling point-in-time recovery and audit without schema changes.
Deep dive
- Bintrail is a new layer developed by Daniel Guzman-Burgos that brings point-in-time queries and row-history lookups to MySQL.
- It provides
AS OFandBETWEENtime-travel queries to MySQL, a feature available natively in Oracle, SQL Server, MariaDB, and via extensions in PostgreSQL. - The system operates transparently behind ProxySQL, routing historical query patterns (e.g.,
_flashback,_diff,_snapshot) to its own backend while regular MySQL traffic remains untouched. - Bintrail parses MySQL ROW-format binary logs, indexing every row event with full before/after images.
- It generates reversal SQL, allowing point-in-time recovery without needing the original binlog files.
- The indexed history store is maintained independently of MySQL's binlog retention, enabling historical queries over longer periods.
- It can optionally extend historical queries into archived Parquet data stored on S3.
- No
ALTER TABLEor special storage engine is required; it works with existing MySQL instances. - Current limitations include support only for literal timestamp queries, primary-key lookups, and capped full-table restores, with joins and complex filtering handled outside the shim layer.
- Bintrail is available on GitHub under the BUSL (Business Source License).
Decoder
- ProxySQL: A high-performance, high-availability, protocol-aware proxy for MySQL, which can route queries based on rules.
- Binlog (Binary Log): A log of all changes to a MySQL database, used for replication and point-in-time recovery.
ROW-formatmeans it logs changes at the row level. - AS OF query: A type of temporal query that allows retrieving the state of data as it existed at a specific past timestamp.
- BETWEEN query: A type of temporal query that allows retrieving all changes to data within a specified time range.
- GTID (Global Transaction Identifier): A unique identifier for a transaction committed on a MySQL server, used for easier replication and failover.
- BUSL (Business Source License): A source-available license that restricts production use for a certain period, after which the code becomes open source (e.g., Apache 2.0).
Original article
Full article content is not available for inline reading.
We're Introducing Real Time Design with Google Stitch
Google has launched Stitch, a real-time AI design tool allowing collaborative partnership with an AI agent for live design iterations using text or voice.
Decoder
- Google Antigravity: A backend integration tool mentioned by Google for connecting designs to logic.
Original article
Full article content is not available for inline reading.
Replit Launches the Newest Version of its Popular Vibe Coding App
Replit released Agent 4 for iOS and iPadOS, its vibe coding app, after Apple lifted a four-month ban, introducing parallel agents and merged project flows.
Decoder
- Vibe coding: A term, popularized by Replit, referring to a style of coding where AI agents assist in generating code or entire applications based on user prompts, often with a more intuitive or 'flow-state' feel.* App Store guideline 2.5.2: An Apple guideline stating that "Apps should be self-contained in their bundles, and may not read or write data outside the designated container area, nor may they download, install, or execute code which introduces or changes features or functionality of the app, including other apps."
Original article
Full article content is not available for inline reading.
AI Gives Us the Prototype. It Doesn't Give Us the Brand
AI competently handles 80% of design, creating generic prototypes, but fails to deliver the 20% that forms unique brand identity and emotional connection.
Decoder
- Digital User Experience (DUX) design: A design process focused on optimizing the overall experience a user has with a digital product or service.* Design parity: A situation where multiple products or interfaces, often due to over-reliance on similar AI tools or design patterns, end up looking and feeling indistinguishable from each other, lacking unique brand identity.
Original article
Full article content is not available for inline reading.
Launch-ready Product Demo Videos (Website)
Slideshot.ai uses an AI agent to automatically navigate web apps and generate polished product demo videos from text descriptions, removing manual recording and editing.
Deep dive
- Slideshot.ai is an AI agent that automates the creation of product demo videos for web applications.
- Users provide a product URL and a text description of the feature flow they want to demonstrate.
- The AI agent drives the web app in a browser, records the walkthrough, and returns a polished MP4 video.
- It supports authenticated product areas by configuring credentials.
- The service aims to eliminate the tediousness of manual screen recording and editing, especially when UIs change frequently.
- Slideshot offers an API for integration into automated workflows, costing $0.90 per recording request with no monthly subscription.
- Most recordings are completed within 5 to 10 minutes, with longer demos potentially taking 20+ minutes via an asynchronous API.
- The platform is positioned as a solution for repeatable demo generation, unlike manual screen recorders like ScreenStudio or Loom.
Decoder
- AI agent: A software program that uses artificial intelligence to perform tasks autonomously, often by interacting with other systems or environments, in this case, a web browser.
Original article
Full article content is not available for inline reading.
In 2026, here's what creative recruiters are looking for in juniors
Creative recruiters in 2026 prioritize a junior designer's original thinking, attitude, and ability to explain their creative process over a technically perfect portfolio, especially amidst increased competition from AI.
Deep dive
- Recruiters are seeing increased competition for junior design roles due to AI and economic factors.
- The most crucial trait is the ability to explain the "why" behind design decisions, not just showcasing the final product.
- Matt Redway of PlayStation, Daniel Poll of Noramble, and James McNaught of Wolff Olins stress the importance of unexpected, meaningful work and understanding the "big idea."
- Attitude, passion, curiosity, and a willingness to learn are more valued than raw talent, according to Edward Dalton of HelloYes.
- Pablo Marques of Raw Materials looks for good taste, willingness to listen, and fearlessness, valuing those who are aware of what they don't yet know.
- Recruiters like Mélanie Hubert-Crozet of monopo london and Tom Muller of helloMuller look for unique styles and personal viewpoints in portfolios, not technical perfection.
- James Le Beau-Morley encourages juniors to show "raw ideas" and "weird stuff" to avoid conforming too early.
- How a candidate shows up (energy, thoughtful questions, humility, personality) is critical; Alex Dixon of Dacre recalls a student's phone call as refreshing.
- Studios hire people, not just portfolios, and value candidates who are adaptable, collaborative, and can contribute to office culture, as explained by Rodd Chant and Chris Woodhams of Cafeteria.
Original article
Full article content is not available for inline reading.
Pixar Ditches its 3D Look for the First Time – And It's Glorious
Pixar is abandoning its signature 3D animation for the first time with "Gatto," a hand-painted film set in Venice about a cat named Nero indebted to a feline mob boss.
Original article
Full article content is not available for inline reading.
Reasonix (Website)
Reasonix is a DeepSeek-native coding agent designed for the terminal, optimized for low token costs across long sessions using prefix-cache stability.
Decoder
- Coding agent: An AI agent designed to assist with or perform coding-related tasks, such as writing code, debugging, or refactoring.
- Prefix-cache stability: A technique in large language models where common initial sequences of tokens (prefixes) are cached, allowing subsequent completions that share the prefix to reuse the cached computation, thus reducing redundant processing and token costs.
Original article
Reasonix is a DeepSeek-native coding agent for the terminal. It is engineered around prefix-cache stability and designed to be left running. Token costs stay low across long sessions.
David Sacks's 11th-Hour Plea Led to Trump's Backtrack on AI Executive Order
Venture capitalist David Sacks convinced former President Trump to postpone an AI executive order, arguing it would hinder U.S. competition with China by imposing mandatory regulations.
Original article
David Sacks, a venture Capitalist, warned President Trump on a call that the long-awaited executive order on the dangers posed by artificial intelligence that Trump was deliberating on could lead to mandatory regulations that slow down the industry in its race with Chinese competitors. Trump responded that he shared concerns about China and was worried about hindering AI investment. He then postponed the signing and told reporters he wouldn't sign the order. The incident shows how powerful Sacks' influence is and marks a win for those against strong guardrails to limit the risks posed by the technology.
Paperwork is better when you can just talk through it
ChatGPT now allows users to upload form images and use voice commands or text to fill them out automatically, streamlining paperwork.
Original article
Paperwork is better when you can just talk through it.
With Images in ChatGPT and voice mode, you can upload a form, say what to fill in, and get back a completed version.
You can do this without voice, too.
Upload a form image, add the details you want included, and ChatGPT can fill it out for you.
• • •
Missing some Tweet in this thread? You can try to force a refresh
Keep Current with ChatGPT
Stay in touch and get notified when new unrolls are available from this author!
This Thread may be Removed Anytime!
Twitter may remove this content at anytime! Save it as PDF for later use!
Try unrolling a thread yourself!
- Follow @ThreadReaderApp to mention us!
- From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll
Practice here first or read more on our help page!
More from @ChatGPTapp
some of our favorite recent GPTs use the Instacart GPT to create a weekly meal plan, have the relevant ingredients populated in your cart, and then get them delivered to you.
books GPT has read all the books in the world and wants to help you find your next read.
Did Thread Reader help you today?
Support us! We are indie developers!
This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.
Become a Premium Member ($3/month or $30/year) and get exclusive features!
Become Premium
Don't want to be a Premium member but still want to support us?
Make a small donation by buying us coffee ($5) or help with server cost ($10)
Donate via Paypal
Or Donate anonymously using crypto!
Ethereum
0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy
Bitcoin
3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy
Thank you for your support!
Sundar Pichai Understands Why People Are Anxious About AI
Google CEO Sundar Pichai believes AI is humanity's most profound technology, acknowledging public anxiety and the industry's need to better showcase its benefits.
Original article
Sundar Pichai believes that AI is the most profound technology humanity will ever work on. He says it is natural that people feel anxious about the future the technology will bring, especially with its extraordinary pace of progress. Pichai thinks the industry has to do a lot more work in showing the benefits that are possible with the technology. This article contains a transcript of an interview with Pichai where he discusses the future of Google Search, how he's using AI agents, and his advice for college students.
Meet Mark Zuckerberg's Right-Hand Man Who's Unleashing AI at Meta
Andrew Bosworth, a 20-year Meta veteran and close Mark Zuckerberg confidant, is leading the company's ambitious transformation into an AI-first organization.
Original article
Andrew Bosworth, a top lieutenant of Meta CEO Mark Zuckerberg for more than 20 years, is leading Meta's gargantuan efforts to transform itself into an AI-first company that can innovate as fast as nimble startups. Bosworth is set to make nearly $1 billion if he can help increase Meta's market cap by more than 500% in the next five years. His focus now is on getting workers to use AI more in their work, and when possible, hand tasks over to it entirely. This article takes a look at his career up until now.
It's like the Olympics - except steroids are allowed
The controversial "Enhanced Games" is launching its inaugural event in Las Vegas, featuring elite athletes openly using performance-enhancing drugs for a chance at multi-million dollar prizes.
Deep dive
- The Enhanced Games is a new sporting event that openly allows and encourages athletes to use performance-enhancing drugs (PEDs) that are legal and FDA-approved.
- The inaugural competition took place in Las Vegas, offering $25 million in prize money, with $1 million bonuses for world records.
- Notable athletes participating include British swimmer Ben Proud and US sprinter Fred Kerley.
- Strongman Hafthor Bjornsson (known as "The Mountain" from Game of Thrones) openly discussed his steroid use, which is accepted in professional strongman.
- The founders, Aron D'Souza and Maximilian Martin, have attracted significant investors like Peter Thiel and Donald Trump Jr.
- Critics, including the US Anti-Doping Agency (USADA) CEO Travis Tygart, condemn the event as reckless and an affront to the spirit of sport, warning of serious health risks from anabolic steroids and growth hormones.
- Athletes like Ben Proud justify participation by highlighting the lucrative prize money, which far exceeds earnings from traditional Olympic sports.
- Some athletes, like American swimmer Hunter Armstrong, plan to compete clean at the Enhanced Games with the intention of still participating in future Olympics, though traditional sports bodies like World Aquatics have threatened bans.
- The company behind the games, Enhanced Group, recently began trading on the New York Stock Exchange and is exploring online sales of performance-enhancing supplements.
- Concerns are raised about the cultural implications of normalizing PED use, especially given existing social media pressures on body image and the rise of "biohacking" culture.
Decoder
- Performance-enhancing drugs (PEDs): Substances, often hormones or stimulants, used to improve athletic performance, banned in most traditional sports.
- FDA (Food and Drug Administration): A federal agency of the United States Department of Health and Human Services, responsible for protecting and promoting public health through the control and supervision of food safety, tobacco products, dietary supplements, prescription and over-the-counter pharmaceutical drugs, vaccines, biopharmaceuticals, blood transfusions, medical devices, electromagn
Original article
It's like the Olympics - except steroids are allowed
Under the blazing Vegas sun, giant billboards advertise "Live Enhanced" as the baritone voice of a sports announcer pretends to introduce British swimmer Ben Proud and other athletes.
The announcer is practising at a new open air arena hosting one of the most controversial events in recent sporting history: the Enhanced Games.
Think Olympics on steroids. Literally.
The inaugural competition on Sunday will feature dozens of elite athletes using performance-enhancing drugs to try and break world records in track, weightlifting and swimming.
Some $25m (£18.6m) in prize money is up for grabs - with cash prizes for winners. World records in certain events, being eyed up by the likes of US sprinter Fred Kerley, pay a $1m (£740,000) bonus.
The drugs they use must be legal, and approved by the US Food and Drug Administration (FDA). But substances like testosterone and human growth hormone - banned by the World Anti-Doping Agency - are not only celebrated here, they're encouraged and for sale.
The project was founded by entrepreneurs Aron D'Souza and Maximilian Martin in 2023 and has attracted backing from prominent investors including billionaire Peter Thiel and Donald Trump Jr.
Health experts warn that anabolic steroids and growth hormones can cause strokes and cardiovascular damage, among other risks.
Event organisers claim Enhanced will push the limits of human performance while critics, especially in the Olympic movement, dismiss it as an affront to the spirit and founding principles of competitive sport.
'We're being up front and honest'
"You don't have to be pressured or use drugs in order to be the best," says Travis Tygart, CEO of the US Anti Doping Agency, USADA.
He tells the BBC that while there are clear failures with the Olympics' anti-doping protocols, the answer is reforming the system, not to dope.
Athletes, he says, need to be assured the Olympics are clean and cheats will not be tolerated.
"We don't want kids to have to say, 'in order to win an Olympic medal, when I'm 18 or 20 years old, I have to inject myself every day in the rear end with a potentially dangerous drug.'"
But Enhanced, the company behind the games, claims it is bringing out into the open what it says is an undercurrent of many athletes cheating and taking performance-enhancing drugs in the shadows.
Packed into a ballroom at Resorts World casino, Enhanced athletes answered media questions for two hours, but only one - strongman Hafthor Bjornsson who hopes to break his own deadlift record of 510kg (1,124.4 pounds) - would say which drugs he was taking. Other athletes were tight lipped.
Bjornsson, who played the Mountain in Game of Thrones, says he's open about his steroid use because it's accepted in the professional strongman world.
American sprinter Shania Collins says the fact that those taking part in the games admit to doping, already gives them more integrity than cheaters.
"We're being up front and honest and transparent from the start," she tells the BBC. "So how can you challenge our integrity when we're forthright with the information?"
Some sporting governing bodies have publicly rebuked athletes for choosing to compete in the games.
UK Athletics' chief executive Jack Buckner said he was "appalled" when it was revealed former Great Britain sprinter Reece Prescod had signed up in January. UK Anti-Doping (Ukad) has called the event a "reckless venture".
Meanwhile, GB Aquatics has said British swimmer Ben Proud will not be selected again for Britain's Olympic team if he competes at the Enhanced Games.
Big money involved
Proud, who won the silver medal in the 50m freestyle at the Paris Olympics in 2024, is hoping to break the world record using performance-enhancing drugs and win a million dollars on Sunday.
If he wins the race but doesn't break the world record, he will still make $250,000 (£185,000).
"There's no money in sport," Proud told the BBC before the games. "I was 30 and had just come off a silver medal, what future path do I follow?"
Proud, who has been widely condemned for joining the Enhanced Games, has said it would take 13 years of winning World Championship titles to earn this kind of prize money.
Enhanced has already paid a doped up swimmer a million dollars for breaking a record, during one of the trials it hosted ahead of Sunday's competition.
Of the 42 athletes competing at the Enhanced Games on Sunday, most will be using testosterone and some will also be using human growth hormone and stimulants like Adderall.
But not everyone will be doping - some are competing clean.
American swimmer Hunter Armstrong has said he "definitely" doesn't want to dope for the games, adding: "I personally have taken pride in getting as far as I can on natural God-given talent."
He plans to compete clean for a shot at the money and then return to compete at the Los Angeles Olympics in 2028. Whether he can is unclear, given the outcry from many sports bodies responsible for selection.
However, the US Anti-Doping Agency's Tygart told the BBC as long as an athlete passes drugs tests to qualify for the Olympics, there's nothing to stop them from taking part from a doping perspective, but he points out that World Aquatics has already threatened to ban any swimmers competing in the Enhanced Games.
Wider worries for society?
Earlier this month, the Enhanced Group - the company behind the competition - began trading on the New York Stock Exchange.
And the competition is seemingly being treated as an opportunity for Enhanced to sell performance-enhancing medicine and supplements online.
This sparks broader concerns for some, at a time when social media is awash with offers to buy unregulated peptides and pressure on people to look a certain way.
Joe Vennare, founder of Fitt Insider, which analyses the health and wellness industry, feels normalising performance-enhancing drugs will bring unknown health and cultural consequences.
He says people have the right to use legal medical interventions, but is concerned some people are doing so at the expense of being fit and having a healthy diet.
"Kids are using social media filters, they're getting Botox injections," he tells the BBC. "They're having body dysmorphia - especially young men, in this case at record numbers."
Vennare says the Enhanced Games reflects those problems, but hasn't created them.
"That's a problem that parents and culture and society more broadly have to address."
Enhanced athlete James Magnussen agrees. The Australian swimmer says parents need to control what their kids watch and take personal responsibility - but he insists Enhanced is not "targeted at children".
"It's an entertainment company and product targeted at people looking at the longevity and human performance space."
None of these criticisms of the Enhanced Games are likely to go away any time soon.
Neither the athletes taking part, nor the invite-only crowd in Vegas, seem to be deterred.
Walk around here and you hear a lot about "biohacking", "human optimisation" and pushing the body beyond its natural limits.
So what's happening here may end up being much bigger than a niche sporting event. It's about whether sport is becoming a testing ground for a much bigger cultural shift.
Meta Launches Forum, a New Reddit-Like App for Facebook Groups
Meta has quietly launched "Forum," a Reddit-like app for Facebook Groups, featuring anonymous posting with nicknames and an AI-powered "Ask" tab for curated answers.
Original article
Meta has quietly launched a new app for Facebook Groups called Forum. The app didn’t get a formal launch but was spotted on the iOS App Store by analyst Matt Navarra.
The App Store description suggests Meta is building it as a rival to Reddit. Forum is “a dedicated space built for deeper discussions, real answers, and the communities you care about,” the company says.
Once you log in to the app using your Facebook account, you’ll be greeted with a feed of updates from Groups you’ve already joined. You can also search for and join new groups based on your interests.
What makes the app a bit more Reddit-like is that you can publish comments or posts under a nickname. Note that everything you share on Forum will also be visible to Group members via Facebook.
This Tweet is currently unavailable. It might be loading or has been removed.
There’s also an AI-powered Ask tab for quick answers from groups across Forum. It is the second option from the right on the bottom navigation bar, and you can tap it to seek the AI's opinions and recommendations.
Similar to querying on chatbot apps, you can drop a question into Ask, and it will pull up curated responses based on comments made by “real people” across Facebook Groups. It will also let you join those groups.
Group Admins get an additional AI feature: an AI assistant. It can help them “manage groups, moderate content, and keep their groups healthy,” Meta says.
For now, the app and its features may not be available in all regions. “We test lots of new products publicly to see what people find interesting and useful to their experiences across our apps,” a company spokesperson tells Navarra. The analyst has also shared videos and screenshots of the app interface on X and Threads.
About Our Expert
Jibin is a tech news writer based out of Ahmedabad, India. Previously, he served as the editor of iGeeksBlog and is a self-proclaimed tech enthusiast who loves breaking down complex information for a broader audience.
- iOS 27 May Finally Add Native Support for Google Cast, But There's a Catch
- New watchOS 27 Rumor Tips Better Heart-Rate Tracking, Delayed AI Health Coach
- How to Watch the Formula 1 Canadian Grand Prix 2026 for Free
- Spotify Will Soon Let You Create AI-Generated 'Personal Podcasts'
- Kansas City Schools Swap Chromebooks, PCs for MacBook Neos in 'All-Apple' Shift
Choosing the Right Graph
RDF/OWL graphs excel for formal, interoperable knowledge and reasoning, while labeled property graphs are better for fast traversal and developer-friendly analytics, although RDF 1.2 is closing the feature gap.
Decoder
- RDF (Resource Description Framework): A W3C standard for describing information as a graph of subject-predicate-object triples. It's a foundation for the Semantic Web.
- OWL (Web Ontology Language): A W3C standard designed for representing rich and complex knowledge about things, groups of things, and relations between things. It builds on RDF and adds capabilities for defining classes, properties, and constraints.
- Labeled Property Graph (LPG): A graph model where nodes and relationships (edges) can have properties (key-value pairs) and labels, making it flexible for many applications, popular in systems like Neo4j.
- Provenance: Information concerning the origin and history of a piece of data or an object, including where it came from and how it was created, processed, and delivered.
- Linked Data: A method of publishing structured data so that it can be interlinked and become more useful through semantic queries. It builds on standard web technologies like HTTP and RDF.
- Statement Annotations: A feature in RDF 1.2 that allows adding metadata to triples (statements), similar to how properties are added to edges in labeled property graphs, narrowing the functional gap between the two models.
Original article
RDF/OWL is better for governed, interoperable knowledge with formal meaning, reasoning, provenance, and linked-data publishing. Labeled property graphs are better for fast traversal, rich edge properties, and developer-friendly graph analytics, though RDF 1.2 narrows the gap with native statement annotations.
Of Hammers and Nails: What AI Can and Cannot Do for a Data Analyst
AI assists data analysts in coding and data prep, but its inconsistency means human judgment, clean data, and deep context remain essential for reliable, trustworthy analytical insights.
Original article
AI helps data analysts write code, prep data, and draft analysis faster, but it is still too inconsistent for trusted ad hoc answers. Good analysis still needs clean data, context, judgment, and human knowledge.
Staff Designers Aren't About Shipping the Best Work. That's the Point
Staff designers provide direction and enable team output rather than creating individual designs, a challenging shift from senior individual contributor roles.
Original article
Staff designers create value through direction rather than individual output, focusing on setting design priorities, quality standards, and system consistency while coaching others. The transition from senior to staff requires moving away from personally solving the hardest design problems and instead enabling the team to ship better work collectively. Many strong senior designers struggle with this shift because they must outgrow the individual contributor skills that made them successful.
What we lost in the AI chat stream
AI chat tools hinder critical thinking by trapping useful insights in long, unreviewed streams, reducing deliberate problem framing and reflection.
Original article
AI chat tools are powerful for brainstorming and refining ideas, but chat histories are poor at preserving meaningful work because they trap useful insights inside long streams of iterative back-and-forth that people rarely revisit. Relying too heavily on AI can reduce critical thinking and problem framing, especially when users skip the deliberate sketching and reflection that traditionally helped shape ideas. AI works best as a production tool after humans have already clarified the problem, structure, and intent themselves.
Frontier AI for Motion Design (Website)
Motion.so launched an AI motion graphics studio, promising to generate and iterate on designs in minutes.
Original article
Motion is an AI motion graphics studio to create and iterate on graphics in minutes.
Technical readiness and creative bravery: Instrument agency's formula for leading the charge in design
Design and technology company Instrument, founded in 2005, leverages AI for rapid prototyping and automation while emphasizing that human creativity, taste, and original perspective remain irreplaceable.
Original article
Full article content is not available for inline reading.
Why Workplace Design is Becoming Central to Business Performance
Modern workplace design has become a strategic business asset, moving beyond aesthetics to actively shape employee experience, collaboration, and organizational performance, especially in hybrid work models.
Deep dive
- Workplace design is no longer an afterthought but a strategic business advantage, directly influencing collaboration, focus, and organizational performance.
- Modern offices must adapt to fluid work patterns, accommodating deep individual tasks, group brainstorming, virtual meetings, and informal interactions.
- Key elements of modern design include true flexibility, spaces that spark creativity, effortless technology integration, comfort, and operational flow.
- Employee experience is a top design priority, as the physical environment signals how much an organization values its people, impacting engagement, morale, and retention.
- Hybrid work has transformed the office into a vital hub for human connection, brainstorming, and innovation, requiring adaptable zones for collaboration, focus, and casual interactions.
- Technology integration is crucial for seamless hybrid meetings, cloud collaboration, and flexible connectivity, with designers building adaptability into core projects.
- Progressive companies view design as integral to organizational strategy, evaluating how environments drive productivity, nurture innovation, and reinforce positive culture.
- Flexibility is essential for responding to changing team sizes, technologies, and work preferences, while sustainability focuses on energy use, waste reduction, and healthier materials.
- C-suite leaders are now directly engaged in workplace strategy discussions, recognizing its influence on team collaboration, innovation, and market adaptability.
- Future workplaces will balance flexibility, collaboration, technology, well-being, and operational efficiency, serving as strategic tools for talent attraction and cultural strength.
Original article
Full article content is not available for inline reading.
Apple's anniversary edition iPhone leaks in dreamy renders and I can't wait for its 2027 debut
Apple is reportedly developing a radically redesigned "iPhone XX" for 2027, featuring a heavily curved glass display and solid-state buttons.
Original article
Apple is reportedly developing a radically redesigned anniversary iPhone for 2027, unofficially dubbed the “iPhone XX” or “iPhone 20,” featuring a heavily curved glass display, rounded chassis, and a more futuristic look than current models. Rumors also point to under-display Face ID, a smaller camera cutout, thinner display tech, solid-state buttons, and upgraded camera sensors, though some prototypes oddly show only two rear cameras, suggesting it may be a special-edition device rather than a standard Pro model.