DEVOURED

deepseek v4 pro 75 percent price cut permanent

AI llmpricingcompetition The Next Web

DeepSeek has permanently slashed prices for its V4 Pro LLM by 75%, making it significantly cheaper than models from OpenAI, Anthropic, and Google.

What: DeepSeek, a Chinese AI startup, permanently reduced the price of its V4 Pro model to $0.87 per million output tokens, a 75% cut from its previous pricing. This undercuts OpenAI's GPT-5 ($10/M output), Anthropic's Claude Opus 4.7 ($25/M output), and even Google's Gemini 3.5 Flash ($0.60/M output). The V4 Pro also supports a one-million-token context window.

Why it matters: This aggressive pricing signals a focus on market share over per-unit revenue and accelerates the commoditization trend in the LLM industry, potentially forcing Western competitors to lower prices further and bifurcating the market.

Takeaway: CTOs evaluating LLMs for long-context applications should reconsider DeepSeek V4 Pro for its cost savings, weighing it against geopolitical concerns and unresolved IP allegations from Anthropic.

Deep dive

DeepSeek V4 Pro's permanent price cut is a 75% reduction.
New output token pricing is $0.87 per million, down from $3.48.
This is significantly cheaper than GPT-5 ($10/M output), Claude Opus 4.7 ($25/M output), and Gemini 3.5 Flash ($0.60/M output).
The model supports a one-million-token context window, ideal for long document processing.
The article highlights an unresolved accusation from Anthropic that DeepSeek used “distillation attacks” (improperly trained on Claude's responses).
This move pressures Anthropic's revenue-per-token economics and valuation trajectory.
It intensifies the existing trend of LLM price commoditization seen with Google Gemini and OpenAI's shift to consumer features.
Enterprise buyers face a dilemma: significant cost savings versus geopolitical risks and IP provenance concerns associated with a Chinese provider.

Decoder

Distillation attack: A method where one AI model is trained to mimic the outputs of another, often larger or more capable, model, potentially leveraging its intellectual property.

Original article

TL;DR

DeepSeek permanently cut V4 Pro prices by 75%, to $0.87 per million output tokens. It undercuts GPT-5, Gemini, and Claude.

DeepSeek has made permanent the 75% price discount on its flagship V4 Pro model. The promotion was originally scheduled to expire on 31 May. The Chinese AI startup’s pricing now ranges from $0.003625 to $0.87 per million tokens, down from $0.0145 to $3.48.

The price points are striking in context. OpenAI’s GPT-5 charges $2.50 per million input tokens and $10 per million output tokens. Anthropic’s Claude Opus 4.7 is priced at $5 input and $25 output.

Google’s Gemini 3.5 Flash, its cost-optimised model, charges $0.15 input and $0.60 output per million tokens. DeepSeek V4 Pro’s new permanent pricing sits below all of them. The gap is widest against the frontier reasoning models that enterprise customers rely on for demanding workloads.

The decision to lock in the discount one month after launching the V4 models suggests DeepSeek is prioritising market share over per-unit revenue. The company described V4 as welcoming the “era of cost-effective 1M context length.” It is positioning its models as the default for applications that process large documents, codebases, or conversational histories where token costs compound fast.

For enterprise accounts consuming millions of tokens daily, the savings are material. Salesforce projects $300 million in Anthropic token spending this year. At DeepSeek’s new pricing, an equivalent volume would cost a fraction of that figure.

The question for enterprise buyers is whether DeepSeek’s model quality, reliability, and compliance posture justify the switch. The price advantage may be offset by the geopolitical and technical risks of routing sensitive workloads through a Chinese AI provider. That calculus varies by industry and by the sensitivity of the data involved.

The competitive dynamics are complicated by Anthropic’s public accusation that DeepSeek has engaged in “distillation attacks.” The allegation is that DeepSeek improperly trained on Claude’s responses to improve its own models. DeepSeek has not publicly addressed the accusation in detail.

If substantiated, it would mean that some of DeepSeek’s capability advantage was built on Anthropic’s research investment. The price differential would then reflect intellectual property arbitrage rather than engineering efficiency. The accusation remains unresolved.

Anthropic’s annualised revenue surged from $9 billion to $30 billion between the end of 2025 and early April 2026. That growth was driven largely by enterprise adoption of Claude Code. DeepSeek’s pricing pressure threatens the revenue-per-token economics that support Anthropic’s valuation trajectory.

If enterprise customers begin routing lower-complexity tasks to DeepSeek while reserving Claude for high-stakes reasoning, Anthropic’s token volume could hold while revenue per token declines. The broader AI pricing landscape has been moving toward commoditisation throughout 2026. Google has repeatedly cut Gemini prices to compete with open-weight models.

OpenAI’s pivot toward consumer platform features, including personal finance tools and advertising, reflects a recognition that API token revenue alone may not sustain its $852 billion valuation. DeepSeek’s permanent price cut accelerates a trend that was already compressing margins across the industry. The era of high-margin AI tokens may be ending faster than anyone expected.

DeepSeek V4 Pro supports a one-million-token context window at the new pricing. That makes it competitive for document analysis, legal review, and codebase comprehension. These are the long-context applications where input cost is the binding constraint on adoption.

The combination of frontier-adjacent capability and radically lower pricing creates a genuine dilemma for CTOs. The cheapest option is also the one with the most geopolitical complexity. It has the least transparency about training data provenance and an unresolved IP accusation from one of its most capable competitors.

DeepSeek’s strategy appears to be that price will win. Enough volume will flow to the cheapest capable model regardless of origin. The geopolitical concerns that constrain adoption in government and regulated industries will not prevent adoption in the broader market.

Whether that bet is correct depends on whether Western AI companies can close the price gap before DeepSeek closes the capability gap. The alternative is that the market bifurcates into a Western tier and a Chinese tier with fundamentally different economics. DeepSeek just made sure the gap between them got wider.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

DEVOURED

The 2026-07-28 MCP Specification Release Candidate

AI backendapiagentsprotocol Model Context Protocol Blog

The Model Context Protocol (MCP) is releasing a major specification update on July 28, 2026, introducing a stateless core, an extensions framework, and hardened authorization.

What: The release candidate for MCP 2026-07-28 is now available, featuring its largest revision since launch. Key changes include a new stateless protocol core that removes session IDs and handshakes, enabling easier scaling on HTTP infrastructure, a formal extensions framework (including MCP Apps for server-rendered UIs and the Tasks extension for long-running work), and authorization hardening aligned with OAuth and OpenID Connect. It also includes a formal deprecation policy and lifts tool schemas to full JSON Schema 2020-12. This release contains breaking changes.

Why it matters: This revision aims to make MCP more scalable and robust for real-world deployments by aligning it with common HTTP and authorization patterns, while also providing a structured way for the protocol to evolve through extensions and a clear deprecation policy.

Takeaway: If you are developing or deploying systems using MCP, review the 2026-07-28 release candidate immediately to prepare for breaking changes and leverage new features like the stateless core and extensions framework before the final specification ships on July 28, 2026.

Deep dive

The MCP 2026-07-28 release candidate is the largest revision since the protocol's launch.
The core protocol is now stateless, eliminating the need for session IDs and handshakes, allowing scaling with plain round-robin load balancers.
Protocol version, client info, and capabilities now travel in _meta on every request.
A formal Extensions framework is introduced, allowing capabilities like MCP Apps (server-rendered UIs in iframes) and the Tasks extension (for long-running operations) to evolve independently.
Authorization is hardened, aligning more closely with OAuth 2.0 and OpenID Connect best practices, including validation of iss parameter and client registration improvements.
Three core features (Roots, Sampling, Logging) are deprecated, with replacements suggested.
Tool inputSchema and outputSchema now support full JSON Schema 2020-12, including composition and references.
The error code for a missing resource changes from -32002 to the JSON-RPC standard -32602.
A new feature lifecycle policy ensures at least twelve months between deprecation and removal for future changes.
The final specification is scheduled for publication on July 28, 2026.

Decoder

Model Context Protocol (MCP): A protocol designed for AI agents and models to interact with tools and services, enabling structured communication and task execution.
Stateless protocol: A communication protocol where each request from client to server contains all the information needed to understand the request, and the server does not store any session information about the client.
JSON Schema 2020-12: A standard for describing the structure of JSON data, allowing for validation and documentation of JSON objects.

Original article

The release candidate for MCP 2026-07-28 is now available. It is the largest revision of the protocol since launch and delivers on the 2026 roadmap:

a stateless core that scales on ordinary HTTP infrastructure
extensions including server-rendered UIs through MCP Apps and long-running work through the Tasks extension
authorization that aligns more closely with OAuth and OpenID Connect deployments
a formal deprecation policy so the protocol can evolve without breaking what you’ve built,

and many other changes.

The practical effect on a production deployment is immediate. A remote MCP server that previously needed sticky sessions, a shared session store, and deep packet inspection at the gateway can now run behind a plain round-robin load balancer, route traffic on an Mcp-Method header, and let clients cache tools/list responses for as long as the server’s ttlMs permits.

The release candidate is available today and the final specification ships on July 28, 2026. This release contains breaking changes; see Release Timeline and Validation for the details.

A Stateless Protocol

The headline change is that MCP is now stateless at the protocol layer. Six Specification Enhancement Proposals (SEPs) work together to get there, completing the plan we laid out in The Future of MCP Transports in December.

Before: a client routed through a load balancer with a sticky route to one MCP server instance, all instances sharing a session store. After: the same client routed to any of three MCP server instances with no session store.

Before and after

In 2025-11-25, calling a tool over Streamable HTTP means establishing a session first:

POST /mcp HTTP/1.1
Content-Type: application/json

{"jsonrpc":"2.0","id":1,"method":"initialize",
 "params":{"protocolVersion":"2025-11-25","capabilities":{},
           "clientInfo":{"name":"my-app","version":"1.0"}}}

The server responds with an Mcp-Session-Id that every subsequent request must carry, pinning the client to whichever instance issued it:

POST /mcp HTTP/1.1
Mcp-Session-Id: 1868a90c-3a3f-4f5b
Content-Type: application/json

{"jsonrpc":"2.0","id":2,"method":"tools/call",
 "params":{"name":"search","arguments":{"q":"otters"}}}

In 2026-07-28, the same call is a single self-contained request that any server instance can handle:

POST /mcp HTTP/1.1
MCP-Protocol-Version: 2026-07-28
Mcp-Method: tools/call
Mcp-Name: search
Content-Type: application/json

{"jsonrpc":"2.0","id":1,"method":"tools/call",
 "params":{"name":"search","arguments":{"q":"otters"},
           "_meta":{"io.modelcontextprotocol/clientInfo":{"name":"my-app","version":"1.0"}}}}

The handshake and session are gone

The initialize/initialized handshake is removed (SEP-2575). The protocol version, client info, and client capabilities that used to be exchanged once at connection time now travel in _meta on every request, and a new server/discover method lets clients fetch server capabilities when they need them up front.

The Mcp-Session-Id header and the protocol-level session that came with it are also removed (SEP-2567). With both gone, any MCP request can land on any server instance, and the sticky routing and shared session stores that horizontal deployments needed before are no longer required at the protocol layer.

Stateless protocol, stateful applications

Removing the protocol-level session does not mean your application has to be stateless. Servers that need to carry state across calls can do what HTTP APIs have always done: mint an explicit handle (a basket_id, a browser_id) from a tool and have the model pass it back as an ordinary argument on later calls.

Sequence diagram: the model calls create_basket, the MCP server returns a basket_id, and the model passes that same basket_id back as an argument to add_item.

In practice, we’ve found this pattern (the model threading an identifier from one tool call to the next) to be more than just a workable substitute for session state. It’s often a more powerful one. The model can compose handles across tools, reason about them, and hand them off between steps in ways that externally managed session state, hidden in transport metadata, never really allowed.

The protocol no longer manages that state for you, but it doesn’t prevent you from managing it yourself. The explicit-handle pattern simply makes the state visible to the model rather than hidden away.

Server-to-client requests, restructured

A stateless protocol still needs a way for servers to ask the client for something mid-call, such as an elicitation prompt. Two SEPs rebuild that flow so it works without a persistent connection.

Server-initiated requests may now only be issued while the server is actively processing a client request (SEP-2260). Earlier spec versions recommended this; it’s now required. A user is never prompted out of nowhere, and every elicitation traces back to something they (or their agent) started.

Multi Round-Trip Requests (SEP-2322) change how those prompts are delivered. Instead of holding a Server-Sent Events (SSE) stream open, the server returns an InputRequiredResult:

{
  "resultType": "inputRequired",
  "inputRequests": {
    "confirm": {
      "type": "elicitation",
      "message": "Delete 3 files?",
      "schema": { "type": "boolean" }
    }
  },
  "requestState": "eyJzdGVwIjoxLCJmaWxlcyI6WyJhIiwiYiIsImMiXX0="
}

The client gathers the answers and re-issues the original call with inputResponses and the echoed requestState. Any server instance can pick that retry up because everything it needs is in the payload.

Routable, cacheable, traceable

Three smaller changes make the resulting traffic easier to operate.

The Streamable HTTP transport now requires Mcp-Method and Mcp-Name headers (SEP-2243) so load balancers, gateways, and rate-limiters can route on the operation without inspecting the body. Servers reject requests where the headers and body disagree.

List and resource read results now carry ttlMs and cacheScope (SEP-2549), modeled on HTTP Cache-Control. Clients know exactly how long a tools/list response is fresh and whether it’s safe to share across users, and a long-lived SSE stream is no longer the only way to learn that a list changed.

W3C Trace Context propagation in _meta is now documented (SEP-414), locking down the traceparent, tracestate, and baggage key names so distributed traces correlate across SDKs and gateways. Several SDKs and tools were already doing this; with the key names fixed in the spec, a trace that starts in a host application can follow a tool call through the client SDK, the MCP server, and whatever the server calls downstream, and show up as a single span tree in an OpenTelemetry-compatible backend.

Extensions Become First-Class

Extensions existed in the 2025-11-25 release but had no formal process behind them. SEP-2133 adds that: extensions are identified by reverse-DNS IDs, negotiated through an extensions map on client and server capabilities, live in their own ext-* repositories with delegated maintainers, and version independently of the specification. A new Extensions Track in the SEP process gives them a path from experimental to official.

This release includes two official extensions.

MCP Apps: server-rendered user interfaces

MCP Apps (SEP-1865) lets servers ship interactive HTML interfaces that hosts render in a sandboxed iframe. Tools declare their UI templates ahead of time so hosts can prefetch, cache, and security-review them before anything runs. The rendered UI talks back to the host over the same JSON-RPC base protocol used everywhere else in MCP, so every UI-initiated action goes through the same audit and consent path as a direct tool call.

Tasks graduates to an extension

Tasks shipped as an experimental core feature in 2025-11-25. Production use surfaced enough redesign that the right home for it is an extension rather than the specification.

The Tasks extension reshapes the lifecycle around the stateless model: a server can answer tools/call with a task handle, and the client drives it with tasks/get, tasks/update, and tasks/cancel. Task creation is server-directed: the client advertises the extension and the server decides when a call should run as a task. tasks/list is removed because it can’t be scoped safely without sessions.

Anyone who shipped against the 2025-11-25 experimental Tasks API will need to migrate to the new lifecycle.

Authorization Hardening

Six SEPs harden the authorization specification to align more closely with how OAuth 2.0 and OpenID Connect are deployed in practice.

Clients must now validate the iss parameter on authorization responses per RFC 9207 (SEP-2468). This is a low-cost mitigation for a class of mix-up attack that is more prevalent in MCP’s single-client, many-server deployment pattern. In a future version, clients will be expected to reject responses that omit iss, so authorization servers should begin supplying it now if they don’t already.

Clients now declare their OpenID Connect application_type during Dynamic Client Registration (SEP-837), avoiding the common case where an authorization server defaults a desktop or CLI client to "web" and rejects its localhost redirect URI. Clients bind registered credentials to the issuing authorization server’s issuer and re-register when a resource migrates between authorization servers (SEP-2352). The spec also documents how to request refresh tokens from OpenID Connect-style authorization servers (SEP-2207), and clarifies scope accumulation during step-up (SEP-2350) and the .well-known discovery suffix (SEP-2351).

Roots, Sampling, and Logging Are Deprecated

Three core features are deprecated under the new feature lifecycle policy (SEP-2577):

Feature	Replacement
Roots	Tool parameters, resource URIs, or server configuration
Sampling	Direct integration with LLM provider APIs
Logging	`stderr` for stdio transports; OpenTelemetry for structured observability

These are annotation-only deprecations. The methods, types, and capability flags continue to work in this release and in every specification version published within a year of it, and removing any of them will require a separate SEP under the lifecycle policy.

Full JSON Schema 2020-12 for Tools

Tool inputSchema and outputSchema are lifted to full JSON Schema 2020-12 (SEP-2106). Input schemas keep the type: "object" root constraint but now allow composition (oneOf, anyOf, allOf), conditionals, and references ($ref, $defs). Output schemas are unrestricted, and structuredContent can now be any JSON value rather than only an object. Implementations must not auto-dereference external $ref URIs and should bound schema depth and validation time.

Separately, the error code for a missing resource changes from the MCP-custom -32002 to the JSON-RPC standard -32602 Invalid Params (SEP-2164). If your client matches on the literal -32002 value, update it.

How the Protocol Evolves From Here

This release contains breaking changes. We don’t intend for that to be the norm.

Three governance SEPs in this release are designed so that future revisions can evolve the protocol without breaking core capabilities. The feature lifecycle policy gives every feature an Active, Deprecated, and Removed lifecycle with at least twelve months between deprecation and the earliest possible removal. The Extensions framework means new capabilities can ship as opt-in extensions and stabilize there before, if ever, moving into the specification. And a Standards Track SEP can no longer reach Final status until a matching scenario lands in the conformance suite (SEP-2484), which is the same suite the new SDK tier system scores official SDKs against.

The stateless rework in this release is the kind of foundational change that needed a clean break. With it landed, and with deprecation windows and extensions as the standard tools going forward, our expectation is that implementers targeting 2026-07-28 will be able to adopt future revisions without rewriting their transport or lifecycle code.

Release Timeline and Validation

The release candidate is locked as of May 21, 2026. The final specification will be published on July 28, 2026. The ten-week window is for SDK maintainers and client implementers to validate the changes against real workloads; under the SDK tier system, Tier 1 SDKs are expected to ship support within this window.

The full release candidate is in the draft specification, and the changelog will list every change against 2025-11-25.

If you find a problem, open an issue in the specification repository. For implementation questions, the relevant Working Group channel in the contributor Discord is the fastest path to an answer.

Looking Ahead

This release gives MCP the foundation we expect it to grow on for a long time: a protocol that runs statelessly on commodity HTTP infrastructure, an extensions framework where capabilities like Tasks and MCP Apps can ship on their own timeline, and a lifecycle policy that lets implementers build on 2026-07-28 knowing what they ship will keep working.

Thank you to everyone who shaped these proposals through the Working Groups and a great deal of patient review. We’re looking forward to making this final with the community on July 28.

DEVOURED

Evaluating Multi-Agent Systems at Scale

AI agentsresearchdata OpenAI

OpenAI has introduced a "macro-evaluation" workflow designed to analyze recurring behavioral patterns across entire populations of multi-agent system traces, rather than focusing on isolated failures.

What: OpenAI's new cookbook details a macro-evaluation workflow for agentic systems, which analyzes patterns across many agent runs. This approach helps identify systemic issues like late handoffs or repeated missed signals by specialist agents, using a synthetic EV order workflow as an example. The method involves running lower-level evals on individual traces, compacting traces into documents, discovering recurring patterns, and drilling down into high-impact patterns.

Why it matters: As agentic systems become more complex, traditional single-trace debugging is insufficient. This macro-evaluation approach provides a critical methodology for AI engineering teams to understand systemic failures and improve multi-agent orchestration at scale, moving beyond individual incidents to population-level insights.

Takeaway: If you are developing or managing multi-agent AI systems, implement a macro-evaluation strategy to identify and address systemic issues across your agent population rather than solely debugging individual trace failures.

Deep dive

OpenAI proposes a macro-evaluation workflow for multi-agent systems.
This approach focuses on analyzing patterns across entire populations of agent traces, not just individual failures.
It helps identify systemic problems like repeated missed signals or incorrect handoffs between specialist agents.
The workflow involves generating/collecting many traced agent runs, running lower-level evals on each, turning traces into compact documents, discovering recurring behavior patterns, and drilling into high-impact patterns.
A synthetic EV order workflow, involving specialist agents for pricing, compliance, supply, etc., serves as the example.
The notebook uses precomputed synthetic traces and saved lower-level eval labels, allowing execution without an OpenAI API key.
It distinguishes between lower-level evals (grading individual agents/actions) and macro evals (looking across many findings for patterns).
Key reader-facing labels used are case_type, run_outcome, eval_finding, and behavior_pattern.
The goal is to translate thousands of agent events into a small number of patterns understandable by both technical and business stakeholders.

Decoder

Agentic system: An AI system composed of multiple interconnected AI agents that collaborate and delegate tasks to achieve a larger goal.
Macro-evaluation: A method of evaluating AI systems, especially multi-agent ones, by analyzing aggregate patterns and recurring behaviors across a large dataset of traces, rather than focusing on individual instance failures.
Trace (in AI agents): A detailed log or record of an agent's internal thought process, actions, tool calls, and interactions throughout the execution of a task.
Promptfoo: An open-source tool for testing and evaluating LLM prompts and agentic systems.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Anthropic plans Claude memory update with new Memory Files

AI llmresearch TestingCatalog

Anthropic is preparing a major Claude memory update, introducing "Memory Files" that distribute conversational context across structured documents, akin to a personal wiki, and a "Dreams" feature for asynchronous memory consolidation.

What: Anthropic is developing a dual-mode memory system for Claude, offering "Classic" (single summarized note) and "Memory Files" options. Memory Files will organize conversational notes into multiple structured documents by topic, project, or context, effectively creating a built-in personal wiki. This system mirrors agentic solutions like OpenClaw and Hermes, and integrates with "Dreams," an asynchronous process that reviews and reorganizes memory files.

Why it matters: This shift from a single, rolling context summary to a structured, file-based memory system represents a significant architectural evolution for LLMs, enabling more durable, scalable, and intelligent personal context management and paving the way for more sophisticated, persistent AI agents that can maintain complex user histories.

Takeaway: If you use Claude, anticipate a more sophisticated and persistent memory system in the future that could significantly improve its ability to recall and organize past interactions, potentially making it more useful for long-term projects.

Deep dive

Anthropic is testing a "Memory Files" feature for Claude, moving beyond a single summarized note for user context.
This new system will distribute Claude's notes across multiple structured documents, categorized by topic, project, or context.
The approach is designed to function like a "built-in personal wiki" that Claude can consult selectively.
It is similar to memory architectures found in agentic solutions like OpenClaw and Hermes, which use filesystem-style memory.
"Memory Files" will allow Claude to have a larger and more durable record of each user without overwhelming its context window.
This memory overhaul is likely a preparation for the debut of the Claude Conway agent.
A related feature called "Dreams" is also being rolled out, which performs scheduled, asynchronous passes over memory files to merge duplicates, resolve contradictions, and surface patterns.
"Dreams" is compared to REM sleep consolidation, producing a reorganized version of memory while leaving the original untouched.
The Dreams feature is currently in limited beta for Claude Managed Agents on the developer platform, scoped to Opus 4.7 and Sonnet 4.6.
No firm timeline for public release of Memory Files or Dreams in consumer Claude products has been announced.
The memory rework is considered the most consequential upcoming change, aiming to put Claude on par with rivals' persistent-memory architectures while maintaining user control.

Decoder

Agentic solutions: AI systems designed to perform a series of actions autonomously, often over extended periods, to achieve a goal.
Context window: The maximum amount of text an LLM can process or "remember" at one time during a conversation.
Dreams (Anthropic): An asynchronous process that runs on Claude's accumulated memory files to consolidate information, resolve contradictions, and identify patterns, akin to human sleep for memory consolidation.
Memory Files (Claude): Anthropic's new structured memory system for Claude, organizing notes into distinct documents by topic or context to improve long-term recall.
Persistent memory: An AI's ability to retain and recall information about past interactions across multiple sessions.
Rolling summary: A continuously updated, single condensed note that attempts to capture the essence of a user's interaction history in a compact form.
Token: The basic unit of data (like words or sub-words) that an LLM processes.

Original article

Anthropic appears to be preparing a substantial overhaul of how Claude remembers users across sessions, with early signals pointing to a dual-mode memory system that would let people choose between the current setup and a more sophisticated file-based architecture. The existing arrangement, framed internally as the “classic” option, condenses what Claude learns about a person into a single, summarized note. The forthcoming alternative, referred to as “Memory Files,” would distribute those notes across multiple structured documents organized by topic, project, or context. This feature is likely a new iteration of earlier discovered "Knowledge Bases".

Organized notes Claude writes as you chat and reads when they're relevant. Browse and edit them anytime.

The approach mirrors what is already powering always-on agentic solutions such as OpenClaw and Hermes, both of which rely on filesystem-style memory to scale beyond the limits of a single rolling summary. By splitting memory into discrete files, Anthropic would be able to give Claude a far larger and more durable record of each user without overwhelming the context window. In practice, it would function as a built-in personal wiki that the assistant can consult selectively depending on the topic under discussion.

Tied to this shift is the prospect that Dreams, a feature Anthropic only recently began rolling out to its Claude Managed Agents on the developer platform, eventually arrives in the consumer Claude product. Dreams runs as a scheduled, asynchronous pass over accumulated memory files, merging duplicates, replacing stale entries with fresh values, resolving contradictions, and surfacing patterns the model missed during live sessions. Anthropic has compared the process to REM sleep consolidation, with the original store left untouched while a reorganized version is produced for review.

On a similar note, Claude Conway agent is expected to arrive soon as well, and it is quite possible that Memory Files feature is part of the preparation for Conway's debut.

No firm timeline has surfaced yet, and Dreams itself remains in limited beta on the platform side, currently scoped to Opus 4.7 and Sonnet 4.6. Smaller UI tweaks are being prepared in parallel, but the memory rework stands out as the most consequential piece of what is coming next, placing Claude on a more competitive footing with the persistent-memory architectures that rivals have been building toward while preserving Anthropic’s stated emphasis on user control over what the model retains.

DEVOURED

A hacker group is poisoning open source code at an unprecedented scale

Tech securityopensourcedevops Ars Technica

Hacker group TeamPCP is relentlessly poisoning hundreds of open-source tools, executing supply chain attacks at an unprecedented scale, even breaching GitHub through a VSCode extension.

What: The financially motivated hacker group TeamPCP has launched over 20 waves of supply chain attacks in recent months, corrupting more than 500 open-source software pieces by embedding malware, even breaching GitHub by compromising a developer's VSCode extension to access 3,800 GitHub code repositories. The group uses a self-spreading worm, Mini Shai-Hulud, to steal credentials and publish malicious versions of development tools, and has also targeted OpenAI, Mercor, and Mistral AI, often deploying ransomware or selling stolen data.

Why it matters: This unprecedented scale of supply chain attacks by TeamPCP highlights a critical and escalating vulnerability in the open-source software ecosystem, forcing developers and organizations to fundamentally re-evaluate their trust models and update processes for third-party dependencies.

Takeaway: Rotate all Gitlab, GitHub, AWS, Azure, GCP, Alibaba, and Oracle personal access tokens and credentials. Implement "age-gating" for open-source tool updates, vetting new versions before immediate deployment, and avoid auto-updates for critical dependencies.

Deep dive

TeamPCP has conducted over 20 "waves" of supply chain attacks, corrupting more than 500 distinct open-source software packages in recent months.
The group breached GitHub by compromising a developer's VSCode extension, gaining access to approximately 3,800 GitHub code repositories containing GitHub's own code.
TeamPCP claims to be selling GitHub's source code and internal organization data on BreachForums.
Their core tactic involves gaining access to development networks, planting malware in commonly used open-source tools, and then using stolen credentials to publish malicious versions of other tools, creating a self-perpetuating cycle.
The group has automated many attacks using a self-spreading worm known as Mini Shai-Hulud, which steals encrypted credentials.
Previous victims include OpenAI, data contracting firm Mercor, the European Commission's public website, Trivy, LiteLLM, Checkmarx, pgserve, TanStack, and Mistral AI.
TeamPCP is financially motivated, deploying ransomware or data extortion, and is willing to sell victims' data.
Experts like Ben Read (Wiz) and Philipp Burckhardt (Socket) emphasize the need for better security hygiene, including rotating authentication tokens and vetting open-source updates before deployment (e.g., "age-gating").

Decoder

Supply chain attack: A cyberattack that targets less secure elements in a supply chain, such as software components, to gain access to the main target.
VSCode extension: A plug-in for Microsoft's Visual Studio Code integrated development environment (IDE) that adds functionality.
Ransomware-as-a-service (RaaS): A business model where ransomware developers offer their tools and infrastructure to affiliates in exchange for a cut of the ransom payments.
Infostealer: A type of malware designed to search for and steal sensitive information from a compromised computer.

Original article

A so-called software supply chain attack, in which hackers corrupt a legitimate piece of software to hide their own malicious code, was once a relatively rare event but one that haunted the cybersecurity world with its insidious threat of turning any innocent application into a dangerous foothold in a victim’s network. Now one group of cybercriminals has turned that occasional nightmare into a near-weekly episode, corrupting hundreds of open source tools, extorting victims for profit, and sowing a new level of distrust in an entire ecosystem used to create the world’s software.

On Tuesday night, open source code platform GitHub announced that it had been breached by hackers in one such software supply chain attack: A GitHub developer had installed a “poisoned” extension for VSCode, a plug-in for a commonly used code editor that, like GitHub itself, is owned by Microsoft. As a result, the hackers behind the breach, an increasingly notorious group called TeamPCP, claim to have accessed around 4,000 of GitHub’s code repositories. GitHub’s statement confirmed that it had found at least 3,800 compromised repositories while noting that, based on its findings so far, they all contained GitHub’s own code, not that of customers.

“We are here today to advertise GitHub’s source code and internal orgs for sale,” TeamPCP wrote on BreachForums, a forum and marketplace for cybercriminals. “Everything for the main platform is there and I very am happy to send samples to interested buyers to verify absolute authenticity.”

The GitHub breach is just the latest incident in what has become the longest-running spree of software supply chain attacks ever, with no end in sight. According to cybersecurity firm Socket, which focuses on software supply chains, TeamPCP has, in just the last few months, carried out 20 “waves” of supply chain attacks that have hidden malware in more than 500 distinct pieces of software, or well over a thousand counting all of the various versions of the code that TeamPCP has hijacked.

Those tainted pieces of code have allowed TeamPCP’s hackers to breach hundreds of companies that installed the software, says Ben Read, who leads strategic threat intelligence at the cloud security firm Wiz. GitHub is only the latest on the group’s long list of victims, which has also included AI firm OpenAI and the data contracting firm Mercor. “It may be their biggest one,” Read says of the GitHub breach. “But each one of these is a big deal for the company that it happens to. It’s not qualitatively different from the 14 breaches that happened last week.”

TeamPCP’s core tactic has become a kind of cyclical exploitation of software developers: The hackers gain access to a network where an open source tool commonly used by coders is being developed—for example, the VSCode extension that led to the GitHub breach or the data visualization software AntV that TeamPCP hijacked earlier this week. The hackers plant malware in the tool that ends up on other software developers’ machines, including some who are writing other tools intended to be used by coders.

The malware allows TeamPCP’s hackers to steal credentials that let them publish malicious versions of those software development tools, too. The cycle repeats, and TeamPCP’s collection of breached networks grows. “It’s a flywheel of supply chain compromises,” says Read. “It’s self-perpetuating, and it’s been a hugely successful way to get access to networks and steal stuff.”

Most recently, the group appears to have automated many of its software supply chain attacks with a self-spreading worm that’s come to be known as Mini Shai-Hulud. The name comes from GitHub repositories the worm creates that include encrypted credentials stolen from victims, each of which includes the phrase “A Mini Shai-Hulud Has Appeared” along with a handful of other references to the sci-fi novel Dune. That message in turn appears to be a reference not just to Dune’s sandworms but to a similar supply chain compromise worm known as Shai-Hulud that appeared in September, though there’s no evidence TeamPCP was behind that earlier self-spreading malware.

“They’re definitely going for big exposure. They really care about getting big attention,” says Philipp Burckhardt, who leads research at Socket and has tracked TeamPCP for months. “They like to toot their own horn.” A dark-web site for the group, which links to “business contacts” likely used to carry out ransom negotiations, features Matrix-style cascading ones and zeros, a reggae fusion soundtrack, and the words “TEAMPCP: The Cats Hijacking Your Supply Chains.”

Before landing on its current strategy for supply chain attacks, TeamPCP emerged in late 2025 exploiting cloud misconfigurations and a vulnerability in the web app development tool Next.js to deploy a botnet for attacks like credential theft and cryptocurrency mining. The group’s reliance on worms emerged during this time with increasing success grabbing static credentials and authentication tokens to bore deeper into victims’ systems.

“It’s been like wildfire; it’s gone very fast,” says Nathaniel Quist, manager of the Cortex Cloud intelligence team at Palo Alto Networks. “They find credentials, personal access tokens, and then it’s just how far can one credential go. I think we will continue to see these techniques. Threat actors know they work, and they’re running with it.”

TeamPCP appears to be financially motivated and often deploys ransomware or data extortion campaigns against its targets, though it also appears willing to sell victims’ data to any buyer. In the most recent case of GitHub, for instance, it wrote on its BreachForums site that “this is not a ransom. We do not care about extorting GitHub, 1 buyer and we shred the data on our end.”

It added what appeared to be a veiled threat to GitHub, perhaps intended to coerce the company to pay: “It looks like our retirement is soon so if no buyer is found we will leak it free.”

The picture has become increasingly complex, Quist says, since TeamPCP began moving to a ransomware-as-a-service model in April by establishing partnerships with the cybercriminal platforms BreachForums and DragonForce. The group has also, at times, seemed to wade into geopolitics, deploying a geographically targeted wiper (dubbed CanisterWorm by researchers) that targeted any Kubernetes cloud infrastructure with malware but only deployed a destructive wiper against Iranian targets. This week, an entity claiming to be TeamPCP also leaked the original Shai Hulud worm source code along with detailed documentation, though its motivations for that leak aren’t clear.

The scale of TeamPCP’s targeting expanded dramatically in March as it hacked more software utilities, leading to its more recent cascading effect of supply chain attacks. The group embedded an infostealer in the open source security scanner Trivy and then used stolen credentials from this attack to compromise certain versions of the AI application programming interface tool LiteLLM hosted on the popular Python software repository PyPI. The group also tainted infrastructure from the web application security firm Checkmarx, hit the development server pgserve, and compromised the web app library TanStack as well as the enterprise AI platform Mistral AI.

The fallout has been severe. In addition to GitHub, TeamPCP attacks on software service providers have led to breaches of the European Commission’s public website and the data contracting firm Mercor, compromise of two employees’ devices at OpenAI and many other incidents. But Palo Alto’s Quist emphasizes that organizations can protect themselves to a degree through security “hygiene” practices that carefully manage authentication tokens and impose access restrictions wherever possible.

“The biggest opportunistic thing that’s making this operation successful is long-lived credentials in these environments,” he says. “It’s vitally important to change your tokens even if you’re not using LiteLLM or any of these packages that have been compromised. If you have Gitlab and GitHub personal access tokens, rotate them. And AWS, Azure, GCP, Alibab, Oracle all of these credentials are being taken.”

TeamPCP’s tidal waves of tainted code also raise hard questions about how to safely use open source software in an era of mounting supply chain attacks. Wiz’s Read recommends safeguards such as “age-gating” updates to open source tools—vetting and installing security updates but otherwise holding off on immediate updates to code that’s been newly published and may be malicious.

In the case of one recent malicious TeamPCP update, Read says Wiz detected the supply chain compromise and warned customers within minutes, but many of the software’s users had auto-updates enabled and had already downloaded it. “You don’t want to just install the freshest version all the time,” Read says.

Amid an epidemic of supply chain attacks like the ones TeamPCP has unleashed, Socket’s Burckhardt says open-source users will need to take trust-but-verify measures, like analyzing updates for malware before rolling them out across a network, as well as the kind of “cool-down” period that Read recommends before downloading and running code.

“At the point it hits your machine,” Burckhardt says, “it’s already too late.”

This story originally appeared at WIRED.com.

DEVOURED

GitHub internal repositories exfiltrated via malicious VS Code extension

DevOps securitysupply-chainide ITPro

GitHub confirmed a breach where 3,800 internal repositories were exfiltrated after a developer installed a malicious VS Code extension.

What: GitHub's CISO, Alexis Wales, stated that roughly 3,800 internal GitHub repositories were accessed, consistent with claims by the TeamPCP hacker group, after a developer installed a compromised Visual Studio Code extension. While no evidence of impact to customer repositories exists, some internal repos contained customer support excerpts.

Why it matters: This incident underscores the escalating threat of supply chain attacks targeting developer tooling, demonstrating how attackers exploit trusted environments like IDE extensions to access sensitive corporate data.

Takeaway: Implement strict governance and vetting for VS Code extensions, regularly rotate critical secrets, and deploy endpoint detection and response tools to monitor for suspicious activity from developer workstations.

Deep dive

GitHub confirmed approximately 3,800 internal repositories were exfiltrated due to a malicious VS Code extension installed by an employee.
The incident aligns with claims from the TeamPCP hacker group, known for supply chain attacks involving CI/CD credentials.
GitHub CISO Alexis Wales stated there is no evidence of impact to external customer repositories, but some internal data included customer support interactions.
GitHub immediately began rotating critical secrets, prioritizing high-impact credentials, and is conducting a full investigation.
This breach highlights the growing risk in the software supply chain, where malicious developer tools are used as an entry vector.
The article mentions a separate, swift-response incident where the Nx Console VS Code extension (2.2 million installs) was briefly backdoored, collecting credentials silently.
Experts, like Sonatype's Ilkka Turunen, emphasize that developers are now permanent targets, and "minimum package and extension ages" could help protect against such attacks.

Decoder

Software supply chain attack: A cyberattack that targets vulnerabilities in the software development process, often by compromising third-party components or tools used by developers.
VS Code extension: A program that extends the functionality of Microsoft's Visual Studio Code integrated development environment.

Original article

GitHub has confirmed that around 3,800 internal repositories have been breached, after a developer unwittingly installed a malicious VS Code extension.

The Microsoft-owned code repository and DevOps platform said the breach was detected on Monday, but that the activity involved exfiltration of GitHub-internal repositories only.

"We have no evidence of impact to customer information stored outside of GitHub's internal repositories, such as our customers' own enterprises, organizations, and repositories," said the firm's chief information security officer, Alexis Wales.

"Some of GitHub's internal repositories contain information from customers, for example, excerpts of support interactions. If any impact is discovered, we will notify customers via established incident response and notification channels."

GitHub said it started rotating critical secrets as soon as it discovered the breach, with the highest-impact credentials prioritized first. It is now analyzing logs, validating secret rotation, and monitoring its infrastructure for any follow-on activity, it said, promising a fuller report once it's finished its investigation.

GitHub hasn't explicitly named the attacker, but made reference to a claim by the TeamPCP hacker group that it had accessed around 3,800 repositories, saying that the number was consistent with its investigation so far.

TeamPCP, which first appeared late last year, is the group linked to the Mini Shai-Hulud worm, and carries out supply chain attacks by stealing CI/CD credentials and using them to publish infected versions of further packages.

The group has reportedly not asked for a ransom for the GitHub data, but is offering the stolen data for sale for $50,000, saying that if it doesn't receive an offer, it will leak it for free.

"This is another reminder that developers are now permanent targets in software supply chain attacks. TeamPCP has shown how a motivated attacker can move through the tools developers trust every day – open source packages, extensions, accounts, and credentials – rather than trying to break in through the front door," said Ilkka Turunen, Field CTO at Sonatype.

"Combined with the acceleration we're already seeing from AI-assisted vulnerability discovery, the window between compromise and exploitation is collapsing. The old assumption was that defenders would have time to identify, prioritize, and respond. That margin is disappearing."

The news came just a day after the Nx Console VS Code extension, which has 2.2 million installs, was briefly backdoored, with the malicious version collecting credentials silently when a developer opened a workspace. The issue was handled swiftly, with the extension pulled within 18 minutes on the VS Code Marketplace and 36 minutes on Open VSX.

"The community's ability to catch and remove malicious packages is real. For extensions with millions of installs, it's also insufficient," commented Shaun Brown technical product marketer at Aikido Security.

"Caught in 18 minutes and prevented exposure are not the same thing. Minimum package and extension ages are the best way to protect your devices from similar attacks today."

DEVOURED

Designing end-to-end ingress request tracing for multi-tenant SaaS platforms

DevOps observabilitycloudmicroservices CNCF

The CNCF released a framework for end-to-end ingress request tracing in multi-tenant SaaS, emphasizing trace IDs and span IDs to diagnose microservice failures.

What: Mridula Chilakamarri of the CNCF Technical Advisory Group published a framework for distributed tracing in multi-tenant SaaS, using a Trace ID to link all operations of a customer request across services and Span IDs for individual units of work within that trace. The framework treats tracing as a core platform capability, ensuring trace data excludes sensitive information and that tracing failures do not disrupt customer requests.

Why it matters: This framework addresses the common problem of disconnected logs in microservice architectures, providing a structured approach to gain full request-level visibility, which is critical for rapid troubleshooting and maintaining reliability in complex cloud-native systems.

Takeaway: If you manage a multi-tenant SaaS platform, review this CNCF framework to assess and improve your distributed tracing implementation, especially regarding trace ID propagation, security guardrails, and organizational adoption strategies.

Deep dive

The Cloud Native Computing Foundation (CNCF) published a framework for end-to-end distributed tracing specifically designed for multi-tenant SaaS platforms.
The core of the framework relies on two identifiers: a "Trace ID" which groups all work for a single customer request across services, and "Span IDs" which identify individual operations within that trace.
Tracing is treated as a first-class platform capability, not an optional tool, with clear acceptance criteria for observable system outcomes.
Key design principles include: generating a Trace ID at the ingress layer if not present, consistent context propagation across synchronous and asynchronous calls, and creating parent-child relationships between spans.
Security is paramount, with trace data explicitly excluding sensitive information (payloads, credentials, PII) by design.
Telemetry export is configuration-only, decoupling it from application code changes and release cycles.
Tracing must have non-disruptive failure modes, meaning customer requests complete successfully even if telemetry backends are unavailable.
The framework leverages industry standards like OpenTelemetry and W3C Trace Context, applicable to Kubernetes environments.
Organizational challenges, like ensuring complete coverage and consistent adoption across all service teams, are highlighted as more difficult than technical implementation.

Decoder

Distributed tracing: A method used to monitor requests as they flow through complex microservice architectures, providing a complete view of the request's journey and performance across multiple services.
Trace ID: A unique identifier that links together all the individual operations (spans) related to a single user request across a distributed system.
Span ID: A unique identifier for a single operation or unit of work within a trace, showing the duration and details of that specific step.
Multi-tenant SaaS: A software-as-a-service model where a single instance of the software serves multiple customers (tenants), but each tenant's data is isolated.
OpenTelemetry: A set of open-source tools, APIs, and SDKs used to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help analyze software performance and behavior.
W3C Trace Context: A World Wide Web Consortium standard that defines HTTP headers to propagate context information across services in a distributed trace.

Original article

Modern SaaS platforms built on cloud‑native architectures frequently consist of dozens of independently deployed microservices. A single customer request entering the platform at the ingress layer may traverse authentication services, orchestration engines, data services, and downstream integrations before completing. When failures or performance regressions occur, platform operators must answer a fundamental question: what happened to this specific request, and where?

In many environments, answering this question remains difficult. Although services emit logs and metrics, these signals are disconnected. Telemetry is produced independently by each service without a shared request context, making it difficult to correlate failures, retries, or latency spikes into an end‑to‑end narrative.

This article presents a product‑led framework for designing ingress request tracing in multi‑tenant SaaS platforms. The focus is on design principles and observable system behavior, not implementation code. The framework builds on industry standards such as OpenTelemetry and W3C Trace Context and is applicable to Kubernetes‑based environments.

The observability problem

Without end‑to‑end tracing, ingress requests cannot be reliably followed as they traverse downstream services. Failures appear as isolated events. Latency regressions are visible only in aggregate metrics. Multi‑service workflows and intermittent issues are especially difficult to diagnose.

Operational teams compensate by manually correlating logs using timestamps, heuristics, and partial identifiers. This approach does not scale with service growth and results in slower diagnosis, higher cognitive load during incidents, and reduced confidence in root cause analysis.

The core challenge is not insufficient telemetry, but the lack of consistent request‑level context linking all operations together.

A product-led framework for ingress request tracing

This framework treats distributed tracing as a first‑class platform capability rather than a service‑level implementation choice. At its core are two complementary identifiers: a Trace ID that groups all work for a single customer request, and Span IDs that identify individual units of work (such as a service call or database query) within that trace.

Every ingress request must have an associated trace identifier. If an incoming request does not contain a trace ID, the ingress layer generates one. If a valid trace ID is already present, it is preserved.

1. Trace ID and span ID generation and preservation

Each service processing the request creates its own span and assigns a unique span ID to that unit of work. When the service makes a downstream call, it passes both the trace ID (unchanged) and its span ID (which becomes the parent span ID for the next service). This creates a parent‑child relationship that allows the observability platform to reconstruct the exact sequence and hierarchy of all operations.

This generate‑or‑preserve rule ensures interoperability with upstream systems while maintaining trace continuity within the platform. Both the trace ID and current span ID are attached to the request context and included in response headers so they can be used as deterministic lookup keys during investigations.

A flow chart showing the End-to-End Trace ID and Span ID Propagation. In the diagram above, a single Trace ID flows unchanged through all services (auth, orchestration, data layer), representing the customer's complete request. Each service creates its own Span ID; when Service A calls Service B, it passes both the Trace ID and its own Span ID (which Service B records as its parent). This hierarchy allows operators to see not just that a request failed, but exactly which service and at which point in the sequence.

Figure 1: End-to-End trace ID and span ID propagation

In the diagram above, a single Trace ID flows unchanged through all services (auth, orchestration, data layer), representing the customer’s complete request. Each service creates its own Span ID; when Service A calls Service B, it passes both the Trace ID and its own Span ID (which Service B records as its parent). This hierarchy allows operators to see not just that a request failed, but exactly which service and at which point in the sequence.

2. Consistent context propagation

All synchronous service‑to‑service calls reuse the same trace ID. Each service creates a new span ID for its own work. Retry operations preserve the original trace ID but may create additional span IDs for each retry attempt, allowing the observability platform to distinguish between the original call and subsequent attempts while keeping them grouped under the same trace.

Where asynchronous processing exists, trace context (both trace ID and parent span ID) is propagated via message metadata to prevent observability gaps as workflows evolve.

3. Security-First Trace Metadata

Trace data is limited to operational metadata only: trace ID, span ID, parent span ID, service name, operation name, timestamps, duration, and execution status.

Request payloads, credentials, secrets, tokens, and personally identifiable information are explicitly excluded by design. Treating data exclusion as a design constraint simplifies security reviews and reduces long‑term compliance risk.

4. Configuration-Only Telemetry Export

Trace export is managed entirely via Kubernetes configuration. Operators can configure exporters, credentials, and routing parameters without application code changes.

This decouples tracing operations from release cycles and allows teams to evolve observability using existing SRE workflows.

5. Non-Disruptive Failure Modes

Tracing must never block request processing. If telemetry backends are unavailable or misconfigured, requests complete successfully. Trace data may be buffered or dropped, but customer experience is unaffected.

Partial traces are acceptable. Failed requests are not.

Acceptance criteria as executable contracts

Clear acceptance criteria define observable system outcomes, not implementation details. In this framework, acceptance criteria act as executable contracts between product management and engineering. Each criterion maps to a specific requirement and is independently testable.

AC ID	Observable Behavior	Requirement Area
AC-001	Every ingress request includes a globally unique trace ID in response headers. Trace IDs already present in incoming requests are preserved and propagated unchanged.	Trace ID Generation & Preservation
AC-002	All platform services processing an ingress request create their own span with a unique span ID. Parent‑child relationships are established through parent span IDs. Retry operations preserve the original trace ID.	Span Creation & Hierarchy
AC-003	Each platform service captures trace-level execution data including trace ID, span ID, parent span ID, service name, operation name, timestamps, duration, status, and HTTP response code.	Trace Data Capture
AC-004	SREs can query traces using a trace ID as a primary lookup key in observability platforms and view the complete execution path with service-to-service relationships via span hierarchies.	Trace Queryability
AC-005	SREs can configure trace export destinations via Kubernetes configuration files without application code changes. Multiple backends and tenant-specific routing are supported.	Config-Only Export
AC-006	Traces exported to observability platforms are visualizable with end-to-end trace views, service dependency graphs, span hierarchies, and latency breakdowns per service and span.	Platform Visualization
AC-007	Tracing does not block or fail requests when the telemetry backend is unavailable. Trace data excludes sensitive payload information, credentials, and PII by design.	Non-Disruptive & Secure

These criteria prevent partial adoption, reduce ambiguity during implementation, and provide a stable basis for regression validation as the platform evolves.

A flow chart image of Acceptance Criteria as Executable Contracts

Quantifying business value

Infrastructure initiatives frequently fail because they cannot articulate business value beyond engineering. The value proposition for this type of initiative should be constructed around measurable operational dimensions:

Value Dimension	Quantified Impact
Root Cause Identification	Shift from heuristic-based to deterministic tracing via trace and span hierarchies; elimination of manual log correlation
Operational Scalability	Observability scales linearly with service count rather than degrading with complexity; span‑level granularity enables micro-service level diagnostics

Understanding trace and span context

The W3C Trace Context standard defines how trace information propagates across services. It specifies two HTTP headers: traceparent carries the essential identifiers, and tracestate carries vendor-specific metadata. The traceparent header format is version‑trace‑id‑span‑id‑flags (for example, 00‑abc123‑def456‑01).

Trace ID: Globally unique identifier that groups all spans belonging to a single customer request. Unchanged as the request flows through all services. Enables support teams to look up the entire request path.

Span ID: Unique identifier for a single unit of work (e.g., API call, database query). Each service creates its own span ID. When making downstream calls, the current span ID becomes the parent span ID for the next service, establishing a parent‑child relationship.

Parent Span ID: The span ID of the calling service. Used to reconstruct the sequence and hierarchy of operations. Allows the observability platform to display which service called which service and in what order.

Together, trace ID and span hierarchy enable operators to ask not just ‘did this request fail’ but ‘exactly where in the sequence did it fail, and what was the sequence of calls that led to that point.’

Operational impact

Ingress request tracing shifts troubleshooting from inference to direct observation. Engineers can follow individual requests across services instead of reconstructing behavior from disconnected signals. With trace and span IDs, the entire execution path is visible: which services were called, in what order, and how much time each spent.

The qualitative benefits are immediate and significant: faster localization of failures through trace ID lookup and span hierarchy analysis, clearer cross‑team communication using shared trace references instead of symptom descriptions, reduced cognitive load during incidents as SREs observe the exact sequence rather than hypothesize, and proactive performance management through per‑service and per‑span latency decomposition.

For small SRE teams supporting complex platforms, these improvements are transformative. A single SRE with a trace can achieve what previously required a cross‑team war room.

The hardest part Is not technical

The most underestimated challenge in any tracing initiative is organizational, not technical. A distributed tracing system is only as complete as its coverage. If three out of eight services in a request path propagate trace context and five do not, the result is a trace with large gaps that is operationally unreliable. Worse, broken span‑parent relationships make the hierarchy useless.

The solution combines technical enforcement with organizational process: automated CI/CD checks that reject deployments without trace instrumentation and proper span creation, a documented onboarding checklist for every service team, and sustained adoption tracking until 100% propagation is achieved. Without this sustained attention, adoption stalls at the teams that opt in voluntarily, leaving critical gaps in exactly the services where tracing is most needed.

Replicating this framework

This framework is designed to be replicable across any multi‑service SaaS platform running on container orchestration infrastructure. The design principles—generate or preserve trace IDs, create unique span IDs per service with parent‑child relationships, capture only operational metadata including span IDs, export through configurable backends, and degrade gracefully—are architecture‑agnostic and applicable regardless of the specific microservices framework, programming languages, or observability backend in use.

Organizations considering adoption should pay particular attention to two areas: failure mode design (ensuring tracing cannot cause outages) and organizational adoption strategy (ensuring complete service coverage through both technical enforcement and process). These are the most common points of failure in distributed tracing deployments and the areas where published guidance is most sparse.

Natural extensions include expanding to asynchronous message‑based workflows, implementing intelligent sampling strategies, correlating trace and span data with infrastructure‑level signals, and ultimately leveraging historical span patterns for predictive operations.

Conclusion

Distributed tracing is foundational to operating cloud‑native platforms at scale, but tooling alone is insufficient. By treating tracing as a product capability with clear guarantees, acceptance criteria as executable contracts, and failure‑mode discipline, platforms can deliver reliable request‑level visibility without compromising security or availability.

The gap in our industry is not in tracing tools—OpenTelemetry, Jaeger, Zipkin, and commercial platforms have solved the instrumentation and visualization layers. The gap is in the product and operational decisions required to deploy tracing successfully: how to scope it, how to secure it, how to make it operator‑friendly, how to ensure complete adoption, how to establish span hierarchies that reveal the true sequence of operations, and how to measure its impact. That is the gap this framework addresses.

DEVOURED

Migrating from Go to Rust

DevOps careerbackendrustgo Matthias Endler

A guide for Go teams migrating to Rust highlights Rust's stronger compile-time guarantees, like memory safety and explicit error handling, as a trade-off for its steeper learning curve and slower compile times.

What: Matthias Endler’s guide on migrating from Go to Rust focuses on backend services, comparing Go’s garbage collection and `if err != nil` convention with Rust’s ownership model, `Result` for error handling, and `Option` for null safety. It notes Rust shifts correctness checks into the type system, preventing runtime panics and data races that Go might miss.

Why it matters: This comparison reveals a fundamental philosophical difference: Go prioritizes shipping speed and operational simplicity, while Rust prioritizes correctness and robustness by enforcing stricter guarantees at compile time, leading to fewer production incidents.

Takeaway: If considering a Go to Rust migration, focus initial efforts on a clear-boundary service, maintain the same API contract, and invest heavily in team training to overcome the borrow checker's learning curve.

Deep dive

The guide by Matthias Endler is specifically for Go teams considering migrating backend services to Rust.
It notes that Go and Rust both offer static typing and strong concurrency, but diverge on compiler guarantees and runtime control.
Rust enforces memory management, data-race prevention, and error handling through its type system (ownership, Send/Sync, Result, Option), whereas Go relies on runtime checks and conventions.
Key pain points in Go that drive migration include verbose error handling (if err != nil), nil pointer panics, and runtime data races (go test -race isn't exhaustive).
Rust's Option and Result types force explicit handling of absence and errors, eliminating entire categories of runtime bugs.
Rust's monomorphized generics offer zero-cost abstractions, unlike Go's generics which can have performance implications and feel "tacked on."
Go's garbage collector, while excellent, can cause P99 latency spikes under heavy memory pressure, a non-issue for Rust.
The "borrow checker" is highlighted as the primary challenge for Go developers moving to Rust, enforcing memory safety and aliasing rules at compile time.
Compile times for Rust are generally longer than Go's, but incremental builds and cargo check are efficient.
Go's "function coloring" (lack of explicit async/await) is an ergonomic advantage, which Rust's explicit async model loses.
Recommended migration strategies include carving off "hot path" services, replacing sidecar/worker processes, or using a strangler pattern behind an API gateway, rather than full rewrites.
Rust typically offers 20-40% CPU improvement and 30-50% memory reduction over Go, along with flatter P99 latency.
The author also notes that Go remains excellent for Kubernetes tooling, CLI utilities, and simple glue services where velocity outweighs absolute correctness.

Decoder

Monomorphization: A compilation technique where generic code is specialized for each specific type it's used with, resulting in unique machine code for each instantiation and no runtime overhead.
Borrow checker: A component of the Rust compiler that enforces strict rules about how references (borrows) to data can be used, ensuring memory safety and preventing data races at compile time.
Nil pointer panic: A runtime error in Go (and other languages) that occurs when a program attempts to dereference a pointer that has a null value, leading to a program crash.
Data race: A concurrency bug that occurs when two or more threads or goroutines access the same memory location concurrently, at least one of the accesses is a write, and there is no synchronization to control the order of accesses.
P99 latency: The 99th percentile of response times, meaning 99% of requests are processed within this latency or faster, indicating the performance for the vast majority of users.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Is your SIEM actually ready? A new way to find out

DevOps securitysiemdata Elastic

Elastic Security 9.4 introduces "SIEM Readiness," a new feature providing a centralized, automated view of SIEM operational health, evaluating log coverage, data quality, and retention across key telemetry domains.

What: Elastic's SIEM Readiness, launched in technical preview with Elastic Security 9.4, offers a continuous, environment-aware assessment of a Security Information and Event Management (SIEM) system's health. It checks four dimensionsCoverage (do rules have data?), Quality (is data ECS-compatible?), Continuity (are pipelines failing?), and Retention (is data available for audits?)across five telemetry categories like Endpoint/Host and Cloud, replacing error-prone manual tracking.

Why it matters: This feature addresses a critical, common pain point for security operations centers (SOCs) by automating the assessment of SIEM effectiveness and data integrity, making it easier to ensure readiness for threat detection and compliance without relying on manual spreadsheets or fragmented tooling.

Takeaway: If you use Elastic Security, explore the SIEM Readiness technical preview in version 9.4 to gain automated insights into your SIEM's operational health and identify data or detection gaps.

Deep dive

SIEM Readiness is a new capability in Elastic Security, available in technical preview as of version 9.4.* It aims to provide a centralized, continuously updated, and actionable view of SIEM operational health.* The initial focus is on "Visibility Health," which assesses whether the underlying data is present, correct, flowing, and retained.* It organizes the view around five core telemetry domains: Endpoint/Host, Identity, Network, Cloud, and Application/SaaS.* Four key dimensions are evaluated: Coverage, Quality, Continuity, and Retention.* Coverage: Checks if enabled detection rules have the required data sources, and assesses overall coverage against baselines like MITRE ATT&CK, NIST CSF, and CIS benchmarks, tailored to the environment.* Quality: Flags ECS incompatibilities in data that could cause rules or dashboards to fail silently.* Continuity: Monitors pipeline failure rates, flagging anything above a 1% threshold.* Retention: Evaluates retention policies against industry benchmarks (FedRAMP, NIST 800-53, SOC 2, ISO 27001) across hot, warm, and cold storage.* The feature is designed for action, with every signal tied to a concrete next step (onboard data, fix pipeline, adjust policy, create case).* It's environment-aware, excluding categories that don't apply, and telemetry-driven, inferring the environment from the data rather than requiring manual configuration.* Elastic plans to extend SIEM Readiness to "Detection Readiness" (are rules effective?) and "Response Readiness" (are workflows operational?).

Decoder

SIEM (Security Information and Event Management): A software solution that aggregates and analyzes security alerts and logs from various sources across an organization's IT infrastructure to provide a centralized view of security events and help detect threats.* Elastic Security: Elastic's platform for security operations, which includes SIEM capabilities.* ECS (Elastic Common Schema): An open source specification that defines a common set of fields for storing event data in Elasticsearch, making it easier to analyze data from disparate sources consistently.* MITRE ATT&CK: A globally accessible knowledge base of adversary tactics and techniques based on real-world observations, used as a foundation for the development of specific threat models and methodologies.* NIST CSF (Cybersecurity Framework): A set of guidelines for private sector organizations to improve their cybersecurity posture, developed by the U.S. National Institute of Standards and Technology.* CIS benchmarks: A set of configuration guidelines for securely configuring operating systems, servers, applications, and network devices, developed by the Center for Internet Security.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

The 58-Million-Key Freeze: What a HashMap Resize Taught Us About Memory Allocation at Scale

Data performancerustdevopsinfrastructure LinkedIn Engineering

LinkedIn's Rust-based FishDB service froze for 10-15 seconds due to a HashMap resizing at 58.7 million keys, acquiring a process-wide mmap_lock and blocking all other threads.

What: LinkedIn's FishDB, a Rust-based retrieval engine using jemalloc and Tokio, experienced 10-15 second freezes. The root cause was a standard library HashMap resizing at 58,720,256 keys, which triggered a 3.5 GB mmap allocation, acquiring the Linux kernel's process-wide mmap_lock in write mode, blocking madvise and page faults.

Why it matters: This incident highlights how fundamental data structure behaviors and underlying OS memory management (specifically the mmap_lock) can cause catastrophic, silent failures in high-scale, multithreaded applications, even with modern languages and allocators.

Takeaway: For high-scale Rust services, pre-allocate HashMap capacity with HashMap::with_capacity() if the expected size is large and known, to avoid unexpected reallocations and mmap_lock contention.

Deep dive

LinkedIn's FishDB service, a Rust application using jemalloc and Tokio, experienced recurring 10-15 second freezes, breaching availability SLOs.
The problem was elusive: ephemeral, silent (no logs), sporadic, and without obvious external triggers.
Correlation with RSS spikes led to suspicion of memory allocation issues.
Traditional CPU profiling was ineffective as threads were blocked (off-CPU).
An automated eBPF-based off-CPU profiling script was deployed to capture kernel stack traces during freezes.
Off-CPU profiles revealed threads blocked on rwsem_down_write_slowpath (write lock for mmap), rwsem_down_read_slowpath (read lock for madvise and page faults).
This pointed to contention on the Linux kernel's process-wide mmap_lock (VMA semaphore), which protects virtual memory area data structures.
A large mmap allocation (requiring a write lock) blocked all other threads needing mmap_lock in read mode.
The HashMap pkey_vs_docref (document reference index) was found to be the culprit. It held 56-59 million entries.
At exactly 58,720,256 keys, the HashMap capacity doubled from ~1.75 GB to ~3.5 GB, requiring both buffers to coexist (total ~5.25GB, leading to observed ~4GB RSS spike).
The fix involved pre-allocating the HashMap with HashMap::with_capacity(base_index_size
1. to a sufficient size at startup, avoiding dynamic resizing.
This prevented the mmap_lock contention and eliminated freezes.

Decoder

mmap_lock: A process-wide read-write semaphore in the Linux kernel that protects the virtual memory area (VMA) data structures. Operations modifying the virtual address space (like large memory allocations or deallocations) require this lock, causing contention if held for too long.
eBPF (extended Berkeley Packet Filter): A Linux kernel technology that allows programs to run in a sandboxed environment within the kernel, enabling powerful, flexible, and safe kernel-level tracing and profiling without modifying kernel source code.
jemalloc: A general-purpose memory allocator that emphasizes fragmentation avoidance and scalable concurrency. It is used by many large-scale applications.
Tokio: An asynchronous runtime for the Rust programming language, providing the necessary tools to build network applications and services.
madvise: A system call that advises the kernel on how to handle a process's memory regions. MADV_DONTNEED is often used to tell the kernel that memory pages are no longer needed and can be reclaimed.
RSS (Resident Set Size): The portion of a process's memory that is held in RAM (not swapped out).

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Plan Mode All the Time, Substrait over SQL, and the End of the DE Role ft

Data aicareerllmagents MotherDuck Blog

Chris Riccomini argues AI agents, when used with "plan mode" and declarative workflows, are already capable of most data engineering, suggesting a future where data engineers become general "data" roles and LLMs prefer formats like Substrait over SQL.

What: Chris Riccomini, author of "The Missing README" and co-author of "Designing Data-Intensive Applications", believes AI will handle most data engineering, emphasizing "plan mode" and strong quality gates for managing LLM non-determinism. He suggests LLMs should use Substrait, which expresses physical operations, rather than SQL, due to better optimization and fewer hallucinations.

Why it matters: This article provides a provocative view on how AI might reshape the data engineering profession and its tools, pushing for more declarative approaches and a shift towards agent-centric ergonomics in language and tooling design.

Takeaway: Experiment with detailed "plan mode" prompting and robust external testing frameworks when using LLMs for code generation, especially for critical data tasks, to manage non-determinism and ensure correctness.

Deep dive

Chris Riccomini believes AI can handle the majority of data engineering work, especially with declarative workflows and strong quality gates.
For financial data, correctness is maintained by defining invariants and using traditional verification tools, as well as pairing AI with human review for bug spotting.
He advocates for LLMs to "speak" Substrait, a format representing physical data transformations (e.g., hash join vs. merge join), rather than SQL.
Substrait could lead to fewer LLM hallucinations and allow for client-side query optimization.
To make AI output more reliable, Chris recommends "plan mode all the time," where LLMs iterate extensively on a plan before implementation.
Managing context by starting with fresh LLM contexts or using "Ralph Loops" (iterative autonomous AI development with external tests) can improve reliability.
Implementing strong quality gates (defining, measuring, enforcing quality) is crucial, like enforcing test coverage with commit hooks.
Non-determinism from LLMs can be mitigated by moving to incremental data loads, reducing the scope of potential errors.
Security concerns for AI agents are high, with a need for "Okta for Agents" (identity/access management) to manage skills, marketplaces, lineage, and RBAC/ABAC.
Agents are already good at inspecting failed workflows, running SQL queries, and writing Python, and could automate much of the "grunt work" of data engineering.
The "data engineer" role may merge into a broader "data" role encompassing engineering, ML, and analysis, as tools become more agent-friendly.
The choice of programming language may shift from human ergonomics to "agent ergonomics," favoring languages that lead to faster, cheaper, and more stable LLM output (e.g., Go over Python due to token cost/code size).
He suggests that while AI might reduce some rote learning, it enables tackling more complex projects and learning about new domains (like FFI bindings) that would otherwise be too time-consuming.

Decoder

Substrait: An emerging open standard that provides a cross-language serialization format for relational algebra expressions. It can represent both logical and physical query plans, allowing for more precise data transformation instructions than pure SQL.
Plan Mode: An approach to interacting with AI where the LLM is guided to first generate a detailed, iterative plan for a task, which is then refined and approved by a human, before the LLM proceeds with implementation.
Ralph Loop: An iterative, autonomous AI development technique where a bash loop (or similar mechanism) repeatedly prompts an AI agent with the same goal, forcing it to persistently iterate and fix errors until external tests pass.
LLM (Large Language Model): A type of artificial intelligence program designed to understand and generate human-like text, often trained on vast amounts of text data.
Declarative Workflows: A programming paradigm where you describe what you want the program to achieve, rather than how to achieve it (as in imperative programming). This allows the underlying system to determine the best execution strategy.
Agent Ergonomics: The idea of designing programming languages, tools, and workflows to be optimized for AI agents to use, rather than primarily for human developers.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

pg_infer 1.0.0 released -- transformer model knowledge as SQL relations

Data databaseaibackendopensourcepostgresql PostgreSQL

pg_infer 1.0.0 is a new PostgreSQL 18+ extension that exposes small transformer model internals as SQL-queryable relations, enabling efficient, costed, and parallelized inference directly within the database.

What: Greg Burd released pg_infer 1.0.0, a PostgreSQL 18+ extension that allows users to query transformer model internals (gate activations, feature labels, learned associations, embeddings) as SQL relations and index data. It supports CPU-efficient inference, including Microsoft BitNet b1.58 models, and can offload inference to PostgreSQL replicas, leveraging existing idle hardware.

Why it matters: This extension fundamentally changes how small LLMs can be integrated with data, moving inference from an external service to a first-class, queryable database operation, which aligns model behavior with traditional relational data management principles.

Takeaway: If you use PostgreSQL 18+ and need to integrate small language models, explore pg_infer to leverage existing database infrastructure for inference and gain SQL-level visibility into model knowledge.

Deep dive

pg_infer 1.0.0 is a PostgreSQL 18+ extension that exposes transformer model internals as SQL-queryable relations.
It treats the model as a first-class data source, allowing the PostgreSQL planner to cost, schedule, and parallelize inference as an operator within a query plan.
The extension provides functions like describe(entity) to get learned relations, walk(prompt) for per-layer activations, and implies(a, b) for directional support.
It includes a custom index access method supporting ORDER BY <~> for model-aware document ranking without pre-computed embeddings.
Unlike pgvector or RAG-style integrations, pg_infer stores the model itself in WAL-logged 8KB pages, enabling full database backup, replication, and point-in-time recovery for model state.
Optimized for CPU execution using BLAS (OpenBLAS) and f16 gate vectors, it specifically supports Microsoft BitNet b1.58 models, which are efficient on commodity CPUs.
A remote backend (larql-server) allows offloading inference to idle PostgreSQL replica hosts, utilizing existing hardware capacity.
The project is based on Chris Hayuk's LARQL project, which pioneered the idea of queryable transformer internals.

Decoder

Transformer model: A deep learning model architecture, particularly effective for processing sequential data like natural language, known for its attention mechanism.
Gate activations: Internal numerical values within a neural network layer that determine the flow of information.
Feature labels: Metadata associated with learned features in a model.
Embeddings: Numerical representations of concepts, words, or entities in a continuous vector space.
BitNet b1.58: A family of "two-bit / 1.58-bit" ternary-weight transformer models developed by Microsoft, designed for high quality on commodity CPUs with dramatically lower memory and power costs.
vindex: A format for extracting and storing transformer model knowledge (gate vectors, feature activations, learned associations) developed by the LARQL project.
WAL-logged pages: Data stored in PostgreSQL's Write-Ahead Log, ensuring durability and recoverability.
Index Access Method (AM): A PostgreSQL mechanism that defines how a specific type of index is stored and accessed.
BLAS (Basic Linear Algebra Subprograms): A specification that defines a set of low-level routines for common linear algebra operations, optimized for performance.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Same buffers, same instructions, same hardware. Where Is the JVM Tax?

Data backendperformancejava Semyon Sinchenko

Semyon Sinchenko's benchmarks show modern Java running vectorized arithmetic kernels over Apache Arrow buffers delivers performance comparable to native arrow-rs, challenging the "JVM tax" narrative for analytical workloads.

What: Semyon Sinchenko conducted toy benchmarks comparing Java implementations of simple vectorized arithmetic kernels (addInt32, mulFloat64) over Apache Arrow buffers using MemorySegment and JDK Vector API against native arrow-rs. Running on a ThinkPad with Intel i5, the results for MulFloat64 showed similar performance, often within 10-25% of each other, suggesting no inherent "JVM tax" for raw compute on columnar memory.

Why it matters: This experiment pushes back against a common generalization in big data, clarifying that perceived performance penalties often stem from specific architectural choices (like Spark's scheduler or object-per-value data planes) rather than an intrinsic limitation of the JVM itself, especially with modern Java features and columnar memory layouts.

Takeaway: When designing high-performance data pipelines in Java, prioritize columnar memory layouts (like Apache Arrow) and leverage modern JVM features (like MemorySegment and Vector API) to avoid common "JVM tax" pitfalls.

Deep dive

Semyon Sinchenko benchmarked Java vs. native performance for simple vectorized arithmetic kernels over Apache Arrow buffers.
The Java implementation used the official Apache Arrow Java SDK (16.1.0), JDK 25.0.3-temurin, java.lang.foreign.MemorySegment, and the JDK Vector API.
The native reference used arrow-rs (56) and the Criterion benchmark harness.
The hardware was a 13th Gen Intel Core i5-1335U.
For MulFloat64, Java and native arrow-rs showed "roughly the same performance class," with ratios typically between 1.13x and 1.25x in favor of native or Java depending on dataset size.
Sinchenko attributes performance differences to cache effects, memory bandwidth, and CPU behavior rather than an inherent "JVM tax."
The AddInt32 benchmark showed a larger gap favoring Java, which Sinchenko attributes to a semantic mismatch (Java's wrapping arithmetic vs. arrow-rs's checked arithmetic preventing vectorization).
He argues that the "JVM tax" phrase often misattributes overheads from frameworks like Spark (scheduler, shuffle, spill) or object-per-value data models to the JVM itself.
The benchmark specifically uses Apache Arrow to ensure a columnar memory layout, avoiding the "object layout tax" or "GC-visible object graph tax."
Sinchenko highlights JVM benefits like dynamic code loading, cross-platform portability (e.g., Xeon to Graviton), automatic CPU dispatch, memory safety, and a unified operational surface for metrics and debugging.
The experiment is intentionally narrow, not addressing complexities like Decimals, Strings, Nested types, Hash aggregation/joins, Parquet I/O, or end-to-end query execution.

Decoder

JVM Tax: A pejorative term implying an inherent performance penalty associated with running code on the Java Virtual Machine compared to native execution.
Apache Arrow: A language-agnostic columnar memory format for in-memory data processing, designed for high-performance analytical workloads.
MemorySegment: A Java API from java.lang.foreign for accessing contiguous regions of memory outside the Java heap, enabling efficient interaction with native memory.
JDK Vector API: A Java API for performing vectorized computations (SIMD instructions) on arrays of primitive types, improving performance for data-parallel operations.
JMH (Java Microbenchmark Harness): A Java tool for building, running, and analyzing nano/micro/milli/macro benchmarks written in Java.
arrow-rs: The official Rust implementation of Apache Arrow.
Columnar memory layout: A data storage arrangement where all values for a single column are stored contiguously, optimizing for analytical queries.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

SAM 3: Segment Anything with Concepts (GitHub Repo)

Data aivisionresearchmachine-learning GitHub

Meta Superintelligence Labs released SAM 3, a new unified foundation model for image and video segmentation that significantly improves open-vocabulary concept segmentation from text or visual prompts, outperforming SAM 2.

What: SAM 3, from Meta Superintelligence Labs (Nicolas Carion et al.), is a new image and video segmentation model that expands on SAM 2's capabilities by enabling exhaustive segmentation and tracking of open-vocabulary concepts from short text phrases or visual exemplars. It achieves 75-80% of human performance on the new SA-Co benchmark (270K unique concepts), driven by a new data engine that annotated over 4 million concepts and a new architecture with a presence token and decoupled detector–tracker design.

Why it matters: This advancement signifies a major step towards more generalized and versatile AI vision systems, moving beyond predefined categories to understand and segment any described concept, which is crucial for real-world applications requiring nuanced object recognition in images and video.

Takeaway: Developers working on computer vision applications, particularly those involving open-vocabulary object segmentation or tracking in images and videos, should investigate SAM 3 and its new capabilities for enhanced performance and concept coverage.

Deep dive

SAM 3 is a new unified foundation model for promptable segmentation in images and videos, developed by Meta Superintelligence Labs.
It significantly improves upon its predecessor, SAM 2, by introducing the ability to exhaustively segment all instances of open-vocabulary concepts specified by text or visual prompts.
SAM 3 achieves 75-80% of human performance on the new SA-Co benchmark, which contains over 270,000 unique concepts.
This breakthrough is powered by an innovative data engine that automatically annotated over 4 million unique concepts for training.
The model features a new architecture with a "presence token" for better discrimination between similar text prompts and a decoupled detector–tracker design to minimize task interference.
SAM 3.1, released on March 27, 2026, introduces "Object Multiplex" for faster joint multi-object tracking.
Requires Python 3.12+ and PyTorch 2.7+ with CUDA 12.6+.
Access to model checkpoints needs to be requested on the SAM 3 Hugging Face repo.
The project provides examples for image and video segmentation, batched inference, and using SAM 3 as an agent.
Two new image benchmarks (SA-Co/Gold, SA-Co/Silver) and one video benchmark (SA-Co/VEval) are released.
The model has 848 million parameters and consists of a detector and a tracker sharing a vision encoder.

Decoder

Foundation model: A large AI model trained on a vast quantity of data that can be adapted to a wide range of downstream tasks.
Promptable segmentation: The ability of an image segmentation model to identify and segment objects in an image based on various prompts, such as text descriptions, points, bounding boxes, or masks.
Open-vocabulary concept segmentation: The ability to segment objects described by any arbitrary text phrase, rather than being limited to a predefined set of categories.
SA-Co benchmark: A new benchmark (Segment Anything with Concepts) introduced with SAM 3, containing 270,000 unique concepts for evaluating open-vocabulary segmentation performance.
Presence token: A new architectural component in SAM 3 designed to improve the model's ability to distinguish between closely related text prompts.
Decoupled detector–tracker design: An architectural approach where the object detection and object tracking components are designed to operate independently, reducing interference and improving scalability.
DETR (Detection Transformer): A transformer-based object detection model.
MLLM (Multi-modal Large Language Model): A large language model capable of processing and understanding multiple types of data, such as text and images.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Cloud Native Computing Foundation Announces OpenTelemetry's Graduation, Solidifying Status as the De Facto Observability Standard

Data observabilityopensourcecncfopentelemetry CNCF

OpenTelemetry has officially graduated from the CNCF, solidifying its status as the de facto vendor-neutral standard for collecting metrics, logs, and traces.

What: The Cloud Native Computing Foundation (CNCF) announced on May 21, 2026, that OpenTelemetry has graduated, signaling its production readiness for standardizing telemetry data. Formed in 2019 by merging OpenTracing and OpenCensus, the project boasts over 12,000 contributors from over 2,800 companies and saw its JavaScript API downloaded over 1.36 billion times and Python API over 1.3 billion times in the past year.

Why it matters: OpenTelemetry's graduation signifies a critical industry shift towards standardized, vendor-neutral observability, enabling organizations to switch analysis tools without re-instrumenting code. This reduces fragmentation and establishes a crucial foundation for monitoring increasingly complex systems, including AI workloads.

Takeaway: If your organization is not yet using OpenTelemetry for telemetry data collection, consider adopting it to future-proof your observability stack and avoid vendor lock-in.

Deep dive

OpenTelemetry, a merger of OpenTracing and OpenCensus formed in 2019, has graduated from the Cloud Native Computing Foundation (CNCF) on May 21, 2026.
Graduation signifies that OpenTelemetry is a stable, production-ready, and vendor-neutral open-source observability framework for collecting metrics, logs, and traces.
The project has seen immense growth, with over 12,000 contributors from over 2,800 companies and has the second-highest project velocity in the CNCF ecosystem, behind Kubernetes.
Widespread adoption includes major organizations like Alibaba, Anthropic, Bloomberg, Capital One, eBay, FICO Software, and Heroku.
Download numbers for its JavaScript API and Python API packages surpassed 1.36 billion and 1.3 billion respectively in the past 12 months, setting new monthly records in April 2026.
OpenTelemetry helps solve tool fragmentation by providing a single set of APIs, SDKs, a Collector agent, and semantic conventions, allowing organizations to switch observability backends without re-instrumenting code.
Its maturity is supported by a third-party independent security audit and formal governance review.
The project is also gaining interest for observing AI workloads, including performance, reliability, accuracy, and trustworthiness.
It integrates deeply with other CNCF projects like Kubernetes, Fluentd, Jaeger, and Prometheus, and has been adopted by other Linux Foundation projects like Cloud Foundry and OpenSearch.
Supporters like Austin Parker (Honeycomb.io), Morgan McLean (Splunk/Cisco), Michele Mancioppi (Dash0), Bob Quillin (ControlTheory), Gordon Radlein (Datadog), Richard Seroter (Google Cloud), Ted Young (Grafana Labs), Christine Yen (Honeycomb), Brendan Burns (Microsoft Azure), Juraci Paixão Kröhling (OllyGarden), and Ben Sigelman (Lightstep) provided supportive quotes, emphasizing its industry impact and shift from vendor-specific telemetry to shared standards.

Decoder

Cloud Native Computing Foundation (CNCF): A Linux Foundation project that fosters and sustains an ecosystem of open source, vendor-neutral projects for cloud-native software.
Observability: The ability to understand the internal state of a system by examining its external outputs, typically through metrics, logs, and traces.
Telemetry data: Data collected from a remote or inaccessible source, including metrics (numerical measurements), logs (timestamped event records), and traces (records of requests flowing through distributed systems).
OpenTracing: A deprecated CNCF project that provided a vendor-neutral API for distributed tracing.
OpenCensus: A deprecated Google-led project for metrics and distributed tracing.
Special Interest Group (SIG): A community group within an open-source project focused on a specific area, such as a programming language or component.
Project Velocity: A metric used by CNCF to gauge the activity, growth, and adoption of its projects, often measured by contributions, commits, and community engagement.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

7 Temporal Blind Spots Breaking Enterprise RAG

Data aillmenterprise ragaboutit.com

Enterprise RAG systems are often undermined by "temporal blind spots," failing to provide accurate, up-to-date information because they ignore the critical dimension of time.

What: David Richards identifies seven common temporal blind spots in Retrieval Augmented Generation (RAG) systems that lead to critical failures, such as stale indexes, time-blind embeddings, and temporal hallucinations. These issues can result in incorrect financial trades or outdated medical advice, exemplified by a hedge fund manager acting on a two-week-old Fed statement leading to millions in losses.

Why it matters: This article highlights a fundamental limitation in current RAG architectures, which often prioritize semantic similarity over temporal relevance. As AI systems are deployed in high-stakes enterprise environments, the lack of robust time-awareness presents a significant risk, indicating a need for more sophisticated data management and retrieval strategies that explicitly account for chronology.

Takeaway: Developers building or managing RAG systems should implement explicit temporal filtering, tiered freshness models for data, and time-aware evaluation metrics to prevent critical failures caused by outdated information.

Deep dive

Enterprise RAG systems commonly fail due to "temporal blind spots," which occur when the architecture ignores the recency or temporal context of information.
An example cited is a hedge fund RAG assistant providing two-week-old Federal Reserve information, leading to a multi-million dollar trading error.
Stale Indexes: 61% of RAG pipelines refresh daily or less, but 73% of users expect information no older than six hours for time-critical queries, creating a significant gap.
Time-Blind Embeddings: Embeddings excel at semantic similarity but often fail to encode temporal proximity, leading systems to retrieve older, semantically similar documents over newer, more relevant ones.
Query-to-Context Time Mismatch: Implicit temporal references in user queries (e.g., "current policy") are often ignored, causing the system to retrieve irrelevant old information.
Temporal Hallucination: RAG systems can faithfully reproduce facts that were true at one point but are now outdated, especially when ingesting historical archives.
Evaluation Gaps: Traditional RAG evaluation metrics do not account for temporality, meaning systems can score high while systematically providing outdated answers.
Chunking Against the Clock: Chunking strategies that prioritize semantic coherence can inadvertently destroy temporal narrative, breaking chronological cause-and-effect within documents.
Cost Overruns from Real-Time Retrieval Pipelines: Addressing freshness often increases compute costs; a tiered freshness model (high, medium, low urgency data) is recommended to balance accuracy and cost.
The article emphasizes that these issues are not edge cases but default failure modes if time is not explicitly designed into RAG systems, providing actionable engineering patterns to mitigate them.

Decoder

Retrieval Augmented Generation (RAG): An AI architecture that enhances large language models (LLMs) by retrieving information from an external knowledge base to ground its responses, aiming to reduce hallucinations and provide up-to-date information.
Vector Store/Index: A specialized database optimized for storing and querying vector embeddings, which represent data points (like text chunks) as numerical vectors in a high-dimensional space.
Embedding: A numerical representation of text (or other data) in a vector space where semantically similar items are mapped closer together.
Cosine Similarity: A measure of similarity between two non-zero vectors in an inner product space, commonly used to determine how similar two documents or embeddings are.
Hallucination (AI): When an AI model generates information that is plausible-sounding but factually incorrect or unsupported by its training data or retrieved context.
RAGAS: A set of metrics and tools specifically designed for evaluating the quality of Retrieval Augmented Generation (RAG) systems.
Chunking: The process of dividing a large document into smaller, manageable pieces (chunks) before creating embeddings and storing them in a vector database, to improve retrieval relevance and fit within LLM context windows.
Temporal Blind Spot: A failure mode in AI systems, specifically RAG, where the system overlooks or incorrectly handles the time dimension of information, leading to outdated or contextually irrelevant responses.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Leading Design Through the AI Shift

Design aicareerenterprise Slack.design

Slack's VP of Product Design, Will Miner, reports that 70 designers are now leveraging AI for rapid prototyping and building internal tools, shifting focus to faster customer value delivery while stressing human judgment.

What: Will Miner, VP of Product Design at Slack, describes how their 70-designer team has embraced AI. Designers now use coding agents for data analysis, rapid prototyping, and even fixing UI bugs, blurring the lines between design and engineering. Miner emphasizes curiosity, skepticism, and maintaining design judgment as core principles.

Why it matters: This article provides a valuable real-world case study from a major tech company on how AI is fundamentally reshaping design workflows and team structures. It highlights the strategic imperative for design leadership to guide teams through rapid technological shifts while preserving the irreplaceable human elements of taste and value judgment.

Takeaway: Designers should actively experiment with AI tools to understand their capabilities and limitations, focusing on how AI can accelerate value creation rather than dilute human judgment or craft.

Deep dive

AI has significantly transformed design workflows at Slack, with approximately 70 designers now using it.
Designers are leveraging coding agents for tasks like data analysis, rapid prototyping, and building custom internal tools, even fixing UI bugs themselves.
This has blurred traditional boundaries between design and engineering roles, with designers creating executive demos in code rather than just Figma mockups.
Slack's VP of Product Design, Will Miner, advocates for leadership guided by principles of curiosity, skepticism, and owning the design judgment at the table.
Miner encourages designers to personally try AI tools to discern hype from reality, but without mandates or quotas, allowing teams to explore freely.
He stresses that while LLMs can generate endlessly, human taste and judgment are crucial for deciding what's worth building and ensuring it resonates with users.
Miner advises against "sharing slop" – using AI as an excuse to produce low-quality or unnecessary artifacts.
He acknowledges the increased stress and uncertainty for designers and encourages patience and breaks, recognizing the challenge of learning amid industry shifts.
The core message is to use AI to move faster and create customer value more efficiently, without sacrificing quality or human oversight.

Decoder

Coding agents: AI programs capable of generating, analyzing, or debugging code based on instructions, used here by designers to build prototypes or tools without extensive manual coding.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Anthropic prepares Mythos 1 for Claude Code and Claude Security

AI securityllmenterprise Testing Catalog

Anthropic is nearing a broader public release of Claude Mythos 1, its AI model specialized in cybersecurity, for Claude Code and Claude Security enterprise offerings.

What: Anthropic's Claude Mythos, previously restricted, is moving towards a general release as Mythos 1, with evidence appearing on Google Cloud and AWS. It's being prepared for use in Claude Code and Claude Security, which now includes a new dashboard for surfacing vulnerabilities. Project Glasswing, an AI cybersecurity initiative, has already found over 10,000 high- or critical-severity vulnerabilities using Mythos-grade models. Claude Opus 4.8 is also rumored for release soon.

Why it matters: This marks a strategic expansion for Anthropic into dedicated enterprise cybersecurity and code analysis, leveraging its advanced models to directly address critical business needs beyond general-purpose LLM applications, but also suggests the company is still cautious about widespread availability of such powerful tools.

Takeaway: If your organization uses Anthropic's enterprise offerings, expect enhanced code and security analysis capabilities with Mythos 1, potentially leading to faster vulnerability discovery.

Deep dive

Anthropic is preparing Claude Mythos for a broader release as Mythos 1.
Evidence of Mythos 1 has appeared with a "claude-mythos-1-preview" label for Claude Code and Claude Security.
Project Glasswing, using Mythos-grade models, has already identified over 10,000 high- or critical-severity vulnerabilities.
The Claude Security offering is receiving a new dashboard for discovered vulnerabilities, historical charts, and triage results.
Earlier statements suggested Mythos would remain restricted, making this a notable shift towards broader availability once safeguards are in place.
Claude Opus 4.8 is also rumored to be in internal evaluations with partners, potentially launching in the coming weeks.

Decoder

Project Glasswing: Anthropic's collaborative AI cybersecurity initiative focused on using AI models to discover software vulnerabilities.

Original article

Anthropic appears to be moving Claude Mythos closer to broader availability than its original guidance suggested. A recent Project Glasswing update notes that the model is now helping protect a wider range of organizations, including open-source projects, and adds that Mythos-grade models could reach the public once the right safeguards are in place.

Last month we launched Project Glasswing, our collaborative AI cybersecurity initiative. Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in essential software.
— Anthropic (@AnthropicAI) May 22, 2026

And in the near future, once we’ve developed the far stronger safeguards we need, we look forward to making Mythos-class models available through a general release.

This marks a notable shift from the earlier framing, in which Anthropic stated that Mythos would remain restricted. Traces of the model have already surfaced on Google Cloud and AWS through vulnerability discovery programs, and signals now point to a product called Mythos 1, carrying a preview label ("claude-mythos-1-preview"), being prepared for Claude Code and Claude Security.

Some users were temporarily able to see the "Mythos 1" model in the UI. Besides that, new strings have been added to the source code recently:

"Access to the Claude Mythos model in Claude Code and Claude Security."

The Claude Security side of that rollout is getting structural work too. A new dashboard is being built that surfaces discovered vulnerabilities, seven-day and thirty-day historical charts, and deeper triage results. There are no indications yet that the product will move beyond enterprise customers, but the refresh brings it closer to parity with how rival security suites present scan history.

In parallel, Claude Opus 4.8 is rumored to be in the works for release, with select Anthropic partners already conducting internal evaluations. A launch in the coming weeks would fit the cadence set by Opus 4.7 in April and would slot neatly alongside the Mythos and security product moves.

DEVOURED

Lance (Hugging Face Repo)

AI llmmultimodalopensource Hugging Face

ByteDance Research has released Lance, a lightweight native unified multimodal model with 3B parameters, demonstrating strong performance in image and video understanding, generation, and editing.

What: Lance, developed by ByteDance Research, is a 3-billion-parameter multimodal model trained from scratch using 128 A100 GPUs. It excels across benchmarks for image generation, image editing, and video generation, offering native support for image and video understanding, generation, and editing.

Why it matters: The release of a capable, lightweight multimodal model trained within a modest GPU budget by a major tech company like ByteDance highlights the ongoing progress in optimizing AI model development for efficiency and broader accessibility, pushing the boundaries of what can be achieved with fewer resources.

Takeaway: Consider exploring the Lance model on Hugging Face for projects requiring efficient multimodal capabilities, particularly if working with image and video generation or editing on a tighter resource budget.

Decoder

Multimodal model: An AI model that can process and understand information from multiple types of data, such as text, images, and video.
Active parameters: The number of parameters in a neural network that are actively used or contribute to the model's computations during inference or a specific task.
Hugging Face: A platform and community for machine learning, providing tools, datasets, and pre-trained models, especially for natural language processing and multimodal AI.

Original article

Lance is a lightweight native unified multimodal model that supports image and video understanding, generation, and editing. It delivers strong performance across image generation, image editing, and video generation benchmarks with just 3B active parameters. The model was trained entirely from scratch within a 128-A100-GPU budget. Examples of clips generated by the model are available in the repository.

DEVOURED

Anthropic's march to profitability

AI startupenterprise Contrary Research

Anthropic is on track for $10.9 billion in Q2 revenue and expects to clear $559 million in profit by its October IPO, defying the "AI labs burn money forever" narrative.

What: Anthropic, creator of Claude, is projected to reach $10.9 billion in Q2 revenue, a significant increase from $4.8 billion in Q1, attributed to cheaper compute costs (down from $0.71 to $0.56 per revenue dollar). Their Claude Code product alone generates $2.5 billion in revenue, with an expected $559 million profit ahead of an anticipated October IPO.

Why it matters: Anthropic's rapid revenue growth and swift path to profitability, driven by decreasing compute costs, challenges the industry's prior assumption that large AI models inherently require perpetual, massive capital expenditure, signaling a maturing market and potentially faster paths to IPOs for leading AI labs.

Original article

Anthropic is on track to do $10.9 billion in Q2 revenue, up from $4.8 billion in Q1, growing faster right now than Zoom did at the peak of the pandemic. The thing that flipped them to profit is compute getting cheaper, 71 cents of compute per revenue dollar in Q1, 56 cents in Q2. Claude Code on its own is at $2.5 billion in revenue, and the company expects to clear $559 million in profit just in time for its October IPO. The "AI labs burn money forever" story finally has a hole in it.

DEVOURED

Bumblebee Goes Open Source

AI securityopensourcedevops Perplexity

Perplexity open-sourced Bumblebee, a security scanner designed to identify risky packages, extensions, and AI tool configurations on developer machines.

What: Perplexity has released Bumblebee as an open-source tool. It functions as a read-only security scanner to detect potentially dangerous packages, browser extensions, and AI tool setups on a developer's workstation.

Why it matters: Open-sourcing security tools like Bumblebee can foster community collaboration in identifying and mitigating security risks, especially as AI tools become more integrated into developer workflows, introducing new attack surfaces.

Takeaway: Consider running Bumblebee on your development machine to check for risky configurations and packages, especially if you use many AI tools or browser extensions.

Original article

Perplexity open-sourced Bumblebee, a read-only security scanner that identifies risky packages, extensions, and AI tool configurations on developer machines.

DEVOURED

Gemini 3.5 Flash (Low)

AI llmperformance X (formerly Twitter)

Google's Gemini 3.5 Flash (Low) generates 45% fewer tokens than Flash (Medium) and surprisingly outperforms Flash (High) on software engineering tasks.

What: Gemini 3.5 Flash (Low) is a new iteration of Google's AI model. It produces approximately 45% fewer tokens than Gemini 3.5 Flash (Medium) and, unexpectedly, demonstrates superior performance compared to Gemini 3.5 Flash (High) when handling software engineering (SWE) tasks.

Why it matters: This seemingly counterintuitive performance—where a "lower" token output model outperforms a "higher" one on specific tasks like SWE—suggests that model efficiency and specialized training or architecture can be more critical than raw output size for domain-specific applications.

Takeaway: If you're using Gemini models for software engineering tasks, consider experimenting with Gemini 3.5 Flash (Low) as it may offer better performance despite its reduced token output compared to other Flash variants.

Decoder

SWE tasks: Software engineering tasks, which involve coding, debugging, testing, and other development-related activities.
Tokens: The basic units of text (like words or sub-words) that an LLM processes, generates, and counts. Fewer tokens often mean lower cost and faster inference.

Original article

Gemini 3.5 Flash (Low) generates around 45% fewer tokens than Gemini 3.5 Flash (Medium) and generally outperforms Gemini 3.5 Flash (High) on SWE tasks.

DEVOURED

SpaceX Launches 400-Foot-Tall Rocket That Will Help Define Its Future

Tech hardwarespaceengineering The Wall Street Journal

SpaceX's upgraded Starship rocket partially succeeded in its latest test launch, separating its booster but losing an engine before exploding on ocean touchdown.

What: SpaceX launched an upgraded 400-foot-tall Starship rocket from Starbase, which saw its booster successfully separate but later crash, while the Starship spacecraft reached space despite losing an engine, deploying satellite mimics before exploding upon reentry into the Indian Ocean.

Why it matters: This test, despite not being a full success, marks another step in SpaceX's iterative development process for Starship, which is crucial for its long-term goals of Mars colonization and satellite deployment.

Original article

SpaceX launched an upgraded version of its Starship rocket from a new launchpad at its Starbase facility on Friday. The booster successfully separated from the spacecraft, but it wasn't able to conduct an engine maneuver and eventually crashed into the Gulf of Mexico. Starship lost one of its engines but was able to make it to space. It deployed devices that mimic satellites, as well as two satellites that took images of the flying spacecraft. Starship exploded after it touched down in the Indian Ocean.

DEVOURED

China launches Shenzhou-23 mission with potential record one-year stay in orbit

Tech spacehardwarepolicy Yahoo Finance

China launched its Shenzhou-23 mission to the Tiangong space station, potentially involving a record one-year stay for one astronaut and the first autonomous rapid docking.

What: China sent three astronauts to its Tiangong space station aboard the Shenzhou-23 mission on Sunday, with one crew member potentially staying for a record one year, surpassing previous six-month missions. The mission will also perform the first autonomous rapid rendezvous and docking procedure with Tiangong's core module, supporting China's goal to land astronauts on the moon by 2030, competing with NASA's Artemis program targeting 2028. Commander Zhu Yangzhu, pilot Zhang Yuanzhi, and payload specialist Li Jiaying (first from Hong Kong) are the crew.

Why it matters: This mission signifies China's accelerated advancement in human spaceflight and its growing rivalry with the United States in the new space race, particularly concerning lunar exploration and long-duration space habitation.

Decoder

Tiangong space station: China's modular space station, currently in low Earth orbit.
Long March-2F Y23 rocket: The specific rocket variant used by China to launch Shenzhou missions.

Original article

China launches Shenzhou-23 mission with potential record one-year stay in orbit

Investing.com -- China is set to launch its Shenzhou-23 mission on Sunday, sending three astronauts to the Tiangong space station in a flight that could see one crew member remain in orbit for a full year, the longest human space mission in the country’s history.

The Shenzhou-23 spacecraft is scheduled to lift off from the Jiuquan Satellite Launch Center in northwestern China aboard a Long March-2F Y23 rocket, according to the China Manned Space Agency.

The crew includes commander Zhu Yangzhu, pilot Zhang Yuanzhi, and payload specialist Li Jiaying, a former Hong Kong police inspector who will become the first astronaut from Hong Kong to take part in a Chinese space mission.

Chinese officials said one astronaut could remain aboard Tiangong for up to a year, exceeding the six-month missions that have been standard for China’s space station program since 2021. The agency said the astronaut selected for the extended stay will be determined later during the mission.

The launch comes as China advances plans to land astronauts on the moon by 2030, setting up a race with the United States, which is targeting a crewed lunar landing in 2028 under NASA’s Artemis program.

China is developing the hardware needed for its lunar ambitions, including the Long March-10 rocket, the Mengzhou spacecraft, and the Lanyue lunar lander. Officials have described recent testing of these systems as part of preparations for future crewed lunar missions.

The Shenzhou-23 mission will also conduct the first autonomous rapid rendezvous and docking procedure with Tiangong’s core module, a capability expected to support future lunar operations.

Scientists will use the mission to study the effects of long-duration spaceflight, including radiation exposure, bone density loss, and psychological stress.

China has steadily expanded its space program in recent years. In 2024, it became the first country to return samples from the far side of the moon through a robotic mission.

Beijing is also working with Russia on plans to establish a permanent lunar base by 2035.

DEVOURED

Inside the World's Biggest Bet on Fusion Energy

Tech researchhardwareclimate CNET

The $22 billion International Thermonuclear Experimental Reactor (ITER) in France, a collaborative fusion energy project among geopolitical rivals, aims to contain plasma 10 times hotter than the Sun's core.

What: The International Thermonuclear Experimental Reactor (ITER), costing an estimated $22 billion and involving over 30 countries including China, Russia, the US, and Europe, is under construction in southern France. This donut-shaped vacuum chamber is designed to contain 150-million-degree Celsius plasma using superconducting magnets near absolute zero, despite a $5 billion cost increase and years-long delay due to 2020 cracks in piping, welding distortions, and COVID-19.

Why it matters: ITER represents a monumental, multinational effort to de-risk and advance fusion energy research, providing foundational knowledge and supply chain development that can benefit both public and private sector fusion initiatives despite its significant cost and timeline challenges.

Decoder

Tokamak: A type of magnetic confinement device used for fusion research, characterized by its donut-shaped (toroidal) vacuum chamber.

Original article

Nestled in the countryside of southern France is a sprawling industrial complex where scientists and engineers from around the world have converged to build the world's largest-ever fusion reactor: a doughnut-shaped vacuum chamber designed to contain temperatures 10 times hotter than the core of the Sun.

At an estimated cost of $22 billion, the International Thermonuclear Experimental Reactor is the world's biggest bet on fusion energy: a project so daunting in scale that longtime geopolitical rivals have pooled their resources to share in its potential risks and rewards.

As ITER's chief strategic advisor Laban Coblentz put it, "That China and Russia were going to collaborate with the US and Europe, and add in Korea, India, and Japan -- that's either genius or insane."

Controlled fusion reactions produce millions of times more energy than the burning of fossil fuels, and four times more energy than the reactions powering traditional nuclear power plants -- without the risk of meltdown, long-lasting radioactive waste and carbon emissions. All humans have to do is create the right conditions for it to happen, but that's far easier said than done.

Containing ITER's 150-million-degree Celsius plasma will require superconducting magnets kept just a few degrees above absolute zero. To make that possible, engineers must place one of the hottest environments ever created right next to one of the coldest, with only a thin heat shield separating the two.

Cracks in the piping of this heat shield were discovered in 2020, along with distortions caused by welding and disruptions due to the COVID-19 pandemic, which led to a years-long delay in ITER's timeline and the need for an additional $5 billion to cover repair costs. At the same time, private fusion startups have been multiplying, with many hoping to beat ITER to major milestones.

Despite the pressure and criticisms generated by these overruns and delays, the people I met at ITER all spoke about the project like an open book. "This is a publicly funded project," said Javier Artola, a scientist working on modeling the behavior of ITER's plasma. "It is the knowledge of the world."

A publicly funded project like ITER helps de-risk the research and development needed for commercial-scale fusion, making it easier for private companies to place their own big bets on the technology. Every problem ITER solves is one less problem private fusion companies will have to figure out.

Every member state of the ITER agreement (which includes more than 30 countries) will have access to all the science that comes out of ITER, and the construction of ITER itself is developing a global fusion energy supply chain. If the member states agree to share it with them, even non-member states may benefit from ITER's science.

"We have become a model for how countries of unlike persuasion can work over decades, only through the shared vision of a better world that everybody wants for the next generations," said Coblentz.

Fusion is one of those technologies that people often joke is always a decade away. But seeing firsthand what ITER is building gave me hope that we may truly be living in the last decade when fusion is still spoken of as a distant dream.

To see our journey into the heart of this one-of-a-kind experiment in fusion energy and international collaboration, check out the video in this article.

DEVOURED

auth.md (Website)

Tech aiagentssecurity WorkOS

WorkOS introduced auth.md, an open protocol allowing AI agents to securely register users with apps without human interaction by asserting identities via Markdown files.

What: Auth.md, an open protocol by WorkOS, enables AI agents to register users for applications by fetching a Markdown file from the app's domain (e.g., https://yourapp.com/auth.md). This file outlines supported flows (like agent-verified identity assertions or OTP-based claims), available scopes, and registration endpoints, allowing agents' identity providers to vouch for users, eliminating manual sign-up forms.

Why it matters: This protocol addresses a crucial need for seamless agent integration into software workflows, paving the way for autonomous AI agents to manage user accounts and access services more efficiently, which is vital for the scalability and adoption of agent-based systems.

Takeaway: Developers building apps meant to interact with AI agents should explore publishing an `auth.md` file to streamline agent-based user registration and service access, using either agent-verified or user-claimed flows.

Deep dive

Auth.md is a Markdown file hosted at an application's domain (e.g., https://yourapp.com/auth.md) that defines how AI agents can register users.
It specifies supported registration flows (e.g., agent-verified, user-claimed via OTP), available scopes, and registration endpoints.
The "agent verified" flow allows an agent's identity provider to vouch for a user, requiring no human interaction.
The "user claimed" flow involves an OTP (one-time password) sent to the user for confirmation.
Apps retain control over accepted flows and the type of credentials issued to the agent (e.g., scoped API keys or access tokens).
Auth.md is an open protocol, not tied to WorkOS infrastructure, and leverages existing OAuth standards like Protected Resource Metadata and ID-JAG identity assertions.
WorkOS provides an AuthKit for easier implementation and is actively shaping the protocol with early adopters.

Decoder

Agent-verified flow: A registration method where an AI agent's identity provider asserts the user's identity, eliminating the need for direct human interaction.
User-claimed flow: A registration method where an AI agent triggers an OTP (one-time password) that the human user confirms, thereby claiming the account.
OAuth: An open standard for access delegation, commonly used for granting websites or applications access to information on other websites without giving them the password.
Protected Resource Metadata: Standardized way to describe the resources and capabilities of an OAuth 2.0 protected resource.
ID-JAG identity assertions: A standard related to identity assertions, often used in federated identity systems to convey user identity information securely.

Original article

auth.md

Enable agents to register users without the sign-up form. Auth.md provides secure agent registration that any app can implement.

Self-serve agent discovery

Publish auth.md at your domain with the flows, scopes, and endpoints an agent needs to register.

Choose the flows you support

Allow trusted identity assertions, OTP-based claim flows, or anonymous access.

Credentials you control

Issue scoped API keys or access tokens tied to users — auditable, expirable, revocable.

Get started

For services that want agents to register users on behalf of their customers.

For platforms whose agents act on behalf of users.

Get in touch to enable auth.md on your account.

FAQs

What is auth.md? A Markdown file an application hosts at its domain — typically https://yourapp.com/auth.md — that tells agents how to register on behalf of a user. It includes which flows are supported, which scopes exist, and how to register for the service.
How does an agent register a user with my app? The agent fetches your auth.md, picks a supported flow, and either presents a verified identity assertion (agent verified flow) or walks the user through an OTP-based claim (user claimed flow). You stay in control of which flows you accept and what credentials get issued.
What's the difference between the agent verified and user claimed flows? Agent verified is agent-attested — the agent's identity provider vouches for the user, no human interaction required. User claimed is OTP-based — the agent triggers a code, the human confirms, the account is claimed. Most apps support both and let the agent pick the right one for the situation.
What credentials get issued to the agent? Your service decides whether to return a scoped API key or access token tied to the user. This allows for re-use of your existing API auth methods.
Is auth.md a WorkOS only feature or an open protocol? It's open. WorkOS authors the protocol, but auth.md isn't tied to WorkOS infrastructure — it composes existing OAuth standards (Protected Resource Metadata, ID-JAG identity assertions) and any app can publish or any agent can read one with no WorkOS account required.

DEVOURED

Predicting AI job exposure

Tech aicareerresearch Benedict Evans

A new analysis argues it's impossible to reliably predict AI's impact on specific jobs because past tech shifts show unexpected outcomes and evolving roles.

What: Benedict Evans argues that historical data from automation, like accounting software, shows industries expected to shrink often grew, while job titles and functions changed dramatically, making current AI job exposure predictions unreliable.

Why it matters: This piece challenges the common practice of quantifying AI's impact on specific job categories, suggesting a more nuanced and unpredictable transformation of tasks, business models, and job definitions rather than simple replacement.

Deep dive

Past technological shifts like accounting automation or the internet's rise demonstrate that predicting job impacts is fraught with error.
The number of accountants continued to rise despite a century of automation because regulations changed, and efficiency gains (Jevons paradox) led to more analysis, not less work.
Technology often makes existing tasks cheaper, leading to new types of work or increased volume of related tasks, fundamentally changing job descriptions even if titles remain.
The internet didn't change what it meant to be a journalist, but it destroyed the newspaper's business model, an effect hard to predict from job descriptions alone.
The rise of smartphones created Uber, an unpredictable disruption for taxi drivers, which wouldn't have been flagged by a 2005 "smartphone exposure" analysis.
Relying on generic job descriptions like O*NET to predict automation exposure is flawed because jobs are complex, subtle meshes of activities, not simple logical steps.
This phenomenon is akin to Gell-Mann Amnesia, where experts recognize complexity in their own field but underestimate it in others, leading to oversimplified AI impact predictions.
Quantifying AI's impact job-by-job is "fooling yourself" because you don't truly know today's jobs or how they will change.

Decoder

Jevons paradox: An economic theory stating that as technological efficiency increases the use of a resource, the rate of consumption of that resource also increases, rather than decreasing.
O*NET: The Occupational Information Network, a comprehensive database of job characteristics and worker requirements used for career exploration and job analysis in the US.
Gell-Mann Amnesia: A phenomenon where one critically assesses news in their area of expertise but uncritically accepts information from other fields as accurate, forgetting that the same journalistic sloppiness might apply elsewhere.

Original article

Predicting AI job exposure

It would be really nice if we had some way to analyse which jobs, companies and industries were exposed to AI, and if we could assign scores, and build charts, and map that against the progress of large language models. We know, in principle, that like every other big wave of technology, AI is bound to destroy some jobs and create others. But which ones? In the last three years a bunch of people have been very busy crunching census data, making tables and building viral charts.

I think this is mostly impossible: I think this is an exercise in predicting something that cannot be predicted.

The simplest way to see the problem is to back-test this against other big technology shifts in the past. Some of the industries that should have suffered most ended up much bigger, and some of the industries that did suffer most should have been immune.

Hence, we spent a century automating accounting: we built calculating machines, punch cards, mainframes, data processing, databases, PCs, spreadsheets, ERPs, cloud… in fact, we built half of the tech industry around automating this. Yet the number of accountants kept going up.

This is high-level survey data, but you can see much the same thing at the micro level. The next chart is about as specific as it gets: 50 years of financial automation doesn’t seem to have hurt the market for CPAs. If you’d done any kind of analysis of professions exposed to automation from computing, this should have been at the top of the list. Dan Bricklin talks about CPAs in the late 1970s using VisiCalc to do one-month projects in a few days. And yet, look what happened.

I think there are three things to point to in this chart. The first is that technology was not the only variable: changes in regulation produced new accounting requirements that led to a one-off surge in CPA hiring (this is why economists say ceteris paribus). Second, within the automation conversation itself there is the Jevons paradox, which is really applied price elasticity: if you make it cheaper to do something, do you do the same for less money (or resources, or employees), or more for the same money, or does a new ROI mean you do more for more money? If a DCF takes a week and then it takes 30 seconds, you probably do more DCFs. ‘Exposure to automation’ might mean more work, not less.

But then, the more important story is that if you automate something that used to be expensive and time-consuming and it becomes cheap and quick, that probably unlocks other things. If analysis becomes cheap and easy, you do much more analysis, and mostly that’s also a different kind of analysis. Accountants today aren’t doing exactly the same work that they did in 1970 or 1980 ‘but more’ - they’re still called ‘accountants’ but the job is different. New technology often starts out being used for ‘the old thing but more’, but it rarely ends up like that.

Indeed, if you dig into the detail of the Census data, then ‘accountants and auditors’ itself is a fairly stable category, but all around that term there are lots of other finance job categories that appear and disappear over time. The job of “Billing, posting and calculating machine operator” appeared in the stats for a decade or so and then disappeared again. How often did that represent someone who started their career as a stock clerk, then became a ‘posting machine operator’ because that was how you did stock-keeping, and then retired as a stock clerk again when that was absorbed into software and the Census didn’t create a category for ‘PC operator’? Equally, there’s still a category for ‘data keyer’ but not for ‘ERP operator’. The same person doing the same actual job (or rather, serving the same business purpose) gets different job titles over time, while ‘accountants’ have the same job title while doing different things.

Then, I think there's a second problem that comes up in back-testing: the job might not change at all, but the business might change underneath you.

The internet didn't really change what it took to be a good journalist or a good A&R scout, but the job of journalism was paid for by a light manufacturing and trucking operation with (in the USA) a local monopoly on classified ads, and the record executive’s salary was paid by manufacturing and shipping small pieces of plastic and aluminium foil. That was a whole other thing that would not be captured in any analysis you tried to do of what it is to be a copy editor or a sound engineer. The internet decoupled a class of business where the product and the job were not affected by the internet but the business was.

It seems to me that we should expect the same thing to happen with AI: how many people have a job that has very low exposure to AI, but the business depends on some other job that is hugely affected by AI? How many people have a job doing something that’s very hard for AI to match, but their company’s defence against competition is that they also have lots of buildings full of people doing something very boring? AI will take a bunch of stuff that used to be expensive and make it very cheap or free - what does that unlock and what does that break, and how many jobs is that?

Third, continuing the theme of big and unpredictable effects of past technologies, how does your analysis handle Uber? I worked in mobile in the 2000s and we all spent a lot of time talking about location data, but it didn’t occur to anyone that this might be an issue for taxis - you might have suggested more efficient dispatch, but no-one was considering that this could totally change the nature of the job (and make a bunch of $1m medallion mortgages worthless). If you’d been calculating ‘internet exposure’ by occupation in 1995 or ‘smartphone exposure’ in 2005 (yes, we had smartphones before the iPhone), are you confident you’d have put taxi drivers on the list?

(Source: Todd Schneider / MTA)

Narrowly, then, the problem with using things like O*NET to try to analyse what a job is and how much it can be automated is that this tells you nothing about all the ways that the job shrinks and grow with automation, and the ways that the job itself might be changed by automation elsewhere, outside your analysis.

But I think there's a more fundamental problem, too. Even if you set aside the question of change, I don't think it's possible, in principle, to create a usefully complete description of what the job is.

Reading O*NET descriptions of jobs reminds me a lot of the failure of expert systems, when people thought that you could use logical steps to build an AI system to do image recognition or language translation. Theoretically, you can describe a series of steps by which a machine can recognise a cat, and theoretically, you can write down exactly what an associate partner at a law firm does, but in reality, these things are just too complex or too subtle for us to be able to describe them like that. Sometimes, of course, the job really is just a task, that can be turned into a button, but that's actually pretty rare. Generally, the job is a complex mesh of things that we lack the capability to explain explicitly (tangentially, this is also why most people seem to struggle to use chatbots). And, of course, once you dig into the detail these descriptions fall apart, just as logical systems did before machine learning: apparently administering a family trust and running a desk at a quant fund are comparable jobs, and they need fluency in Lotus 1-2-3, Oracle or Quickbooks but not Bloomberg.

Aaron Levie, CEO of Box, described this as a variant of ‘Gell-Mann Amnesia’. You have a pretty good sense of how complex your own field is, and how incomplete AI’s addressability of that might be, but in other fields you forget this - you see a Claude template for a Powerpoint or a legal draft and you think “wow, consultants and law firms are screwed!” When you hire Bain, BCG or McKinsey, they will give you some slides, but that’s not what you’re paying for, just as when you buy software, you’ll get some code, but that’s not the product.

The counter-argument to all of this, would be to say that, yes, well done, there are important exceptions, as there always are, but directionally and in aggregate, it is ‘surely’ correct to say that jobs that involve a lot of repetitive clerical work are most exposed, and this is how many jobs that is, and by how much. That sounds good, but you don’t know if the exceptions are bigger than the rule. Suppose we’d looked at the internet in 1995 and said that this would destroy the value of physical distribution for media - this was ‘directionally correct’, but in practice that meant totally different things for record companies, newspapers, TV companies and movie studios. On average, we’re all dead. Half of the jobs you’ve analysed might be entirely unaffected, and there might be other big pools of jobs to be transformed that you miss entirely. You don’t know.

A while ago, I noted someone had criticised my work by saying that I always end by saying ‘it depends’. But when you're at such an early stage of a fundamentally new technology, any specific predictions about a particular field will only be correct by luck: it really does depend. As Yogi Berra said, “it’s tough to make predictions, especially about the future”. We can certainly point to framings and mental models for how this might work, and we can point to what happened the last half-dozen times we went through this kind of change. We can even say things that are probably directionally correct. But as soon as you try to quantify that, and model it out job by job and industry by industry, and make pretty radar charts, you’re fooling yourself, because you do not actually know what those jobs are today, and you do not know how they will change. At a minimum, you have to ask whether your model passes the newspaper test, the Uber test and the CPA test: would your approach have captured those effects? If not, how useful is it to the rest of us?

DEVOURED

Don't Roll Your Own ...

Tech frontendwebdesignux Susam Pal

Developer Susam Pal argues against building custom web UI features like scrolling, link navigation, or date pickers when browsers already provide robust, familiar native implementations.

What: Susam Pal criticizes modern web design practices where developers "roll their own" implementations of features browsers handle well, such as page scrolling, link navigation, and password fields. He cites GitHub's custom link navigation and the issues custom password fields create for browser-native password management.

Why it matters: This highlights a tension in web development between pursuing unique, branded experiences and adhering to web standards and established user expectations, often at the expense of accessibility, performance, and user familiarity.

Takeaway: Prioritize native browser functionality for core UI elements like scrolling, links, and form inputs to ensure better accessibility, performance, and a consistent user experience.

Deep dive

The principle "Don't roll your own crypto" should extend to web UI features where browsers excel, as custom implementations often degrade user experience.
Custom page scrolling often breaks familiar responsiveness to mouse, touchpad, or keyboard input, making navigation frustrating.
Custom link navigation, like on GitHub, can introduce delays and break standard browser behaviors (e.g., opening in a new tab sometimes being faster).
Custom password fields often interfere with browser-native password saving, autofill, strong password generation, and security warnings for insecure connections.
Custom date pickers vary widely across websites, forcing users to learn new interaction patterns instead of using their preferred, consistent browser default.
Native form controls are generally well-equipped, accessible, and integrate with system-level features like password managers and accessibility tools.
Constantly changing website layouts and interfaces, even if well-intentioned, can be highly disruptive, especially for less tech-savvy users.
The author argues that unless there's a compelling, specific reason, developers should be more conservative with custom UI implementations for serious websites.

Original article

Don't Roll Your Own ...

This is going to be a rant about modern web design practices. But before I get to that, let me begin with a familiar principle from the world of cryptography. Among software developers, and especially among those who work on security-sensitive systems, there is a well-known maxim: Don't roll your own crypto. This does not mean that nobody is allowed to write cryptographic code. Someone has to. It means that, for ordinary production software that protects sensitive data of users, we should not rely on a private, unreviewed implementation that has not been vetted by the wider software development community. We should use established, vetted software packages or tools wherever possible.

Fortunately, it is now standard industry practice to avoid rolling your own crypto and instead use cryptographic algorithms and packages that have been peer reviewed and stood the test of time. It wasn't so some twenty years ago. I have seen several flawed home-grown RC4 implementations early in my career, with issues like improper initialisation vectors, predictable keystreams and partial leakage of plaintext into ciphertext, putting sensitive data of users at risk. But today, major e-commerce websites or banks typically do not use home-grown cryptography for its web services. In fact, in regulated domains such as payments, healthcare and personal data processing, doing so could violate requirements for strong cryptography, possibly leading to hefty financial penalties.

Website design is obviously not cryptography. A broken scroll bar is not the same kind of failure as a broken encryption scheme. But I wish there were a similar maxim for website design as well. There are many aspects of websites where, I think, developers should not be rolling their own X, especially when X is something browsers already do well and something users depend on every day. Here I present a list of such X.

Don't roll your own page scrolling.
Don't roll your own link navigation.
Don't roll your own text selection.
Don't roll your own context menu.
Don't roll your own copy and paste.
Don't roll your own password field.
Don't roll your own date picker.

Of course, there are valid scenarios where you may need to roll your own X. But here I want to focus on the cases where you should not roll your own X, and how doing so can lead to a worse user experience, at least in my experience. I am not saying that nobody should ever build anything themselves. As someone who does a lot of creative computing myself and develops fun tools from time to time, I am a big proponent of developing your own stuff. But when it comes to developing user interface features for serious websites that people need to use to get their work done, I wish the software development community were more conservative in deciding what fancy feature goes into a website and what is left out. Do keep in mind that I am no expert in user experience. Far from it. So none of what I am saying here should be taken as a recommendation. But I am a user of the Web, and as a user, I have found some modern web design patterns to be frustrating. This post is a lament from one user of the Web, not a design guide.

Of all the things I mentioned above, the one that bothers me the most is custom scroll behaviour on websites. I am used to how page scrolling responds to my mouse, touchpad or keyboard input. When you override the default scrolling behaviour of the web browser with your own implementation, it 'breaks' the page for me. The page now moves too slowly or too quickly when I scroll. Keyboard scrolling may or may not work. You take something I am so familiar with that I don't even think about it, and turn it into something unfamiliar that I now have to think about.

Custom link navigation is another pet peeve of mine. Web browsers can already handle links very well. You could say that this is the whole reason web browsers even exist. Following links is their bread and butter. You shouldn't have to mess with that behaviour at all. If you think you need to, reconsider what you are trying to achieve and whether it is really so important as to disrupt normal link navigation. The worst offender I have found here is GitHub. When you click on a link on GitHub, say, a file link or an issue link, it triggers a massive piece of functionality implemented in JavaScript that handles the link click for you. If you don't believe me, visit your favourite project on GitHub using Firefox or Chrome, type F12 to open the browser's developer tools, then go to the 'Debugger' or 'Sources' tab, find 'Event Listener Breakpoints' on the right sidebar, expand 'Mouse' and select 'click'. Then click on a link on GitHub and see what happens.

I'm sure I am not the only one who has noticed that, on GitHub, a clicked link sometimes takes too long to load. Ironically, it is often faster to open the link in a new tab than to wait for GitHub's JavaScript code to handle the navigation in the current tab.

A custom password input field is another such hazard. Fortunately, custom password input fields have become rarer over the years. The password input field that comes with the web browser is generally well equipped to handle passwords. It can offer to save passwords, fill them in later and generate strong passwords for new accounts. It can also warn when a password is submitted over an insecure HTTP connection, work well with password managers and autofill, and cooperate with mobile keyboards and accessibility tools. If you replace the browser's password field with your own fake version, you may break all of that. You may also end up using an ordinary text field and masking it yourself, in which case the password may be treated by the browser, the operating system or assistive tools as ordinary visible text rather than as a password, thereby exposing the password in ways you did not intend.

Custom date pickers are another common annoyance. I know that <input type="date"> does not help you select a date range. But that is okay. You can provide two date input fields, one for the start date and one for the end date. I am willing to pay the small price of using two different inputs to select a date range if that means I can use my favourite web browser to navigate the calendar and select dates the same way everywhere. What I am less inclined to do is to learn ten different ways of using the date selector in ten different implementations across ten different websites. Right now the implementations of date selector are all over the place. Some require you to zoom out of the month view to enter a year view, where you can select years. While you are there, you cannot change the month again until you return to the month view. Some require you to click the previous-year button literally forty times to select your year of birth if you are old enough. Some do not let you type the date at all. No. I do not want to learn your calendar widget. I just want to use the date picker in my favourite browser, which is quite sane. Saner than your custom implementation. If you need to have a calendar widget to support browsers with inadequate native date-picker support, perhaps that support can be added alongside the native date picker rather than as a replacement for it. For example, the ordinary <input type="date"> element could be left intact, with a custom widget provided in addition to it so that users can manipulate the same field.

In general, just stop messing with the form controls. They almost always introduce new problems while solving some existing ones. And while you are at it, don't keep changing your website layout and interface every few months! I may adapt to the new design, but my ageing relatives cannot. For them, every time you change the user interface, it amounts to learning a whole new tool. If every website keeps doing this every few months, they have to spend a significant amount of time relearning familiar things for no functional benefit. Please just let them enjoy their retirement. Imagine how you would feel if a Linux distribution decided to redesign all its core commands and their command-line options every few months. Or imagine how you would feel if the buttons of your washing machine were rearranged every morning. It wouldn't be pleasant!

DEVOURED

Snap Specs True AR Glasses Reportedly Launch This Fall For Around $2500

Tech hardwarearmobilestartup UploadVR

Snap's consumer "Specs" true AR glasses, designed to place virtual objects naturally into the real world, are reportedly launching this fall for around $2500, targeting early adopters.

What: Snap's standalone consumer AR glasses, "Specs," are expected to launch this fall priced around $2500, according to tech journalist Alex Heath. These glasses aim to be the first "normal-looking" true AR glasses from a major tech company, featuring binocular displays, head and hand tracking, and running Snap OS.

Why it matters: This represents an ambitious entry into the consumer AR market by Snap, potentially beating Meta and Apple to market with a high-fidelity, relatively compact true AR device, albeit at a premium price point for early adopters.

Takeaway: Developers interested in AR may want to explore Snap's Lens Studio for creating "Lenses" as this platform could gain traction with the consumer launch of Specs.

Deep dive

Snap's consumer "Specs" AR glasses are anticipated to launch in Fall 2026 at approximately $2500, with a production run of around 100,000 units.
Alex Heath, a veteran tech journalist, reported this pricing and launch window.
"Specs" are designed to be "true AR glasses," meaning they overlay virtual objects onto the physical world without significantly dimming or distorting the user's view, unlike existing development kits.
These consumer glasses are expected to be much smaller and lighter than the current Spectacles AR development kit, which rents for $99/month.
They run Snap OS, an Android-based system that does not support native APKs or third-party engines like Unity.
Developers create sandboxed "Lenses" (apps) using JavaScript or TypeScript within Snap's Lens Studio, interacting with high-level APIs.
This software approach offers advantages similar to Apple's visionOS Shared Space, including fast app launches, interaction consistency, and multi-user experiences.
Snap OS 2.0, released late last year, added and improved first-party apps like Browser and Gallery, moving the platform closer to consumer readiness.
The $2500 price point targets wealthy early adopters, similar to Apple Vision Pro.
Snap's competitors, Meta and Apple, are not expected to release their true AR glasses until late 2027 and 2028, respectively.
Snap recently spun its AR hardware division into a dedicated subsidiary, Specs Inc.

Decoder

True AR glasses: Augmented Reality glasses that display virtual objects seamlessly integrated into the user's physical surroundings, without significantly obscuring or distorting the real-world view.
APK: Android Package Kit, the package file format used by the Android operating system for distribution and installation of mobile apps and middleware.
Lens Studio: Snap's desktop application for Windows and macOS that allows developers to create augmented reality "Lenses" (apps) for Snapchat and Snap Spectacles.
Lenses: Snap's term for augmented reality experiences or applications developed for its platform.

Original article

The Snap Specs standalone true AR glasses will launch this fall, veteran tech journalist Alex Heath reports, priced around $2500.

The company behind Snapchat officially announced that it would release standalone true AR glasses, called Specs, just under one year ago.

Compared to the bulky and heavy Spectacles standalone AR development kit glasses, which the company rents to developers for $99/month or students for $49/month, Snap CEO Evan Spiegel claimed the consumer Specs will have "a much smaller form factor, at a fraction of the weight, with a ton more capability", while running the same Snap OS operating system and supporting all the same apps developed so far.

Snap OS is relatively unique. While on an underlying level it's Android-based, you can't install APKs on it, and thus developers can't run native code or use third-party engines like Unity. Instead, they build sandboxed "Lenses", the company's name for apps, using the Lens Studio software for Windows and macOS. In Lens Studio, developers use JavaScript or TypeScript to interact with high-level APIs, while the operating system itself handles the low-level core tech like rendering and core interactions. This has many of the same advantages as the Shared Space of Apple's visionOS: near-instant app launches, interaction consistency, and easy implementation of shared multi-user experiences without friction. It even allows the Spectacles mobile app to be used as a spectator view for almost any Lens. Snap OS doesn't support multitasking, but this is more likely a limitation of the current hardware than the operating system itself.

Since releasing Snap OS in the latest Spectacles kit in late 2024, Snap has repeatedly added new capabilities for developers building Lenses, and late last year launched Snap OS 2.0, adding and improving first-party apps like Browser, Gallery, and Spotlight to bring the AR platform closer to being ready for consumers.

In April, Alex Heath released a report via his Sources newsletter wherein he claimed that Snap will preview its new Specs glasses in the next couple of months, followed by a consumer release in the fall.

In an October edition of Sources, Heath said that Snap was targeting a price of around $2500 for Specs, and a production run of around 100,000.

That price puts it squarely in the realm of relatively wealthy early adopters, like Apple Vision Pro. But, assuming it isn't beaten to market by something we're not aware of, Specs will be the first standalone true AR glasses (meaning relatively normal-looking glasses that can place interfaces and virtual objects into your physical space, without significantly dimming or distorting your view of the real world) from a major tech company.

Meta's $800 glasses are considerably more affordable yes, but also vastly less capable, showing only a small fixed heads-up display (HUD) in one eye, while Snap is targeting a relatively wide field of view binocular display system with head tracking, hand tracking, and realtime environment meshing.

Multiple reports suggest Meta plans to ship its own true AR glasses in late 2027, and Bloomberg's Mark Gurman has reported that Apple won't launch AR glasses until 2028 at the earliest. Meanwhile, there are some obscure Chinese products that technically qualify as true AR glasses, but they're bulky, their onboard compute is significantly limited, and their software is not particularly fleshed out.

The news of Snap's plan to launch this fall comes a few months after it spun its AR hardware ambitions into a dedicated subsidiary, Specs Inc.

We'll keep a close eye on Snap in the coming months for any sign of a proper reveal of the design and specifications of Specs, a product that could be a milestone moment for consumer AR.

I'm actively writing on UploadVR again, and this article is one in a series of "catch up" pieces where I report on some of the interesting things that have been happening in the industry in recent months. And yes, VR Download is coming back very soon!

DEVOURED

The Eternal Sloptember

Tech aiagentssoftware-development George Hotz's Blog

George Hotz warns that AI agents in software development will be "one of the most costly mistakes" in history, leading to an "Eternal Sloptember" of low-quality code.

What: George Hotz (geohot) argues that AI agents are statistical models that mimic programming, producing increasingly hard-to-detect broken code. He details his six-month struggle using agents for projects like tinygrad and USB-PCIe chip reverse engineering, concluding they fail to deliver polished results despite initial speed. He specifically mentions Apple pushing AI on its engineers.

Why it matters: This piece offers a significant counter-narrative to the prevailing optimism around AI agents in software development, suggesting that their increasing adoption by large organizations with slower feedback loops could degrade overall software quality, creating an "Eternal Sloptember" rather than an era of enhanced productivity. It highlights a critical debate about the true utility and potential pitfalls of current AI models for complex engineering tasks.

Deep dive

George Hotz asserts that AI agents' adoption in software development will be one of the most costly mistakes in history, coining the term "Eternal Sloptember."
He argues agents cannot truly program; they are statistical models mimicking programming output, producing "slop" that is increasingly hard to detect.
After six months of trying to use agents for projects like tinygrad and reverse-engineering a USB-PCIe chip, Hotz found manual methods faster and better.
While acknowledging AI's utility as a "better Google" and for quick, unpolished prototypes, he states agents are not close to the bar for a software engineer.
Hotz believes that large organizations, with slower feedback loops and less alignment, will be hurt most by agents because lower performers will produce 10x output that is low quality.
He predicts agents will lead to more code and features but a "golden era for buckets and buckets of slop, and a dark age for gems of quality."
He mentions hearing that Apple is pushing AI on all its engineers, questioning if macOS quality will improve or worsen in the next two years.
Hotz aligns with Yann LeCun and Gary Marcus, believing current LLM models without world models or genuine understanding cannot program effectively.

Original article

I’m calling it now, the adoption of AI agents into software development will be one of the most costly mistakes in the field’s history. Agents cannot program, and it’s taking longer and longer to realize that they can’t. They are a highly sophisticated statistical model designed to mimic the distribution of programming. The output is broken, but in a way that’s getting harder and harder to detect. Which is exactly what you’d expect from an increasingly accurate statistical model.

At first, I rejected this. I bought into the Twitter explanation of status anxiety. I define some of my self worth by my programming abilities, so wouldn’t it make sense to get defensive around that loss? Deny the models can code for as long as I could to preserve my ego?

I mean, it’s very clear they can solve math problems I couldn’t hope to solve if I devoted my life to it. So why can’t they program? Maybe I’m just not good enough of a programmer to recognize their genius.

I really tried for the last 6 months. I wrote some parts of tinygrad with agents. I reversed a USB <-> PCIe chip with agents. But each time I suspected I could have done it better and faster manually. The agent frontloads all the progress, then gives you a slot machine lever to pull to hope it gets the polish done. It never quite gets there.

And in before, “you are using it wrong.” I have tried all the different models, different harnesses, different prompts. It’s not this. The people who say this would probably say the same thing about slot machines, you see, you have to bet 5 lines after you get a cherry no wonder you aren’t winning!

I’m not saying that AI isn’t useful, it clearly is. It’s definitely a better Google for most searches. And whenever you need a quick prototype and don’t care about polish, it is absurdly fast. But is it a software engineer? Not close to the bar at any company I have worked at. The key aspect is knowing when to use it and when not to.

I thought more about the self worth preservation thing. AFL found more bugs than LLMs and nobody felt that way about it. Chess and Go are more popular than ever. I cannot fucking wait until I have armies of robot associates I can trust to clean up my code! I don’t fear loss of status, I almost think this is some kind of psyop to sell agents. Fear of loss is one of the only ways to make big companies move. Though I think in that fear they are making a big mistake.

Agents will end up hurting large organizations more than high performing individuals or small orgs. I’ve watched how my friends and coworkers have adopted these tools over the last 6 months. A trait you find in all high performing people is the ability to error correct, and they have mostly been good at seeing when slop is slop. It takes a bit to explore/exploit and tune the outer loops around when to use them, when to trust them, how to use them, etc…but I haven’t seen anyone of them move to a model where they don’t carefully read and understand each line, except in some confined domains.

Contrast this with a large organization. Much slower feedback loops, much less alignment. The bottom performers won’t have that self check. They are the ones producing 10x output with the agents. What do you think is happening to the average output of that organization? What is happening to the average output of the world?

Agents will end up producing more code, more apps, and more features than ever before. It is a golden era for buckets and buckets of slop, and a dark age for gems of quality.

I hear that Apple is pushing AI on all their engineers. When people think in the abstract, they think AI will do all this stuff, but let’s focus on a concrete example. Do you think macOS will get better or worse in the next 2 years?

When people see an artifact, they make assumptions about the process that was used to create it. Without even thinking about it, they assume the creator had a basically human state of mind. This assumption is no longer true. Things can be broken in ways that weren’t previously possible, and old proxies of underlying quality like syntax and grammar are useless. AI produced artifacts are not produced by the same process as human ones, and this difference, while extremely subtle in statistics, makes itself obvious when you try to interact with and build on the artifact in human ways.

Without fully endorsing all their ideas, I’m now in the LeCun/Marcus camp on LLMs. I don’t think models like this will ever be able to program, I think the process matters. I think that deep learning is still the solution, but real programming agents will need world models, not some RLVR shit that comments out the failing test and tells you all the tests are now passing.

The real story of this era will be who manages to avoid harming themselves in their AI psychosis.

DEVOURED

Introducing Pulumi Do: Direct Resource Operations for Any Cloud

DevOps infrastructurecloudcli Pulumi

Pulumi introduced `pulumi do`, a new CLI command enabling direct, one-off cloud resource operations across thousands of providers without project setup or code.

What: Pulumi CLI v3.242.0 introduces `pulumi do`, allowing users to create, read, update, delete, and query cloud resources via a single command, like `pulumi do aws:s3:Bucket create`. It's designed for quick ad-hoc tasks by humans and AI agents, with future plans for credential management via Pulumi ESC and a clear upgrade path to full infrastructure-as-code.

Why it matters: This tool simplifies ad-hoc cloud operations, removing the friction of full IaC for simple tasks, and specifically positions Pulumi for the "Agentic Infrastructure Era" by streamlining provisioning for AI agents.

Takeaway: Install Pulumi CLI v3.242.0 or later to experiment with `pulumi do` for quick cloud resource interactions.

Deep dive

pulumi do allows direct CRUD operations and queries on cloud resources using a single CLI command, such as pulumi do aws:s3:Bucket create.
It removes the need for project setup, code, or state tracking for quick, one-off tasks.
The tool supports thousands of Pulumi-supported providers, maintaining a consistent command structure ( ).
Output is predictably JSON on stdout, making it suitable for programmatic parsing by AI agents.
Pulumi highlights its use case for AI agents, enabling them to provision infrastructure without human intervention when combined with "Agent accounts" and Pulumi ESC for credential management.
Future roadmap includes unified credential management with Pulumi ESC, cross-resource references to handle dependencies, and a stateful mode with a "graduation path" to full Pulumi IaC projects.
This feature is available as a research preview in Pulumi CLI v3.242.0 and later.

Decoder

Infrastructure as Code (IaC): Managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.
Pulumi ESC: Pulumi's solution for Environmental Secrets and Configuration, used for managing credentials and configuration across providers.
Agentic Infrastructure Era: A concept where AI agents autonomously provision and manage infrastructure.

Original article

Introducing pulumi do: Direct Resource Operations for Any Cloud

Fraser Waters Pat Gavlin Arun Loganathan Christian NunciatoPosted on May 22nd, 2026

Infrastructure as code is the right model for production systems. State tracking, drift detection, and repeatable deployments all matter when you’re managing real workloads.

But sometimes, you also need a quick, one-off interaction with the cloud: create a bucket or a database, look up a VPC, delete a stray resource.

Today we’re introducing pulumi do, a new command for direct resource operations. With pulumi do, you can create, read, update, delete, and query any cloud resource from the terminal with a single command, across thousands of Pulumi-supported providers — no project, code, or state required.

The problem: Sometimes IaC is more than you need

When you’re managing production workloads, IaC is the proven solution. Code lets you declare complex systems, state tracking catches drift before it becomes a problem, dependency graphs sequence changes safely, and policy keeps everything in bounds. That full lifecycle, especially with the backing of a platform like Pulumi Cloud, is exactly what you want to build systems that scale.

But when you (or your coding agent) need an ad-hoc Postgres database, the simplest path with IaC still takes several steps: make a directory, create a project, configure your credentials, write the code, preview, deploy. It works, but it’s not always necessary for what should be a simple operation. pulumi do collapses all of those steps into one, using the same Pulumi providers, resource model, and ecosystem that powers the core Pulumi platform.

Resource creation is also only part of the problem. As Joe laid out in The Agentic Infrastructure Era, the real challenge for AI agents isn’t with code or CLI commands, it’s with everything else: getting a cloud account, resolving credentials, wiring configuration across multiple services. Agent accounts, also released this week, simplify this by letting an agent provision its own ephemeral Pulumi Cloud account, and Pulumi ESC takes care of consolidating credentials across providers. Together, with pulumi do, agents can now go from zero to deployed infrastructure without requiring a human in the loop — and when that one-off resource needs to grow into a more permanent system, there’s a clear graduation path back to full Pulumi IaC.

What it looks like

As an example, say you wanted to provision an S3 bucket. With the AWS CLI, you’d need to assemble an aws s3api create-bucket invocation with the right set of command-line flags, region constraints, a globally unique name, and so on. With pulumi do, it’s just this:

$ pulumi do aws:s3:Bucket create

That might not look all that different on the surface — but because you’re using the Pulumi engine and resource model, you can provide a minimal set of input properties, take advantage of provider-defined defaults, and use Pulumi’s auto-naming feature to give the bucket a unique name automatically:

$ pulumi do aws:s3:Bucket create

This will create aws:s3/bucket:Bucket with the following inputs:
{
  "bucket": "bucket-279ea56",
  "tagsAll": {}
}

Please confirm that this is what you'd like to do by typing `yes`:

Answer yes (or just pass --yes), and you’re done. To delete the bucket:

$ pulumi do aws:s3:Bucket delete bucket-279ea56 --yes

Need to look up an existing resource? Use a provider function:

$ pulumi do aws:ec2:getVpc --default

{
  "arn": "arn:aws:ec2:us-west-2:663782525873:vpc/vpc-d7b311af",
  "cidrBlock": "172.31.0.0/16",
  "enableDnsHostnames": true,
  "enableDnsSupport": true,
  "enableNetworkAddressUsageMetrics": false,
  "id": "vpc-d7b311af",
  ...
}

Same CLI, same output contract, same provider ecosystem.

The command shape

The do command accepts a Pulumi resource type, or type token, to determine the action to take. Type tokens have the form <package:module:resource>. For example, aws:s3:Bucket refers to the Amazon S3 Bucket resource that belongs to the s3 module of the aws package.

You can also provide a portion of the token to help you find what you’re looking for without ever having to leave the terminal:

$ pulumi do aws:s3

Functions and resources for the s3 module.

Run 'pulumi do <module/resource/function> --help' for more details on usage.

Functions:
  aws:s3:getAccessPoint
  aws:s3:getAccountPublicAccessBlock
  aws:s3:getBucket
  aws:s3:getBucketObject
  ...

Resources:
  aws:s3:AccessPoint
  aws:s3:AccountPublicAccessBlock
  aws:s3:AnalyticsConfiguration
  aws:s3:Bucket
  ...

$ pulumi do aws:s3:Bucket read bucket-d20976f

{
  "arn": "arn:aws:s3:::bucket-d20976f",
  "bucket": "bucket-d20976f",
  "bucketDomainName": "bucket-d20976f.s3.amazonaws.com",
  "bucketNamespace": "global",
  ...
}

The package, module, and resource/function segments all come directly from the Pulumi provider schema, so --help works at every level of the tree. Pass a package name, optional module, and optional function or resource type, and do returns the appropriate level of detail.

You can also provide the input properties of a resource in a YAML or JSON file with the --input option. To create a container service in Google Cloud Run for example:

# service.yaml
location: us-central1
deletionProtection: false
template:
  containers:
    - image: us-docker.pkg.dev/cloudrun/container/hello

$ pulumi do gcp:cloudrunv2:Service create \
    --input yaml \
    --input-file service.yaml

This will create gcp:cloudrunv2/service:Service with the following inputs:
{
  "deletionProtection": false,
  "location": "us-central1",
  "name": "service-b8af752",
  "template": {
    "containers": [
      {
        "image": "us-docker.pkg.dev/cloudrun/container/hello"
      }
    ]
  }
}

The result:

{
  "createTime": "2026-05-22T23:00:22.415839Z",
  ...
  "urls": [
    "https://service-b8af752-921927215178.us-central1.run.app",
    "https://service-b8af752-ctnulmzwoa-uc.a.run.app"
  ]
}

Resource operations

Most resources support the full set of CRUD operations — create, read, update, delete, and list — directly from the CLI. Each operation maps to a provider CRUD method using the same provider logic a full Pulumi program would use, and resources are addressable by their cloud provider IDs:

# Create a resource
$ pulumi do aws:s3:Bucket create --yes | jq -r ".name"
bucket-4f5cb22

# Fetch it
$ pulumi do aws:s3:Bucket read bucket-4f5cb22 | jq -r ".hostedZoneId"
Z3BJ6K6RIION7M

# Update/patch it
$ pulumi do aws:s3:Bucket patch bucket-4f5cb22 --input yaml --input-file tags.yaml

$ pulumi do aws:s3:Bucket read bucket-4f5cb22 | jq ".tags"
{
  "key": "value"
}

# Delete it
$ pulumi do aws:s3:Bucket delete bucket-4f5cb22

Provider configuration

Today, pulumi do resolves provider configuration — for example, applying your AWS credentials — using environment variables or credential files as supported by each individual Pulumi provider. See the Pulumi Registry for provider-specific configuration details.

Designed for humans and agents

We’ve designed pulumi do to serve humans and coding agents equally well, guided by three fundamental ideas:

Consistent command structure across every provider. The do <package:module:type> <operation> pattern is the same for AWS, Azure, Google Cloud, Kubernetes, Cloudflare, Datadog, and every provider, including packages containing higher-level component resources. Once an agent learns that pattern, it applies across the board.
Predictable output contract. JSON on stdout, progress on stderr, consistent exit codes. An agent can parse the result programmatically without scraping human-formatted tables.
A single CLI command that works across every cloud. Many cloud and SaaS providers don’t have a full CLI at all. pulumi do generates commands from the provider schema, so if a Pulumi provider exists for it, the CLI just works. Neither humans nor agents need to install, learn, or even know about cloud provider-specific tooling.

What’s next

Resource operations and provider functions are the foundation. The pulumi do roadmap extends the same direct-operation model with credential management, state tracking, and a path to full IaC.

Unified credentials with Pulumi ESC

One of the hardest parts of multi-cloud operations is credential management. Every provider has its own authentication scheme, environment variables, and session lifecycle. An agent working across AWS, Cloudflare, and Datadog today manages three separate credential mechanisms.

We’re building Pulumi ESC integration into pulumi do so you can manage credentials in one place and resolve them everywhere. ESC handles credential resolution (including OIDC-based dynamic credential generation and short-lived tokens) across all of your providers. Name the credential set, reference it, and ESC does the rest, with rotation, RBAC, and audit built in.

Cross-resource references

Real infrastructure has dependencies — subnets need VPCs, security group rules need their security groups, and so on. When you’re building resources one at a time, those references need to flow between commands somehow.

A future version of pulumi do will let resource inputs reference outputs from previously created resources, allowing the CLI to resolve them automatically and preserve the dependency graph. Later, when the time comes to graduate to a full IaC program, the generated code contains proper resource references rather than hard-coded strings.

Stateful mode and the graduation path

Today, pulumi do is stateless. Each command runs independently. A planned stateful mode will persist resource state across operations, enabling drift detection, lifecycle management, and a graduation path to full infrastructure as code.

Here’s what we’re planning:

Zero setup. Your first pulumi do implicitly creates a project and stack. No manual initialization.
Accumulate resources. Each operation stores resource state. After a few commands, you have a lightweight representation of your infrastructure.
Eject to a full project. When the time comes, generate a Pulumi project in your chosen language with all resources imported and dependency graphs intact.
Connect to Pulumi Cloud. Layer on governance, compliance, team collaboration, and deployment automation through Pulumi Cloud. Resources created via pulumi do can be governed by Pulumi Insights from day one, even before you opt into full IaC.

This path works because pulumi do uses the same providers, resource types, and property schemas as every other pulumi operation. Provisioned cloud resources stay where they are as management capabilities are added as needed.

Get started

pulumi do ships as a research preview in Pulumi CLI v3.242.0 and later. Install or update the CLI, install a provider plugin, and start running commands. The documentation has the full reference.

We can’t wait to hear your feedback. Give it a try today, tell us what works (and what doesn’t), and help shape the CLI that agents and humans both reach for first.

features
pulumi-cli
ai-agents
product-launches

Subscribe to the Pulumi Monthly Newsletter

DEVOURED

Request-Based Autoscaling Is Now Generally Available on App Platform

DevOps cloudperformancedigitalocean DigitalOcean

DigitalOcean's App Platform now offers generally available request-based autoscaling, allowing applications to scale based on real-time HTTP traffic like requests per second and P95 latency across all CPU plans.

What: DigitalOcean announced general availability of request-based autoscaling for its App Platform, enabling applications to scale horizontally using real-time HTTP traffic metrics like "requests per second per instance" and "P95 request latency." This feature is now available on both shared and dedicated CPU instances, whereas previously autoscaling required a dedicated CPU plan.

Why it matters: Moving beyond CPU-based autoscaling, which is a lagging indicator, request-based scaling directly addresses user experience metrics, allowing infrastructure to react faster to sudden traffic changes and provide more consistent performance.

Takeaway: If you use DigitalOcean App Platform, consider updating your autoscaling rules to use request-based metrics for more responsive and cost-effective scaling.

Deep dive

DigitalOcean's App Platform now supports request-based autoscaling as a generally available feature, announced on May 22, 2026.
This allows applications to scale based on immediate HTTP traffic signals, specifically "requests per second per instance" and "P95 request latency."
Unlike CPU-based autoscaling, which is reactive, request-based scaling acts on leading indicators, improving responsiveness to traffic spikes.
The feature is now available for both shared and dedicated CPU instances, removing the previous restriction that required a dedicated CPU plan for autoscaling.
Users can combine request-based and CPU-based metrics on dedicated plans, with scaling up occurring if any threshold is crossed and scaling down only when all metrics are back in range.
Configuration can be done via the App Platform console's "Settings" tab or by adding an autoscaling block to the app spec using doctl apps update or the Apps API.
Autoscaling decisions are based on a 5-minute rate window to react to sustained load rather than brief, momentary spikes.
This feature applies to web service components receiving external HTTP traffic; worker and function components are not eligible, nor can it be used alongside "Scale to Zero (Inactivity Sleep)".

Decoder

P95 response latency: The 95th percentile response latency, meaning 95% of requests are served within this time or faster.
CPU-based autoscaling: Automatically adjusting the number of instances based on the CPU utilization of the running application.

Original article

Request-Based Autoscaling Is Now Generally Available on App Platform

By Bikram Gupta and Greeshma Pillai

Today, we’re excited to announce that request-based autoscaling on DigitalOcean App Platform is now generally available. Your apps can now automatically scale based on live HTTP traffic signals (requests per second and P95 response latency) so your infrastructure reacts to what’s actually happening, not what happened minutes ago.

Now Available for Shared and Dedicated CPU Instances

Until now, autoscaling on App Platform required a dedicated CPU plan. That meant a good portion of App Platform users (anyone running on shared CPU instances) had no path to automatic horizontal scaling at all.

That changes today. Request-based autoscaling works on both shared and dedicated CPU instances. Whether you’re running an early-stage project on a shared plan or a high-throughput production service on dedicated resources, you can now configure autoscaling to match your traffic—no plan upgrade required.

Faster, More Responsive Scaling

CPU-based autoscaling is reactive by nature. CPU is a lagging indicator: your containers have to be visibly struggling before the scaler knows there’s a problem, and by then, your users are already waiting.

Request-based autoscaling acts on the signals that actually reflect user experience:

Requests per second per instance: how many requests each container is handling right now
P95 request latency: the response time that 95% of your users are seeing

When traffic rises and either threshold is exceeded, new containers spin up immediately. When load drops and all metrics fall back below their targets, the scaler brings containers back down. You get the capacity headroom you need, faster, and pay only for what you use.

You can also combine request-based and CPU-based metrics on dedicated plans. The autoscaler scales up when any configured threshold is crossed, and scales down only when all metrics are back in range.

Know Your Baseline Before You Set Thresholds

Configuring good autoscaling thresholds starts with understanding your normal traffic patterns. The Insights tab in the App Platform console gives you exactly that. image alt text

The Insights tab shows you HTTP Ingress Request Rate (requests per second) and HTTP Ingress Request Duration P95 (your 95th-percentile latency) over time. Use this to understand how your service behaves under normal load before dialing in your autoscaling rules.

How to Configure Request-Based Autoscaling

Using the Control Panel

Go to the Apps page, select your app, open the Settings tab, and select your web service component. In the Resource Size section, click Edit.

Select the Shared CPU or Dedicated CPU tab. Under Scaling, toggle Autoscale on. Set your Minimum Containers and Maximum Containers, then configure at least one autoscaling rule:

image alt text

Scale on number of requests per second set a target RPS per instance
Scale on response time and speed (P95) set a target P95 latency in milliseconds
Scale on CPU usage threshold available on dedicated CPU plans

Click Save. A redeployment kicks off automatically and your app starts autoscaling.

Using the App Spec

Add an autoscaling block to your service component in your app spec. The example below scales between 1 and 10 containers, targeting 100 requests per second per instance and a P95 latency of 500 ms:


name: my-app

services:

- name: web

  github:

    repo: your-org/your-repo

    branch: main

  autoscaling:

    min_instance_count: 1

    max_instance_count: 10

    metrics:

      requests_per_second:

        per_instance: 100

      request_duration:

        p95_milliseconds: 500

Submit your updated spec via doctl apps update or the Apps API. You can tune these values at any time—if your service is scaling earlier than you’d like, raise the target; if you’re seeing latency before new containers arrive, lower it.

A few things to keep in mind:

Request-based autoscaling applies to web service components that receive external HTTP traffic. Worker and function components are not eligible.
It cannot be used alongside Scale to Zero (Inactivity Sleep) on the same service.
Scaling decisions are based on a 5-minute rate window, so the autoscaler responds to sustained load rather than momentary spikes.

Get Started With Request-Based Autoscaling

Your traffic doesn’t follow a schedule. Your scaling shouldn’t either. Request-based autoscaling is available now on every DigitalOcean account. Head to the Insights tab to understand your traffic patterns, then configure your autoscaling rules directly in the console or via the app spec.

Read the documentation to get started

About the author(s)

Bikram Gupta

Greeshma Pillai

Product Updates

Start building today

From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications. Sign up

Centrally enrich logs with data stored in Reference Tables

Logs provide indicators of what is happening in your system, but many lack critical context that helps you answer who owns what or which Indicators of Compromise (IOCs) might be present. Moreover, key data sources like threat intelligence feeds and configuration management databases (CMDBs) receive regular updates that need to be accounted for in production data. When enriching with locally stored datasets, teams have to spend critical time manually updating CSV files and coordinating update jobs across environments. And when enrichment happens after ingestion, teams have to redo the same lookups for each downstream tool that they manage, adding latency and creating inconsistencies.

The Enrichment Table processor in Observability Pipelines helps solve these problems by enabling you to enrich logs with data stored in SaaS-hosted Reference Tables. Reference Tables stay up to date automatically with your integrations, reducing engineering effort since you don’t have to manually update datasets.

Reference Tables support several common enrichment sources that teams already use for operational and security context:

Snowflake stores threat intelligence feeds, user profiles, compliance mappings, and business intelligence data that teams can join with authentication logs, access logs, or detection signals.
ServiceNow CMDB provides asset and service metadata that teams can use to enrich logs to accelerate investigations and route issues to the right responders.
Salesforce provides customer and billing metadata such as account owners, contract tiers, and account segmentation that helps teams prioritize customer-impacting issues.
Databricks offers model outputs such as anomaly scores and fraud likelihood values that teams can attach to transaction and authentication events.
Cloud storage sources (including Amazon S3, Microsoft Azure Blob Storage, and Google Cloud Storage) hold CSV reference data such as allowlists, denylists, asset inventories, and IP reputation lists that update on a schedule.

For example, you can enrich logs with threat intelligence from feeds like AlienVault that are stored in Snowflake. The following screenshot shows the Enrichment Table processor configured to enrich logs from the alienvault_threat_intel table that dynamically updates from Snowflake.

Once the processor is configured, logs containing values that match keys found in ip_address are enhanced with information from the table.

After Observability Pipelines enriches events in your infrastructure, you can route enriched logs to the SIEM or data lake of your choice, including Microsoft Sentinel, CrowdStrike, and Datadog BYOC (Bring Your Own Cloud) Logs.

Apply fresh context to data during threat investigations

Investigating security threats often requires you to revisit historical data for a specific user, device, or process. An IP reputation list can change after an incident begins, or a new fraud model can assign new risk scores to historical transactions. Security investigations become more difficult when older logs lack newer context, especially when this data is stored in cloud storage archives and separated from your logging or SIEM solution because of cost controls or retention strategy.

Using Observability Pipelines, you can rehydrate and extract archived data before applying processing and routing rules. You can pull historical information from your storage buckets and enrich it with current context stored in Reference Tables, and then route normalized data to your SIEM. This workflow helps you apply updated context without rebuilding custom joins in every downstream system.

For example, let’s say that you’re a security engineer investigating a Tier 0 threat by using Microsoft Sentinel and Azure Blob Storage. You can rehydrate archived authentication logs from Azure Blob Storage, enrich the logs with an up-to-date asset list from Snowflake, and route the enriched output into Microsoft Sentinel for correlation with current detections. The enriched logs can highlight connections that were not visible at original ingest time, especially when threat intelligence and scoring datasets changed after the fact.

Process and conditionally route enriched data to your downstream logging tool, SIEM, or data lake

Application logs often lack context that teams need to help them prioritize and make smart routing decisions. Without that context, routing rules tend to rely on static heuristics that have limited business meaning. Enrichment becomes especially valuable when it helps teams keep high-volume, low-risk data in less expensive storage while sending smaller, higher-signal subsets to a SIEM or analytics platform.

With log enrichment in Observability Pipelines, teams can make routing and volume control decisions by using attributes derived from external sources. A pipeline can enrich an event with a threat classification, a customer tier, an ownership team, or an environment label, and then use that information to route data to a destination that matches operational goals. The following diagram shows how Observability Pipelines enriches logs with Reference Table data on-stream:

The numbered steps in the diagram map to the following workflow:

The Enrichment Table processor looks up the value of the key field in the local cache (e.g., ip_address:192.0.2.1).
If a matching entry is found in the cache (e.g., the IP address matches a row in the table that is cached) or if the log does not have a valid key field, the log is immediately enriched and sent downstream.
A. If the value is not found in the cache, the log is buffered in memory.
B. The value is also added to the client queue to be checked against the Datadog Reference Tables API.
The client is triggered every second or when the queue reaches a certain length, and it fetches all pending keys from the Datadog Reference Tables API.
On a successful API response, the entries are stored in the cache and the corresponding logs are pulled out of the buffer, enriched, and sent downstream.

Consider a security pipeline that processes endpoint or network telemetry data. The pipeline can enrich events by using a threat intelligence feed stored in Snowflake and add an attribute that indicates whether an IP address or indicator appears on a benign list, suspicious list, or malicious list. Routing rules can then send benign high-volume activity to Amazon S3 while forwarding suspicious and malicious activity to a SIEM such as Microsoft Sentinel, CrowdStrike, or Datadog Cloud SIEM for faster investigation. This approach reduces noise in expensive downstream tools while keeping richer context attached to high-priority events.

Start enriching your logs with Observability Pipelines

Centralized log enrichment with Reference Tables in Observability Pipelines brings dynamic, managed lookups into log processing that runs inside your infrastructure. You can enrich logs, apply fresh context to accelerate investigations, and use the enriched attributes to guide routing and volume control across destinations such as SIEM tools and cloud storage. To learn more, check out the Observability Pipelines documentation and the Reference Tables documentation.

If you don’t already have a Datadog account, you can sign up for a 14-day free trial to get started enriching your logs.

Related jobs at Datadog

We're always looking for talented people to collaborate with

Featured positions

We have positions

View all

Start monitoring your metrics in minutes

find out how

DEVOURED

Mitigate credential exposure in Windows environments with Boundary and Vault

DevOps securitywindowsidentity HashiCorp

HashiCorp's Boundary and Vault can secure Windows remote access by replacing static RDP credentials with dynamic, short-lived Active Directory credentials and identity-based access.

What: HashiCorp Boundary and Vault together offer a solution to mitigate credential exposure in Windows environments, specifically for RDP, by generating dynamic Active Directory credentials and injecting them, moving away from static credentials and broad VPN access. They provide a Terraform-based AWS proof-of-concept.

Why it matters: This addresses a common security challenge in Windows environments, shifting towards a more secure, identity-driven access model that reduces the attack surface from compromised long-lived credentials, aligning with zero-trust principles.

Takeaway: If managing Windows RDP access, investigate integrating HashiCorp Boundary and Vault for dynamic credential management and identity-based access control.

Deep dive

Boundary provides identity-based remote access for RDP.* Vault generates dynamic, short-lived Active Directory credentials.* The solution replaces static credentials and broad VPN access for Windows remote sessions.* Credentials are injected directly, reducing exposure.* A Terraform-based AWS proof-of-concept is available for implementation guidance.

Decoder

RDP (Remote Desktop Protocol): A proprietary protocol developed by Microsoft that allows a user to graphically connect to another computer over a network connection.* Active Directory (AD): Microsoft's directory service that stores information about network objects (like users, groups, and computers) and makes this information available to users and network administrators.

Original article

Organizations face Windows remote access risks from static credentials and broad VPN based network access. Boundary and Vault provide identity based RDP with short lived dynamic AD credentials and credential injection, plus a Terraform based AWS proof of concept setup.

DEVOURED

Deploying to Multiple Azure Subscriptions with Terraform Provider Aliases

DevOps cloudterraformazure Techielass

Sarah Lean demonstrates how to deploy resources across multiple Azure subscriptions from a single Terraform project using provider aliases and a unified state file.

What: Terraform's provider aliases feature allows defining multiple instances of the `azurerm` provider, each configured for a different Azure subscription ID (e.g., `sub1`, `sub2`, `sub3`), enabling a single Terraform project to manage resources across development, staging, and production environments. Resources are explicitly linked to an aliased provider using the `provider` meta-argument.

Why it matters: This approach simplifies Infrastructure-as-Code management for organizations with multi-subscription Azure environments, reducing the overhead of maintaining separate Terraform projects and state files for each subscription, which improves consistency and reduces operational complexity.

Takeaway: If managing multiple Azure subscriptions with Terraform, consider adopting provider aliases to consolidate your infrastructure code into a single project, simplifying deployment and state management.

Deep dive

By default, Terraform's azurerm provider targets a single Azure subscription.* Organizations often use separate subscriptions for different environments like dev, staging, prod.* Without aliases, this typically means separate Terraform projects and state files per subscription.* Terraform provider aliases allow declaring multiple instances of the same provider within one project.* Each instance is configured with a unique alias and a specific subscription_id.* The article provides a step-by-step guide with code examples for variables.tf, providers.tf, and main.tf.* Resources are "pinned" to a specific provider instance using the provider = azurerm.alias_name meta-argument.* This enables a single terraform plan and terraform apply to manage resources across all aliased subscriptions.* The approach improves consistency, reduces configuration drift, and centralizes state management.

Decoder

Terraform provider aliases: A Terraform feature allowing multiple configurations of the same provider within a single project, differentiated by an alias argument, enabling management of resources across different environments or accounts of the same cloud provider.* azurerm provider: The official Terraform provider for managing resources in Microsoft Azure.* provider meta-argument: A Terraform argument used within a resource block to explicitly specify which provider configuration (including aliased ones) should be used for that resource.* Azure subscription: A logical container for Azure services, managed by an Azure account, often used to separate billing, environments, or organizational units.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Accelerating LLM Inference with Prompt Caching for Open‑Source Models on Databricks

DevOps aillminfrastructureperformance Databricks

Databricks now offers automatic prompt caching for open-source LLMs like Llama and Mistral hosted on its platform, significantly boosting inference performance without configuration.

What: Databricks has extended its automatic prompt caching feature to open-source LLMs, including GPT-OSS, Llama 3.1, Gemma 3, Mistral, and DBRX models, on its Foundation Model APIs. This capability reuses identical prompt prefixes to reduce redundant processing, resulting in a 2.5x increase in throughput and a 3x reduction in P50 latency for GPT-OSS in production, and requires no customer setup.

Why it matters: This move makes serving open-source LLMs on Databricks more competitive and efficient, directly addressing the common issue of repetitive prompt processing that wastes compute resources and increases costs, especially for applications using long system prompts or frequent identical requests.

Takeaway: If you're running open-source LLMs on Databricks, expect automatic performance improvements in inference speed and cost-efficiency due to the new prompt caching feature.

Deep dive

Databricks has rolled out automatic prompt caching for open-source LLMs hosted on its Foundation Model APIs (FMAPIs).* This feature applies to batch inference, pay-per-token, and provisioned-throughput workloads.* Supported open-source models include GPT-OSS (20B, 120B), Gemma 3 12B, Fine-tuned Llama 3.1 8B, Llama 3.1 8B, and 3.3 70B.* Prompt caching works by reusing repeated prompt prefixes, skipping the "prefill" stage of LLM inference.* This significantly reduces latency and increases throughput.* In real-world production testing on GPT-OSS, it led to a 2.5x increase in per-replica input-token throughput and a 3x reduction in P50 latency, with a 30% cache hit ratio.* The caching is entirely automatic; customers do not need to configure anything.* Prompt caches are isolated, reside only in volatile memory, and are never persisted, ensuring data security.* Databricks previously offered this feature for proprietary models (GPT, Gemini, Claude).

Decoder

LLM (Large Language Model): A type of artificial intelligence model trained on vast amounts of text data to understand, generate, and process human language.* Inference: The process of using a trained machine learning model to make predictions or generate outputs on new, unseen data.* Prompt caching: A technique in LLM inference where the intermediate computational results (specifically, the key-value cache or KV cache) for repeated prompt prefixes are stored and reused, avoiding re-computation and speeding up subsequent requests with the same prefix.* Prefill stage: The initial phase of LLM inference where the input prompt tokens are processed to generate the first set of internal representations (KV cache) before the model starts generating output tokens one by one.* P50 latency: The 50th percentile latency, meaning 50% of requests are completed within this time or faster. It's a measure of typical performance.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

The Hugo evolution: Engineering Grab's unified, one-click data ingestion platform with Apache Flink

Data devopsinfrastructureflinkkafka Grab Engineering

Grab dramatically reduced data pipeline onboarding from days to minutes by unifying self-service data ingestion on a new Flink-based platform called Hugo, replacing Kafka Connect and Sprinkler.

What: Grab's Hugo platform evolved from siloed data ingestion workflows to a unified, automated Flink-based system for MySQL CDC and Kafka pipelines. This change reduced operational overhead, eliminated schema risks via dynamic validation, and increased new pipeline adoption significantly.

Why it matters: This case study demonstrates how consolidating and automating fragmented data ingestion tools with a stream processing engine like Flink can yield massive productivity gains and democratize data access within a large organization.

Takeaway: Consider adopting a unified streaming data platform like Apache Flink for CDC and Kafka ingestion to reduce operational complexity and accelerate data onboarding.

Decoder

CDC (Change Data Capture): A set of software design patterns used to determine and track changes to data so that actions can be taken based on those changes. In databases, this often means reading transaction logs (binlogs).
Apache Flink: An open-source stream-processing framework for distributed, high-performing, and always-on data applications. It can perform stateful computations over unbounded and bounded data streams.
Kafka Connect: An open-source framework for connecting Kafka with other systems, allowing data to be streamed in and out of Kafka.
Apache Iceberg: An open table format for huge analytic datasets. Iceberg adds SQL table capabilities to files in data lakes, like schema evolution, hidden partitioning, and time travel.
Confluent Schema Registry: A service for storing and retrieving Avro, Protobuf, and JSON Schema schemas. It helps ensure data compatibility and evolution in Kafka-based systems.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

From Batch to Streaming and AI, Iceberg for Everyone by Everyone (34 minute video)

Data opensourcestreamingaidatabase YouTube

Apache Iceberg, while successful for batch analytics and now supporting semi-structured data in v3, still requires significant community enhancements like "One File Commits" in v4 to fully support low-latency streaming and AI workloads.

What: Apache Iceberg has advanced from v1 for batch analytics to v3, which added vendor-neutral support for semi-structured data and improved deletes. However, the format needs further development in v4, including features like One File Commits, better column statistics, and columnar metrics, to effectively handle low-latency streaming and AI workloads.

Why it matters: The evolution of open table formats like Iceberg reflects the industry's push to unify batch, streaming, and AI data processing on a single, performant data lake foundation, highlighting the ongoing challenges in achieving universal capabilities.

Takeaway: If you're planning for next-generation data lake architectures with tight streaming or AI integration, keep an eye on Apache Iceberg's v4 roadmap for features like One File Commits.

Decoder

Apache Iceberg: An open table format for huge analytic datasets. It brings reliable, SQL table semantics to data stored in data lakes (e.g., S3, HDFS), offering features like schema evolution, hidden partitioning, and time travel.
One File Commits: A proposed feature for Apache Iceberg aimed at reducing the number of files written during micro-batch or streaming ingestion, improving performance and reducing metadata overhead for low-latency writes.
Columnar metrics: Aggregate statistics stored at a column level within data files (e.g., min/max values, null counts), which can be used by query engines for predicate pushdown and query optimization without reading full data files.

Original article

While Apache Iceberg has seen strong success from batch analytics in v1 to the recent v3 table spec, which added vendor-neutral support for semi-structured data and improved deletes, the format still requires significant enhancements for low-latency streaming and AI workloads. The community is working on V4 to support One File Commits, better column statistics, and columnar metrics, to make Iceberg truly universal.

DEVOURED

DuckDB 1.5.3: Not an Ordinary Patch Release

Data databaseopensourcesql DuckDB

DuckDB's v1.5.3 patch release introduces "Quack" as a core beta extension, enabling client-server database functionality and enhancing Iceberg, AWS, and HTTPS proxy support.

What: DuckDB v1.5.3, a patch release, now ships with Quack as a core extension, allowing DuckDB to operate as a client-server database. It also adds new features to the DuckLake, AWS (supporting IAM Roles for Service Accounts and RDS/Aurora IAM auth), and Iceberg extensions (MERGE INTO, INSERT/UPDATE on partitioned tables, ALTER TABLE, GEOMETRY type). Quack is in beta, with a production-ready version planned for DuckDB v2.0 in fall 2026.

Why it matters: The addition of Quack signifies DuckDB's strategic move beyond an in-process OLAP database to a more versatile client-server model, broadening its use cases for data engineers in distributed and cloud environments.

Takeaway: If you use DuckDB, consider experimenting with the new Quack extension for client-server patterns, but be aware it is still in beta and may have breaking changes before v2.0 in late 2026.

Deep dive

DuckDB v1.5.3 is released, containing significant new features delivered through extensions despite being a patch release.
The "Quack" protocol, introduced on May 12, is now a core beta extension, transforming DuckDB into a client-server database.
Quack enables client applications to connect to a remote DuckDB instance transparently.
DuckLake, DuckDB's data lake client, now supports DuckDB with Quack as its catalog database.
The AWS extension gains support for IAM Roles for Service Accounts (IRSA) via the web_identity chain type and IAM authentication for managed PostgreSQL databases on RDS/Aurora.
The HTTPS extension now respects the HTTP_PROXY environment variable for extension installs and network requests.
The DuckDB-Iceberg extension receives numerous updates, including MERGE INTO, INSERT and UPDATE for partitioned tables, CTAS via ADBC, and ALTER TABLE support.
Internal changes include shipping jemalloc as a statically linked core library on Linux for cleaner packaging and fixing the DISABLE_EXTENSION_LOAD compile-time flag.
Quack is expected to become production-ready with DuckDB v2.0 in fall 2026.

Decoder

Quack: A new remote protocol that turns DuckDB into a client-server database, allowing clients to connect to a remote DuckDB instance.
DuckLake: A data lake client for DuckDB.
IRSA (IAM Roles for Service Accounts): An AWS feature that allows Kubernetes service accounts to assume IAM roles, providing fine-grained permissions to pods.
ADBC (Apache Arrow Database Connectivity): A standard for high-performance data access to databases, based on Apache Arrow.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Introducing Dimster, a performance benchmarking tool for Apache Kafka

Data devopsperformancekubernetesjava Jack Vanlightly

Jack Vanlightly released Dimster, an open-source Kafka benchmarking tool designed for "Dimensional Testing" across various workloads and configurations, with Kubernetes support for standardized deployment.

What: Dimster is a new open-source performance benchmarking tool for Apache Kafka, created by Jack Vanlightly. It supports "Dimensional Testing" to explore the impact of specific configurations or workload aspects. It offers four test modes (Run, Explore, Drain-backlog, Correctness), produces shareable JSON results with logs and interactive charts/Grafana dashboards, and is designed to run on Kubernetes.

Why it matters: This tool addresses the complexity of comprehensive Kafka performance testing by providing structured, reproducible benchmarking that simplifies identifying bottlenecks and understanding performance envelopes across diverse deployment scenarios and configurations.

Takeaway: If you manage Apache Kafka clusters, explore Dimster on GitHub for more structured and reproducible performance testing, especially if you use Kubernetes for deployment.

Deep dive

Dimster is an open-source performance benchmarking tool for Apache Kafka, created by Jack Vanlightly.
It is designed for "Dimensional Testing," allowing users to systematically vary single or co-varying dimensions of configuration or workload to analyze performance impact.
Results are self-contained and shareable, including JSON, CSV, source configs, log files, and interactive charts/Grafana dashboards (as HTML).
Supports four test modes: Run (fixed throughput, live interaction, optional availability), Explore (finds peak sustainable throughput under latency targets), Drain-backlog (times backlog processing), and Correctness (detects data loss, corruption, out-of-order, duplicates).
Provides CLI commands for pre-benchmark resource calculation (resources), comparing runs (compare), and pivoting results (pivot).
Kubernetes is a standardized runtime for Dimster, simplifying deployment and orchestration across various environments (local, EKS, GKE).
Dimster can deploy Kafka clusters to Kubernetes or connect to external Kafka services.
The tool is written in Java and leverages modern JVM features.

Decoder

Dimensional Testing: A benchmarking technique where configurations or workload aspects (dimensions) are systematically varied to observe their impact on performance.
p99 end-to-end latency: The 99th percentile of the total time taken for a message to travel from producer to consumer.
mTLS (mutual TLS): A security protocol where both client and server authenticate each other using TLS certificates.
OpenMessagingBenchmark (OMB): An existing open-source benchmarking framework for messaging systems, which inspired aspects of Dimster.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Bintrail: MySQL Time-Travel Queries Using Indexed Binlogs

Data databasebackendmysqldevops InfoQ

Bintrail introduces time-travel and diff queries to MySQL via ProxySQL and indexed binlogs, enabling point-in-time recovery and audit without schema changes.

What: Daniel Guzman-Burgos's Bintrail project adds AS OF and BETWEEN time-travel queries and row-history lookups to MySQL, a feature traditionally lacking in the database. It works by parsing and indexing MySQL ROW-format binary logs behind ProxySQL, generating reversal SQL for recovery. This allows querying data as it existed at a past timestamp and reviewing row-level changes, independently of MySQL's binlog retention.

Why it matters: Bintrail fills a long-standing gap in MySQL's capabilities, providing native-like temporal querying that is crucial for modern audit, compliance, and disaster recovery needs, especially as automation in database operations increases the demand for precise historical data access.

Takeaway: If you operate MySQL databases and require point-in-time recovery or detailed audit trails, investigate Bintrail as a solution to introduce temporal querying capabilities without modifying your existing MySQL instances or application code.

Deep dive

Bintrail is a new layer developed by Daniel Guzman-Burgos that brings point-in-time queries and row-history lookups to MySQL.
It provides AS OF and BETWEEN time-travel queries to MySQL, a feature available natively in Oracle, SQL Server, MariaDB, and via extensions in PostgreSQL.
The system operates transparently behind ProxySQL, routing historical query patterns (e.g., _flashback, _diff, _snapshot) to its own backend while regular MySQL traffic remains untouched.
Bintrail parses MySQL ROW-format binary logs, indexing every row event with full before/after images.
It generates reversal SQL, allowing point-in-time recovery without needing the original binlog files.
The indexed history store is maintained independently of MySQL's binlog retention, enabling historical queries over longer periods.
It can optionally extend historical queries into archived Parquet data stored on S3.
No ALTER TABLE or special storage engine is required; it works with existing MySQL instances.
Current limitations include support only for literal timestamp queries, primary-key lookups, and capped full-table restores, with joins and complex filtering handled outside the shim layer.
Bintrail is available on GitHub under the BUSL (Business Source License).

Decoder

ProxySQL: A high-performance, high-availability, protocol-aware proxy for MySQL, which can route queries based on rules.
Binlog (Binary Log): A log of all changes to a MySQL database, used for replication and point-in-time recovery. ROW-format means it logs changes at the row level.
AS OF query: A type of temporal query that allows retrieving the state of data as it existed at a specific past timestamp.
BETWEEN query: A type of temporal query that allows retrieving all changes to data within a specified time range.
GTID (Global Transaction Identifier): A unique identifier for a transaction committed on a MySQL server, used for easier replication and failover.
BUSL (Business Source License): A source-available license that restricts production use for a certain period, after which the code becomes open source (e.g., Apache 2.0).

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

We're Introducing Real Time Design with Google Stitch

Design aifrontend Google Blog

Google has launched Stitch, a real-time AI design tool allowing collaborative partnership with an AI agent for live design iterations using text or voice.

What: Google Stitch, announced at I/O, is a new real-time design tool where an AI agent streams design work directly to a canvas, allowing users to steer iterations with text or voice prompts. Designs can be exported to Google Antigravity or published via Netlify.

Why it matters: This represents a significant push by Google into AI-assisted design, aiming to make the design process more fluid and collaborative with AI as an active partner rather than just a generator.

Takeaway: Developers or designers interested in AI-powered real-time design collaboration can try Google Stitch, available globally today.

Decoder

Google Antigravity: A backend integration tool mentioned by Google for connecting designs to logic.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Replit Launches the Newest Version of its Popular Vibe Coding App

Design aiagentsmobilecareer Mashable

Replit released Agent 4 for iOS and iPadOS, its vibe coding app, after Apple lifted a four-month ban, introducing parallel agents and merged project flows.

What: Replit's CEO Amjad Massad announced the release of Agent 4 for iPhone and iPad, four months after Apple temporarily banned updates due to a dispute over App Store guideline 2.5.2 regarding apps executing external code. New features include parallel agents, collaborative merged flows, and multi-workspace project viewing.

Why it matters: The resolution of Apple's ban on Replit's Agent app sets a precedent for how AI-driven coding tools, which execute code, might coexist with App Store guidelines, hinting at potential compromises or revised interpretations.

Takeaway: If you use Replit's Agent app, you can now download and use Agent 4 on your iPhone or iPad to access new features like parallel agents and collaborative project flows.

Decoder

Vibe coding: A term, popularized by Replit, referring to a style of coding where AI agents assist in generating code or entire applications based on user prompts, often with a more intuitive or 'flow-state' feel.* App Store guideline 2.5.2: An Apple guideline stating that "Apps should be self-contained in their bundles, and may not read or write data outside the designated container area, nor may they download, install, or execute code which introduces or changes features or functionality of the app, including other apps."

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

AI Gives Us the Prototype. It Doesn't Give Us the Brand

Design aienterprise The Drum

AI competently handles 80% of design, creating generic prototypes, but fails to deliver the 20% that forms unique brand identity and emotional connection.

What: Yann Caloghiris, Executive Creative Director at Left Field Labs, references UX expert Natalie Levy-Acosta's findings that AI tools excel at 80% of DUX design, like structural scaffolding and prototypes. However, the remaining 20%—interaction feedback, visual language, and creative direction—which builds brand identity and emotional connection, is beyond AI. This leads to "design parity," where AI-generated interfaces look similar, causing brand erosion.

Why it matters: This exposes a critical limitation of current AI in creative fields: while it boosts efficiency for standard tasks, it struggles with the nuanced, human-centric elements essential for brand differentiation, pushing companies towards commoditization if they rely solely on AI output.

Takeaway: When using AI for design, dedicate significant human effort and recovered time to the crucial 20% of creative direction and brand-specific elements to avoid design parity and brand erosion.

Decoder

Digital User Experience (DUX) design: A design process focused on optimizing the overall experience a user has with a digital product or service.* Design parity: A situation where multiple products or interfaces, often due to over-reliance on similar AI tools or design patterns, end up looking and feeling indistinguishable from each other, lacking unique brand identity.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Launch-ready Product Demo Videos (Website)

Design aiwebdevopsvideo Slideshot.ai

Slideshot.ai uses an AI agent to automatically navigate web apps and generate polished product demo videos from text descriptions, removing manual recording and editing.

What: Slideshot is an AI agent that creates product demo videos by following text instructions to interact with web applications, including authenticated areas. It offers an API and charges $0.90 per recording request, with typical runs completing in 5-10 minutes.

Why it matters: This represents an automation trend for repeatable marketing and documentation tasks, shifting from manual screen recording to declarative, AI-driven asset generation, addressing the pain point of keeping demos current with UI changes.

Takeaway: If your team frequently creates and updates product demo videos for web apps, investigate Slideshot.ai's API for automating this process.

Deep dive

Slideshot.ai is an AI agent that automates the creation of product demo videos for web applications.
Users provide a product URL and a text description of the feature flow they want to demonstrate.
The AI agent drives the web app in a browser, records the walkthrough, and returns a polished MP4 video.
It supports authenticated product areas by configuring credentials.
The service aims to eliminate the tediousness of manual screen recording and editing, especially when UIs change frequently.
Slideshot offers an API for integration into automated workflows, costing $0.90 per recording request with no monthly subscription.
Most recordings are completed within 5 to 10 minutes, with longer demos potentially taking 20+ minutes via an asynchronous API.
The platform is positioned as a solution for repeatable demo generation, unlike manual screen recorders like ScreenStudio or Loom.

Decoder

AI agent: A software program that uses artificial intelligence to perform tasks autonomously, often by interacting with other systems or environments, in this case, a web browser.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

In 2026, here's what creative recruiters are looking for in juniors

Design careercreative-industry Creative Boom

Creative recruiters in 2026 prioritize a junior designer's original thinking, attitude, and ability to explain their creative process over a technically perfect portfolio, especially amidst increased competition from AI.

What: Recruiters from companies like PlayStation (Matt Redway), Noramble (Daniel Poll), and Wolff Olins (James McNaught) emphasize that junior designers need to demonstrate the "why" behind their work, a genuine curiosity, a positive attitude, and a willingness to learn. They value personality, collaboration, and adaptability more than just technical polish, as stated by Edward Dalton of HelloYes and Pablo Marques of Raw Materials.

Why it matters: The article highlights a significant shift in design recruitment, where soft skills, critical thinking, and a human-centric approach are becoming paramount. This is a direct response to AI's growing capabilities in execution, underscoring that human judgment and original ideas are what truly differentiate junior talent in a competitive market.

Takeaway: Junior designers should focus on articulating their creative process and showing their personality in interviews and portfolios, emphasizing curiosity and problem-solving over technical perfection.

Deep dive

Recruiters are seeing increased competition for junior design roles due to AI and economic factors.
The most crucial trait is the ability to explain the "why" behind design decisions, not just showcasing the final product.
Matt Redway of PlayStation, Daniel Poll of Noramble, and James McNaught of Wolff Olins stress the importance of unexpected, meaningful work and understanding the "big idea."
Attitude, passion, curiosity, and a willingness to learn are more valued than raw talent, according to Edward Dalton of HelloYes.
Pablo Marques of Raw Materials looks for good taste, willingness to listen, and fearlessness, valuing those who are aware of what they don't yet know.
Recruiters like Mélanie Hubert-Crozet of monopo london and Tom Muller of helloMuller look for unique styles and personal viewpoints in portfolios, not technical perfection.
James Le Beau-Morley encourages juniors to show "raw ideas" and "weird stuff" to avoid conforming too early.
How a candidate shows up (energy, thoughtful questions, humility, personality) is critical; Alex Dixon of Dacre recalls a student's phone call as refreshing.
Studios hire people, not just portfolios, and value candidates who are adaptable, collaborative, and can contribute to office culture, as explained by Rodd Chant and Chris Woodhams of Cafeteria.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Pixar Ditches its 3D Look for the First Time – And It's Glorious

Design animationartfilm Creative Bloq

Pixar is abandoning its signature 3D animation for the first time with "Gatto," a hand-painted film set in Venice about a cat named Nero indebted to a feline mob boss.

What: Pixar's upcoming film, "Gatto," will be the studio's first hand-painted animated feature. It centers on a cat protagonist named Nero in Venice who owes a debt to a mob boss, moving away from Pixar's established CGI style.

Why it matters: This move by Pixar signals a willingness within major animation studios to explore diverse aesthetic approaches beyond the dominant CGI paradigm, potentially inspiring a resurgence of traditional animation techniques and expanding creative possibilities for storytelling.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Reasonix (Website)

AI agentscodingopensource DeepSeek Reasonix

Reasonix is a DeepSeek-native coding agent designed for the terminal, optimized for low token costs across long sessions using prefix-cache stability.

What: Reasonix is a coding agent developed specifically for DeepSeek models, intended to be run in a terminal environment. Its core feature is "prefix-cache stability," which helps maintain low token costs even during extended coding sessions.

Why it matters: This tool demonstrates an interesting development in developer tooling, offering a specialized AI assistant that leverages specific model features (like DeepSeek's long context window and cost-efficiency) to provide a persistent, cost-effective coding experience directly within the terminal.

Takeaway: Developers working with DeepSeek models who prefer a terminal-based coding assistant should investigate Reasonix for potentially more efficient and longer-running AI-assisted coding sessions.

Decoder

Coding agent: An AI agent designed to assist with or perform coding-related tasks, such as writing code, debugging, or refactoring.
Prefix-cache stability: A technique in large language models where common initial sequences of tokens (prefixes) are cached, allowing subsequent completions that share the prefix to reuse the cached computation, thus reducing redundant processing and token costs.

Original article

Reasonix is a DeepSeek-native coding agent for the terminal. It is engineered around prefix-cache stability and designed to be left running. Token costs stay low across long sessions.

DEVOURED

David Sacks's 11th-Hour Plea Led to Trump's Backtrack on AI Executive Order

AI policystartup WSJ

Venture capitalist David Sacks convinced former President Trump to postpone an AI executive order, arguing it would hinder U.S. competition with China by imposing mandatory regulations.

What: David Sacks, a prominent venture capitalist, successfully lobbied former President Trump to delay signing a broad executive order on AI dangers. Sacks argued that the order's potential mandatory regulations would slow down the U.S. AI industry in its race against Chinese competitors, leading Trump to share concerns about China and postpone the signing.

Why it matters: This incident highlights the significant influence of tech industry leaders, particularly venture capitalists like David Sacks, on major policy decisions, potentially shaping the regulatory landscape for AI in favor of rapid innovation over safety guardrails.

Original article

David Sacks, a venture Capitalist, warned President Trump on a call that the long-awaited executive order on the dangers posed by artificial intelligence that Trump was deliberating on could lead to mandatory regulations that slow down the industry in its race with Chinese competitors. Trump responded that he shared concerns about China and was worried about hindering AI investment. He then postponed the signing and told reporters he wouldn't sign the order. The incident shows how powerful Sacks' influence is and marks a win for those against strong guardrails to limit the risks posed by the technology.

DEVOURED

Paperwork is better when you can just talk through it

AI frontendux Thread Reader App

ChatGPT now allows users to upload form images and use voice commands or text to fill them out automatically, streamlining paperwork.

What: OpenAI's ChatGPT has introduced a new feature enabling users to upload an image of a form. Users can then provide details either through voice mode or text input, and the chatbot will automatically fill out the form for them, simplifying document completion.

Why it matters: This feature represents a practical application of multimodal AI, leveraging both vision and natural language processing to automate mundane tasks, indicating a trend towards more intuitive, conversational interfaces for interacting with traditional documents and workflows.

Takeaway: If you frequently deal with digital forms that require manual entry, try using ChatGPT's new image and voice input feature to automate the filling process.

Original article

Paperwork is better when you can just talk through it.

With Images in ChatGPT and voice mode, you can upload a form, say what to fill in, and get back a completed version.

You can do this without voice, too.

Upload a form image, add the details you want included, and ChatGPT can fill it out for you.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with ChatGPT

Stay in touch and get notified when new unrolls are available from this author!

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

Follow @ThreadReaderApp to mention us!
From a Twitter thread mention us with a keyword "unroll"

@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ChatGPTapp

some of our favorite recent GPTs use the Instacart GPT to create a weekly meal plan, have the relevant ingredients populated in your cart, and then get them delivered to you.

books GPT has read all the books in the world and wants to help you find your next read.

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

DEVOURED

Sundar Pichai Understands Why People Are Anxious About AI

Tech aipolicycareer The New York Times

Google CEO Sundar Pichai believes AI is humanity's most profound technology, acknowledging public anxiety and the industry's need to better showcase its benefits.

What: Google CEO Sundar Pichai shared in a New York Times interview that he considers AI the most profound technology humanity will ever work on, recognizing natural public anxiety due to its rapid progress and stressing the industry's responsibility to demonstrate its benefits. He discussed the future of Google Search, AI agents, and advice for students.

Why it matters: This reflects the ongoing public relations challenge and ethical considerations for major AI developers like Google, as they navigate rapid technological advancements with widespread societal implications and concerns.

Original article

Sundar Pichai believes that AI is the most profound technology humanity will ever work on. He says it is natural that people feel anxious about the future the technology will bring, especially with its extraordinary pace of progress. Pichai thinks the industry has to do a lot more work in showing the benefits that are possible with the technology. This article contains a transcript of an interview with Pichai where he discusses the future of Google Search, how he's using AI agents, and his advice for college students.

DEVOURED

Meet Mark Zuckerberg's Right-Hand Man Who's Unleashing AI at Meta

Tech aienterpriseleadership Wall Street Journal

Andrew Bosworth, a 20-year Meta veteran and close Mark Zuckerberg confidant, is leading the company's ambitious transformation into an AI-first organization.

What: Andrew Bosworth, Meta's CTO, is spearheading the company's shift to an AI-first approach, incentivized by a potential nearly $1 billion bonus if Meta's market cap increases by over 500% in five years. His strategy focuses on integrating AI extensively into internal workflows and automating tasks.

Why it matters: This indicates Meta's aggressive commitment to AI, not just as a product but as a fundamental operational shift, driven by high-stakes incentives for its leadership and internal restructuring.

Original article

Andrew Bosworth, a top lieutenant of Meta CEO Mark Zuckerberg for more than 20 years, is leading Meta's gargantuan efforts to transform itself into an AI-first company that can innovate as fast as nimble startups. Bosworth is set to make nearly $1 billion if he can help increase Meta's market cap by more than 500% in the next five years. His focus now is on getting workers to use AI more in their work, and when possible, hand tasks over to it entirely. This article takes a look at his career up until now.

DEVOURED

It's like the Olympics - except steroids are allowed

Tech policystartuphealthculture BBC

The controversial "Enhanced Games" is launching its inaugural event in Las Vegas, featuring elite athletes openly using performance-enhancing drugs for a chance at multi-million dollar prizes.

What: Founded in 2023 by Aron D'Souza and Maximilian Martin, the "Enhanced Games" held its first competition in Las Vegas, backed by investors like Peter Thiel. It offers $25 million in prize money, including $1 million bonuses for breaking world records, to athletes who use legal, FDA-approved performance-enhancing drugs.

Why it matters: This event challenges the traditional anti-doping ethos of sports, pushing the boundaries of human performance and openly confronting the hypocrisy often associated with hidden doping, while simultaneously raising ethical and health concerns.

Deep dive

The Enhanced Games is a new sporting event that openly allows and encourages athletes to use performance-enhancing drugs (PEDs) that are legal and FDA-approved.
The inaugural competition took place in Las Vegas, offering $25 million in prize money, with $1 million bonuses for world records.
Notable athletes participating include British swimmer Ben Proud and US sprinter Fred Kerley.
Strongman Hafthor Bjornsson (known as "The Mountain" from Game of Thrones) openly discussed his steroid use, which is accepted in professional strongman.
The founders, Aron D'Souza and Maximilian Martin, have attracted significant investors like Peter Thiel and Donald Trump Jr.
Critics, including the US Anti-Doping Agency (USADA) CEO Travis Tygart, condemn the event as reckless and an affront to the spirit of sport, warning of serious health risks from anabolic steroids and growth hormones.
Athletes like Ben Proud justify participation by highlighting the lucrative prize money, which far exceeds earnings from traditional Olympic sports.
Some athletes, like American swimmer Hunter Armstrong, plan to compete clean at the Enhanced Games with the intention of still participating in future Olympics, though traditional sports bodies like World Aquatics have threatened bans.
The company behind the games, Enhanced Group, recently began trading on the New York Stock Exchange and is exploring online sales of performance-enhancing supplements.
Concerns are raised about the cultural implications of normalizing PED use, especially given existing social media pressures on body image and the rise of "biohacking" culture.

Decoder

Performance-enhancing drugs (PEDs): Substances, often hormones or stimulants, used to improve athletic performance, banned in most traditional sports.
FDA (Food and Drug Administration): A federal agency of the United States Department of Health and Human Services, responsible for protecting and promoting public health through the control and supervision of food safety, tobacco products, dietary supplements, prescription and over-the-counter pharmaceutical drugs, vaccines, biopharmaceuticals, blood transfusions, medical devices, electromagn

Original article

It's like the Olympics - except steroids are allowed

Under the blazing Vegas sun, giant billboards advertise "Live Enhanced" as the baritone voice of a sports announcer pretends to introduce British swimmer Ben Proud and other athletes.

The announcer is practising at a new open air arena hosting one of the most controversial events in recent sporting history: the Enhanced Games.

Think Olympics on steroids. Literally.

The inaugural competition on Sunday will feature dozens of elite athletes using performance-enhancing drugs to try and break world records in track, weightlifting and swimming.

Some $25m (£18.6m) in prize money is up for grabs - with cash prizes for winners. World records in certain events, being eyed up by the likes of US sprinter Fred Kerley, pay a $1m (£740,000) bonus.

The drugs they use must be legal, and approved by the US Food and Drug Administration (FDA). But substances like testosterone and human growth hormone - banned by the World Anti-Doping Agency - are not only celebrated here, they're encouraged and for sale.

The project was founded by entrepreneurs Aron D'Souza and Maximilian Martin in 2023 and has attracted backing from prominent investors including billionaire Peter Thiel and Donald Trump Jr.

Health experts warn that anabolic steroids and growth hormones can cause strokes and cardiovascular damage, among other risks.

Event organisers claim Enhanced will push the limits of human performance while critics, especially in the Olympic movement, dismiss it as an affront to the spirit and founding principles of competitive sport.

'We're being up front and honest'

"You don't have to be pressured or use drugs in order to be the best," says Travis Tygart, CEO of the US Anti Doping Agency, USADA.

He tells the BBC that while there are clear failures with the Olympics' anti-doping protocols, the answer is reforming the system, not to dope.

Athletes, he says, need to be assured the Olympics are clean and cheats will not be tolerated.

"We don't want kids to have to say, 'in order to win an Olympic medal, when I'm 18 or 20 years old, I have to inject myself every day in the rear end with a potentially dangerous drug.'"

But Enhanced, the company behind the games, claims it is bringing out into the open what it says is an undercurrent of many athletes cheating and taking performance-enhancing drugs in the shadows.

Packed into a ballroom at Resorts World casino, Enhanced athletes answered media questions for two hours, but only one - strongman Hafthor Bjornsson who hopes to break his own deadlift record of 510kg (1,124.4 pounds) - would say which drugs he was taking. Other athletes were tight lipped.

Bjornsson, who played the Mountain in Game of Thrones, says he's open about his steroid use because it's accepted in the professional strongman world.

American sprinter Shania Collins says the fact that those taking part in the games admit to doping, already gives them more integrity than cheaters.

"We're being up front and honest and transparent from the start," she tells the BBC. "So how can you challenge our integrity when we're forthright with the information?"

Some sporting governing bodies have publicly rebuked athletes for choosing to compete in the games.

UK Athletics' chief executive Jack Buckner said he was "appalled" when it was revealed former Great Britain sprinter Reece Prescod had signed up in January. UK Anti-Doping (Ukad) has called the event a "reckless venture".

Meanwhile, GB Aquatics has said British swimmer Ben Proud will not be selected again for Britain's Olympic team if he competes at the Enhanced Games.

Big money involved

Proud, who won the silver medal in the 50m freestyle at the Paris Olympics in 2024, is hoping to break the world record using performance-enhancing drugs and win a million dollars on Sunday.

If he wins the race but doesn't break the world record, he will still make $250,000 (£185,000).

"There's no money in sport," Proud told the BBC before the games. "I was 30 and had just come off a silver medal, what future path do I follow?"

Proud, who has been widely condemned for joining the Enhanced Games, has said it would take 13 years of winning World Championship titles to earn this kind of prize money.

Enhanced has already paid a doped up swimmer a million dollars for breaking a record, during one of the trials it hosted ahead of Sunday's competition.

Of the 42 athletes competing at the Enhanced Games on Sunday, most will be using testosterone and some will also be using human growth hormone and stimulants like Adderall.

But not everyone will be doping - some are competing clean.

American swimmer Hunter Armstrong has said he "definitely" doesn't want to dope for the games, adding: "I personally have taken pride in getting as far as I can on natural God-given talent."

He plans to compete clean for a shot at the money and then return to compete at the Los Angeles Olympics in 2028. Whether he can is unclear, given the outcry from many sports bodies responsible for selection.

However, the US Anti-Doping Agency's Tygart told the BBC as long as an athlete passes drugs tests to qualify for the Olympics, there's nothing to stop them from taking part from a doping perspective, but he points out that World Aquatics has already threatened to ban any swimmers competing in the Enhanced Games.

Wider worries for society?

Earlier this month, the Enhanced Group - the company behind the competition - began trading on the New York Stock Exchange.

And the competition is seemingly being treated as an opportunity for Enhanced to sell performance-enhancing medicine and supplements online.

This sparks broader concerns for some, at a time when social media is awash with offers to buy unregulated peptides and pressure on people to look a certain way.

Joe Vennare, founder of Fitt Insider, which analyses the health and wellness industry, feels normalising performance-enhancing drugs will bring unknown health and cultural consequences.

He says people have the right to use legal medical interventions, but is concerned some people are doing so at the expense of being fit and having a healthy diet.

"Kids are using social media filters, they're getting Botox injections," he tells the BBC. "They're having body dysmorphia - especially young men, in this case at record numbers."

Vennare says the Enhanced Games reflects those problems, but hasn't created them.

"That's a problem that parents and culture and society more broadly have to address."

Enhanced athlete James Magnussen agrees. The Australian swimmer says parents need to control what their kids watch and take personal responsibility - but he insists Enhanced is not "targeted at children".

"It's an entertainment company and product targeted at people looking at the longevity and human performance space."

None of these criticisms of the Enhanced Games are likely to go away any time soon.

Neither the athletes taking part, nor the invite-only crowd in Vegas, seem to be deterred.

Walk around here and you hear a lot about "biohacking", "human optimisation" and pushing the body beyond its natural limits.

So what's happening here may end up being much bigger than a niche sporting event. It's about whether sport is becoming a testing ground for a much bigger cultural shift.

DEVOURED

Meta Launches Forum, a New Reddit-Like App for Facebook Groups

Tech socialaimobileenterprise PCMag

Meta has quietly launched "Forum," a Reddit-like app for Facebook Groups, featuring anonymous posting with nicknames and an AI-powered "Ask" tab for curated answers.

What: Meta released "Forum," a new iOS app designed for Facebook Groups to foster deeper discussions, spotted by analyst Matt Navarra. It allows users to post under a nickname and includes an AI-powered "Ask" tab that provides answers drawn from real comments within Facebook Groups. Group admins also get an AI assistant for moderation.

Why it matters: This move signifies Meta's attempt to capture a segment of the forum/community market, traditionally dominated by Reddit, by integrating AI and offering more privacy options within its existing Facebook Groups ecosystem.

Original article

Meta has quietly launched a new app for Facebook Groups called Forum. The app didn’t get a formal launch but was spotted on the iOS App Store by analyst Matt Navarra.

The App Store description suggests Meta is building it as a rival to Reddit. Forum is “a dedicated space built for deeper discussions, real answers, and the communities you care about,” the company says.

Once you log in to the app using your Facebook account, you’ll be greeted with a feed of updates from Groups you’ve already joined. You can also search for and join new groups based on your interests.

What makes the app a bit more Reddit-like is that you can publish comments or posts under a nickname. Note that everything you share on Forum will also be visible to Group members via Facebook.

This Tweet is currently unavailable. It might be loading or has been removed.

There’s also an AI-powered Ask tab for quick answers from groups across Forum. It is the second option from the right on the bottom navigation bar, and you can tap it to seek the AI's opinions and recommendations.

Similar to querying on chatbot apps, you can drop a question into Ask, and it will pull up curated responses based on comments made by “real people” across Facebook Groups. It will also let you join those groups.

Group Admins get an additional AI feature: an AI assistant. It can help them “manage groups, moderate content, and keep their groups healthy,” Meta says.

For now, the app and its features may not be available in all regions. “We test lots of new products publicly to see what people find interesting and useful to their experiences across our apps,” a company spokesperson tells Navarra. The analyst has also shared videos and screenshots of the app interface on X and Threads.

About Our Expert

Jibin is a tech news writer based out of Ahmedabad, India. Previously, he served as the editor of iGeeksBlog and is a self-proclaimed tech enthusiast who loves breaking down complex information for a broader audience.

iOS 27 May Finally Add Native Support for Google Cast, But There's a Catch
New watchOS 27 Rumor Tips Better Heart-Rate Tracking, Delayed AI Health Coach
How to Watch the Formula 1 Canadian Grand Prix 2026 for Free
Spotify Will Soon Let You Create AI-Generated 'Personal Podcasts'
Kansas City Schools Swap Chromebooks, PCs for MacBook Neos in 'All-Apple' Shift

DEVOURED

Choosing the Right Graph

Data database jessicatalisman.substack.com

RDF/OWL graphs excel for formal, interoperable knowledge and reasoning, while labeled property graphs are better for fast traversal and developer-friendly analytics, although RDF 1.2 is closing the feature gap.

What: The article compares RDF/OWL and labeled property graphs (LPGs). RDF/OWL is favored for governed, interoperable knowledge, formal semantics, reasoning, provenance, and linked-data publishing. LPGs are preferred for fast traversal, rich edge properties, and easier graph analytics, though RDF 1.2 with native statement annotations is making RDF more competitive.

Why it matters: The choice of graph database technology is often driven by the need for semantic rigor and interoperability versus operational performance and developer agility, indicating a continued tension between formal knowledge representation and practical application development.

Takeaway: If you need to integrate diverse data with formal meaning, consider RDF/OWL. If your priority is rapid graph traversal and analytics with rich relationships, a Labeled Property Graph might be more suitable.

Decoder

RDF (Resource Description Framework): A W3C standard for describing information as a graph of subject-predicate-object triples. It's a foundation for the Semantic Web.
OWL (Web Ontology Language): A W3C standard designed for representing rich and complex knowledge about things, groups of things, and relations between things. It builds on RDF and adds capabilities for defining classes, properties, and constraints.
Labeled Property Graph (LPG): A graph model where nodes and relationships (edges) can have properties (key-value pairs) and labels, making it flexible for many applications, popular in systems like Neo4j.
Provenance: Information concerning the origin and history of a piece of data or an object, including where it came from and how it was created, processed, and delivered.
Linked Data: A method of publishing structured data so that it can be interlinked and become more useful through semantic queries. It builds on standard web technologies like HTTP and RDF.
Statement Annotations: A feature in RDF 1.2 that allows adding metadata to triples (statements), similar to how properties are added to edges in labeled property graphs, narrowing the functional gap between the two models.

Original article

RDF/OWL is better for governed, interoperable knowledge with formal meaning, reasoning, provenance, and linked-data publishing. Labeled property graphs are better for fast traversal, rich edge properties, and developer-friendly graph analytics, though RDF 1.2 narrows the gap with native statement annotations.

DEVOURED

Of Hammers and Nails: What AI Can and Cannot Do for a Data Analyst

Data aicareerllm adamwritesaboutdata.substack.com

AI assists data analysts in coding and data prep, but its inconsistency means human judgment, clean data, and deep context remain essential for reliable, trustworthy analytical insights.

What: The article explains that AI is effective for assisting data analysts with tasks like writing code, preparing data, and drafting analyses, speeding up workflows. However, AI currently lacks the consistency and nuanced understanding required for producing trusted ad hoc answers, emphasizing that good analysis still fundamentally depends on clean data, human context, judgment, and domain knowledge.

Why it matters: This piece frames AI not as a replacement, but as a productivity tool for data analysts, reinforcing the critical role of human expertise in tasks requiring critical thinking, context, and error mitigation that AI models still struggle with.

Takeaway: Treat AI tools as powerful assistants for repetitive or drafting tasks in data analysis, but always apply human critical judgment and verification to their outputs, especially for ad hoc insights.

Original article

AI helps data analysts write code, prep data, and draft analysis faster, but it is still too inconsistent for trusted ad hoc answers. Good analysis still needs clean data, context, judgment, and human knowledge.

DEVOURED

Staff Designers Aren't About Shipping the Best Work. That's the Point

Design careerenterprise The Designer's Field Guide

Staff designers provide direction and enable team output rather than creating individual designs, a challenging shift from senior individual contributor roles.

What: Staff designers add value by setting design priorities, quality standards, and system consistency, and by coaching other designers. This role requires moving beyond personally solving hard design problems to empowering the team to ship better work collectively.

Why it matters: This article highlights a common career transition challenge in design, where the skills required for senior individual contribution differ significantly from the leadership and enablement focus of staff-level roles.

Original article

Staff designers create value through direction rather than individual output, focusing on setting design priorities, quality standards, and system consistency while coaching others. The transition from senior to staff requires moving away from personally solving the hardest design problems and instead enabling the team to ship better work collectively. Many strong senior designers struggle with this shift because they must outgrow the individual contributor skills that made them successful.

DEVOURED

What we lost in the AI chat stream

Design aicareerresearch Medium

AI chat tools hinder critical thinking by trapping useful insights in long, unreviewed streams, reducing deliberate problem framing and reflection.

What: The article argues that while AI chat tools are useful for brainstorming, their chat histories are ineffective for preserving meaningful work because insights get buried in iterative conversations. Over-reliance on AI can diminish critical thinking and the traditional process of sketching and reflection that clarifies problems and intent.

Why it matters: This highlights a pedagogical and process challenge with integrating AI into creative and problem-solving workflows, suggesting that the tool's immediate convenience can inadvertently undermine deeper cognitive engagement necessary for quality output.

Takeaway: When using AI chat for creative or problem-solving tasks, actively extract and document key insights outside the chat stream, and ensure upfront human effort is dedicated to problem framing before engaging the AI for production.

Original article

AI chat tools are powerful for brainstorming and refining ideas, but chat histories are poor at preserving meaningful work because they trap useful insights inside long streams of iterative back-and-forth that people rarely revisit. Relying too heavily on AI can reduce critical thinking and problem framing, especially when users skip the deliberate sketching and reflection that traditionally helped shape ideas. AI works best as a production tool after humans have already clarified the problem, structure, and intent themselves.

DEVOURED

Frontier AI for Motion Design (Website)

Design aigraphics Motion.so

Motion.so launched an AI motion graphics studio, promising to generate and iterate on designs in minutes.

What: Motion is a new AI-powered web platform designed to quickly create and modify motion graphics.

Why it matters: This reflects the broader trend of AI tools democratizing creative production, allowing designers to rapidly prototype or non-designers to generate visual assets.

Original article

Motion is an AI motion graphics studio to create and iterate on graphics in minutes.

DEVOURED

Technical readiness and creative bravery: Instrument agency's formula for leading the charge in design

Design aistartupbranding It's Nice That

Design and technology company Instrument, founded in 2005, leverages AI for rapid prototyping and automation while emphasizing that human creativity, taste, and original perspective remain irreplaceable.

What: Instrument, a design and technology company founded in 2005 by Justin Lewis, Vince Laveccia, and JD Hooge, works with clients like Spotify, Google, and Ōura. CCO Nishat Akhtar states they use AI to accelerate prototyping and automate repetitive tasks but believes true creativity stems from human judgment and originality, which AI cannot provide.

Why it matters: This article provides a nuanced perspective from a seasoned design agency on integrating AI: it's a powerful tool for efficiency and exploration, but the core creative direction, taste, and unique voice remain human-centric, defining a practical boundary for AI in creative work.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Why Workplace Design is Becoming Central to Business Performance

Design enterprisecareer Design Work Life

Modern workplace design has become a strategic business asset, moving beyond aesthetics to actively shape employee experience, collaboration, and organizational performance, especially in hybrid work models.

What: According to Zoe Santoro, workplace design has evolved from an afterthought to a strategic business advantage, emphasizing flexibility, employee well-being, and seamless technology integration. The article highlights how hybrid work necessitates offices to be vital hubs for connection and collaboration, prompting C-suite leaders to pay closer attention to design's impact on productivity and culture.

Why it matters: This shift reveals a growing understanding that physical environments are critical tools for talent attraction and retention, directly impacting business outcomes and requiring a more dynamic, adaptable approach to office spaces as work patterns continue to evolve.

Deep dive

Workplace design is no longer an afterthought but a strategic business advantage, directly influencing collaboration, focus, and organizational performance.
Modern offices must adapt to fluid work patterns, accommodating deep individual tasks, group brainstorming, virtual meetings, and informal interactions.
Key elements of modern design include true flexibility, spaces that spark creativity, effortless technology integration, comfort, and operational flow.
Employee experience is a top design priority, as the physical environment signals how much an organization values its people, impacting engagement, morale, and retention.
Hybrid work has transformed the office into a vital hub for human connection, brainstorming, and innovation, requiring adaptable zones for collaboration, focus, and casual interactions.
Technology integration is crucial for seamless hybrid meetings, cloud collaboration, and flexible connectivity, with designers building adaptability into core projects.
Progressive companies view design as integral to organizational strategy, evaluating how environments drive productivity, nurture innovation, and reinforce positive culture.
Flexibility is essential for responding to changing team sizes, technologies, and work preferences, while sustainability focuses on energy use, waste reduction, and healthier materials.
C-suite leaders are now directly engaged in workplace strategy discussions, recognizing its influence on team collaboration, innovation, and market adaptability.
Future workplaces will balance flexibility, collaboration, technology, well-being, and operational efficiency, serving as strategic tools for talent attraction and cultural strength.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Apple's anniversary edition iPhone leaks in dreamy renders and I can't wait for its 2027 debut

Design hardwaremobile Digital Trends

Apple is reportedly developing a radically redesigned "iPhone XX" for 2027, featuring a heavily curved glass display and solid-state buttons.

What: Rumors suggest an anniversary edition iPhone, possibly called "iPhone XX" or "iPhone 20," launching in 2027 with a curved glass display, rounded chassis, under-display Face ID, smaller camera cutout, thinner display tech, solid-state buttons, and upgraded camera sensors.

Why it matters: This potential radical redesign for an anniversary iPhone indicates Apple's strategy to introduce significant hardware innovation in special editions, possibly to differentiate beyond annual incremental updates.

Original article

Apple is reportedly developing a radically redesigned anniversary iPhone for 2027, unofficially dubbed the “iPhone XX” or “iPhone 20,” featuring a heavily curved glass display, rounded chassis, and a more futuristic look than current models. Rumors also point to under-display Face ID, a smaller camera cutout, thinner display tech, solid-state buttons, and upgraded camera sensors, though some prototypes oddly show only two rear cameras, suggesting it may be a special-edition device rather than a standard Pro model.

Devoured - May 25, 2026

deepseek v4 pro 75 percent price cut permanent

TL;DR

Get the TNW newsletter

The 2026-07-28 MCP Specification Release Candidate

A Stateless Protocol

Before and after

The handshake and session are gone

Stateless protocol, stateful applications

Server-to-client requests, restructured

Routable, cacheable, traceable

Extensions Become First-Class

MCP Apps: server-rendered user interfaces

Tasks graduates to an extension

Authorization Hardening

Roots, Sampling, and Logging Are Deprecated

Full JSON Schema 2020-12 for Tools

How the Protocol Evolves From Here

Release Timeline and Validation

Looking Ahead

Evaluating Multi-Agent Systems at Scale

Anthropic plans Claude memory update with new Memory Files

A hacker group is poisoning open source code at an unprecedented scale

GitHub internal repositories exfiltrated via malicious VS Code extension

Designing end-to-end ingress request tracing for multi-tenant SaaS platforms

The observability problem

A product-led framework for ingress request tracing

3. Security-First Trace Metadata

4. Configuration-Only Telemetry Export

5. Non-Disruptive Failure Modes

Acceptance criteria as executable contracts

Quantifying business value

Understanding trace and span context

Operational impact

The hardest part Is not technical

Replicating this framework

Conclusion

Migrating from Go to Rust

Is your SIEM actually ready? A new way to find out

The 58-Million-Key Freeze: What a HashMap Resize Taught Us About Memory Allocation at Scale

Plan Mode All the Time, Substrait over SQL, and the End of the DE Role ft

pg_infer 1.0.0 released -- transformer model knowledge as SQL relations

Same buffers, same instructions, same hardware. Where Is the JVM Tax?

SAM 3: Segment Anything with Concepts (GitHub Repo)

Cloud Native Computing Foundation Announces OpenTelemetry's Graduation, Solidifying Status as the De Facto Observability Standard

7 Temporal Blind Spots Breaking Enterprise RAG

Leading Design Through the AI Shift

Anthropic prepares Mythos 1 for Claude Code and Claude Security

Lance (Hugging Face Repo)

Anthropic's march to profitability

Bumblebee Goes Open Source

Gemini 3.5 Flash (Low)

SpaceX Launches 400-Foot-Tall Rocket That Will Help Define Its Future

China launches Shenzhou-23 mission with potential record one-year stay in orbit

China launches Shenzhou-23 mission with potential record one-year stay in orbit

Inside the World's Biggest Bet on Fusion Energy

auth.md (Website)

auth.md

Self-serve agent discovery

Choose the flows you support

Credentials you control

Get started

FAQs

Predicting AI job exposure

Predicting AI job exposure

Don't Roll Your Own ...

Don't Roll Your Own ...

Snap Specs True AR Glasses Reportedly Launch This Fall For Around $2500

The Eternal Sloptember

Introducing Pulumi Do: Direct Resource Operations for Any Cloud

Introducing pulumi do: Direct Resource Operations for Any Cloud

The problem: Sometimes IaC is more than you need

What it looks like

The command shape

Resource operations

Provider configuration

Designed for humans and agents

What’s next

Unified credentials with Pulumi ESC

Cross-resource references