DEVOURED

Stable Audio 3.0

AI audioopensourcemachine-learning Stability AI

Stability AI released Stable Audio 3.0, an open-weight model family that generates music and sound effects up to six minutes, trained on fully licensed data.

What: Stability AI's Stable Audio 3.0 offers open-weight models (Small SFX, Small, Medium) on Hugging Face and an API for Large. It generates variable-length audio up to 6:20, can compose full music tracks on-device (Small model), and supports LoRa fine-tuning.

Why it matters: This release democratizes high-quality, long-form generative audio creation with strong commercial rights, pushing community-driven innovation akin to Stable Diffusion in image generation, and specifically addresses licensing concerns that plague other models.

Takeaway: Developers can download Stable Audio 3.0 Small SFX, Small, and Medium weights from Hugging Face to experiment with on-device music and sound effect generation.

Deep dive

Stability AI launched Stable Audio 3.0, a new family of models for generating music and sound effects.
Key features include variable-length generation up to 6 minutes and 20 seconds for Medium and Large models.
The 3.0 Small model is capable of full music composition directly on portable devices.
Three models (Small SFX, Small, Medium) are open-weight and available on Hugging Face, allowing free download and building.
All models are trained on fully licensed data, granting users ownership and commercialization rights under the Stability AI Community License (or Enterprise License for larger organizations).
The architecture uses a novel semantic-acoustic autoencoder for more flexible generation.
The release includes documentation and weights for LoRa training, an efficient fine-tuning method.
It also supports audio inpainting for editing and extending tracks.
Stable Audio 3.0 Large is available via the Stability AI API and for enterprise self-hosting.

Decoder

Open-weight models: AI models where the underlying parameters and architecture are publicly released, allowing users to inspect, modify, and run the models locally.
LoRa (Low-Rank Adaptation): An efficient fine-tuning method for large models that trains a small number of new parameters instead of the entire model, making customization faster and less resource-intensive.

Original article

Meet Stable Audio 3.0, the model family built for artistic experimentation with open-weight models

Key Takeaways:

We're releasing Stable Audio 3.0, a model family with open-weights music models that are trained on fully licensed data.
You own your outputs and can distribute and commercialize them under the Stability AI Community License, or the Enterprise License for organizations with more than $1M in revenue.
Key innovations include variable-length generation up to six minutes, and full song composition on portable devices.
Stable Audio 3.0 Small and Medium are available on Hugging Face. You can download the weights here.
Stable Audio 3.0 Large is available via the Stability AI API and self-hosting for enterprise deployments. Try it out here.

Today we're releasing Stable Audio 3.0, a model family trained on fully licensed data, designed to be the foundation for what the audio community builds next. Three of the models are open weights, free to download and build on.

Music has always evolved through the collective creativity of its community. Remix culture, interpolations, and mashups are how artists build on each other's work and push the art form forward. Generative audio will be no different. We want to foster the same kind of community-driven innovation in audio that we sparked in image generation with the launch of Stable Diffusion.

Stable Audio 3.0 is our open invitation to experiment with generative audio. We believe the best innovations are still waiting to be built.

Meet the Stable Audio 3.0 model family

We’re releasing four new models designed for different use cases and deployment options:

Stable Audio 3.0 Small SFX: Sound effects generation on-device, such as mobile phones and consumer-grade laptops.
Stable Audio 3.0 Small: Full music composition on-device.
Stable Audio 3.0 Medium: Higher musicality (i.e. structure, melodic coherence, and phrasing) and longer track length at up to 6:20.
Stable Audio 3.0 Large: The most advanced musicality in the family, built for music platforms and creative applications that need low-latency generation at high volume.

Open for experimentation, with ownership of what you create

All Stable Audio 3.0 models are trained on fully licensed data. Under the Stability AI Community License, you own your outputs and can distribute and commercialize them freely.

For organizations with more than $1M in annual revenue, you can get commercial coverage with our Enterprise license. We also offer legal indemnification under the Enterprise license.

3.0 Small SFX, 3.0 Small, and 3.0 Medium are all open-weights. To our knowledge, other open music models either restrict commercial use or carry the risks associated with being trained on unlicensed music.

Architectural advancements for variation and iteration

Stable Audio 3.0 is our next-gen architecture, built with a novel semantic-acoustic autoencoder that enables longer, more flexible audio generation. You can read the full research paper here.

Variable-length generation, up to more than six minutes. Stable Audio 3.0 introduces a new method for variable-length audio generation that enables you to generate exactly what you need, at per-second granularity.

3.0 Small generates up to two minutes, compared to 11 seconds from Stable Audio Open Small, and 47 seconds from Stable Audio Open. 3.0 Medium and 3.0 Large generate more than six minutes.

Full music composition on-device. To our knowledge, 3.0 Small is the only model capable of full music composition on-device. For the first time, on-device and offline audio generation isn't limited to short samples; it can produce complete musical tracks.

Customize the models on your own library with support for LoRa training. A LoRa is an efficient method for fine-tuning that was first made popular in image generation, and is now an emerging method for customizing audio generation models.

For the first time we're publishing documentation for LoRa training, alongside the weights for 3.0 Small and 3.0 Medium. For organizations with our Enterprise license, we offer the option of white-glove support with fine-tuning.

Audio inpainting options. Modify a segment of a track, rework part of a song, or extend your composition without starting over. Stable Audio 3.0 supports single-segment editing, multi-segment editing, and causal continuation (extending audio beyond its original endpoint).

Setting the stage for what’s next

Stable Audio 3.0 is the new architecture on which we're already building our next generation of fully licensed audio models for professionals.

While responsibly trained generative AI models are critical, they are not enough on their own. Artist-centric AI will only win if the product experience on a licensed platform is better than the experience on an unlicensed platform.

We're also working on a suite of new products for musicians. Join the waitlist to get early access.

In the meantime, you can learn more about our partnerships with Universal Music Group and Warner Music Group.

Get started with Stable Audio 3.0 now

Open weights: Download 3.0 Small SFX, 3.0 Small, and 3.0 Medium on Hugging Face. For organizations with more than $1M in annual revenue, contact us to discuss our Enterprise Licensing.

API: Stable Audio 3.0 Large is available via the Stability AI API.

Partner platforms: Stable Audio 3.0 will be available on ComfyUI and other platforms.

To stay updated on our progress, follow us on X, LinkedIn, Instagram, and join our Discord Community.

DEVOURED

Anthropic to Pay SpaceX Nearly $45 Billion for Computing Deal

AI infrastructurecloudenterprise Bloomberg

Anthropic will pay SpaceX nearly $45 billion over three years, $1.25 billion monthly until May 2029, for compute resources at SpaceX's Colossus 1 and a second data center.

What: Anthropic committed to a $45 billion compute deal with SpaceX over three years, paying $1.25 billion per month until May 2029. This expands an earlier agreement for 300 megawatts of capacity at SpaceX's "Colossus 1" data center in Memphis, now including a second facility.

Why it matters: This massive, long-term compute procurement highlights the immense, almost unfathomable infrastructure costs and power requirements driving advanced AI development, and indicates a strategic partnership between two major players in their respective fields.

Original article

Anthropic has agreed to pay SpaceX nearly $45 billion over the next three years for compute resources. It will pay $1.25 billion per month until May 2029, with either party being able to end the agreement with 90 days' notice. The companies had earlier this month inked a deal that gave Anthropic 300 megawatts of computing capacity from a large SpaceX data center in Memphis known as Colossus 1. This partnership has expanded to include capacity at a second SpaceX data center.

DEVOURED

AI Solves a Longstanding Geometry Conjecture

AI researchmathematicsmachine-learning OpenAI

An OpenAI reasoning model autonomously disproved a 1946 combinatorial geometry conjecture related to the planar unit distance problem, independently verified by mathematicians.

What: An OpenAI reasoning model autonomously disproved a long-standing conjecture within combinatorial geometry, specifically related to the planar unit distance problem dating back to 1946. The AI developed proof techniques from algebraic number theory, with external mathematicians verifying the result.

Why it matters: This represents a significant milestone in AI's capability for abstract mathematical reasoning and problem-solving, moving beyond mere computation to generating novel mathematical insights and proofs, which traditionally required human intuition.

Decoder

Combinatorial geometry: A branch of mathematics that combines principles of combinatorics and geometry, focusing on finite arrangements of geometric objects like points, lines, and circles.
Planar unit distance problem: An unsolved problem in combinatorial geometry asking for the maximum number of pairs of points in a set of 'n' points in a plane that are exactly one unit apart.
Algebraic number theory: A branch of number theory that studies algebraic integers and their properties, often using methods from abstract algebra.

Original article

An OpenAI reasoning model autonomously disproved a major conjecture tied to the planar unit distance problem, an open question in combinatorial geometry that dates back to 1946. The proof introduced techniques from algebraic number theory and was independently verified by external mathematicians, marking one of the first cases where an AI system resolved a prominent unsolved mathematics problem.

DEVOURED

Introducing Agent Executor, Google's distributed Agent Runtime

AI agentsdevopsopensourcecloud Google Cloud Blog

Google introduced Agent Executor, an open-source runtime standard for reliably and efficiently deploying long-running AI agent workflows with features like durable execution, secure isolation, and connection recovery.

What: Google's Agent Executor is an open-source distributed agent runtime designed for reliable execution of complex, long-running agent workflows. It offers durable execution, secure sandboxed isolation, session consistency, connection recovery, and trajectory branching. It integrates with Agent Substrate, a new Kubernetes Engine abstraction for scaling millions of sub-second agent tool calls.

Why it matters: This marks Google's significant push to standardize and productionize AI agent deployment, addressing core operational challenges like fragility and scalability. By offering an open-source runtime and integrating with Kubernetes, Google aims to make agent orchestration more robust and accessible for enterprises.

Takeaway: Developers building AI agent workflows should investigate Google's Agent Executor and Agent Substrate to improve reliability, isolation, and scalability, especially for long-running or distributed agent applications.

Deep dive

Agent Executor is Google's new open-source runtime standard for executing, resuming, and deploying AI agent workflows.
It addresses the challenges of fragile and hard-to-manage long-running agent tasks in production environments.
Durable execution ensures workflows can resume after outages or human-in-the-loop (HITL) interruptions, using event logs and snapshotting.
Secure isolation is provided through sandboxes, preventing harmful side effects and compromising services, especially for code-generating agents or multi-tenant scenarios.
Session consistency is maintained in distributed workflows via a built-in single-writer architecture.
Connection recovery allows clients to reconnect after disconnections, backfilling responses for a better user experience.
Trajectory branching enables agents to test different decision paths from checkpoints without losing context.
Agent Executor integrates with Agent Substrate, a new open-source Kubernetes Engine abstraction, designed to optimize compute efficiency for massive agent deployments by handling millions of sub-second tool calls.
It is harness-agnostic, supporting Google's Antigravity, Gemini APIs, LangChain/LangGraph, and other Agent Development Kits (ADK) and Agent2Agent Protocol (A2A) agents.
The project promotes vendor lock-in prevention, allowing enterprises to run agents on their own infrastructure and control data residency.
Agent Executor is available in preview on GitHub for developers to explore and provide feedback.

Decoder

Agent workflow: A series of automated steps and decisions carried out by an AI agent, often involving interactions with external tools, APIs, or human users.
Durable execution: A programming paradigm where a long-running process can reliably resume its state and continue execution after interruptions, crashes, or reboots.
Human-in-the-loop (HITL): A model where human interaction is required to complete or refine a process, often for validation, correction, or complex decision-making within an automated system.
Trajectory branching: In the context of AI agents, this refers to the ability to explore different possible decision paths or sequences of actions from a given state, similar to branching in version control systems, allowing for testing and evaluation without committing to a single path.
Agent Substrate: A new abstraction layer for Google Kubernetes Engine (GKE) designed specifically to optimize compute for AI agent workloads, particularly those involving a high volume of short-lived tool calls, by moving agents on and off compute capacity in real-time.

Original article

Introducing Agent Executor, Google’s distributed Agent Runtime

Jaana Dogan

Software Engineer

Ethan Bao

Engineering Director

As models and harnesses improve, agents are taking on increasingly complex tasks that can run for hours or even days. But as we push agents to do more, this has surfaced a new operational problem: long-running agent workflows are fragile and incredibly hard to manage reliably and efficiently in production.

Today, we’re introducing Agent Executor, Google’s open-source runtime standard for agent execution, resumption, and distributed deployment. Based on what we’ve learned from solving these challenges internally, we’ve built Agent Executor to have the following native capabilities:

Durable execution: Long-running execution requires the ability to resume after outages or agentic interruptions such as human-in-the-loop (HITL) confirmations. Agent Executor provides this backend resilience automatically for any actor (e.g., an agent, agent harness, skill, tool, or sandbox) through its event log and snapshotting.
Secure isolation: Agent Executor isolates components in secure-by-design sandboxes to prevent harmful side effects and help ensure malicious activity cannot compromise the broader service. Sandboxes are especially useful when agents generate code or handle multiple tenants or user data concurrently.
Session consistency: In distributed agent workflows, multiple components may attempt to update shared session state at the same time. Agent Executor’s built-in single-writer architecture helps maintain consistency and reduces the risk of corruption in that state.
Connection recovery: In long-running agentic execution, clients may disconnect for many reasons, including network outages. Agent Executor lets clients reconnect to agents and backfills responses from the last sequence seen by the client for a better user experience.
Trajectory branching: Checkpoints let you branch an agentic trajectory (its decision or workflow path) at any point, allowing agents to test or evaluate different paths without losing context or other state.

In this blog, we’ll share more about Agent Executor and how you can get started.

Federate with Google’s agent runtime

Enterprise adoption of agents requires orchestration across deployment models. Some teams need on-prem infrastructure for proprietary workflows, performance, or compliance, while others prefer pre-built or custom managed agents for faster time-to-value. At Google I/O, we introduced a new suite of such solutions – including Antigravity 2.0 and the Managed Agents API – designed to accelerate how teams build and scale within the agentic enterprise.

Agent Executor bridges these deployment models, letting you mix-and-match between any or all of:

Google Antigravity, Gemini’s state-of-the-art agent harness
Google-built frontier agents, such as the latest Deep Research agent
Custom agents built by you and managed by Google (e.g., via the new Managed Agents in Gemini API)
Custom purpose-built agents, built with LangChain/LangGraph, Agent Development Kit (ADK), etc and any agents using Agent2Agent Protocol (A2A)

Own your agents, models, and compute

With Agent Executor, enterprises have maximum flexibility to maintain sovereignty over workloads and keep proprietary workflows within their self-managed compute and custom sandboxes. Your internal development teams have much more flexibility over how agents are deployed and managed and you benefit from:

Prevent vendor lock-in: Deploy your agents on your own infrastructure without being tethered to a specific provider’s model or compute environment. This allows for full control over data residency and your cost and budgetary controls.
Bring your own harness and agents: Agent Executor is designed to be harness-agnostic, allowing you to bring your own or use those made available by other vendors. It also supports agents developed with industry-standard frameworks and protocols providing a broad ecosystem of compatible agents.
Fully control execution: Agent Executor allows developers to run the entire agentic stack, including MCPs, skills, and other agents, directly on their own data plane. Developers can choose any compute with custom isolation boundaries and workload policy enforcement.

Scale agents up on Kubernetes with an agent-first compute layer

As agent workloads scale into the hundreds of millions and become increasingly long-running, our customers are hitting the limits of traditional compute abstractions because unlike traditional software, agents are nonlinear programs that wait for external inputs. To solve this problem, we’ve partnered with the Google Kubernetes Engine team on Agent Substrate, a new open-source project also announced today.

Agent Substrate introduces a new level of abstraction for Kubernetes that moves agents onto and off of ready compute capacity in real-time, resulting in lower latency with higher scale and efficiency. While standard Kubernetes is optimized to handle thousands of long-running services, Agent Substrate is designed for the chatter of millions of sub-second tool calls that would otherwise overwhelm a standard control plane. Agent Substrate takes core secure runtime and snapshotting capabilities of existing sandbox infrastructure and pairs them with a minimal control plane designed to bypass some of the limitations of Kubernetes, without reinventing the rest of it. Working together, these layers enable you to:

Maximize compute efficiency: Agent Substrate introduces a new control plane designed to handle hundreds of millions of registered agents. Together with Agent Executor, Agent Substrate can provide a foundation for today’s largest agent deployments.
Stay within the Kubernetes ecosystem: Agent Substrate is built on top of Kubernetes and allows scheduling and horizontal scaling of compute with declarative configuration.

In the demo below, we showcase using Agent Executor together with Agent Substrate with a sample workload.

Get started today

Models, agents, harnesses, and the infrastructure around them are all evolving faster than ever. We’re building Agent Executor in the open so we can validate the design in the hands of real developers and improve based on your feedback.

Agent Executor is available now in preview. We invite you to explore the code, test it with your own workloads, and help shape the future of agent runtimes. Head over to our GitHub repo to get started today.

DEVOURED

Gemini 3.5 Flash Developer Guide

Tech aillmgoogleagents Google AI Studio (via X)

Google's Gemini 3.5 Flash model is now generally available, introducing features like "Thinking," multimodal function responses, and robust agentic execution for coding and long-horizon tasks.

What: Google has made its Gemini 3.5 Flash model generally available, featuring improvements such as a "Thinking" capability for agents, structured outputs with tools, multimodal function responses, code execution with images, and combined tool use, designed for sustained frontier performance in agentic execution and long-horizon tasks.

Why it matters: The focus on "Thinking" and enhanced tool use in Gemini 3.5 Flash reflects a clear industry direction towards more autonomous and capable AI agents that can tackle complex, multi-step tasks across different modalities, signaling Google's continued push in the highly competitive LLM space.

Takeaway: Developers can now explore the Gemini 3.5 Flash API and migration guidance to integrate its new features, especially for agentic workflows or tasks requiring multimodal understanding and structured outputs.

Decoder

Agentic execution: The ability of an AI model to autonomously plan, execute, and iterate on complex tasks, often by breaking them down into sub-tasks and using external tools.
Multimodal function responses: The capability of an AI model to use information from multiple modalities (e.g., text, images) when calling external functions or responding to prompts.
Long-horizon tasks: Complex tasks that require multiple steps, sustained reasoning, and often involve memory or planning over extended interactions.

Original article

Gemini 3.5 Flash is now generally available. This guide contains an overview of improvements, API changes, and migration guidance for the model. Gemini 3.5 Flash features Thinking, structured outputs with tools, multimodal function responses, code execution with images, and combined tool use, but not Computer Use. It delivers sustained frontier performance in agentic execution, coding, and long-horizon tasks at scale.

DEVOURED

Coding is solved? Software is not

Tech aibackendcareerdevops Arcplane

While AI tools like Claude Code can write 100% of code, the core challenge of software development remains problem understanding, design, and verification, not just implementation.

What: Boris Cherny of Claude Code claims AI can "solve coding" by generating 100% of code and PR reviews. However, the article argues that software development involves more than coding, including defining ambiguous intent, narrowing scope, and verifying outcomes, which AI agents often fail to reduce.

Why it matters: This article provides a critical, nuanced perspective on AI's impact on software engineering, highlighting that while implementation speed increases, the intellectual and strategic challenges of building reliable systems persist and may even be exacerbated by "AI slop" without new workflows.

Takeaway: Developers adopting AI code generation tools should focus on developing new workflows that emphasize clear context, robust specifications, verifiable evidence, and human checkpoints, rather than solely relying on faster code output.

Deep dive

Boris Cherny, creator of Claude Code, suggests "coding is largely solved" as AI can write and review 100% of code.
The article argues that while AI speeds up code implementation, it doesn't solve the broader challenges of software development, which include turning ambiguous intent into reliable systems.
AI agents can produce "AI slop," output that looks complete but doesn't genuinely reduce the complexity or "entropy" of the problem.
The bottleneck shifts from code writing to establishing context, creating precise specifications, verifying agent output, and instituting human checkpoints.
A vague task given to an AI agent can lead to a "finished" but incorrect implementation, as agents lack human judgment to question product decisions.
Trust becomes the expensive part when code is cheap to generate; reviewers need clear evidence of what the agent ran, what failed, and why.
The company Arcplane is developing a workflow tool to manage agentic software work on production codebases, focusing on structured context, persistent specs, verifiable evidence, and human review points.
This new workflow aims to allow agent-authored work to be reviewed as a real change rather than requiring decoding from chat transcripts.
Ultimately, the challenge is making AI-generated code hold up to scrutiny and fit correctly within the overall system.

Decoder

AI slop: AI-generated output that appears complete or coherent but lacks true substance, accuracy, or usefulness, often adding to confusion rather than clarity.
Entropy reduction (in software development): A metaphor for the process of transforming a messy, ambiguous problem statement into a clear, verified, and shippable software change.

Original article

Boris Cherny, the creator of Claude Code, said in a recent talk:

…at this point, it’s safe to say that coding is largely solved - at least for the kind of programming that I do.

He described a workflow where Claude Code writes 100% of the code and Claude reviews every pull request, while humans still act as checkpoints for safety and quality.

The line works because AI coding tools can feel both magical and disappointing. A change that used to take an afternoon can arrive as a credible first draft in minutes, and then the team may still spend hours, sometimes days, deciding whether it was the right change to make.

If implementation is becoming abundant, why does building software still take so much time and effort?

Coding is not the whole job

“Coding is solved” is a provocative statement. It is also an incomplete one.

Models still hallucinate, and generated changes still need review. But the statement points at something real: for many software teams, writing code has stopped being the slowest part of building software.

And yet, software development does not feel solved.

Because coding does not equal software development.

Coding turns instructions into implementation. It remains important, and it is imperfect. But software development is larger than that: it turns ambiguous intent into a reliable system.

No matter what process a team follows, someone has to understand the problem before code exists. The team has to narrow the scope until “done” means something concrete.

After code exists, someone has to prove that the change belongs in the system, ship it safely, and keep owning the consequences.

This is where the promise frays. Implementation gets dramatically faster; the rest of software development does not disappear.

Software development reduces entropy

Not in the physics sense. But as a metaphor, it feels right.

A new feature often starts as a messy request: “Can we add team invitations?” At that point, there may not be an implementation to compare. The team is still figuring out which product behavior the request implies.

Product thinking reduces the mess first. Maybe “team invitations” means a simple email invite into an existing organization. Maybe role assignment can wait. A vague request becomes a narrower bet.

Design gives that bet a shape. The team decides who can send an invite, what an existing user sees, and what happens when an invite expires. Now there is proposed behavior, not just a product wish.

Implementation turns the behavior into a real change. Code gives the idea weight, but it also gives the team something new to distrust. The next question is no longer “can we build this?” It is “did we build the right thing, in the right way?”

Review and deployment close the loop. The change has to survive contact with the rest of the product and with real users.

At each step, software development narrows a messy space of possibilities until there is a change the team can verify. In that loose sense, software development is entropy reduction: turning confusion into a verified change.

The diagram below shows the clean version of that journey: intent becoming a shipped change the team can stand behind.

A diagram showing the software development process

But fast coding can add entropy too

At first, it feels like AI agents can own implementation. In more ambitious versions of the story, they may eventually own the whole loop. But in practice, we often find that agents are “too smart” for their own good.

The failure mode is subtle. A generated test suite can be large and still mostly confirm the implementation the agent already chose. A review thread can grow longer because the agent nitpicks around the core issue. A plan can sound thoughtful while leaving the actual product tradeoff undecided.

This is one form of “AI slop”: output that looks complete, but does not actually reduce the mess.

After introducing AI agents, entropy can decrease in one part of the process and increase in another. The implementation arrives faster, but the team may spend more time reconstructing the agent’s intent and deciding how much of the evidence to trust.

The team produces code faster, but it does not necessarily trust the result sooner.

The missing piece: a new workflow

Once agents enter day-to-day work, the magic wears off a little. They start to feel more like capable junior teammates. The work starts to look more like mentoring:

You give them enough context to begin, then keep checking whether the work is heading toward the thing you meant.

In our team, the transition happened gradually.

At first, agents were personal assistants. They helped inside the developer’s existing loop, while the rest of the development process stayed mostly the same.

Then developers started delegating larger parts of implementation. Instead of writing most of the code by hand, they became editors of an agent’s proposed change.

That worked surprisingly well. It also made the surrounding workflow feel heavier.

Review started to include more archaeology. Context had to be repeated. Noisy tests had to be interpreted. Reviewers spent more time reconstructing what happened and why. None of it looked dramatic in isolation, but it changed the shape of the work.

Chat is useful while the task is still being discovered. But once a change needs review, the transcript becomes a poor source of truth: important decisions and concrete evidence are buried in the same stream as the back-and-forth that produced them.

When humans wrote most of the code, we tolerated a lot of workflow friction because implementation itself took time. Now the code arrives sooner, so the surrounding workflow gets exposed, and in some places the problems get worse.

That does not mean “coding is solved” is wrong. It means the bottleneck has moved.

For us, four problems keep coming back: context, specs, verification, and human checkpoints.

What needs to change?

We build and operate an auth product that manages millions of user identities. That makes us conservative about code written by agents. A change that looks local in the diff can still change who gets access to what, especially in a multi-tenant system.

So we cannot treat agents as a faster way to throw code over the wall.

Context chosen on purpose

A lot of agent work succeeds or fails before the agent writes code.

Large context windows help, but more context is not automatically better. A bloated prompt can bury the one rule that actually matters.

Most teams already have the needed context, but it is scattered across docs, old pull requests, chat, and things teammates remember.

For the invitation task, the useful context lives around membership and access: who can invite, where tenant boundaries are enforced, and whether an existing account accepts differently from a new one. Someone has to choose those pieces. If that choice stays in a developer’s head, the agent guesses. If it travels with the task, review starts from shared ground.

The context created during the work matters too. If a reviewer corrects the same mistake twice, that feedback should not stay buried in two separate pull requests. If a team introduces a new convention, future runs should be able to use it without every developer pasting the same reminder again.

That discipline helps agents. It helps the team too. The agent is just the pressure that makes the old context problem harder to ignore.

Specs that stay with the work

A vague task used to be less dangerous than it is now.

When a human engineer gets a vague task, they bring judgment with them. Sometimes that judgment shows up as a product question, a remembered edge case, or a refusal to implement the request as written.

An agent is much more willing to proceed. Give it a vague request, and it may still produce a full implementation. The result can look finished even when the interpretation was wrong.

That makes the spec matter more.

The invitation spec could still be short. An admin can invite someone by email. The invite expires after seven days. Existing users join after accepting. Role assignment waits for a later change, and cross-tenant access stays out of bounds. If review turns up a missing edge case, like a suspended user accepting an old invite, the spec should change before the agent keeps going.

Most tasks only need enough shape for the risk involved. A small bug fix may only need the expected behavior and a reproduction case. As the risk goes up, the spec has to capture the boundaries that matter: user flow, permissions, constraints, and migration story.

The spec cannot disappear once the agent starts coding. The agent plans against it. The implementation is judged against it. If the team discovers a missing edge case, the spec changes and the agent continues with the updated intent.

That is the version of spec-in-the-loop we care about. The useful spec is the one that stays close enough to the work to argue with it.

Evidence reviewers can trust

When code is cheap to generate, trust becomes the expensive part.

Agents can write useful tests. They can also write tests that mostly confirm the implementation they already chose. Coverage goes up, while the reviewer still has to ask: did we actually prove the behavior we care about?

Verification has to be visible enough for reviewers to know what the agent ran, what failed, and what changed after the failure. They also need to know whether the passing command was actually the right command for this task.

Later, the reviewer should see evidence for those promises, not a generic wall of green checks. The run should show that admin and non-admin paths were exercised, expiry was covered, and acceptance worked for both a brand-new user and an existing account. The command or environment behind that evidence should be visible too.

A small utility change may only need unit tests. A product flow is different: the real signal may come from exercising the experience end to end. For auth and permission changes, we usually want evidence from a reproducible environment, especially around database state and permissions.

The right checks vary by repo and team. What matters is that reviewers can inspect them. A reviewer should not have to dig through a long chat transcript to understand why the change is believed to be safe.

Agents are good at sounding confident. The workflow has to produce evidence.

Checkpoints where judgment matters

Humans should not sit in every loop forever. That defeats the point.

But some moments still need judgment.

Before implementation, someone needs to check whether this is worth building. A missing constraint or wrong scope can send the agent toward the wrong answer very efficiently.

This is where the human checkpoint may matter before any code exists. Someone has to decide whether role assignment belongs in scope, whether both owners and admins can invite, and how to handle an email that already belongs to another tenant. If a human punts on those questions, the agent can still ship clean code for the wrong product decision.

For some tasks, this checkpoint may matter more than code review.

Clean code cannot rescue a bad spec.

After implementation, the review shifts to the result. The question is whether the agent actually solved the problem in a way that fits the product. Test presence alone tells only part of the story; the tests have to mean something. Sometimes the risky part is a maintenance problem that appears later.

The depth of review should depend on risk. A copy update should not go through the same process as a permission change. As the system earns trust, some classes of work can run with less supervision. Others should stay tightly reviewed.

Those boundaries should be part of the workflow.

What we are building for the new workflow

Arcplane is our answer to that workflow gap.

Arcplane gives teams a place to run, review, and manage agentic software work on production codebases. It sits above tools like GitHub and gives agent-authored work a real lifecycle instead of leaving it as an unstructured chat-to-diff handoff.

In Arcplane, that same invitation task would begin with the chosen membership and permission context. The spec would stay with the run as it changes. The branch would carry evidence from checks that actually matched the behavior. Review would pause at the moments a human chose in advance, instead of hoping the important decisions survive in a chat transcript.

That is the workflow we want for our own team: agent work that can be reviewed as a real change, not decoded from a conversation.

Reusable instructions and agent skills help, but they are only ingredients.

A skill can encode repeatable team practice, such as migration review or the way risky auth changes are tested. But that practice still needs a place in the run and in review.

Code is getting easier to produce. The work now is making it hold up.

If this matches what you are seeing in your own team, subscribe below. We will share what we learn as we build.

Subscribe and we'll git push build notes, product updates, and changelog entries to your inbox.

DEVOURED

Years-long fight over users' right to tweak smart TV software heads to trial

Tech policyopensourceconsumerhardware Ars Technica

A California jury will decide in August if Vizio must release its Linux-based smart TV operating system's source code, a case that could profoundly impact user control over smart TVs.

What: The Software Freedom Conservancy (SFC) is suing Vizio to force the release of its Linux-based smart TV OS source code, citing violations of the GNU GPL and LGPL licenses. The trial is set for August 10 in California, and its outcome could affect user rights to modify software, block ads, and disable tracking on many smart TVs, including those by LG, Samsung, and Roku.

Why it matters: This landmark case could set a crucial precedent for consumer rights and open-source license enforcement in the smart device era, potentially empowering users to control their hardware beyond manufacturer-imposed limitations and shaping how companies comply with copyleft licenses in embedded systems.

Takeaway: If you own a Vizio smart TV, or other Linux-based smart TV, keep an eye on the August 10 trial outcome as it could grant you the legal right to request and modify your TV's operating system source code.

Deep dive

The Software Freedom Conservancy (SFC) filed a lawsuit against Vizio in 2021, alleging the company breached GPLv2 and LGPLv2.1 by not providing complete source code for its Linux-based Vizio OS.
The trial is scheduled for August 10 in Orange County Superior Court of California.
SFC argues that as owners of several Vizio TVs, they are third-party beneficiaries of the GPL licenses and have the right to demand the source code.
Vizio, which was acquired by Walmart in December 2024, has resisted, arguing that SFC is not an intended third-party beneficiary and that GPL is a license, not a contract.
Access to the full source code would allow users to make meaningful changes like limiting ads, deactivating automatic content recognition, and ensuring device longevity through community updates.
The Free Software Foundation (FSF) supports SFC's efforts, stating that access to source code is a precondition for software freedom under GPL.
A judge already ruled in December 2025 that Vizio is not required to guarantee a TV will function properly after a user reinstalls modified OS code, a stance supported by Linus Torvalds.
The case's outcome could impact other smart TV manufacturers using Linux-based operating systems like LG's webOS, Samsung’s Tizen, and Roku’s Roku OS.
Vizio's advertising business has been profitable ($115.8 million in the quarter before acquisition), making the ability for users to block ads a significant concern for the industry.
SFC dismisses concerns about DRM key exposure, stating they do not want these keys and Vizio is free to remove them from any modified TV OS.

Decoder

Software Freedom Conservancy (SFC): A US non-profit organization that promotes and provides legal support for free and open-source software projects.
GNU General Public License (GPL): A widely used free software license that guarantees end users the freedom to run, study, share, and modify the software. It is a "copyleft" license, requiring that derivative works also be released under the GPL.
GNU Lesser General Public License (LGPL): A free software license that allows linking to libraries under the LGPL from proprietary software, while still requiring modifications to the LGPL-licensed components themselves to be released under the LGPL.
Copyleft: A general method for making a program (or other work) free, and requiring all modified and extended versions of the program to be free as well.

Original article

For years, owners of Vizio smart TVs have had little control over the software running on their sets—software that can track viewing habits, push ads, and generally shape the experience of using the device.

The Software Freedom Conservancy (SFC), a US nonprofit that promotes and provides legal support for free and open source software projects, isn’t happy about that—so much so that it has spent eight years trying to force the release of the complete source code for Vizio’s Linux-based smart TV operating system.

Now, after numerous delays since the SFC filed suit in 2021, a California jury will decide in August whether Vizio must provide that code in executable form to SFC and any Vizio TV owner who wants it.

The outcome could reverberate across the industry. Because many of today’s popular smart TV operating systems are Linux-based, the case may help determine how much control many owners have over their sets. Access to the full code would allow users to make meaningful changes to how their TVs work, including limiting ads or deactivating automatic content recognition.

Ahead of the trial, we spoke with an SFC executive about why it’s suing Vizio and what it hopes the case will accomplish.

Vizio and its parent company, Walmart, did not respond to multiple requests for comment. We reviewed filings from Vizio to understand why it doesn’t think the GNU’s General Public License (GPL) and its “Lesser” version (LGPL) require it to share the source code for Vizio OS (formerly Smart Cast).

Software Freedom Conservancy sues Vizio

The Software Freedom Conservancy argues it has the right to Vizio OS’s source code because it owns several Vizio TVs and because the operating system is based on Ubuntu, a Linux distribution. (SFC employees bought seven Vizio TVs from 2018 to 2021 after getting complaints about Vizio not sharing its TVs’ source code, according to the complaint.) In general, the Linux kernel is provided under the terms of GPLv2, as noted by kernel.org, which is run by the Linux Kernel Organization.

SFC’s lawsuit alleges that Vizio breached GPLv2 and LGPLv2.1 by failing to make available the complete source code for Vizio OS. The case is currently in the Orange County Superior Court of the State of California. The lawsuit targets Vizio specifically, but the impact could extend to other Linux-based smart TV OSes such as LG’s webOS, Samsung’s Tizen, and Roku’s Roku OS.

“We expect all companies who distribute Linux and other software using right-to-repair agreements like the GPL in their products would comply with these agreements,” Denver Gingerich, the director of compliance at SFC, told Ars.

SFC sued Vizio specifically because the group received numerous reports from concerned users about the company’s TVs, Gingerich said. Vizio has shared some of its operating system’s source code, but SFC claims that code does “not include all files and scripts that would permit the code to be compiled into an executable form,” according to its amended complaint from 2024 (PDF).

“As a nonprofit charity with limited resources, we sadly cannot solve every violation of the GPL agreement, but we do work hard to solve those that are important to a wide variety of users, and the popularity of Vizio TVs suggested to us that resolving this case would be especially worth the effort,” Gingerich said.

The terms of GPLv2 say that “[f]or an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable.”

FSF says there’s “no reason” for code to be withheld

Legal filings from both Vizio and SFC frame the Free Software Foundation (FSF) as the authority on the GPLs in question, as it’s the license steward and publisher of GNU licenses, including GPLv2 and LGPLv2.1.

FSF’s executive director, Zoë Kooyman, was deposed in the case in May 2025. When asked about the nonprofit’s stance, she said via email that the FSF supports SFC’s efforts and believes that “users should be free to enforce their right to source code under the GNU GPL licenses through any available legal mechanism.”

Vizio OS is believed to include at least two versions of the Linux kernel that are subject to GPLv2. The first appears to be tied to the Ubuntu distribution in the OS’s user interface and streaming platform, and the second seems to be tied to “a custom version supplied by VIZIO’s chip vendor for the lower-level operating system,” according to SFC’s amended complaint from 2024 (PDF).

In her email to Ars, Kooyman said:

In the definition of software freedom, which the FSF maintains, we explain the value of the four essential freedoms (run, study, modify, share). The definition clearly explains that access to the source code is a precondition for software freedom. Programs licensed under the GNU GPL can be assumed to have chosen this license to ensure users have these four essential freedoms, as *that is what the license was specifically designed to do*. There is no reason why these core requirements for software to be free would not need to be upheld.

Vizio OS also uses numerous other programs subject to the GPLv2, including BusyBox, dnsmasq, GNU Bash, GNU Tar, and SELinux. Other parts of Vizio OS, including DirectFB, FFmpeg, GNU C Library, SeLinux, and Systemd, are subject to the Lesser General Public License version 2.1 (LGPLv2.1).

Both GPLv2 and LGPLv2.1 are copyleft licenses, meaning that they grant “permission to freely use, modify, and redistribute the covered intellectual property—but only if the original license remains intact, both for the original project and for any modifications to the original project anyone might make,” as Ars alum Jim Salter explained.

Critical to SFC and Vizio’s dispute, the actual text of GPLv2 reads:

Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things.

LGPLv2.1 uses similar language.

SFC expects a ruling within three to six months of the conclusion of the trial, which is currently scheduled for August 10.

Vizio fights back

Absent a response from Vizio and Walmart, Vizio’s legal filings provide insight into the company’s defense.

In 2023, Vizio filed a motion for summary adjudication (PDF) seeking to avoid a trial (a judge denied the motion later that year (PDF). In the motion, Vizio argued that “SFC is not an intended third-party beneficiary to GPLv2 or LGPLv2.1 and, thus, doesn’t have the right to sue Vizio to enforce license terms.”

According to the litigation filing:

… nothing in the text of the GPLs suggests that third parties have the right to enforce alleged violations of the GPLs. Further, the FSF has made clear that it never intended third-party enforcement, stating publicly that ‘the copyright holders of the software are the ones who have the power to enforce the GPL’… and that ‘[i]f you think you see a violation of the GNU GPL [or] LGPL . . . you should send a precise report to the copyright holders of the packages that are being wrongly distributed . . . [because] we cannot act on our own where we do not hold copyright.

Vizio also argued that GPL is a software license, not a contract, so the company has no contractual obligation to provide SFC with Vizio OS’s source code, even if SFC were considered a third-party beneficiary of GPLv2 LGPLv2.

Broader implications for smart TVs

Because many other companies use Linux-based OSes, SFC’s lawsuit could have broad implications for smart TVs and their owners. The case stands out because it asserts that individual end users have rights as third-party beneficiaries of GPL licenses. Ultimately, the SFC hopes that filing the lawsuit as a third-party beneficiary of the GPL “will clearly demonstrate the feasibility of this approach,” Gingerich said.

Access to Vizio OS’s source code could allow users to tweak the software to add features like accessibility tools and avoid frustrations like user tracking and ads. Walmart acquired Vizio in December 2024 and is likely to make changes to Vizio OS aimed at furthering its advertising business. The company already requires some new Vizio TVs to be set up with a Walmart account to access smart features.

And Vizio OS’s source code could also be critical for users if Vizio ever goes out of business.

“[Access to the source code] would also allow for the maintenance of older models that are no longer supported by VIZIO. In these ways, purchasers of VIZIO smart TVs can be confident that their devices would not suffer from software-induced obsolescence, planned or otherwise,” SFC’s complaint reads.

Ads and tracking have been Vizio’s primary focus for years. Walmart doesn’t share Vizio’s financials, but in the quarter before its acquisition, the company’s ad business made $115.8 million, and its hardware business lost $6.7 million. While many Vizio TV owners are unlikely to have the time or technical ability to reduce ads or user tracking—even with access to Vizio OS’s source code—making that code available could still threaten Vizio’s business and profitability.

SFC’s Gingerich thinks these fears are “overblown,” and he compared them to 1980s concerns that VCRs would kill the US film business. But with virtually every smart TV brand increasingly focusing on monetizing software through ads and tracking, the entire industry is likely to resist anything that could significantly curb ad revenue potential.

Another concern is that people tinkering with their TVs’ software could gain access to digital rights management (DRM) keys that Netflix and other streaming companies provide to OS operators for decrypting protected videos, Gingerich said, adding:

We have repeatedly and emphatically informed Vizio that we do not want these keys, and they are free to delete any such key material from the TV before modified versions of Linux or other open source programs are installed onto the TV. They have not responded to this reply of ours.

Despite these obstacles, the SFC is doubtful that its lawsuit will push Vizio or other smart TV OS operators to adopt a completely closed OS that would make customization and repairs impossible for users.

“Because of how valuable and flexible Linux and other open source programs are, it is generally not practical to change to a fully proprietary operating system,” Gingerich said.

One ruling already made

A judge has already ruled that Vizio is not required to provide source code in a way that guarantees a TV will continue working properly if a user reinstalls a modified version of the OS. In a December 2025 ruling (PDF), Judge Sandy Leal wrote, “Nothing in the language of the Agreements requires Vizio to allow modified source code to be reinstalled on its devices while ensuring the devices remain operable after the source code is modified.”

She continued:

… the disputed language means that Vizio must provide the source code in a manner that allows the source code to be obtained and revised by Plaintiff or others for use in other applications.

In other words, Vizio must ensure the ability of users to copy, change/modify, and distribute the source code, including using the code in other free programs consistent with the Preamble and Terms and Conditions of the Agreements.

The SFC has said it “never” believed that any version of the GPL requires devices to function properly after someone installs a modified version of copyleft software.

The ruling even prompted a reaction from Linus Torvalds, the creator and lead developer of the Linux kernel. On a forum on kernel.org, Torvalds said he supported the judge’s finding, adding that it validates the premise that “GPLv2 is about making source code available, not controlling the access to the hardware that it runs on.”

“Vizio used Linux in their TVs without originally making the source code available, and that was obviously not ok,” he added.

Further, in December, Judge Leal issued a tentative ruling on the case, suggesting that Vizio may be obligated to share the Vizio OS source code, but that’s not a final decision.

“Personally, I’m excited to see [more] people interested in improving the devices they have, whether that be to extend their support lifetime for 10-plus years… so they can keep getting security updates after the manufacturer stops updating them, add new features like ad-blocking, or diagnose and fix hardware issues that require software changes,” Gingerich told Ars.

“The Vizio lawsuit is just one piece in the puzzle,” he said.

DEVOURED

GitHub confirms breach of 3,800 repos via malicious VSCode extension

Tech securityvscode BleepingComputer

GitHub confirmed a security breach affecting 3,800 repositories due to a malicious, unnamed VS Code extension that has since been removed.

What: GitHub confirmed that 3,800 repositories were compromised by an unnamed trojanized VS Code extension. The malicious extension was subsequently removed from the VS Code marketplace.

Why it matters: This incident highlights the growing supply chain risk introduced by third-party extensions in popular developer tools like VS Code, which can serve as vectors for sophisticated attacks targeting developer environments and codebases.

Takeaway: Review the extensions installed in your VS Code environment, especially if you have sensitive repositories on GitHub, and verify their legitimacy and permissions.

Decoder

Trojanized extension: A software extension that appears legitimate but contains hidden malicious code, similar to a Trojan horse.

Original article

GitHub has since removed the unnamed trojanized extension from the VS Code marketplace.

DEVOURED

mondayDB 3 – Solving HTAP for a Trillion-Table System

Data databasebackendcloudduckdbmysql monday.com Engineering

monday.com replaced its MySQL, Cassandra, and Redis fleet with mondayDB 3, a custom HTAP system powered by DuckDB, achieving 5-20x performance improvements and 40-60% cost reductions for its trillion-table system.

What: monday.com transitioned its board reads from a MySQL + JSON architecture to mondayDB 3, a CQRS-based Lambda architecture. This new system stores immutable snapshots in object storage, uses an external WAL for real-time changes, and a soft-stateful serving layer that syncs and queries local DuckDB files on every read. It resulted in 5x faster board loads, 20x faster large boards, 50x faster aggregations, and a 40-60% reduction in infrastructure costs.

Why it matters: monday.com's experience with a highly flexible, multi-tenant system highlights the limitations of general-purpose OLTP databases for analytical workloads with dynamic schemas, even when augmented with JSON. It demonstrates how purpose-built systems leveraging columnar engines like DuckDB, combined with CQRS and external WALs, can offer significant performance and cost advantages by aligning the data architecture with actual access patterns. This architecture is now being extended for AI contextual layers.

Takeaway: If you are dealing with multi-tenant systems, dynamic schemas, and analytical workloads on operational data, consider a CQRS-based architecture with an embedded columnar database like DuckDB and an external WAL to achieve better performance and cost efficiency.

Deep dive

mondayDB 3 is an HTAP system designed to handle over a trillion dynamic tables with highly flexible schemas.
The previous architecture used MySQL with JSON blobs for schema flexibility, plus Cassandra and Redis.
This led to slow board loads (over 2 seconds for large boards), poor aggregation performance, and inefficient I/O due to row-oriented storage and multi-tenant shared indexes.
The new architecture is a CQRS-based Lambda architecture with durable snapshots in object storage (S3), an external write-ahead log (WAL) for real-time mutations, and a soft-stateful serving layer.
Serving nodes are Go processes on Kubernetes with local NVMe SSDs, acting as smart read-through caches.
Each serving node maintains an LRU cache of over 200,000 DuckDB files, one per board.
Before each query, the serving node loads the board's DuckDB file, syncs pending WAL entries, and executes the query in-process using DuckDB's vectorized C++ engine.
DuckDB's native WAL is disabled; monday.com uses its own distributed external WAL for durability and real-time changes.
Dynamic schema evolution is handled on the fly by issuing ALTER TABLE ADD COLUMN in DuckDB when a new column is referenced.
A custom routing layer, Ranja, uses Weighted Rendezvous Hashing with capacity-aware weights and hedged requests to ensure cache affinity and resilience.
The migration of over 1 million organizations took 18 months, using feature flags, dual-read validation, and per-account rollout with instant rollback capability.
The architecture is designed as a multi-entity platform, supporting new entity types via JSON-driven plugins.
monday.com plans to evolve mondayDB 3 into an AI contextual layer for text search, semantic retrieval, and RAG, leveraging its per-board file isolation and real-time freshness.

Decoder

HTAP (Hybrid Transactional/Analytical Processing): A database system designed to efficiently handle both transactional (OLTP) and analytical (OLAP) workloads within a single platform.
CQRS (Command Query Responsibility Segregation): An architectural pattern that separates the model for updating information (the command side) from the model for reading information (the query side).
Lambda Architecture: A data processing architecture designed to handle massive quantities of data by combining batch processing with stream processing for real-time data.
DuckDB: An in-process SQL OLAP database management system designed for analytical queries, often described as SQLite for analytics.
WAL (Write-Ahead Log): A log of changes to data before they are applied, used to ensure data durability and atomicity.
Soft-stateful: Refers to a system where state is maintained for performance but can be rebuilt from a durable source of truth if lost, meaning no critical data relies solely on that local state.
Ranja (Weighted Rendezvous Hashing): A distributed hashing algorithm used for consistent load balancing and routing, ensuring that a given key (e.g., tenant ID) consistently maps to the same node for cache affinity.
RAG (Retrieval-Augmented Generation): An AI technique that combines a language model with a retrieval system to fetch relevant information from an external knowledge base to improve the accuracy and relevance of generated responses.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

WrenAI (GitHub Repo)

Data aiagentsopensourcedatabase GitHub

WrenAI is an Apache-2.0 licensed open-source context layer for AI agents, enabling them to generate governed SQL by understanding business semantics over existing data stacks.

What: WrenAI, an open-source context layer released under Apache-2.0, helps AI agents understand business data semantics and generate reliable SQL queries across various data sources like Postgres, BigQuery, and Snowflake. It supports agent frameworks like LangChain and Pydantic. The Wren Engine merged into this repo on May 7, 2026.

Why it matters: This project highlights the critical need for "context layers" to bridge the gap between generic AI agent capabilities and specific enterprise business logic, moving beyond simple schema-based understanding to enable reliable, governed data interaction for agents.

Takeaway: If building AI agents that query business data, consider integrating WrenAI to provide structured context, semantic understanding, and governed SQL generation.

Deep dive

WrenAI acts as an open context layer, providing AI agents with business semantics, examples, memory, and governance capabilities beyond what database schemas offer.
It is open-sourced under the Apache-2.0 license, including its core engine, SDK, and skills.
The system uses a Modeling Definition Language (MDL) for defining models, columns, relationships, and access control.
The core engine is based on Apache DataFusion and supports over 22 data sources.
Memory and examples are backed by LanceDB, using hybrid retrieval, and are versionable.
It includes SDKs for popular agent frameworks like LangChain and Pydantic.
WrenAI supports governed execution primitives, including dry-plan validation and structured errors.
A quickstart guides users to install skill bundles for AI coding agents (e.g., Claude Code) to scaffold projects and onboard WrenAI.
The /wren-enrich-context skill helps enrich projects with business context via grill or auto-pilot modes, writing to MDL and memory in a Git-friendly way.
It aims to solve the problem of agents "rediscovering" business logic by providing a shared, vendor-agnostic interface.

Decoder

Context layer: A software component that provides AI agents with a deeper understanding of business-specific semantics, rules, and memory beyond raw data schemas.
Modeling Definition Language (MDL): A domain-specific language used within WrenAI to define data models, relationships, metrics, and access control policies for business data.
Apache DataFusion: A high-performance, extensible query engine written in Rust, used as the core for WrenAI's semantic engine.
LanceDB: An open-source vector database designed for AI workloads, used by WrenAI for memory and example storage.

Original article

The open context layer for AI agents over business data.

Your agent doesn't know what your data means. We fix that.

Docs · Discord · Vision · Blog

📣 2026-05-07 — Wren Engine has merged into this repo under core/. The previous Canner/wren-engine repo is archived. The previous WrenAI GenBI app is preserved on the legacy/v1 branch (tag v1-final). Read the announcement →

What WrenAI is

WrenAI is the open context layer that gives your agents what schemas don't: business semantics, examples, memory, governance, and — soon — the unstructured corporate knowledge that lives in your docs, wikis, and chat threads. Built for the agent frameworks you already use.

Why agent builders pick WrenAI

Open by default — Open-sourced core, SDK, and skills through Apache-2.0 license.
Built for AI agents — Skills, agentic architecture, context retrieval are first-class. Ships as SDKs for the agent frameworks that engineers already use.
Correctness as primitives — rich schema retrieval, dry-plan validation, structured errors with hints, value profiling, eval runner. The agent orchestrates; the trace lives in the agent's reasoning.
Reviewable, reproducible context — every definition, example, and mapping is versionable and evidence-linked. Git-friendly.
Sits on top of your existing stack — warehouse, transformation pipelines, your existing semantic layer. Not another tool to maintain.

With & Without Wren AI

Agents are everywhere. Claude Code, Cursor, ChatGPT, Aider, LangChain pipelines, Pydantic AI flows, in-house copilots, customer-facing apps. None of them should have to rediscover your business logic from scratch. With Wren AI, "the context layer," they query through a standalone, shared interface usable by every agent and person, not gated behind a single vendor's UI and architecture.

Quickstart

WrenAI is agent-driven by design: you install the skill bundle once, then let your AI coding agent (Claude Code, Openclaw, Hermes, Codex, etc.) drive the rest — Python deps, DB connection, project scaffold, and first query.

1. Install the skill bundle

Skills are workflow guides that teach AI coding agents (Claude Code, Openclaw, Hermes, Codex, etc.) how to drive the Wren CLI for you.

npx skills add Canner/WrenAI --skill '*'

Have multiple AI coding agents installed and want the skills available in all of them? Pass --agent '*':

npx skills add Canner/WrenAI --skill '*' --agent '*'

Or via the install script:

curl -fsSL https://raw.githubusercontent.com/Canner/WrenAI/main/skills/install.sh | bash

See the Skills reference for the full list of skills installed and what each one does.

2. Ask your agent to set things up

Open your agent in a project directory and ask:

Use the /wren-onboarding skill to install and set up Wren AI.

The agent will check your environment, install wrenai, create a connection profile, scaffold the project, and run a first query — all in one flow.

3. (Optional) Enrich the project

Once onboarding finishes, give your project the business context schemas can't carry:

Use the /wren-enrich-context skill in grill mode.

Two modes: grill (one question at a time, you in the loop) or auto-pilot (agent reads <project>/raw/ and proposes). Both modes write to MDL, instructions, queries, and memory — all reviewable, all Git-friendly.

4. Ask questions

# Ask any question
"who are our top 10 customers by sales this quarter?"

Or just ask your agent in natural language — it uses the context layer to resolve schema, recall similar past queries, and write governed SQL.

Want to try it without your own database? Ask your agent to run /wren-onboarding with the bundled jaffle_shop sample dataset — same flow, but you'll be querying a real warehouse end-to-end in a couple of minutes.

Two beats: scaffold fast, enrich deep

/wren-onboarding         # Scaffold a Wren project from your DB (agent-driven)
/wren-enrich-context     # One skill, two modes: (Under development)
                         #   grill      — one question at a time, you in the loop
                         #   auto-pilot — agent reads <project>/raw/ and proposes
wren ask "..."           # Query through the context layer

Fast at first. Deep when you need it. Always reviewable and Git-friendly.

What's Included

Modeling Definition Language (MDL) — models, columns, relationships, views, cubes, metrics, row-level / column-level access control (RLAC / CLAC)
Engine — Apache DataFusion based, 22+ data sources
Memory & examples — LanceDB-backed, hybrid retrieval, versionable
Agent SDK — wren-langchain (LangChain / LangGraph), wren-pydantic; reference Python integration for other stacks
Governed execution primitives — functions, dry-plan, row limits, access control

What's next

Context enrichment skill — /wren-enrich-context (grill + auto-pilot modes) hardened across MDL, instructions, queries, and memory
End-to-end correctness primitives — value profiling, rich retrieval, structured errors, golden eval runner
Agent-native distribution — first-class SDKs across major agent frameworks; see GitHub Discussions for what's prioritized next
Full governed execution — audit logs, rate limits, approval workflow, data-flow inspector

Full roadmap and design notes: see the vision paper.

Documentation

Quickstart — from skill install to first answer
Concepts — what context is, what MDL is, how memory works
Connect a database — Postgres, BigQuery, Snowflake, DuckDB, and more
Agent SDKs — what's shipping today, what's next

Community

💬 Discord — chat with the team and other builders
🐙 GitHub Discussions — design conversations, RFCs, longer threads
🐦 Twitter / X — release notes and short updates
🗞 Blog — vision, post-mortems, deep dives

Contributing

We build in the open. Issues, PRs, connector contributions, SDK integrations, docs fixes — all welcome.

Contributor guide
Connector ecosystem program — three-tier ownership: official, community-blessed, community-owned
Architecture map — find the right place to land your change
Looking for somewhere to start? Try the good first issue label.

Project structure — click to expand

core/
  wren-core/         Rust semantic engine (Apache DataFusion)
  wren-core-base/    Shared manifest types + MDL builder
  wren-core-py/      Python bindings (PyPI: wren-core)
  wren-core-wasm/    WebAssembly build (npm: wren-core-wasm)
  wren/              Python SDK and CLI (PyPI: wrenai)
  wren-mdl/          MDL JSON schema
sdk/
  wren-langchain/    Reference agent SDK integration
skills/              Agent skills for context authoring
docs/                Module documentation
examples/            Example projects

Contributors

License

Apache 2.0. See LICENSE.

Come build the context layer with us.

If WrenAI helps you, drop a ⭐ — it genuinely helps us grow!

DEVOURED

Protocols for transactional usage of object storage

Data databasecloudstorageperformancebackend Bits & Pages

Almog Gavra details how to achieve serializable online transactional processing (OLTP) on object storage by using specific atomic and conditional read/write primitives, despite the high cost of LIST operations.

What: Almog Gavra, on May 18, 2026, explains protocols for building correct transactional applications on object storage, using primitives like atomic PUTs, conditional PUT If-Match/If-None-Match, and strongly consistent LISTs, along with atomic and conditional GETs. The article highlights that while GETs are cheap, LIST operations on S3 can cost nearly 12x more. This approach is motivated by SlateDB's transactional write module.

Why it matters: This article demonstrates how fundamental guarantees like serializability, traditionally associated with relational databases, can be engineered on highly distributed, eventually consistent object storage systems by carefully leveraging specific low-level primitives, pushing the boundaries of what these systems can reliably support for transactional workloads.

Takeaway: If designing a data system on object storage requiring transactional guarantees, carefully evaluate the use of conditional writes and consistent listings, being mindful of the cost implications, especially for LIST operations on services like S3.

Deep dive

The article focuses on design patterns for using object storage (like S3) correctly for Online Transactional Processing (OLTP) use cases, aiming for serializable history.
It identifies three key write primitives: unconditional atomic PUTs, and conditional PUT If-None-Match / If-Match for compare-and-set operations.
Three read primitives are also crucial: atomic GETs, conditional GET If-None-Match (for cached data), and strongly consistent LIST requests.
A key performance consideration is that LIST operations on S3 are significantly more expensive than GETs (nearly 12x more).
The "baseline protocol" uses direct atomic PUTs, providing correctness but suffering from high latency.
The "simple conditional protocol" improves latency by batching writes into a single file and using PUT If-Match to resolve conflicts, but leads to 3-round-trip costs on contention. OpenData Buffer uses this.
The "sequenced write protocol" uses file names as guards, incrementing them with each write, reducing contention cost to two round-trips but requiring garbage collection.
For single-writer systems (like SlateDB), conflicts can lead to writer failure, often using "epochs" encoded in files to distinguish roles (main writer, garbage collector, compactor).
Safe garbage collection requires "boundary files" to prevent old writers from inadvertently creating data branches after cleanup.
The author mentions that SlateDB's protocols are formally verified using the Fizzbee specification language.
The core tradeoff is safety versus contention cost, emphasizing the need for robust protocols to ensure consistency during failures.

Decoder

Object storage: A data storage architecture for handling large amounts of unstructured data, where data is managed as objects rather than files in a hierarchy (e.g., AWS S3, Google Cloud Storage).
Online Transactional Processing (OLTP): A type of data processing that facilitates and manages transaction-oriented applications, typically involving frequent, short, atomic transactions (e.g., bank transactions, e-commerce orders).
Serializable history: A strong isolation level in database transactions ensuring that the concurrent execution of multiple transactions produces the same result as if they were executed sequentially, preventing data inconsistencies.
PUT If-None-Match / If-Match: Conditional write operations in object storage. If-None-Match succeeds only if the object does not exist; If-Match succeeds only if the object's ETag (content hash/version) matches a specified value. These are used for compare-and-set semantics.
Consistent LIST: A LIST operation on object storage that guarantees to return all objects that have been successfully written and acknowledged, reflecting the latest state.
Atomic PUT / GET: Write or read operations that guarantee either the entire operation completes successfully, or it fails entirely, without any partial or corrupted state.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

What data agent benchmarks do and don't tell us

Data airesearchllmstartup dbt Roundup

AI Council attendee Jason Ganz notes the collapsing divide between data and AI infrastructure, highlighting that agent benchmarks are evolving to emphasize statefulness and rich, cross-system context over isolated tasks.

What: Jason Ganz, reporting from AI Council (formerly Data Council) on May 17, 2026, observes that most companies now position themselves as AI infrastructure providers, focusing on context retrieval, orchestration, or inference. He notes that new systems like LanceDB are built specifically for LLM workloads. Benchmarking efforts, including dbt Semantic Layer tests and ADE-bench, show agents perform better with well-specified tasks and extensive context from systems like GitHub and Slack, with Izzy Miller's 90-day business simulation benchmark further demonstrating the value of stateful learning.

Why it matters: The convergence of data and AI infrastructure, coupled with the shift towards stateful, context-rich agent benchmarks, indicates a maturation of the AI agent ecosystem, moving from isolated experiments to integrated, long-running, and truly intelligent systems that operate across an organization's entire knowledge base.

Takeaway: If you are evaluating AI agents, prioritize benchmarks and solutions that emphasize stateful learning and the integration of diverse organizational context (e.g., dbt, GitHub, Slack) for more realistic performance assessments.

Deep dive

The AI Council (formerly Data Council) conference in 2026 showed a clear convergence of data and AI infrastructure.
Many companies are now positioning as AI infrastructure providers, focusing on context providers, workflow orchestrators, or compute providers.
New databases like LanceDB are being designed specifically for AI-native, multimodal workloads, handling blob reads and embeddings.
Benchmarking for AI agents is evolving beyond simple question-answering, with efforts like dbt Semantic Layer tests showing performance improvements for LLM-generated queries.
ADE-bench evaluates agents' ability to build data pipelines, moving beyond just answering questions.
Opeyemi Fabiyi's research identifies specific activities that uplift agent performance with dbt projects.
Izzy Miller's "90-day simulation" benchmark emphasizes statefulness and learning from mistakes over time, a critical aspect of real-world agent operation.
Agents perform significantly better when given access to rich, cross-system context from sources like dbt projects, GitHub, Slack, and Notion, which is often underestimated in sandboxed benchmarks.
A major upcoming constraint and optimization area is token and compute efficiency, as agent usage costs are already high.
The article concludes that data agents are improving, and the industry needs better mechanisms to track performance with organizational context and memory, alongside a focus on efficient agent design.

Decoder

AI Council (formerly Data Council): A prominent conference for the data and AI community, covering trends and developments in both fields.
AI infrastructure company: A company that provides foundational technologies or services (e.g., context retrieval, orchestration, inference compute) to support the development and deployment of AI agents and applications.
LanceDB: An AI-native multimodal lakehouse database designed from scratch for LLM-shaped workloads, focusing on blob reads and embeddings.
dbt Semantic Layer: A component of dbt (data build tool) that defines consistent business metrics and semantics, improving the reliability of LLM-generated queries.
ADE-bench: A benchmark designed to measure the ability of coding agents to build data pipelines, extending beyond simple data question answering.
Statefulness: In the context of AI agents, the ability of an agent to retain and use information, memories, and learning from past interactions or tasks over extended periods, rather than starting fresh with each new task.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

OpenAI Reportedly Moves Toward IPO

AI startuppolicyfinance TechCrunch

OpenAI is reportedly preparing for an IPO as early as September 2026, working with Goldman Sachs and Morgan Stanley, following Elon Musk's dismissed lawsuit.

What: OpenAI CEO Sam Altman reportedly aims for a September 2026 IPO, engaging Goldman Sachs and Morgan Stanley. This move comes after Elon Musk's lawsuit challenging OpenAI's structure was dismissed. The company may file confidential IPO paperwork soon.

Why it matters: This signals OpenAI's transition from a research-focused non-profit (initially) to a mature, financially driven enterprise, setting the stage for a major public valuation and intensifying the AI funding race.

Original article

OpenAI barrels toward IPO that may happen in September

A day after Elon Musk lost his lawsuit that threatened OpenAI’s structure, leadership, and finances, the AI giant is ready to move forward with its initial public offering, sources told the Wall Street Journal.

OpenAI chief executive Sam Altman reportedly hopes that his company will be ready to go public by September. The ChatGPT maker has been working with tech IPO powerhouse bankers Goldman Sachs and Morgan Stanley, and may file IPO paperwork confidentially with regulators within days or weeks, per the WSJ.

The news of OpenAI’s potential IPO, which by all accounts should be a blockbuster, comes as the world awaits the public disclosure of SpaceX’s IPO filings, which are expected to appear as soon as Wednesday, according to reports. Rocket-maker SpaceX is, of course, now one of OpenAI’s major competitors, after it consumed Elon Musk’s model maker, xAI.

Now that Musk failed to skewer OpenAI, the competitor he co-founded, through the heart with a lawsuit, it looks like the next Musk vs. Altman battle will take place in the world of finance. Which one will be the bigger IPO?

OpenAI did not immediately respond to a request for comment.

DEVOURED

On Building Agents From First Principles

AI agentsmachine-learningresearch X (athleticKoder)

Mishra simplifies agent training, revealing that all systems follow a core loop of prompt-model-action-environment-reward-gradient update, demonstrating this by building a text-to-diagram agent.

What: Mishra deconstructs agent training into a universal loop: a prompt leads to a model action, which interacts with an environment, yields a reward, and updates gradients. He illustrates this by building a Python-based text-to-diagram agent that emits JSON `create_shape` and `connect` actions.

Why it matters: This first-principles approach demystifies complex agent frameworks, providing a clearer understanding of their fundamental mechanics and showing how to build intelligent systems from the ground up without relying on high-level abstractions.

Takeaway: If you're building an AI agent, focus on clearly defining the core loop of prompt, model action, environment interaction, reward function, and gradient update, as this is the fundamental architecture shared across all agent systems.

Deep dive

Mishra's article argues that despite varying frameworks, all agent-training systems fundamentally adhere to the same core loop.
This loop involves taking a prompt, having a model generate an action, applying that action to an environment, receiving a reward, and then updating the model's gradients.
He strips away higher-level abstractions like TRL (Transformer Reinforcement Learning), Unsloth, and PRIME-RL to highlight this commonality.
To demonstrate, Mishra constructs a simple text-to-diagram agent using pure Python.
This agent interprets text prompts and generates JSON actions such as create_shape and connect.
These actions are executed against a validating canvas.
The agent incorporates a reward function that evaluates several aspects: JSON validity, compliance with a schema, quality of the layout, and semantic coverage of the original prompt keywords.

Decoder

Agent: An autonomous software entity designed to perceive its environment, make decisions, and take actions to achieve specific goals, often involving interaction with an AI model.
TRL (Transformer Reinforcement Learning): A framework or methodology for training transformer models using reinforcement learning techniques.
Unsloth: A library or framework for fine-tuning large language models efficiently.
PRIME-RL: A specific, potentially proprietary, reinforcement learning framework or algorithm not widely known to the general developer community.
Gradient update: The process in machine learning where the model's parameters are adjusted based on the calculated gradients of the loss function, aiming to minimize errors and improve performance.

Original article

Mishra strips away the TRL, Unsloth, and PRIME-RL framework abstractions to show that every agent-training system reduces to the same loop: prompt to model action to environment to reward to gradient update. He builds a toy tldraw-style text-to-diagram agent in pure Python where the model emits JSON create_shape and connect actions against a validating canvas, then layers a reward function combining JSON validity, schema compliance, layout quality, and semantic coverage of prompt keywords.

DEVOURED

A Bitter Lesson for Data Filtering

AI machine-learningresearchdata arXiv

New scaling studies suggest data filtering might be unnecessary for large model pretraining, as ample compute allows models to benefit even from low-quality and distractor data.

What: Researchers Christopher Mohri, John Duchi, and Tatsunori Hashimoto found that in high compute, data-scarce scenarios, large parameter models benefit from including low-quality and distractor data. This challenges the common belief that strict data filtering is essential for pretraining.

Why it matters: This "bitter lesson" shifts the paradigm for large model training, suggesting that raw computational power can compensate for, and even benefit from, less curated datasets, potentially simplifying data pipeline efforts in resource-rich environments.

Takeaway: If you have access to significant compute resources for large model pretraining, reconsider aggressive data filtering strategies, as including lower-quality data might be more beneficial than previously thought.

Deep dive

A new research paper titled "A Bitter Lesson for Data Filtering" investigates the role of data filtering in large model pretraining.
The study focuses on scaling in a high-compute, data-scarce environment.
Contrary to prevailing wisdom, the authors found that the optimal data filter in such conditions is "no data filter."
Large parameter models, when sufficiently trained with ample compute, appear to not only tolerate low-quality and distractor data but actually benefit from it.
This suggests that computational resources can effectively mitigate the perceived negative impact of "poor" or unfiltered data.
The findings could simplify data preparation pipelines for organizations with substantial compute infrastructure.

Decoder

Pretraining: The initial phase of training a machine learning model on a very large and diverse dataset, allowing it to learn general features and representations before being fine-tuned for specific tasks.
Distractor data: Data that is not directly relevant to the primary learning objective or contains noise, but which the model might still derive useful information from when given sufficient capacity and training.

Original article

A Bitter Lesson for Data Filtering

Abstract:We investigate data filtering for large model pretraining via new scaling studies that target the high compute, data-scarce regime. In spite of an apparently common belief that filtering data to include only high-quality information is essential, our experiments suggest that with enough compute, the best data filter is no data filter. We find that sufficiently trained large parameter models not only tolerate low-quality and distractor data, but in fact benefit from nominally ``poor'' data.

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2605.19407 [cs.LG] (or arXiv:2605.19407v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2605.19407 Focus to learn more arXiv-issued DOI via DataCite

Submission history

From: Christopher Mohri

Tue, 19 May 2026 06:02:36 UTC (458 KB)

Access Paper:

View PDF
HTML (experimental)
TeX Source

Which authors of this paper are endorsers?

DEVOURED

Lance Unified Multimodal Model (GitHub Repo)

AI multimodalopensourcemodel GitHub

ByteDance released Lance, a 3B parameter unified multimodal AI model trained from scratch, demonstrating strong performance in image and video generation, editing, and understanding.

What: ByteDance's Lance is a 3B parameter multimodal model, supporting image and video understanding, generation, and editing within a single framework. It was trained entirely from scratch (except ViT and VAE encoders) using a 128-A100-GPU budget, and its GitHub repository provides code, model weights, and a Gradio demo.

Why it matters: This open-source release from a major player like ByteDance showcases the continued trend of powerful, albeit smaller, multimodal models being made available, pushing the boundaries of what's achievable with more constrained compute.

Takeaway: Developers interested in multimodal AI can explore the Lance GitHub repository to integrate its image and video capabilities into their projects.

Deep dive

Lance is a 3B parameter native unified multimodal model developed by ByteDance.
It supports image generation, image editing, video generation, video editing, image understanding, and video understanding.
The model achieves strong performance on benchmarks like DPG-Bench, GenEval, GEdit-Bench, and VBench despite its relatively small size of 3B active parameters.
It was trained entirely from scratch (excluding ViT and VAE encoders) using a 128-A100-GPU budget.
The GitHub repository provides installation instructions, a unified command-line interface for tasks, and a Gradio demo.
Model weights are available on Hugging Face.
Requires Python 3.10+, CUDA 12.4+, and a GPU with at least 40GB VRAM for inference.
The model achieves a 0.90 overall score on GenEval for image generation, a 7.30 average score on GEdit-Bench for image editing, and an 85.11 total score on VBench for video generation.
The authors highlight multi-task synergy as key to its unified capabilities.

Decoder

Multimodal model: An AI model capable of processing and generating content across multiple data types, such as text, images, and video, simultaneously.
ViT (Vision Transformer): A type of neural network that applies the transformer architecture, originally designed for natural language processing, to image recognition tasks.
VAE (Variational Autoencoder): A type of generative model that learns to encode data into a lower-dimensional latent space and then decode it back, used here for image compression/decompression.

Original article

Lance: Unified Multimodal Modeling by Multi-Task Synergy

Fengyi Fu*, Mengqi Huang*,✉, Shaojin Wu*, Yunsheng Jiang*, Yufei Huo, Jianzhu Guo✉,§ Hao Li, Yinghang Song, Fei Ding, Qian He, Zheren Fu, Zhendong Mao, Yongdong Zhang
ByteDance
* Equal contribution ✉ Corresponding authors § Project lead

🌟 Highlights

Lance is a 3B native unified multimodal model that supports image and video understanding, generation, and editing within a single framework.

Efficient at 3B scale. With only 3B active parameters, Lance delivers strong performance across image generation, image editing, and video generation benchmarks.
Trained from scratch. Lance is built with a staged multi-task recipe and trained entirely from scratch (except for the ViT and VAE encoders; the transformer backbone is trained entirely from scratch) within a 128-A100-GPU budget.

We are actively updating and improving this repository. If you find any bugs or have suggestions, please feel free to open an issue or submit a pull request (PR) 💖.

🎨 Demo

Text-to-Video

Video Editing

Multi-turn Consistency Editing

Intelligent Video Generation

Video Understanding

Question: How many times did the person launch objects on the table? Options: (A) 3 (B) 2 (C) 4

Response: (A) 3

Question: The person makes sets of repeated actions. How many distinct repeated actions did the person do? Options: (A) 2 (B) 3 (C) 4

Response: (A) 2

Question: In which direction does the purple sphere move in the video? Options: (A) Down and to the right. (B) Up and to the left. (C) Up and to the right. (D) The object is stationary.

Response: (A) Down and to the right.

Question: What is the unrealistic phenomenon displayed in the video? Options: (A) The man can manipulate time via phone. (B) Man grabs an object through a phone screen. (C) Chocolate transforms into different objects. (D) Visible means of propulsion enables flight.

Response: (B) Man grabs an object through a phone screen.

Question: Offer a succinct account of the culinary process shown in this video.

Response: Add tomato puree and mix it well with chicken pieces.

Question: Provide a detailed description of the given video, capturing its key moments.

Response: In a sunlit meadow, a small tortoiseshell butterfly rests on a purple flower. A bee, with black and yellow stripes, lands on the same flower. The butterfly flaps its wings gently, while the bee busies itself, collecting nectar. The flower sways slightly in the breeze. The butterfly then takes off, and the bee follows, both heading to the next flower. The scene is a vivid display of insect interaction in a natural setting, with the colors of the butterfly and the bee contrasting against the green background of the meadow. The video captures this peaceful moment in a short 6-second duration.

Text-to-Image Generation

Image Editing

Image Understanding

Question: Is the largest segment greater than sum of all the other segments?

Response: Yes, the largest segment in the pie chart is the blue one, which is the most populous segment.

Question: What percentage of respondents want better border security?

Response: 29%

Question: What is the license plate number of the car?

Response: The license plate number of the car is BX62 BFY.

Question: According to the data from the proprietary market research, how much amount was spent on the promotional meetings and events during 1998?

Response: According to the data from the proprietary market research, the total amount spent on the promotional meetings and events during 1998 was approximately $1.3 billion.

Question: What is the appearance of the Colosseum in Rome, Italy?

Response: The Colosseum in Rome, Italy is a magnificent ancient amphitheater made of stone and concrete. It is characterized by its large, arched openings and its semi-circular shape. The Colosseum has been partially restored, and its original color has faded, but the architectural details, like the columns and the semi-circular arches, are still visible. The lighting in the image highlights the grandeur and historical significance of the Colosseum, making it stand out against the backdrop of the evening sky.

Question: How does a total solar eclipse look like from Earth?

Response: A total solar eclipse looks like a dark circle with a bright, white outer edge. This is a result of the Earth's rotation and the Earth's shadow, which partially obscures the sun's light. The bright white edge of the eclipse is caused by the sun's high energy and its interaction with the Earth's atmosphere, while the dark part of the eclipse is due to the Earth's shadow and the surrounding air currents. The solar eclipse's shape, with its bright white edge and dark center, is similar to the shape of a full moon or a dark disk. It is a natural phenomenon that occurs in the atmosphere of the Earth and is an important part of the solar system.

🚀 Installation

Recommended Environment

Software: Python 3.10+, CUDA 12.4+ (required)
Hardware: A GPU with at least 40GB VRAM is required for inference

Installation Steps

bash ./setup_env.sh

Download Model Weights

Please download all necessary model checkpoints from Lance-3B on Hugging Face and place them in the downloads/ directory.

📚 Usage

Inference

We provide a unified command-line interface for all generation / editing / understanding tasks:

Option 1: Configure and Run the Unified Script

bash inference_lance.sh

Before running, please configure the inference parameters at the top of inference_lance.sh.
Supported tasks: t2i, t2v, image_edit, video_edit, x2t_image, and x2t_video. You can modify TASK_DEFAULT_CONFIGS in inference_lance.py to customize the default data samples for each task.
Note: For all tasks, we recommend following the prompt format used in the provided examples when writing input prompts, as this typically leads to better generation quality.

Option 2: Configure and Run the Unified Script

We provide task-specific one-click commands for different generation, editing, and understanding tasks.

Text-to-Video Generation

bash inference_lance.sh \
  --TASK_NAME t2v \
  --MODEL_PATH downloads/Lance_3B_Video \
  --RESOLUTION video_480p \
  --NUM_FRAMES 121 \
  --VIDEO_HEIGHT 480 \
  --VIDEO_WIDTH 848 \
  --SAVE_PATH_GEN results/t2v

Text-to-Image Generation

bash inference_lance.sh \
  --TASK_NAME t2i \
  --MODEL_PATH downloads/Lance_3B \
  --RESOLUTION image_768res \
  --VIDEO_HEIGHT 768 \
  --VIDEO_WIDTH 768 \
  --SAVE_PATH_GEN results/t2i

Video Editing

bash inference_lance.sh \
  --TASK_NAME video_edit \
  --MODEL_PATH downloads/Lance_3B_Video \
  --RESOLUTION video_480p \
  --SAVE_PATH_GEN results/video_edit

Image Editing

bash inference_lance.sh \
  --TASK_NAME image_edit \
  --MODEL_PATH downloads/Lance_3B \
  --RESOLUTION image_768res \
  --SAVE_PATH_GEN results/image_edit

Video Understanding

bash inference_lance.sh \
  --TASK_NAME x2t_video \
  --MODEL_PATH downloads/Lance_3B_Video \
  --RESOLUTION video_480p \
  --NUM_FRAMES 50 \
  --SAVE_PATH_GEN results/x2t_video

Image Understanding

bash inference_lance.sh \
  --TASK_NAME x2t_image \
  --MODEL_PATH downloads/Lance_3B \
  --RESOLUTION image_768res \
  --SAVE_PATH_GEN results/x2t_image

Available Tasks

Task Name	Description	Example JSON
`t2v`	Text-to-Video generation	`config/examples/t2v_example.json`
`t2i`	Text-to-Image generation	`config/examples/t2i_example.json`
`image_edit`	Image editing	`config/examples/image_edit_example.json`
`video_edit`	Video editing	`config/examples/video_edit_example.json`
`x2t_image`	Image understanding	`config/examples/x2t_image_example.json`
`x2t_video`	Video understanding	`config/examples/x2t_video_example.json`

For understanding examples:

config/examples/x2t_image_example.json: image understanding examples for visual question answering and image-based reasoning.
config/examples/x2t_video_example.json: video understanding examples for video question answering and video captioning.

Parameters

You can configure the following hyperparameters at the top of the inference_lance.sh script:

Parameter	Default Value	Description
`MODEL_PATH`	`"downloads/Lance_3B"`	Path to the downloaded Lance model weights (`Lance_3B` or `Lance_3B_Video`).
`NUM_GPUS`	`1`	Number of GPUs to use for inference.
`VALIDATION_NUM_TIMESTEPS`	`30`	Number of denoising steps (e.g., 30 or 50).
`VALIDATION_TIMESTEP_SHIFT`	`3.5`	Timestep shift parameter for flow matching scheduling.
`CFG_TEXT_SCALE`	`4.0`	Classifier-Free Guidance (CFG) scale for text conditioning.
`VALIDATION_DATA_SEED`	`42`	Random seed for generation reproducibility.
`NUM_FRAMES`	`50`	Number of frames for video generation (Max: 121). Unused for image tasks.
`VIDEO_HEIGHT` / `VIDEO_WIDTH`	`768`	Spatial resolution. Unused for editing tasks (determined by input image/video).
`RESOLUTION`	`"video_480p"`	Base resolution preset (`image_768res` or `video_480p`).

Gradio

python lance_gradio_t2v_v2t.py --gpus 0 --server-port 7860

Benchmarks

DPG-Bench Evaluation

Models	# Params.	Global	Entity	Attribute	Relation	Other	Overall
Generation-only Models
SDXL	3.5B	83.27	82.43	80.91	86.76	80.41	74.65
DALL-E 3	-	90.97	89.61	88.39	90.58	89.83	83.50
SD3-Medium	2B	87.90	91.01	88.83	80.70	88.68	84.08
FLUX.1-dev	12B	74.35	90.00	88.96	90.87	88.33	83.84
Qwen-Image	20B	91.32	91.56	92.02	94.31	92.73	88.32
Unified Models
Janus-Pro-7B	7B	86.90	88.90	89.40	89.32	89.48	84.19
OmniGen2	4B	88.81	88.83	90.18	89.37	90.27	83.57
Show-o2	7B	89.00	91.78	89.96	91.81	91.64	86.14
BAGEL†	7B	88.94	90.37	91.29	90.82	88.67	85.07
InternVL-U	1.7B	90.39	90.78	90.68	90.29	88.77	85.18
TUNA	7B	90.42	91.68	90.94	91.87	90.73	86.76
TUNA-2	7B	89.50	91.40	92.07	91.91	88.81	86.54
🌟 Lance (Ours)	3B	83.89	91.07	89.36	93.38	80.80	84.67

† indicates methods that use LLM rewriters for prompt rewriting before generation.

GenEval Evaluation

Models	# Params.	1-Obj.	2-Obj.	Count	Colors	Position	Attr.	Overall
Generation-only Models
SDXL	3.5B	0.98	0.74	0.39	0.85	0.15	0.23	0.55
DALL-E 3	-	0.96	0.87	0.47	0.83	0.43	0.45	0.67
SD3-Medium	2B	0.99	0.94	0.72	0.89	0.33	0.60	0.74
FLUX.1-dev	12B	0.98	0.93	0.75	0.93	0.68	0.65	0.82
Qwen-Image	20B	0.99	0.92	0.89	0.88	0.76	0.77	0.87
Unified Models
Janus-Pro-7B	7B	0.99	0.89	0.59	0.90	0.79	0.66	0.80
OmniGen2	4B	1.00	0.95	0.64	0.88	0.55	0.76	0.80
Show-o2	7B	1.00	0.87	0.58	0.92	0.52	0.62	0.76
BAGEL†	7B	0.98	0.95	0.84	0.95	0.78	0.77	0.88
Mogao	7B	1.00	0.97	0.83	0.93	0.84	0.80	0.89
InternVL-U	1.7B	0.99	0.94	0.74	0.91	0.77	0.74	0.85
TUNA	7B	1.00	0.97	0.81	0.91	0.88	0.83	0.90
TUNA-2	7B	0.99	0.96	0.80	0.91	0.84	0.76	0.87
🌟 Lance (Ours)	3B	1.00	0.94	0.84	0.97	0.87	0.81	0.90

GEdit-Bench Evaluation

Models	# Params.	BC	CA	MM	MC	PB	ST	SA	SR	SRp	TM	TT	Avg/G_O
Generation-only Models
Gemini 2.0	-	-	-	-	-	-	-	-	-	-	-	-	6.32
GPT Image 1	-	6.96	6.85	7.10	5.41	6.74	7.44	7.51	8.73	8.55	8.45	8.69	7.49
Qwen-Image-Edit	20B	8.23	8.30	7.33	8.05	7.49	6.74	8.57	8.09	8.29	8.48	8.50	8.01
Unified Models
Lumina-DiMOO	8B	3.43	4.27	3.08	2.77	4.74	5.19	4.44	3.80	4.38	2.68	4.20	3.91
Ovis-U1	1.2B	7.49	6.88	6.21	4.79	5.98	6.46	7.49	7.25	7.27	4.48	6.31	6.42
BAGEL	7B	7.32	6.91	6.38	4.75	4.57	6.15	7.90	7.16	7.02	7.32	6.22	6.52
InternVL-U	1.7B	7.08	7.05	6.38	7.02	6.03	6.27	7.13	6.55	6.33	6.59	6.85	6.66
InternVL-U (w/ CoT)	1.7B	7.05	7.87	6.50	6.99	5.77	6.10	7.33	7.16	7.12	7.36	6.46	6.88
🌟 Lance (Ours)	3B	7.73	7.74	7.28	7.83	7.50	7.03	7.64	7.85	7.71	4.46	7.57	7.30

VBench Evaluation (Video Generation)

Type	Model	# Params.	Total Score ↑
Gen. Only	ModelScope	1.7B	75.75
LaVie	3B	77.08
Show-1	6B	78.93
AnimateDiff-V2	-	80.27
VideoCrafter-2.0	-	80.44
CogVideoX	5B	81.61
Kling	-	81.85
Open-Sora-2.0	-	81.71
Gen-3	-	82.32
Step-Video-T2V	30B	81.83
Hunyuan Video	-	83.43
Wan2.1-T2V	14B	83.69
Unified	HaproOmni	7B	78.10
Emu3	8B	80.96
VILA-U	7B	74.01
Show-o2	2B	81.34
TUNA	1.5B	84.06
🌟 Lance (Ours)	3B	85.11

Running Benchmarks

Ready-to-run benchmark scripts are provided under benchmarks/:

Benchmark	Modality	Script
GenEVAL (image gen)	Image	`benchmarks/image_gen/GenEVAL/sample_GenEVAL.sh`
DPG (image gen)	Image	`benchmarks/image_gen/DPG/sample_DPG.sh`
GEdit (image edit)	Image	`benchmarks/image_gen/GEdit/sample_GEdit.sh`
VBench (video gen)	Video	`benchmarks/video_gen/Vbench/sample_vbench.sh`

📄 License

🙏 Acknowledgements

We would like to thank the contributors of BAGEL, Qwen2.5-VL-3B-Instruct, and Wan2.2 for their open research and contributions.

💖 Citation

If you find Lance useful for your project or research, welcome to 🌟 this repo and cite our work using the following BibTeX:

@misc{fu2026lanceunifiedmultimodalmodeling,
      title         = {Lance: Unified Multimodal Modeling by Multi-Task Synergy},
      author        = {Fengyi Fu and Mengqi Huang and Shaojin Wu and Yunsheng Jiang and Yufei Huo and Hao Li and Yinghang Song and Fei Ding and Jianzhu Guo and Qian He and Zheren Fu and Zhendong Mao and Yongdong Zhang},
      year          = {2026},
      eprint        = {2605.18678},
      archivePrefix = {arXiv},
      primaryClass  = {cs.CV},
      url           = {https://arxiv.org/abs/2605.18678},
}

📞 Contact

For questions, issues, or collaborations, please contact Mengqi Huang and Jianzhu Guo.

DEVOURED

LiteFrame Scales Video LLM Efficiency

AI llmvideoperformance Jihwan Kim's Project Page

LiteFrame, a lightweight video encoder, significantly boosts Video LLM efficiency by resolving bottlenecks in vision transformers and language models, allowing 8x more frames to be processed.

What: LiteFrame, a video encoder, addresses the challenge of long-form video understanding in Video LLMs by reducing inefficiencies in both vision transformers and language model processing. It achieves up to a 35% reduction in inference latency, improves accuracy, and uses only 87M parameters compared to a 304M parameter teacher model.

Why it matters: This research highlights a critical shift in optimizing Video LLMs, moving beyond just token reduction to focus on the efficiency of the vision encoder itself. By doing so, it opens the door for practical, scalable long-form video understanding, which is essential for more sophisticated AI applications.

Decoder

Video LLM (Video Large Language Model): A large language model extended to process and understand video data in addition to text.
Vision Transformer (ViT): A neural network architecture that applies the Transformer model, initially developed for natural language processing, to visual tasks.
Compressed Token Distillation (CTD): A novel training framework used by LiteFrame where a compact student vision encoder learns to predict information-dense, spatio-temporally compressed representations generated by a larger teacher vision model.
Language Model Adaptation (LMA): A lightweight fine-tuning stage that aligns the compressed latent space from the vision encoder with the downstream large language model, enabling it to handle extended temporal contexts.

Original article

TL;DR: We propose LiteFrame, a highly efficient video encoder for Video Large Language Models that unlocks scalable, long-form video understanding by resolving inefficiencies in both the LLM and the ViT.

Abstract

The fundamental challenge in scaling Video Large Language Models (Video LLMs) to long-form video lies in managing the explosion of visual-token context length. Existing strategies predominantly focus on "post-hoc" token reduction—reducing visual tokens after feature extraction to alleviate the LLM's computational overhead. While these methods effectively reduce the number of visual tokens, we observe that the primary latency bottleneck then shifts from the LLM to the expensive per-frame processing of the vision encoder.

To address this, we introduce LiteFrame, a strong, yet highly efficient video encoder backbone for Video LLMs. To train LiteFrame, we propose Compressed Token Distillation (CTD), a novel training framework that teaches a compact student vision encoder to directly predict information-dense, spatio-temporally compressed representations produced by a large teacher vision model, effectively bypassing redundant computation. When coupled with further Language Model Adaptation (LMA), this approach results in a new latency-accuracy Pareto frontier. Our results demonstrate a new potential path to unlocking longer-form video understanding under fixed compute budgets.

Main Results

LiteFrame redefines the performance-latency trade-off across multiple video understanding benchmarks, including Video-MME, MLVU, and LongVideoBench.

Unlocking Frame Scaling: By offloading the prefilling bottleneck from the LLM and lowering visual encoding costs, LiteFrame enables the processing of 8x more frames within restricted computing budgets.
End-to-End Efficiency: LiteFrame achieves up to a 35% reduction in total inference latency (vision encoding + LLM prefilling) while consistently improving average video understanding accuracy.
Parameter Reduction: LiteFrame utilizes only 87M parameters, a massive reduction from the 304M parameters of the teacher model.
Zero-Shot Spatial Resolution Scaling: LiteFrame's inherent token efficiency enables scaling in high-resolution videos, achieving a state-of-the-art score on HLVid without any high-resolution training.

Methodology

To train LiteFrame, we propose Compressed Token Distillation (CTD) and Language Model Adaptation (LMA).

Compressed Token Distillation (CTD): The student encoder is trained to directly predict information-dense, spatio-temporally compressed supervision targets generated by applying Weighted Average Pooling (WAP) to a large teacher model's output.
Language Model Adaptation (LMA): A lightweight fine-tuning stage aligns the compressed latent space with the downstream LLM, allowing it to seamlessly handle extended temporal contexts (up to 512 frames).
Spatio-Temporal Token Compressive Architecture: Our lightweight student encoder significantly reduces FLOPs and latency by employing depth-wise 1D convolutions for temporal modeling and strided convolutions for downsampling.

Citation

@article{kim2026liteframe,
  title={LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs},
  author={Kim, Jihwan and Parthasarathy, Nikhil and Qin, Danfeng and Hur, Junhwa and Sun, Deqing and Han, Bohyung and Yang, Ming-Hsuan and Gong, Boqing},
  journal={arXiv preprint arXiv:2605.17260},
  year={2026}
}

DEVOURED

Cheap AI could derail OpenAI and Anthropic's IPOs

AI startupmarketspolicy CNBC

OpenAI and Anthropic's anticipated IPO valuations, projected at over $800 billion each, are threatened by rapidly decreasing AI costs and fierce competition from cheaper, efficient models, particularly from Chinese labs.

What: OpenAI and Anthropic's expected IPOs face challenges as AI costs fall and competition intensifies. Chinese labs like DeepSeek, Kimi, and Zhipu offer models at a fraction of the price (e.g., Anthropic's Claude is 9x more expensive than the cheapest Chinese alternative) with comparable capabilities. U.S. competitors like Nvidia, Cohere, and Reflection are also pushing cheaper, open-source, and enterprise-focused alternatives, leading enterprises to adopt "advisor models" to curb costs.

Why it matters: This signals a potential shift in the AI market from a premium-pricing, frontier-model dominance to a more cost-sensitive, commoditized landscape. The rapid improvement and cost-effectiveness of smaller and open-source models, especially from China, are eroding the perceived "moat" of leading U.S. labs, forcing a reevaluation of business models and valuations.

Decoder

IPO (Initial Public Offering): The first time a company offers its shares for sale to the general public on a stock exchange.
Advisor model: A strategy where a cheaper, often open-source, AI model handles most tasks, but is configured to call upon a more powerful (and expensive) frontier model for assistance when it encounters tasks it cannot solve.
Frontier model: Refers to the most advanced and capable AI models available at a given time, typically developed by leading AI labs like OpenAI and Anthropic.
Capex (Capital Expenditure): Money spent by a business on acquiring or maintaining fixed assets, such as buildings, land, machinery, or in this case, AI training infrastructure and expensive chips.

Original article

This earnings season, the cost of AI started showing up in the numbers. Meta, Shopify, Spotify, and Pinterest all flagged rising AI and inference costs as a drag on margins. Shopify said economies of scale were "partially offset by increased LLM costs."

This is the bill coming due for the pricing model that underpins OpenAI's and Anthropic's expected IPO valuations, both projected north of $800 billion. Those numbers assume OpenAI and Anthropic will hold their market share and pricing power — that competitors can't easily catch up, and that enterprise customers will keep paying a premium because there's no real alternative.

But increasingly the data is pointing the other way. Cutting-edge AI is becoming abundant and cheap. Chinese labs are charging a fraction of what American labs do for comparable work, while a wave of Western challengers — Nvidia, Cohere, Reflection, Mistral — are building cheaper, smaller, more efficient alternatives for enterprises that won't touch a Chinese model. By the time OpenAI and Anthropic file their prospectuses, with OpenAI's confidential filing coming as soon as this week, the central premise of their valuations may already be gone.

The cost gap is wide and getting wider. Enterprise AI budgets have surged. Some 45% of companies surveyed by cloud cost firm CloudZero said they spent more than $100,000 a month on AI in 2025, up from 20% the year before. Where that money goes increasingly matters. AI benchmarking firm Artificial Analysis runs every major model through the same 10 evaluations and tracks the total cost. For each lab's most capable model: Anthropic's Claude came in at $4,811. OpenAI's ChatGPT: $3,357. DeepSeek: $1,071. Kimi: $948. Zhipu's GLM: $544. Claude is nearly nine times more expensive than the cheapest Chinese alternative for the same workload.

Even Google is making the case. At its I/O developer conference this week, CEO Sundar Pichai said "many companies are already blowing through their annual token budgets, and it's only May," and pitched the company's cheaper Flash model as the answer. If the largest Google Cloud customers shifted 80% of their workloads from frontier models to Gemini 3.5 Flash, Pichai said, they would save more than $1 billion a year. The company is acknowledging that enterprises need cheaper options.

And the cheap alternatives are no longer a step behind. DeepSeek, the Chinese AI lab whose model triggered a U.S. tech selloff last year, released a preview of its next-generation model last month that matches or nearly matches the latest from OpenAI, Anthropic, and Google on coding, agentic, and knowledge benchmarks. Models from other Chinese labs, including Moonshot, Xiaomi, and Zhipu, have shipped at similar capability levels in the past four months.

Databricks CEO Ali Ghodsi has a real-time view of the shift. The company's AI gateway sits between thousands of enterprise customers and the models they're using, and Ghodsi said revenue from that product is climbing sharply.

The technique enterprises are deploying, he said, is called an "advisor model." A cheap open-source model handles the bulk of the work as the default. When it hits a task it can't solve, it's given a tool that lets it call out to a frontier model from OpenAI or Anthropic for help.

"You can curb costs really well this way," Ghodsi said.

The speed of the shift is striking. On OpenRouter, a marketplace that lets developers access hundreds of AI models through a single interface, Chinese models went from about 1% of usage in 2024 to more than 60% in May.

And vendors are starting to sell cost reduction as a product. Figma CEO Dylan Field said companies are moving through three phases of AI adoption: first, nobody uses it; second, everyone has to, with some "literally holding competitions of who can spend the most with tokens." And third is the realization that "everyone's spending too much" and has to cut back. Many enterprises, he said, are now entering that third phase. Figma is selling features that cut customers' token consumption by 20 to 30%.

U.S. vs. China

The cost gap reflects how the two sides are built. American frontier labs are running on hundreds of billions of dollars in capex, training ever-larger models on the most expensive chips Nvidia sells, inside a U.S. power grid that can't add capacity fast enough. Those costs get passed through to customers. For Chinese labs, constraint has become the strategy. Working under chip export restrictions, they've been forced to optimize aggressively — training competitive models with less compute and running them more efficiently.

The American labs' best defense is trust. Cohere CEO Aidan Gomez, whose company sells AI models specifically to banks, defense agencies, and other regulated industries, says those buyers won't touch Chinese models regardless of price. Cohere's revenue grew sixfold last year selling into exactly that segment. But it's a relatively narrow slice of the broader enterprise market. Outside of regulated industries, where security and compliance rules are looser, the case for paying a premium gets harder to make.

The American response is taking shape. Nvidia, the company that has profited most from the AI boom, is now publicly pushing a different model, releasing its own AI systems that any company can download and run on its own servers, free of charge, as an alternative to both Chinese options and the locked-down models from OpenAI and Anthropic. Reflection AI raised at a multibillion-dollar valuation specifically to build American open-source models for enterprises that want a domestic alternative. Both are well-capitalized and explicitly targeting the same gap — capable models, cheaper than the frontier, deployed on infrastructure U.S. enterprises already trust.

The case against this shift has rested on national security. But the objection is dissolving in practice. Even the U.S. government's AI Safety Institute, which flagged DeepSeek models as lagging American ones on security and performance, documented that downloads have risen nearly 1,000% since the R1 release in January 2025.

And Anthropic itself acknowledges the pressure. In a policy paper released in May, the company said U.S. models are only "several months ahead" of Chinese ones, and warned that Beijing is "winning in global adoption on cost."

OpenAI sees it differently. A person familiar with the company's thinking said every release of a new frontier model, including GPT-5.5 last month, has driven a surge in API and product usage, with enterprise demand growing in what they described as a "vertical wall." Open source has a role in low-stakes tasks, this person said, but isn't eating into the company's core business. Pricing pressure isn't on the company's top ten list of concerns.

But an enterprise AI CEO, who asked not to be named to protect customer relationships, offered a different read. The growth is real — “but it would expand even faster for frontier if this technique wasn't used.”

This is the market OpenAI and Anthropic are expected to ask public investors to value. At nearly trillion-dollar valuations each, the S-1 has to show enterprise revenue growth and concentration that justifies the multiple. But the premium that justifies the valuation is eroding fastest in exactly the segments the labs need to dominate.

WATCH: OpenAI preparing for confidential IPO filing

DEVOURED

Better Experiments with LLM Evals — A funnel, not a fork

AI dataexperimentationllm Spotify Engineering

Spotify's "evaluation funnel" strategy combines LLM evals with A/B testing to verify AI output quality before validating real user impact, improving experiment efficiency.

What: Spotify engineers Matilda Ankargren and Mårten Schultzberg describe how LLM evals, automated judges, verify aspects like relevance and coherence before A/B tests validate user responses, cutting down on tests that fail to meet quality standards. Spotify's A/B tests yield 12% positive results, with 42% rolled back due to secondary metric regressions.

Why it matters: This article illustrates a mature approach to integrating AI quality assessment into existing product development workflows, emphasizing calibration and the distinct roles of offline verification and online validation to prevent costly regressions.

Takeaway: If developing AI features, integrate LLM evaluations as a verification step before full A/B tests to catch quality issues early and inform experiment design.

Decoder

LLM eval: An automated judge, often powered by a Large Language Model, used to assess qualitative aspects of content or outputs, such as relevance, coherence, tone, or intent alignment, at scale.* Evaluation funnel: A concept where different evaluation methods are applied sequentially, with cheaper, faster methods (like LLM evals) used earlier to filter out poor candidates before more expensive, rigorous methods (like A/B tests) are used for final validation.

Original article

Better Experiments with LLM Evals — A funnel, not a fork

TL;DR LLM evals, automated judges that assess relevance, coherence, and quality at scale, are a powerful new tool. Paired with online experiments, they raise the hit rate of what we test and create a feedback loop that makes both evals and experiments smarter over time.

At Spotify, only about 12% of A/B tests end in a shipped positive result. Around 64% produce valid learning: a regression caught, an idea ruled out, a hypothesis refined. The win rate understates the value of experimentation.

Now we have a new capability. LLM evals can assess dimensions we couldn't scale before (relevance, coherence, tone, intent alignment) faster and cheaper than human annotation, on any data from test sets to A/B test variants. Evals and experiments measure different things. The right relationship is a funnel, not a fork. Schultzberg and Ottens (2024) call it an evaluation funnel, where evals belong before your experiment, not instead of it. A strong eval stack means you don't test to find out if the change does what you intend. Evals already told you that. You test to validate the intended change drives the business outcome it was meant to, and to bound the risk of harming the business.

What evals give us, and what they don’t

Schultzberg and Ottens distinguish verification from validation. Evals verify: does the output conform to quality standards? Experiments validate: do real users respond as predicted? Evals discard the non-promising candidates before they consume experiment bandwidth. They raise the hit rate of the experiments that follow.

Evals also generate hypotheses. Consider a team that builds an LLM judge to flag trust-breaking content, say a recommendation shared with a user it doesn't fit. The judge surfaces patterns the team didn't know to look for. Those patterns become product fixes. After the fix ships, the same judge can verify it worked: the flagged violations should drop. That's the eval doing two jobs: discovering what to improve, and confirming the improvement was realized.

What the eval can't tell you is whether users who received the improved version actually had better outcomes: whether the fix prevented the slow erosion of trust that eventually leads to churn. This question requires an experiment.

Beyond the dimensions you're measuring are the ones you aren't measuring. At Spotify, teams roll back about 42% of launched experiments to prevent regression in secondary metrics: session length dropping, crash rates climbing, retention eroding. No evals or offline evaluation flagged those. As we've described in our work on guardrail metrics, the point of a guardrail is to watch dimensions you care about but aren't optimizing for. An eval measures quality of implementation in one dimension. An experiment quantifies the impact on systems in production and end users.

Two calibration layers, one feedback loop

Evals are proxies. They substitute a score for an outcome you actually care about. That substitution is only valid as long as the score tracks the real outcome, the same dynamic we've described with proxy metrics.

Now LLM judges add a second calibration layer on top of traditional quantitative metrics (ranking scores, precision, recall). Both layers need validation against online outcomes. Both can drift. When the judge says Variant A is better, does it actually deliver a better user experience, or is the judge rewarding surface patterns that don't drive outcomes?

For example, when Anthropic released the Opus 4.5 model, Qodo's coding evals showed no improvement, but the model had improved substantially on longer tasks a controlled experiment would have surfaced. Miscalibration runs both ways. Without offline-online signal calibration, our evals are opinions, not evidence. By construction, long-running tasks and long-term behaviour are challenging to capture with evals. By continuously adjusting the evals to improve their mapping to online outcomes, the evals are becoming better and better verification tools. We are not ruling out that in the future, as AI develops, evals can map well enough to start acting as validations: By having the offline/online calibration loop in place we have continuous transparency on what role evals can play in the evaluation funnel as AI keeps improving.

Teams under speed pressure sometimes call A/B tests "costly." We know from experience that shipping without an experiment can be incredibly costly, if a major regression in top business metrics goes undetected. The more complex the system, the more important it is to bound the risk.

Close the loop

Run evals early and often to find the best treatments. Then let the experiment validate that real users and systems respond as predicted, and monitor the metrics you didn't optimize for. Not every change needs the same evidence: quick directional tests for iteration and data gathering, rigorous tests for ship decisions.

Then: run your LLM evals on the A/B test data itself. Did the version the judge preferred actually perform better with users? This extends the traditional evaluation funnel. LLM judges let us ask not just "did the metric move?" but "did the qualitative aspects change?" When the gap between eval scores and experiment outcomes is large, that's diagnostic gold. Each cycle helps calibrate the next.

Return to the trust-breaking recommendation team: the experiment is the final step. If users who received the improved version show better long-term engagement, the team has confirmed that what the judge measures actually matters. If the judge scores improved but user outcomes didn't, that's the calibration signal: the judge is capturing something, but not the thing that drives value. Both results make the system smarter.

Spotify already has a strong evaluation culture in the shape of experimentation. LLM evals extend that culture upstream, with a clear role in the funnel: find the best treatments before the experiment, and calibrate the judges after it. As Ankargren (2025) argues, success comes from doing the basics well at scale. The value compounds when the system is simple enough to use, and rigorous enough to trust.

DEVOURED

Alibaba unveils new AI chip in push for domestic alternatives

AI hardwarechinaenterprise Reuters

Alibaba's T-Head subsidiary unveiled the new Zhenwu M890 AI chip and a multi-year roadmap, tripling performance for AI agent workloads amid tightening US export curbs.

What: Alibaba's semiconductor unit T-Head released the Zhenwu M890 AI chip, offering three times the performance of its predecessor, the Zhenwu 810E, and optimized for AI agent workloads' memory and communication demands. Alibaba also committed 380 billion yuan ($53 billion) to cloud and AI infrastructure and revealed a roadmap for successors like V900 (Q3 2027) and J900 (Q3 2028).

Why it matters: This move underscores China's aggressive push for AI self-sufficiency in hardware, driven by U.S. export restrictions, and signals increasing investment in specialized silicon for the growing demands of AI agents and large language models.

Decoder

AI agent: A software system capable of performing complex, multi-step tasks with limited human oversight, often by planning, reasoning, and interacting with its environment.* Zhenwu M890: Alibaba's new AI chip designed by its T-Head subsidiary, optimized for AI agent workloads.* Qwen 3.7-Max: The latest version of Alibaba's flagship large language model, engineered for advanced coding and long-running agent tasks.

Original article

Alibaba unveils new AI chip in push for domestic alternatives

BEIJING/SINGAPORE, May 20 (Reuters) - Alibaba Group on Wednesday unveiled a new AI chip, the Zhenwu M890, as the Chinese technology giant intensifies efforts to build domestic alternatives to ‌Nvidia processors amid tightening U.S. export curbs.

The chip, developed by Alibaba's semiconductor design subsidiary ‌T-Head, delivers three times the performance of its predecessor, Zhenwu 810E. It is purpose-built for the emerging wave of AI "agents" — software systems that can carry out complex, multi-step tasks with limited human oversight.

Alibaba said the new processor is well-suited to handle the heavy memory and communication demands of agent workloads, where models must retain long stretches of context and coordinate with one another in real time.

The company also outlined a ‌multi-year chip roadmap, saying it would ⁠follow the M890 with a successor called the V900 in the third quarter of 2027, and a further chip, the J900, in the third quarter of ⁠2028. The V900 is expected to deliver another roughly threefold performance gain over the M890, Alibaba said, signalling a sustained cadence of in-house silicon upgrades.

The plan underscores China's growing efforts to produce locally developed AI chips as Washington bans the sale of the most powerful U.S. processors to Chinese customers, and ‌follows a similar announcement by Huawei last year.

Hangzhou-based Alibaba last year pledged to spend more than 380 billion yuan ($53 billion) on cloud and AI infrastructure over three years, its largest-ever commitment to the sector.

The investment reflects a broader bet across China's technology industry that demand for AI computing power will continue to surge as enterprises adopt agent-based applications.

Alibaba unveiled the chip at its annual Alibaba ‌Cloud Summit, alongside a new server system, the Panjiu AL128, which packages 128 of the accelerators into a single rack.

The system is available immediately to Chinese enterprise customers through Alibaba Cloud's domestic model platform, known as Bailian.

T-Head said it ‌has shipped more than 560,000 Zhenwu units to date, with over 400 external customers across 20 industries, including automakers and financial services firms, having deployed the chips.

Alibaba also announced Qwen 3.7-Max, the latest version of its flagship large language model, which it said is engineered for ‌advanced coding and long-running agent tasks. The company said the model can operate continuously for up to 35 hours without performance degradation.

DEVOURED

Google adds llms.txt check to Chrome Lighthouse

AI webpolicyfrontend Search Engine Land

Google has integrated an llms.txt check into Chrome's Lighthouse audits under "Agentic Browsing" to control how LLMs interact with websites.

What: Google now includes an llms.txt check in Chrome's Lighthouse auditing tool, found within the "Agentic Browsing" category. This aims to provide webmasters with a mechanism, similar to robots.txt, to manage how large language models and other AI agents crawl and interact with their site content.

Why it matters: This development signals a growing recognition from Google of the need for web standards and controls specifically for AI agent interaction, addressing potential issues like unauthorized data scraping or unexpected bot behavior on websites.

Takeaway: If you develop websites, investigate how llms.txt works and consider implementing it to control AI agent access to your content as this standard gains traction.

Decoder

llms.txt: A proposed standard text file, similar to robots.txt, that website owners can use to specify rules for how Large Language Models (LLMs) and other AI agents should interact with their website content, such as crawling or data usage.* Lighthouse: An open-source, automated tool from Google for improving the quality of web pages, running audits for performance, accessibility, SEO, and more.* Agentic Browsing: A category within Google Lighthouse audits that pertains to how automated agents, like LLMs, interact with and process web content.

Original article

Google introduced an llms.txt check in Chrome's Lighthouse audits under the "Agentic Browsing" category to enhance machine interaction with websites.

DEVOURED

OpenAI Is Preparing to File for an IPO Very Soon

Tech aistartupfinance Wall Street Journal

OpenAI is preparing an IPO filing within weeks, targeting a September public debut despite missing internal revenue targets and concerns about supporting high spending commitments.

What: OpenAI is working with bankers to file for an IPO in the coming days or weeks, with a potential public debut as early as September, though plans are still fluid, and the company faces challenges regarding revenue generation vs. spending.

Why it matters: The push for an OpenAI IPO, despite profitability concerns and missed targets, indicates intense investor pressure for an exit and reflects the challenge for even leading AI companies to translate massive technological investment into sustainable, high-growth financial performance.

Original article

OpenAI has been working with bankers to prepare to file for an IPO in the coming days or weeks. The company aims to be ready to go public as early as September. Its plans remain fluid and could still change. The company still has to overcome concerns about whether it can generate enough revenue to support its spending commitments, among a host of other challenges. It recently missed multiple internal revenue and user targets.

DEVOURED

Chickens without eggs? De-extinction company creates artificial egg

Tech researchsciencebiotechhardware Ars Technica

Biotech startup Colossal, known for de-extinction, has developed an artificial eggshell allowing nearly full avian development, simplifying embryo manipulation and observation for research.

What: Colossal, the biotech startup co-founded by Ben Lamm, announced on May 20, 2026, an artificial eggshell device that supports almost the entire avian developmental process outside the natural shell, enabling easier embryo manipulation and continuous observation. The device supports the entire egg contents, allowing normal chick development from an embryo transferred on day one, before the circulatory system forms.

Why it matters: This advancement by Colossal, primarily for its dodo and moa de-extinction projects, also provides a significant new tool for developmental biologists to study embryogenesis dynamically, by open-sourcing the technology, which could accelerate research in fundamental biology.

Takeaway: Developmental biologists working with avian species should reach out to Colossal if they believe this artificial egg technology could benefit their research, as the company plans to make it available for free.

Deep dive

Colossal, a biotech startup focused on de-extinction, has developed an artificial eggshell capable of supporting nearly the entire avian developmental process.
The device allows the transfer of egg contents within a day or two of laying, even before the circulatory system forms, leading to normal chick development.
This artificial environment makes it easier for scientists to perform genetic or surgical manipulations on embryos and continuously film and track cellular movements and rearrangements.
The 3D-printed support system maintains proper membrane tension and allows efficient oxygen exchange in a normal atmosphere, only requiring calcium supplementation.
Colossal intends to use this technology for its de-extinction efforts, particularly for species like the dodo and moa, which are significantly larger than related existing birds, necessitating external nutrient supplementation.
The company, led by Ben Lamm, plans to open-source the technology for research purposes, not charging labs for its use, indicating a broader scientific impact beyond de-extinction.

Decoder

De-extinction: The process of bringing an extinct species back to life, typically through genetic engineering and cloning.
Embryogenesis: The process by which an embryo forms and develops from a zygote.
Developmental biology: The study of the processes by which organisms grow and develop.

Original article

On Tuesday, biotech startup Colossal announced its newest development on the road to its announced goal: reversing the extinction of species, in this case, avian species. The development itself is essentially an artificial eggshell, one that allows almost the entire developmental process to occur without the shell. The company transferred the contents of eggs to their specially designed container within a day or two of laying and were able to have normal chicks walk away from it.

Beyond its potential utility for Colossal’s intended efforts, the work is personally interesting to me because it may solve a problem I faced in my research days. I’m going to start by describing the research problem that Colossal may have solved, before coming back to what it hopes to use its technology to do—and why the company still has a few key hurdles left to overcome.

Watching development

For part of my career, I studied the development of vertebrates using chickens. While they’re less closely related to us than something like mice, the basics of their development are largely the same. And, unlike mice, they develop outside of their mother’s body. If you’re careful, you can chip away a hole in the egg, perform manipulations on the developing embryo, and then seal it back up with some tape. The chicken embryo will keep developing, allowing you to see the impact of what you’ve done on normal development.

Manipulations include everything from surgically removing key tissues to implanting beads soaked with signaling molecules to injecting DNA into cells to instruct them to make a different set of proteins. Any of these can alter the development of the embryo, telling us things about the factors that are normally required.

While this has been incredibly powerful, it provides us with a limited view of key events. That’s because you’re only allowed two time points: the moment you perform the manipulations, and when you stop the experiment. You don’t have a complete picture of how things change in between the two. You can repeat the experiment and stop things at different time points, but you don’t really get a complete picture of what is a dynamic process.

This is especially true because development involves a lot of motion: cells move around, tissues rearrange and slide past each other. For example, the spinal cord starts out as a flat plate of neural tissue, but then rolls up into a tube. As the cells mature into neurons, they detach from the inner surface of the tube, move to new locations, and start sending out axons to connect with other neurons.

There was a potential solution to this. A researcher down the hall (Kat Hadjantonakis) developed a microscope system that automated taking repeated exposures of embryos over time, allowing her team to track how cells moved about during key developmental processes. It worked, but only if you could get the embryo to survive in culture. This could work for a day or two with mouse embryos, but chickens were really difficult.

There were two big problems. The chicken embryo is embedded in the membrane that encloses the yolk, and the tension on the membrane provided by the yolk is needed for the embryo to develop properly. Let the yolk leak out and the membrane will sag, leaving the embryo a crumpled, disorganized mess. (Colossal told Ars that the curvature of the container it developed had to be tweaked to maintain the proper tensions within the egg’s membranes.)

The other issue is that the embryo’s developing circulatory system extends deeply into the yolk. Most embryos I tried to image ended up being disorganized messes with no blood.

Development without the shell

Colossal has basically solved that problem. It made a structural support that supports the entire contents of the egg in a way that keeps everything intact so that the embryo develops normally. No problems with a lack of membrane tension or the loss of blood. In fact, the transfer of the egg contents to Colossal’s new device can take place before the circulatory system even forms. (In the work they’re describing, transfers are done on day one of development, when the embryo is largely a smudge of cells on the surface of the yolk.)

The support system is 3D-printed and lined with a special membrane that allows oxygen to be exchanged with the environment. Previous efforts to get this to work had to put the embryo in a high-oxygen environment, which increases the chance of DNA damage from reactive oxygen in the cells. The membrane is efficient enough that the embryo can develop in a normal atmosphere, though humidity has to be controlled. Colossal’s Ben Lamm told Ars that the only thing that needed to be added was calcium, as the embryo normally extracts a bit of that from the interior of the egg shell.

Colossal also confirmed that, due to density differences, the yolk naturally floats to the top of the container, with the embryo rotating to the top of that. So, once the egg is placed in this device, all the manipulations that biologists normally do should be possible. And, because it only requires a humidified chamber, it should be possible to film the embryo as it develops afterward and track any changes to cell movements and rearrangements. The company has even designed the container so that light can be diffused in from beneath for microscopy purposes.

In other words, Colossal seems to have solved a problem I no longer have (since I’m now a journalist) but is likely still an issue for biologists. However, the company did so purely as a necessary step for one of its de-extinction projects.

Not all eggs are created equal

Why does it need to externalize the contents of eggs? It comes back to two of its planned de-extinctions, the dodo and the moa. Both of these species are far, far larger than the nearest related species. In the moa’s case, it’s far larger than any existing birds. If you want to make something that big, then there’s simply no way of taking an egg from an existing species and putting a moa embryo in it. So, one of Colossal’s next steps will be to see if it can supplement an egg—do things like add enough nutrients to the yolk to support the growth of a larger embryo.

This likely can’t be done before the embryo is in place, as simply pumping more material into the yolk would likely cause the membrane enclosing it to burst. Instead, they’re likely to have to add or exchange material as the embryo is developing.

The other issue they’ll have to contend with is the fact that embryonic development starts while the egg is still inside its parent. So the team will have two choices. One option is that they will need to figure out how to get the first half day or so of development to proceed without an egg, and then transfer that growing embryo into an egg. The alternative is that they’ll have to figure out how to fertilize eggs after their contents have been transferred to this device.

But some of those are challenges specific to de-extinction. For any researchers who think this could benefit their work, the company would be happy to hear from you. “I believe there will be labs that want to use this just for research purposes, which is awesome,” Lamm told Ars. “And by the way, we’re not going to charge for that. We’re going to just give it away.”

DEVOURED

Making Our Monorepo Ergonomic for Agents

Tech agentsdevopsengineeringsoftware-development Basis (via X)

Basis successfully refactored its monorepo in three months to be "ergonomic" for code agents, using principles like verifiability and canonical context, unlocking substantial payoffs.

What: The company Basis spent three months making its monorepo ergonomic for code agents, discovering that the process is more complex and less intuitive than expected, but yields significant benefits by addressing agents' unique failure modes and context requirements. They achieved this by applying principles such as verifiability, interoperability, and canonical context.

Why it matters: As AI code agents become more prevalent in development workflows, this experience from Basis highlights a nascent but critical area of "agent-native" software engineering, suggesting that traditional repository structures and practices will need re-evaluation to maximize AI productivity.

Takeaway: Software engineering teams exploring code agents should anticipate significant effort in adapting their monorepo structures, focusing on making code bases verifiability, interoperability, and providing canonical context for agents.

Decoder

Monorepo: A single repository containing multiple distinct projects, often with shared code and dependencies, managed by a single team or organization.
Code agents: AI systems designed to understand, write, debug, and refactor code, often operating within a developer's workflow or directly on a codebase.
Ergonomic (for agents): Designed to optimize the efficiency and minimize the errors of AI code agents when interacting with a codebase, considering their unique needs for context, structure, and verifiability.

Original article

Code agents have their own failure modes, appetite for context, and demands on what counts as a well-organized repository. Companies need to take this seriously as the work to make a repository ergonomic for agents is bigger than expected, and the principles are non-obvious. However, the payoff is substantial. This article looks at how Basis made its codebase ergonomic for agents in three months using principles rooted in verifiability, interoperability, and canonical context.

DEVOURED

How I Choose Which Cloudflare Employees to Replace With AI

Tech aicareerinfrastructure WSJ

Cloudflare used AI to identify positions for layoffs, cutting over 20% of its workforce two weeks ago, primarily in measurement tasks, while simultaneously increasing open positions.

What: Cloudflare recently laid off more than 20% of its workforce, many in measurement roles. The company is using new AI tools to better measure business performance, allowing human teams to focus on value creation. Despite the layoffs, Cloudflare has a record number of open positions and expects overall employee growth.

Why it matters: This reflects an ongoing trend where AI isn't just automating tasks but also enabling companies to restructure, shifting human roles towards higher-value, creative, and strategic functions, even if it means initial workforce reductions in specific areas.

Original article

Cloudflare laid off more than 20% of its workforce two weeks ago, many of whom were responsible for various measuring tasks. The company now has better tools to measure exactly how the business is performing. Cloudflare has a record number of open positions, and it expects its number of employees to grow. AI is allowing the company to better measure itself so the humans on its teams can focus on creating and capturing value by building and selling.

DEVOURED

Jeff Bezos describes his $38B startup Prometheus for the first time: ‘Nothing to do with robotics'

Tech startupaihardwaredesign GeekWire

Jeff Bezos revealed his $38 billion startup, Project Prometheus, is building an "artificial general engineer" for designing physical objects, clarifying it's not a robotics company.

What: Jeff Bezos's Project Prometheus, funded with $6.2 billion, has 120 employees from OpenAI, DeepMind, Meta, and xAI. It's developing next-generation Computer-Aided Design tools, dubbed an "artificial general engineer," to aid companies like Blue Origin in designing physical objects.

Why it matters: This highlights a significant investment into AI for physical product design and engineering, moving beyond software-centric AI applications to tackle complex real-world manufacturing and development challenges.

Decoder

Artificial general engineer (AGE): A hypothetical AI system capable of autonomously designing, simulating, and optimizing complex physical systems and objects, functioning across various engineering disciplines.

Original article

Jeff Bezos' startup, Project Prometheus, is developing an artificial general engineer and building next-generation tools for designing physical objects, similar to a very modern version of Computer-Aided Design. The tools Prometheus is building will help companies like Blue Origin immensely. Project Prometheus launched with $6.2 billion in funding and has roughly 120 employees from firms including OpenAI, DeepMind, Meta, and xAI. It was previously incorrectly reported to be an AI robotics company.

DEVOURED

The Evolution of Cassandra Data Movement at Netflix

Data infrastructurecloudcassandraspark Netflix Technology Blog

Netflix revamped its Cassandra data movement engine, now processing 3 PB/day by reading backups directly from S3 and converting them to Spark DataFrames, enabling optimized connectors for various data abstractions.

What: Netflix moved from an older Cassandra-to-Iceberg movement engine to a new layered platform. This platform reads Cassandra backups directly from S3, converts them into Spark DataFrames, and allows different data abstractions to build optimized connectors. The system processes about 3 petabytes of data daily and was migrated with zero downtime using shadow validation and a Maestro Decider for fallback.

Why it matters: This evolution demonstrates how large-scale data platforms like Netflix iterate on foundational data infrastructure to improve efficiency, flexibility, and maintainability for critical data movement tasks, moving towards more direct data access and standardized processing frameworks like Spark.

Takeaway: If you need to move large volumes of data from NoSQL databases to analytical stores, consider processing backups directly from object storage rather than live database instances to reduce load and improve efficiency, leveraging frameworks like Spark for transformation.

Decoder

Apache Cassandra: A free and open-source distributed wide-column NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
Apache Iceberg: An open table format for huge analytic datasets, designed to improve on existing table formats by supporting schemas, hidden partitioning, and schema evolution.
Spark DataFrames: A distributed collection of data organized into named columns, conceptually equivalent to a table in a relational database or a data frame in R/Python, offered by Apache Spark for structured data processing.
Maestro Decider: Likely a custom Netflix service or component used for decision-making and fallback logic within their operational orchestration system, Maestro.

Original article

Netflix replaced its Cassandra-to-Iceberg movement engine with a layered platform that reads backups directly from S3, converts them to Spark DataFrames, and lets each data abstraction build its own optimized connector. The engine moves about 3 PB/day, migration uses shadow validation, enhanced observability, and a Maestro Decider fallback to the prior solution, enabling a transparent cutover with zero downstream code changes.

DEVOURED

How We Cut BigQuery Slot Usage by 90% On One Of Our Most Resource Hungry Service After a Production Outage

Data cloudperformancebigquery Teads Engineering

Teads slashed BigQuery slot usage by over 90% on their Audience Planning service following an outage, by coalescing requests, optimizing SQL queries, and refining data models and partitioning.

What: Teads significantly reduced BigQuery slot consumption for their Audience Planning service. This was achieved through application-level changes like using Redis distributed locks to coalesce duplicate queries and implementing fail-fast validation for oversized filters, alongside SQL query rewrites (e.g., replacing large IN clauses with semi-joins). Data model improvements included data type compression, precomputing repeated work, and a better partitioning strategy, which collectively cut the effective table footprint by approximately 95%.

Why it matters: This case study illustrates how a combination of application logic optimization, SQL query tuning, and data modeling best practices can yield dramatic cost and performance improvements in cloud data warehouses like BigQuery, especially under load. It also underscores the importance of responding to production outages with holistic architectural reviews.

Takeaway: Regularly audit and optimize your application's interaction with cloud data warehouses: coalesce duplicate queries, validate inputs early, rewrite inefficient SQL patterns (e.g., large IN clauses), and ensure your data model and partitioning strategy are efficient for your query patterns.

Decoder

BigQuery: A fully managed, serverless enterprise data warehouse offered by Google Cloud that enables super-fast SQL queries using the processing power of Google's infrastructure. "Slots" represent units of computational capacity for queries.
Redis distributed locks: A mechanism using Redis to ensure that only one process or application instance can execute a specific block of code or access a resource at a time across a distributed system, preventing race conditions or duplicate work.
Semi-join: A type of database join operation that returns all rows from the left table for which there is at least one match in the right table, without including any columns from the right table.

Original article

Teads dramatically cut BigQuery slot usage by 90%+ on their Audience Planning service through application fixes (request coalescing with Redis distributed locks to eliminate duplicate queries, fail-fast validation for huge filters, and rewriting large IN clauses as semi-joins) combined with data model optimizations (compressing data types, precomputing repeated work, and an improved partitioning strategy), reducing the effective table footprint by ~95%.

DEVOURED

What's Easy Now? What's Hard Now?

Data aisoftware-developmentresearch Marc Brooker's Blog

Marc Brooker predicts that AI coding agents' long-term capabilities will hinge on effective feedback loops, making formally specifiable tasks "easy" and those needing subjective human feedback "hard."

What: Marc Brooker, an engineer at AWS, argues that the future effectiveness of AI coding agents depends more on the quality of feedback loops than on raw LLM intelligence. Tasks with fast, accurate, and automated feedback (like building high-performance databases with formal specifications) will become "easy" for agents. Conversely, tasks requiring slow, subjective human feedback (like designing delightful user interfaces or complex distributed systems with unclear specifications) will remain "hard."

Why it matters: This perspective challenges the intuitive notion that AI will make front-end web development easier than system-level programming. It suggests a future where specification and rigorous testing frameworks will become paramount for leveraging AI agents effectively in software engineering.

Takeaway: Focus on improving and automating feedback loops in your development process, particularly for tasks that can be formally specified, as this will maximize the utility of AI coding agents in the future.

Decoder

LLM (Large Language Model): An AI model, typically deep learning-based, trained on vast amounts of text data to understand, generate, and process human language.
TLA+: A formal specification language used to design, model, document, and verify concurrent and distributed systems, ensuring correctness before implementation.
Verus: A tool that combines Rust with formal verification to prove the correctness of Rust code.
Property-based testing: A testing technique where tests are generated from a specification of properties that the code should satisfy, rather than specific example inputs.

Original article

Marc's Blog

About Me

My name is Marc Brooker. I've been writing code, reading code, and living vicariously through computers for as long as I can remember. I like to build things that work. I also dabble in machining, welding, cooking and skiing.

I'm currently an engineer at Amazon Web Services (AWS) in Seattle, where I work on databases, serverless, and serverless databases. Before that, I worked on EC2 and EBS.

All opinions are my own.

Links

My Publications and Videos

@marcbrooker on Mastodon @MarcJBrooker on Twitter

What’s Easy Now? What’s Hard Now?

Take it easy.

This is the fourth in a series about how AI is changing software development, after It’s time to be right., What about juniors?, and My heuristics are wrong. What now?. It stands alone, but if you found this interesting you may also find those interesting.

I’ve been spending a lot of time thinking about the shape of the capabilities of coding agents. What they’re good at now, what they’re going to be good at. What they’re bad at now, how much of that is inherent and how much is transient. This is worth thinking about, because it’s the most important question shaping the future of software, and of software engineering. I don’t pretend to have an answer, but am coming to a conclusion that may be deeply counter-intuitive.

Coding agents are becoming very good indeed, and can build meaningful and correct software very quickly and at transformatively low cost. They have super-human abilities on some coding tasks. Of course, computer systems have had super human abilities for at least 85 years1. I think we’re going to find, as we have over those nine decades, that this new technology we’re building is vastly super-human in some areas2, and not nearly as capable as humans in others.

Which raises the important question of how, and why.

Feedback is powerful

Early on in my EE education, one of my professors drew a simple circuit on the board that’s been stuck in my mind ever since. It looked like this3:

Apply a voltage on the left, and on the right you get the square root of that voltage4. The two components are an opamp and an analog multiplier IC (e.g. the deeply obsolete MC1495). This simple circuit encapsulates possibly the most important idea in electrical engineering: feedback is uniquely powerful. Maybe unreasonably powerful. It’s the idea that makes nearly every electronic device work, it keeps planes in the sky, and stops your oven from burning your dinner.

Components inside feedback loops can be made to behave significantly differently from their basic open loop behavior. Excellent outputs can be extracted from poor components. Multipliers can become square rooters. Feedback changes everything.

AI agents are just feedback loops. They’re built around a component with useful, but flawed, open loop behavior (an LLM), and use feedback to make that component able to do things that it’s not able to do without feedback. This is the basic idea behind the transformation that has happened in developer tooling in the last two years or so: a move from open loop AI (the smart autocomplete mode in IDEs) to agents. The moving of the feedback from the human developer (build, test, go back to IDE), into the agent itself (build, test, iterate).

Much of the conversation about long-term coding agent capabilities is about open loop model behavior. But that’s only half the picture. I may even stretch to saying it’s the less important half of the picture. Feedback is the thing that’s going to drive long-term capabilities.

The feedback loop hypothesis

In the long term, coding agents will find tasks with effective feedback ‘easy’, and tasks without effective feedback ‘hard’. The availability of accurate feedback will determine the limits on their capabilities.

On one hand, we should see this as uncontroversial. Anybody who has built code with agents knows that good error messages help keep agents unstuck. We’re seeing how tools like Rust guide agents towards writing correct code by providing explicit and immediate feedback about incorrectness of some kinds. We’re seeing agents be great at performance work, where good benchmarks exist. We’re seeing tools like property-based testing be uniquely valuable. We’re also seeing that agents aren’t great at architecture (where feedback tends to be of the ‘I know it when I see it’ kind), or writing concurrent programs (where feedback tends to be of the ‘it silently corrupted data at runtime’ kind).

But let’s look forward a little bit, and compare two problems:

Building a delightful ergonomic photo editing website.
Building a correct high-performance database storage engine5.

For open-loop models, the former is easier than the latter. At least in that you’ll get closer to real success with a pure vibe coding workflow, and much closer to success on the former after a single shot. The feedback loop hypothesis, however, makes me think that the latter is actually the easier long-term problem.

To understand why, consider their feedback loops. The website’s feedback loop, beyond maybe some automation that tests if the buttons do what they should, requires a human in the loop. It needs to be easy to use for humans, and humans are notoriously slow, squishy, and inconsistent feedback providers. The latter, however, has a rather simple specification, including the API, safety properties, and liveness properties. With the right tools in the feedback loop, iteration towards success requires no humans.

What does it mean?

I think this is different from the intuition many people have about coding agents. They see websites and UIs as ‘easy’ (see the SaaSpocalypse), and system software as ‘hard’. The feedback loop hypothesis says that this is backwards. That, in fact, we’re going to find that SaaS is ‘hard’ and system software is ‘easy’.

This is going to raise the importance of specification (the writing down of what good looks like to drive the feedback loop), and of tools that apply that specification to code. Compile-time tools like Rust, Hydro, and Verus. Modelling-time tools like TLA+ and P. Specification tools like Kiro’s spec analyzer. Testing tools, simulators, mocks, etc.

The future of software development is building these feedback loops. Many hard problems remain.

Footnotes

Dating back to the work of folks like Marian Rejewski in the 1930s.
The MacBook on my desk can add 64 bit numbers about something like 100,000,000,000 times faster than I can.
Drawn with CircuitLab, and adapted from this Electronics StackExchange Answer. In reality, a few more passive components are needed.
If you’re not familiar with this stuff, here’s an intuition for how this works. The opamp (the triangle) tries to adjust its output (on the right) so the two inputs are the same. So if you take the output, and multiply it by itself, then feed it into one of the inputs, it’ll set the output to the square root of the input. If you are familiar with this stuff, I apologize deeply for that explanation.
I mean something on the scale of, say, RocksDB or InnoDB, not something on the scale of Aurora DSQL or even PostgreSQL. I think these large-scale distributed systems are going to be harder to hill climb to, at least for the future I can see.

« Back to the blog index

The pipeline tax is breaking enterprise AI at agent scale

Data aienterprisecloud CIO Dive (Sponsored by EDB)

Enterprise AI is hindered by a "pipeline tax" from excessive data movement across disparate systems, leading to latency, governance issues, and high costs, pushing towards bringing agents to data.

What: Quais Taraki, CTO of EDB, describes the "pipeline tax" as the overhead created by moving data multiple times (up to four copies) through various systems like warehouses, lakehouses, vector DBs, and RAG layers for enterprise AI. This leads to latency, governance drift, audit complexity taking weeks to resolve, and stalled AI projects. The solution advocated is to bring AI agents directly to the data layer, embedding governance natively.

Why it matters: This article signals a fundamental shift in enterprise AI architecture, moving away from complex, multi-hop data pipelines towards a more integrated, data-centric approach. It highlights the growing importance of data governance and real-time access as AI agents become more prevalent, suggesting that existing data infrastructure must adapt to new AI demands.

Takeaway: Re-evaluate your enterprise data architecture for AI workloads. Prioritize bringing AI agents and compute closer to your operational data sources, leveraging technologies like PostgreSQL for governance and Apache Iceberg for open table formats, to reduce data movement and its associated "pipeline tax."

Decoder

Pipeline tax: A term coined to describe the hidden costs and inefficiencies (latency, governance drift, audit complexity) arising from repeatedly moving and transforming data across multiple systems (e.g., data warehouses, lakehouses, vector databases) for enterprise AI applications.
RAG (Retrieval-Augmented Generation) layers: Components in an AI architecture that retrieve relevant information from a knowledge base to augment the context provided to a language model, improving its generation quality.
Apache Iceberg: An open table format for huge analytic datasets, designed to improve on existing table formats by supporting schemas, hidden partitioning, and schema evolution.
Model Context Protocol (MCP): A proposed standardized protocol that allows AI agents to securely and accountably access operational data directly from data layers, rather than through custom integrations or pipelines.
Postgres® (PostgreSQL): A powerful, open-source object-relational database system known for its reliability, feature robustness, and performance.

Original article

The pipeline tax is breaking enterprise AI at agent scale

Three months ago, the conversation I was having with enterprise technology leaders was about which model to fine-tune. Today, it’s about why the pipeline feeding that model is the reason their AI project is six months behind schedule. Or, more important, why adding more pipelines and more cloud capacity is not translating into measurable value from AI in production.

This is not a coincidence. The 2025 enterprise AI architecture—vector databases, RAG layers, orchestration frameworks and ingestion pipelines pulling from operational systems—was built on an assumption that does not survive contact with production: that enterprises can keep moving data fast enough to make AI agents useful in real time and then reconstruct governance downstream after every move.

That assumption came from pre-AI blueprints. It was like adding more horses versus building horsepower. AI in production needs brake-horsepower infrastructure that puts data and AI together in real time in a sovereign infrastructure, not in different places.

The next generation of successful enterprise architecture starts and finishes at the engine inside that vehicle: the data layer.

This is the new world of an engine, where all the parts fit and work together in real time. It’s not a set of fragmented pieces glued together with great intent but optimized only for reducing drag and friction—it means building a whole new sovereign systemic design for AI success.

That old assumption cannot hold. And the data layer is where it is breaking first.

The pipeline tax no one has on their balance sheet

Look at the architecture most large organizations actually run. Transactional systems feed pipelines, which feed warehouses, lake houses, feature stores and models. Each hop is a translation. And each translation is a place where governance policies have to be reapplied, lineage gets murky and a masking rule defined in one system can silently fail to propagate to the next.

By the time data reaches an AI agent, it may have been copied four times and governed by three regimes, none of which fully agrees with one another. Then a regulator asks a simple question—“Can you show me where this customer’s data went and who touched it?”— and the answer takes six weeks and a consulting engagement.

This is the pipeline tax. It does not appear as a line item in any budget, but it shows up as audit findings; AI hallucinations; stalled migrations; and the reason why 95% of enterprises say they want to operate as their own sovereign AI and data platforms, while only 13% report they are actually thriving at it. Those figures come from EDB’s recent customer research, but the broader pattern is visible across the market: Gartner has tied GenAI project abandonment to poor data quality, inadequate risk controls, escalating costs and unclear business value. And McKinsey’s 2025 State of AI survey found that AI adoption is broadening, but most organizations have not yet scaled the technology into enterprise-wide impact.

The retreat from 2025’s architecture is happening at scale—and fast

The market has started to figure this out. The retreat from the RAG infrastructure enterprises spent 2025 building is real—VB Pulse found that organizations that “went wide on RAG in 2025” are now hitting a common failure point: architectures built for document retrieval do not hold at agentic scale. Single-method vector similarity is no longer enough for production agentic workloads that require accuracy, access control and context across systems.

Vector database categories are shifting as a result. The issue is not that retrieval is going away; it is that the simple RAG-to-vector-database pipeline is being rebuilt for a different era of AI. Hyperscalers are beginning to rebuild their data stacks around agents rather than pipelines. Even lake house incumbents are publishing research arguing that when queries span databases and documents, stronger models alone do not fix the problem—architecture does.

What is missing from most of those stories is the next move. If pipelines are the problem, what replaces them?

Always-on-governance is the new model, at the data layer

The architectural answer now forming is straightforward: Stop moving the data and bring agents and AI to the data. Governance should live inside the data layer by design, not be bolted onto every downstream system after the fact.

Treat governance as a property of the architecture itself. Think of it like the human body: Organs perform different functions, but they are interdependent and governed by the same system 24x7x365. Enterprise AI needs the same principle. Different systems and agents may serve different purposes, but they have to operate from the same rules of governance, policy and sovereign control.

The pieces required to do this are no longer speculative. Postgres®, where much of the enterprise’s operational data already lives, can serve as a governance control plane, with row-level security, column masking and lineage native to the engine. Apache Iceberg has won the open table format argument. The Model Context Protocol gives AI agents a standardized, governed way to reach operational data without requiring a custom integration for every application.

None of this is a 2027 roadmap conversation. It is a procurement conversation happening now.

Migration is a capability, not a project

The same logic applies to the modernization backlog blocking everything else. Migration has historically been treated as a project: scope it, staff it, suffer through it and deliver it 18 months late.

The reason it remains painful is that the work itself—discovering schema dependencies, translating embedded business logic, validating functional equivalence—is exactly the kind of high-context, repetitive reasoning that coordinated AI agents are now genuinely good at.

The COBOL-translation demos getting attention this year are the leading edge of something larger: migration becoming an autonomous, continuously running capability rather than a one-off program. That changes the unit economics. It also changes the strategic question. The interesting question is no longer, “How long will this Oracle migration take?” It is, “How quickly can we evolve our entire platform strategy?”

The data layer is where the next decade gets decided

The vendors that win the next decade of enterprise infrastructure will not be the ones with the fastest query engine or the slickest notebook experience. They will be the ones that recognize data movement is breaking enterprise AI.

The pipeline tax has been paid long enough. The interesting work now starts at the data layer—and it starts when enterprises stop paying that tax.

The pipeline model breaks at agent scale. It was born of good intent, but in a world moving toward 1 billion agents delivering 217 billion instructions a day, it is architecturally medieval. The agentic era will be won at the data layer.

DEVOURED

Monitoring Cortex Agent Performance With Trace Data

Data aidevopsperformancecloud Monte Carlo

Monte Carlo details how to monitor Snowflake Cortex Agents in production by analyzing trace data for token consumption, latency, and errors using GET_AI_OBSERVABILITY_EVENTS.

What: Michael Segner and Virna Sekuj from Monte Carlo describe monitoring Snowflake Cortex Agents using native observability tables accessible via `SNOWFLAKE.LOCAL.GET_AI_OBSERVABILITY_EVENTS`. Key metrics include total tokens, duration (latency), and span-level status codes (e.g., STATUS_CODE_OK, STATUS_CODE_ERROR) to detect issues like context bloat or slow tool calls. This article was updated on April 29, 2026.

Why it matters: As AI agents move into production, the need for robust observability and performance monitoring becomes critical, mirroring the evolution of traditional software systems and highlighting the maturity curve for managing AI-driven applications.

Takeaway: If you are deploying Snowflake Cortex Agents, leverage `SNOWFLAKE.LOCAL.GET_AI_OBSERVABILITY_EVENTS` to build monitoring for token usage, latency, and error rates at the span level.

Deep dive

The article outlines how to monitor Snowflake Cortex Agents by querying structured observability events.
Snowflake Intelligence natively logs rich trace data, including conversation history, tool execution, LLM planning, and response generation.
This trace data is stored in native observability tables and accessible via the SNOWFLAKE.LOCAL.GET_AI_OBSERVABILITY_EVENTS table function.
Each agent interaction is composed of hierarchical spans, with record_name identifying span types like chat, planning, response_generation, and tool_call.
Key performance metrics to monitor include total token consumption (prompt + completion tokens per span), duration/latency (P50, P90), and span-level status codes (STATUS_CODE_OK, STATUS_CODE_ERROR).
Common issues like token spikes can indicate changes in input length, context window accumulation, retrieval behavior changes, or increased tool call depth.
Monitoring completion rates (proportion of STATUS_CODE_OK spans) is crucial for catching silent failures.
High usage volatility in early deployments requires careful tuning of monitor sensitivity and consideration of filtering to business hours for cleaner baselines.
The article emphasizes combining signals across span types for effective troubleshooting, such as correlating high planning tokens with low tool call completion rates.
The underlying principle is to continuously watch agent behavior, learn what "normal" looks like, and surface deviations before users notice them at scale.

Decoder

Cortex Agent: AI agents offered by Snowflake Intelligence, designed to perform tasks by interacting with data within the Snowflake ecosystem.
Trace data: Detailed logs of an agent's operations, capturing sequential events (spans) during a request, including planning, tool calls, and response generation.
Span: A single operation or step within a trace, such as an LLM planning step, a tool call, or a response generation step.
SNOWFLAKE.LOCAL.GET_AI_OBSERVABILITY_EVENTS: A Snowflake table function used to access native observability events for Cortex Agents.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Context graphs and decision traces to the rescue

Data aidatabaseenterpriseresearch InfoWorld

A December 2025 Foundation Capital paper introduces "context graphs" and "decision traces" as a crucial missing provenance layer for enterprise RAG and AI agent workflows.

What: A December 2025 paper from Silicon Valley VC firm Foundation Capital, titled “AI’s trillion-dollar opportunity,” introduced the concept of a "context graph." This knowledge graph is designed to capture "decision traces," which record the full context, reasoning, and causal relationships behind critical business decisions to provide provenance for AI agents and RAG systems.

Why it matters: This signals a growing industry focus on the need for explainable AI and robust operational memory in enterprise environments. Generic AI models struggle with specific business contexts, highlighting that effective enterprise AI requires understanding not just data, but also the historical *how* and *why* of decisions to prevent AI from hallucinating.

Takeaway: If working on enterprise AI or agent systems, explore the concepts of context graphs and decision traces to consider how to log and structure operational memory for better AI reasoning and explainability.

Deep dive

A December 2025 paper by Foundation Capital, "AI’s trillion-dollar opportunity," introduced the concept of "context graphs" and "decision traces."
A context graph is a knowledge graph designed to capture "decision traces," which record the observable reasoning and causal relationships behind past business decisions.
This approach is seen as a missing provenance layer for retrieval-augmented generation (RAG) and AI agent workflows in enterprises.
It complements existing enterprise systems (like ERPs, CRMs) by storing operational memory: what evidence mattered, which relationships were relevant, policies applied, and exceptions made.
The article argues that AI needs all three types of human memory: episodic (decision traces), semantic (facts/schemas), and procedural (skills/how-to), to prevent hallucinations.
Graphs are crucial because enterprise context is fundamentally based on connections and relationships, which vector search struggles to capture.
A context graph can act as a "graph of graphs," guiding AI agents to the correct underlying data sources (e.g., accounting database for accounting questions).
GraphRAG, which improves retrieval from graphs for AI systems, is an essential part of this emerging ecosystem.
The market is evolving quickly, and enterprise AI teams should explore innovations like context graphs without setting architectures in stone yet.

Decoder

Context graph: A knowledge graph designed to capture and map the full context, reasoning, and causal relationships behind critical business decisions and organizational data.
Decision traces: Records of the observable reasoning behind how decisions were made, including rule application, exceptions, conflict resolution, approvals, and governing precedents.
GraphRAG (Retrieval-Augmented Generation in graphs): A technique that uses graph databases to store and retrieve structured knowledge, improving the ability of AI systems to access and utilize this knowledge for generating more accurate and contextually relevant responses.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Google's Newest App is an AI-powered Image Editor

Design aifrontendmobile Engadget

Google is launching "Google Pics," an AI-powered image editor built on Nano Banana, which allows users to manipulate image elements and translate text within photos.

What: Google Pics, a new AI image editing app based on Nano Banana, enables users to move, resize, and transform parts of images, and modify or translate text while preserving font style. It integrates with Workspace apps like Slides and Drive, aiming to compete with Canva, and will roll out globally this summer to Google AI Pro and Ultra subscribers.

Why it matters: Google is expanding its generative AI capabilities into consumer-facing creative tools, directly challenging established players like Canva and Adobe by leveraging its AI research for practical, integrated solutions within its ecosystem.

Takeaway: If you are a Google AI Pro or Ultra subscriber, expect access to Google Pics globally this summer.

Decoder

Nano Banana: The underlying AI technology or platform Google Pics is built upon.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Airbnb gets into hotels, expands AI for host onboarding and customer support

Design startupaitravelenterprise TechCrunch

Airbnb is expanding beyond traditional home rentals by listing boutique hotels in 20 cities and significantly increasing its use of AI for host onboarding, customer support, and travel planning, with a voice AI assistant planned.

What: Airbnb is adding boutique hotels in 20 cities and new travel services like luggage storage and car rentals, consolidating them into a redesigned app. They are also expanding AI use for host onboarding, wishlist comparisons, review summaries, and customer support, where an AI chatbot already handles 40% of queries and a voice-based AI assistant is planned for later this year.

Why it matters: Airbnb is strategically broadening its market reach beyond its core home-sharing model to become a comprehensive travel platform, while simultaneously leveraging AI to automate and enhance various aspects of its operations and user experience. The hotel expansion also helps circumvent short-term rental regulations in key cities.

Takeaway: If you use Airbnb, expect new options for boutique hotel bookings, integrated travel services, and more AI-driven assistance for planning and support this summer.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Activity-focused Design

Design ux ixdf.org

Activity-focused design, like task analysis, prioritizes understanding user actions and goals to identify design improvements, emphasizing "what people do and how they do it."

What: Activity-focused design methodologies, such as task analysis, center on the actions users take to achieve specific goals. This approach involves identifying user goals, breaking them into activities and individual tasks, and documenting them to find opportunities for design optimization and improve user experience.

Why it matters: This design methodology provides a foundational framework for creating intuitive user experiences by systematically deconstructing user interactions, underscoring the importance of mapping user behavior to product functionality.

Takeaway: UX designers should consider employing task analysis to systematically identify and optimize user workflows, particularly when designing new features or improving existing task sequences.

Deep dive

Activity-focused design (AFD) centers on the actions people take to reach their goals, with the core unit of analysis being "what people do and how they do it."* Task analysis is one approach within AFD, complementing human-centered design, used to understand and improve user workflows.* The process of task analysis involves four main steps: determining the user's primary goals, identifying the specific tasks required to achieve those goals, documenting the goals and tasks (e.g., with diagrams), and analyzing them for improvements.* Designers must choose an appropriate level of specificity for tasks; for new designs, it's often better to describe tasks in an interface-independent way.* Strengths of task analysis include revealing gaps and optimization opportunities in task sequences, and its natural translation to digital experience design.* Weaknesses include potentially overlooking non-task-related factors like user emotions or social context, and a tendency to lead to backward-looking designs based on existing task flows rather than innovative new ones.

Decoder

Activity-focused design (AFD): A design methodology that prioritizes understanding the actions people take to achieve their goals.* Task analysis: A systematic method within activity-focused design used to break down user goals into a series of activities and individual tasks to identify design opportunities.* Contextual inquiry: A user research method where the designer observes users in their natural environment while they perform tasks to understand their workflow and challenges.* Think-aloud interviews: A user research method where participants vocalize their thoughts, feelings, and actions as they interact with a system, allowing designers to understand their cognitive processes.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

3D Gaussian Splatting Editor (Website)

Design graphics3dweb SuperSplat

SuperSplat introduces an advanced browser-based editor for manipulating and optimizing 3D Gaussian Splats, simplifying 3D content creation.

What: SuperSplat is a new web-based tool designed to edit and optimize 3D Gaussian Splats, offering an accessible way to work with this emerging 3D representation technology directly in a browser.

Why it matters: This tool democratizes access to advanced 3D content creation, potentially making Gaussian Splatting more widely adopted by developers and designers who lack specialized desktop software.

Takeaway: Explore SuperSplat if you're working with 3D models and need an accessible, browser-based solution for manipulating Gaussian Splats.

Decoder

3D Gaussian Splatting: A novel technique for real-time 3D rendering that represents scenes as a collection of 3D Gaussian functions, offering high quality and fast rendering compared to traditional mesh or volumetric methods.

Original article

SuperSplat is an advanced browser-based editor for manipulating and optimizing 3D Gaussian Splats.

DEVOURED

Generative Analytics Platform (Website)

Design aidataanalyticssaas Fusedash

Fusedash offers a no-code generative analytics platform that automatically builds interactive KPI dashboards and reports from raw data using AI, supporting models like Claude and GPT.

What: Fusedash provides an AI dashboard software that generates KPI dashboards, charts, and real-time reports from uploaded CSVs, REST APIs, or MCP-compatible AI models like Claude and GPT. It focuses on making analytics accessible without manual configuration or data warehouses.

Why it matters: This platform signals a move towards highly automated, AI-driven business intelligence, enabling non-technical users to generate complex analytics quickly and reducing reliance on specialized BI analysts for initial dashboard creation.

Takeaway: Evaluate Fusedash if your team needs to rapidly generate interactive dashboards and reports from various data sources without extensive coding or BI tool expertise.

Deep dive

Fusedash is a no-code generative analytics platform that builds interactive KPI dashboards, AI charts, and real-time reports.
It connects to data via CSV uploads, REST APIs, or any Model Context Protocol (MCP)-compatible AI model (e.g., Claude, GPT).
The platform aims to reduce the need for manual dashboard configuration, allowing teams to focus on decision-making rather than building.
Key features include KPI dashboards, data storytelling reports, AI chart generation, location intelligence maps, chat with data, and live monitoring.
It offers solutions tailored for e-commerce, SaaS, agencies, and roles like business leaders, analysts, and marketing teams.
The system is built on the Model Context Protocol (MCP), allowing users to choose their preferred AI model for tasks like generating visuals, summaries, and chat responses.
Fusedash differentiates itself by combining interactive dashboards, an AI chart generator, real-time data visualization, and data storytelling in one workspace.
Pricing uses token packs for AI-powered actions, ensuring core dashboards remain functional even when AI usage is throttled.
The company emphasizes that its "generative analytics" approach means the interface does the work, not the team, streamlining report creation.

Decoder

KPI (Key Performance Indicator): A measurable value that demonstrates how effectively a company is achieving key business objectives.
Model Context Protocol (MCP): An open protocol enabling analytics platforms to connect with and utilize various AI models (like Claude or GPT) for tasks such as data summarization and dashboard generation, without vendor lock-in.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Why Motion Design is Defining Modern Digital Communication

Design webmarketingfrontend ZillionDesigns

Motion design has become essential in modern digital communication, surpassing static content in effectiveness due to evolving audience expectations set by platforms like TikTok and Instagram Reels.

What: According to ZD Team on May 14, 2026, motion design is now a core marketing tool, driving engagement on digital platforms where video and animation are prioritized by algorithms. While static design retains its role for clarity, motion graphics excel at capturing attention, enhancing information processing, and increasing emotional impact.

Why it matters: The article highlights a fundamental shift in user experience expectations, where dynamic, interactive content is now the norm, pushing developers and designers to integrate motion not just as an aesthetic enhancement but as a critical functional element for engagement and communication.

Takeaway: When designing new web features or marketing content, consider how motion design can be integrated from the outset to improve user engagement and information retention, rather than treating it as an afterthought.

Deep dive

Motion design, or motion graphics, transforms visuals through animation, movement, timing, and sound, contrasting with static design's fixed visuals.
Platforms like TikTok, Instagram Reels, and YouTube Shorts have driven a shift towards motion-first content, influencing audience behavior and expectations.
Human attention is naturally drawn to movement, making motion design highly effective at interrupting fast-scrolling feeds and capturing user focus.
Motion design improves information processing by structuring content sequentially over time, simplifying complex ideas, and reducing cognitive load through digestible parts.
It enhances emotional connection and memory retention by creating rhythm, anticipation, and narrative flow, leading to stronger brand association.
Research in visual cognition and marketing behavior supports that motion increases attention capture, improves information retention, and drives engagement.
Social media algorithms explicitly favor video and animated content due to higher engagement and longer watch times, forcing brands to design for motion-first visibility.
Overuse of motion can lead to distraction, animation fatigue, accessibility concerns (motion sensitivity), and performance issues (slow load times).
The most effective digital experiences strategically combine both motion and static design, using motion for engagement and storytelling, and static for simplicity and clarity.
The U.S. Bureau of Labor Statistics projects employment for motion design professionals to grow faster than average through 2034, indicating its increasing importance.

Decoder

Motion Graphics: Animation combined with graphic design to create the illusion of motion or rotation, often with audio, for use in multimedia projects.
Kinetic Typography: An animation technique mixing motion and text, often used to convey emotion or enhance a message, commonly seen in commercials, title sequences, and explainer videos.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Which of the Following is Not True About Graphic Design? Common Myths Debunked

Design graphic-design Designbeep

Many widely-held beliefs about graphic design are false, including that it's purely subjective, only for print, or that software skill makes a designer.

What: The article debunks common myths about graphic design, emphasizing it's visual communication with established principles like hierarchy and contrast, not just aesthetics or software use. It highlights that graphic design is a broad, ancient discipline, distinct from art, and that bad design has significant consequences, referencing the 2000 US election butterfly ballot.

Why it matters: This piece clarifies the fundamental nature of graphic design as a problem-solving communication discipline, countering common misunderstandings that can lead to ineffective or even harmful designs. It stresses that design choices are purposeful, grounded in human perception, and have real-world impact beyond mere aesthetics.

Takeaway: Understand graphic design principles like hierarchy and contrast, and apply them with purposeful communication in mind, rather than solely focusing on aesthetics or client preference.

Deep dive

Graphic design is defined as visual communication combining images, words, and ideas to convey information to an audience.
It is not purely subjective; it has principles like Hierarchy, Contrast, Alignment, Proximity, White space, and Repetition, grounded in human cognition.
Adding more design elements does not make a design more effective; often, minimalism is key to clarity.
Graphic design is not limited to print; it encompasses UI, motion graphics, environmental design, social media, and data visualization.
Proficiency in software like Photoshop does not equate to being a graphic designer; design thinking and understanding principles are crucial.
Good design's primary goal is effective communication with the intended audience, not merely client satisfaction.
The discipline is not new; while the term was coined in 1922 by William Addison Dwiggins, the practice is ancient, seen in cave paintings and early printing.
Bad graphic design has real consequences, costing money, confusing audiences, and potentially causing harm, as exemplified by the 2000 US election butterfly ballot.
Graphic design is distinct from art; art is self-expression, while design is problem-solving oriented towards a specific communication goal.

Decoder

William Addison Dwiggins: American typographer, calligrapher, and book designer who coined the term 'graphic design' in 1922.
Bauhaus: Influential German art school (1919-1933) known for its functionalist approach to design, merging craftsmanship with fine arts.
Swiss International Style: A graphic design movement developed in Switzerland in the 1950s, characterized by clean, readable, sans-serif typography, grids, and asymmetric layouts, emphasizing clarity and objectivity.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

WavFlow Generates Audio Directly in Waveform Space (GitHub Repo)

AI audioresearchopensource GitHub

Meta AI's WavFlow is a new flow-matching framework that generates high-fidelity audio directly from video and text inputs in raw waveform space, bypassing latent audio compression.

What: WavFlow is a flow-matching framework by Meta AI that generates synchronized audio from video and text inputs directly in raw waveform space. It uses waveform patchifying and amplitude lifting to enable stable flow matching on raw audio, achieving performance on par with latent-based methods on VGGSound (VT2A) and AudioCaps (T2A) benchmarks.

Why it matters: This research from Meta AI challenges the traditional reliance on latent audio compression in audio generation, suggesting that direct waveform generation can achieve comparable or better fidelity, simplifying the architectural complexity and potentially improving the naturalness of generated sound.

Takeaway: Developers can train their own WavFlow models using the provided scripts, though production-trained checkpoints are not publicly released by Meta AI.

Decoder

Flow-matching framework: A generative modeling technique that learns a continuous-time transformation between a simple noise distribution and a complex data distribution, allowing for efficient and stable sampling.
Waveform space: The raw, uncompressed representation of an audio signal, typically consisting of a sequence of amplitude values over time.
Latent audio compression: The process of encoding raw audio into a lower-dimensional, abstract representation (latent space) which is then used by generative models to produce new audio, common in many existing audio AI systems.

Original article

WavFlow: Audio Generation in Waveform Space

Feiyan Zhou1,2 · Luyuan Wang1 · Shoufa Chen1,* · Zhe Wang1 · Zhiheng Liu1 · Yuren Cong1 · Xiaohui Zhang1 · Fanny Yang1 · Belinda Zeng1

1 Meta AI · 2 Northeastern University

🌐 Project Page · 📄 arXiv · 🛠 Training Guide

Overview

WavFlow introduces a paradigm for generating synchronized, high-fidelity audio from video and text inputs directly in the raw waveform space, bypassing latent compression entirely. Through waveform patchifying and amplitude lifting, WavFlow enables stable flow matching on raw audio via direct x-prediction. Evaluation on the VGGSound (VT2A) and AudioCaps (T2A) benchmarks shows that WavFlow delivers performance on par with established latent-based methods, proving that end-to-end waveform generation can match traditional frameworks in acoustic richness, fidelity, and synchronization.

Demo

🌳 Forest (natural)

forest.mp4

🐸 Frog (animal)

frog.mp4

🥁 Drum (music)

drum.mp4

🛹 Skateboard (sport)

skateboard.mp4

See the Project Page for 24+ samples and side-by-side benchmark comparisons.

Method

Installation

git clone https://github.com/facebookresearch/WavFlow.git
cd WavFlow
bash scripts/setup.sh        # creates conda env 'wavflow' and installs everything
conda activate wavflow

Manual setup

conda create -n wavflow python=3.10 -y
conda activate wavflow
pip install -r requirements.txt
pip install -e . --no-deps
conda install -n wavflow -c conda-forge "ffmpeg<7" -y    # for torio video decoding

All required external weights (CLIP, Synchformer, the empty-string CFG embedding) are downloaded or computed automatically on first run and cached under ~/.cache/wavflow/.

Inference

⚠️ Due to organizational policy constraints, we are currently unable to release the production-trained checkpoints. We are working on a foundation checkpoint trained on fully open-source data; in the meantime you can train your own — see the training guide.

Once you have a trained checkpoint, run:

bash scripts/launch/predict.sh [--gpu N] [--config PATH]

The default config is wavflow/configs/infer.yaml. The input CSV (data.csv_path) accepts video, text, or both:

video_path,caption,video_exist,text_exist
/abs/path/sample1.mp4,a whistling rocket explodes,1,1   # video + text
/abs/path/sample2.mp4,birds chirping in a forest,1,1    # video + text
,a whistling rocket explodes,0,1                        # text-only
/abs/path/sample3.mp4,,1,0                              # video-only

Configuration reference

Launcher options

Flag / env	Default	Description
`--gpu N` (or `GPU=N`)	`0`	CUDA device index
`--config PATH` (or `CONFIG_PATH=...`)	`wavflow/configs/infer.yaml`	YAML config to load
`WAVFLOW_ENV`	`wavflow`	conda env name to auto-activate

Any extra positional argument is forwarded to python -m wavflow.infer.

Key fields in `infer.yaml`

Field	What to set
`data.csv_path`	the input CSV (above)
`model.name`	one of `medium_16k`, `medium_44k`, `large_16k`, `large_44k` (must match the trained ckpt)
`model.ckpt_path`	a `checkpoint_.pth` (full ckpt) or `ema_epoch_.pth` (EMA-only)
`model.use_ema`	`true` to load `model_ema1` from a full ckpt; `false` to use the live `model` weights
`inference.duration_sec` / `target_sample_rate`	output length and SR (must match model arch)
`inference.cfg`, `num_steps`, `noise_scale`, `noise_shift`, `prediction_type`, `seed`	sampling hyperparameters
`inference.batch_size`	rows per ODE batch
`inference.trim_to_duration`	trim output to `duration_sec`
`output.output_dir`	where wavs are written
`output.loudness_norm`, `loudness_target_lufs`	optional `pyloudnorm` post-processing

CSV semantics

video_exist=0 → uses learned empty CLIP/Sync tokens (no video decode)
text_exist=0 → uses learned empty CLIP-text token (caption ignored)
Optional id column; otherwise the wav file name is derived from Path(video_path).stem, falling back to row_<idx> for text-only rows
Captions with commas must be quoted

EMA caveat

The EMA tensor stored as model_ema1 is updated with ema_decay = 0.9999 per step. After only a few hundred / thousand steps it still contains random-init values and produces noise during inference. Set model.use_ema: false (or pass an ema_epoch_*.pth saved after enough steps) when sampling from a short / overfit run.

Training

For feature extraction and training (single-node and multi-node), see TRAINING.md.

Citation

@misc{zhou2026wavflowaudiogenerationwaveform,
      title={WavFlow: Audio Generation in Waveform Space}, 
      author={Feiyan Zhou and Luyuan Wang and Shoufa Chen and Zhe Wang and Zhiheng Liu and Yuren Cong and Xiaohui Zhang and Fanny Yang and Belinda Zeng},
      year={2026},
      eprint={2605.18749},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2605.18749}, 
}

Acknowledgements

WavFlow builds on the open-source community. We gratefully acknowledge:

MMAudio — multimodal audio generation
JiT — Just Image Transformer
Synchformer — audio-visual synchronization

License

The majority of WavFlow is licensed under CC-BY-NC 4.0. Portions of the project are vendored from third-party open source projects under their original license terms (MIT, Apache 2.0, CC BY-NC 4.0, and Stability AI Community License). See NOTICE.txt for the full per-component breakdown and license texts.

DEVOURED

Mind-Blowing Growth Is About to Propel Anthropic Into Its First Profitable Quarter

AI startupfintechmarkets The Wall Street Journal

Anthropic's revenue is projected to more than double to $10.9 billion in the second quarter, making it profitable for the first time, surpassing the growth rates of Google and Facebook pre-IPO.

What: Anthropic's internal projections indicate its revenue will exceed $10.9 billion in Q2, marking its first profitable quarter. This growth outpaces Google and Facebook's revenue increases before their IPOs. However, the company plans to significantly increase spending on compute, suggesting profitability might not be sustained for the full year.

Why it matters: This demonstrates the intense growth trajectory and significant revenue potential within the AI sector, even as costs for compute remain a major factor. It also highlights the strategic decision of AI labs to reinvest heavily into infrastructure, prioritizing future capabilities and market position over immediate, sustained profitability.

Original article

Anthropic's revenue is set to more than double in the second quarter to $10.9 billion. The projections, disclosed to Anthropic's investors as part of an ongoing funding round, show how the company's sales have exploded since the start of the year. Its quarterly revenue is now growing faster than Google's and Facebook's in the run-up to their initial public offerings. The company might not remain profitable for the full year as it plans to increase spending due to its vast need for compute.

DEVOURED

The Unsustainable Subsidy

AI startupbusinessllm Tom Tunguz

AI model prices are generally rising, with Google tripling annually and OpenAI increasing after subsidies, as vendors prioritize margins over market share due to tight cash and record capex.

What: Tom Tunguz highlights that AI model pricing is becoming unsustainable, with Google's AI prices tripling yearly and OpenAI's GPT-5.5 increasing to $5.00/1M input and $30.00/1M output after an initial subsidy. Anthropic's Claude Opus 4.7 sits at $5.00/1M input and $25.00/1M output, while Google Gemini 3.1 Pro remains the lowest cost at $2.00/1M input and $12.00/1M output.

Why it matters: The shift from subsidized pricing to higher costs indicates a market maturing where AI providers, facing immense capital expenditures, are prioritizing profitability and sustainable business models over aggressive market share acquisition.

Takeaway: When planning or scaling AI applications, factor in potentially increasing LLM API costs, especially if your current pricing seems heavily subsidized or below market averages.

Original article

Google’s AI triples in price each year.

OpenAI’s flagship model was seemingly subsidized for a while, before rising again.

Anthropic’s AI has been the same price for a little bit & decreased for the most powerful models.

Those are three very different pricing strategies. If we compare the absolutes, the data completes the picture.

Vendor	Model	Input ($/1M)	Output ($/1M)
Google	Gemini 3.1 Pro	$2.00	$12.00
Anthropic	Claude Opus 4.7	$5.00	$25.00
OpenAI	GPT-5.5	$5.00	$30.00

Google remains the low-cost player, increasing the price on all its models but still less than half of the competition. Anthropic had maintained a luxe pricing until late last year.

The pricing changes indicate changes in strategy : cuts when cash is plentiful & share matters. Increases when cash is tight & margins matter. The latter is the case for all three vendors now when capex spending continues to set records.

DEVOURED

The Secrets Revealed in SpaceX's IPO Filing

Tech startupfinancehardwarespace Wall Street Journal

SpaceX aims for a record-breaking June IPO on Nasdaq as SPCX, despite significant losses last year ($4.9 billion) and in Q1 ($4.3 billion) on substantial revenue.

What: Elon Musk's SpaceX is preparing for an IPO in June on Nasdaq under ticker SPCX, potentially making him the world's first trillionaire, despite reporting a $4.9 billion loss on $18.7 billion revenue last year and a $4.3 billion loss on $4.7 billion revenue in Q1.

Why it matters: This IPO reveals the immense capital requirements and long-term investment cycles inherent in advanced hardware and space exploration ventures, where profitability may lag well behind established operations and high revenue.

Original article

SpaceX's IPO is expected to set a record for the biggest stock debut ever and could make its founder, Elon Musk, the world's first trillionaire. The company is aiming to go public in June on Nasdaq under the ticker symbol SPCX. The company lost $4.9 billion last year on revenue of $18.7 billion. This year, in the first quarter, SpaceX lost $4.3 billion on $4.7 billion of revenue. The financials show an established business launching satellites and astronauts to space, though it is still unprofitable.

DEVOURED

Artificial Womb for Growing Mammals Is at 'One-Yard Line', Says Colossal CEO

Tech researchsciencebiotechhardware Decrypt

Colossal, the de-extinction startup, claims its artificial womb for mammals is nearing completion, already achieving a 100% development rate for its early stages.

What: De-extinction startup Colossal's CEO states they are "at the one-yard line" for an artificial womb capable of supporting mammal development, with the current platform achieving a 100% development rate, though it still needs to replicate complex chemical signals. This is separate from their late 2028 woolly mammoth birth plans.

Why it matters: Colossal's commitment to open-sourcing its de-extinction technologies, including the artificial womb, suggests a vision beyond commercial gain, aiming to accelerate broader scientific research and conservation efforts, potentially democratizing access to advanced biotechnologies.

Original article

Coin Prices

BTC $77,279.00 -0.32%

ETH $2,121.99 -0.20%

BNB $655.20 0.68%

XRP $1.36 -0.74%

USDC $0.999751 0.02%

SOL $86.61 0.48%

TRX $0.365062 1.39%

FIGR_HELOC $1.026 -0.89%

DOGE $0.105121 0.22%

HYPE $57.15 1.09%

WBT $56.99 -0.26%

ZEC $651.32 -2.20%

USDS $0.999677 0.01%

ADA $0.249962 0.89%

LEO $10.00 -0.76%

BCH $378.07 0.44%

XMR $387.23 -2.98%

LINK $9.77 1.46%

CC $0.154477 1.01%

TON $2.00 -2.54%

XLM $0.146394 0.93%

USD1 $0.99958 -0.02%

USDE $0.999026 -0.02%

SUI $1.098 -0.45%

LTC $54.08 -0.22%

AVAX $9.47 1.02%

HBAR $0.089125 0.70%

M $2.82 -3.55%

RAIN $0.0075024 -0.10%

PYUSD $0.999584 -0.02%

SHIB $0.00000582 0.49%

CRO $0.06909 -0.23%

USYC $1.12 0.00%

NEAR $2.19 28.30%

TAO $280.31 -0.71%

USDG $0.999799 0.02%

XAUT $4,514.03 0.03%

BUIDL $1.00 0.00%

UNI $3.61 -0.59%

MNT $0.670457 -1.77%

DOT $1.31 3.59%

USDY $1.13 -0.62%

PAXG $4,515.30 0.05%

ONDO $0.407855 2.19%

WLFI $0.061259 -0.12%

HTX $0.00000201 -0.15%

USDF $0.996953 -0.04%

ASTER $0.689499 0.87%

RLUSD $0.999813 -0.00%

OKB $81.60 0.99%

SKY $0.070738 0.60%

PI $0.152292 -0.39%

PEPE $0.00000377 0.83%

USDD $0.998652 -0.06%

ICP $2.69 5.38%

ETC $9.08 1.34%

BGB $1.98 -1.37%

AAVE $87.57 -1.25%

BFUSD $0.998401 0.03%

MORPHO $2.00 6.42%

QNT $76.32 3.80%

ATOM $2.13 2.75%

USDTB $0.999048 0.01%

KCS $7.98 -0.40%

EUTBL $1.22 -0.07%

RENDER $1.99 3.62%

U $0.999994 0.04%

ALGO $0.113916 -2.63%

POL $0.092122 0.87%

USTB $11.09 0.01%

JTRSY $1.10 0.01%

BCAP $105.74 0.00%

ENA $0.106032 0.15%

KAS $0.03477359 1.09%

WLD $0.277802 8.81%

NEXO $0.871348 0.11%

VVV $17.85 5.17%

APT $0.974135 1.79%

FIL $1.013 2.57%

JST $0.092397 0.37%

STABLE $0.03329004 -3.44%

GT $7.09 -0.16%

FLR $0.00837514 -0.88%

JUP $0.217179 3.32%

ARB $0.113496 2.00%

XDC $0.03531672 4.91%

DEXE $14.07 3.58%

PUMP $0.00181053 2.65%

BDX $0.079542 1.07%

PENGU $0.00960047 4.97%

DASH $47.44 -1.34%

GHO $0.998898 -0.00%

VET $0.00670044 1.31%

HASH $0.01052331 1.93%

OUSG $115.32 0.01%

USD0 $0.998559 0.00%

BONK $0.00000623 1.70%

KITE $0.230848 -0.18%

NIGHT $0.03091837 1.81%

INJ $5.10 3.46%

YLDS $0.99968 -0.01%

A7A5 $0.01298918 -0.10%

APXUSD $0.999626 0.01%

VIRTUAL $0.767475 5.05%

EDGE $1.42 3.65%

TUSD $0.998846 -0.02%

TRUMP $2.08 1.76%

FET $0.210651 8.87%

CAKE $1.45 -0.22%

STX $0.249477 2.82%

LUNC $0.00008075 4.90%

CHZ $0.04295488 -3.58%

币安人生 $0.440481 1.42%

EURC $1.16 -0.06%

JAAA $1.035 0.01%

AERO $0.456896 10.48%

SEI $0.063239 2.25%

EURSAFO $1.17 -0.07%

ADI $3.87 -0.52%

USX $0.999413 -0.01%

H $0.219132 -12.59%

TIA $0.433365 6.67%

2Z $0.114309 1.73%

SUN $0.02013268 -0.06%

XTZ $0.349449 1.01%

FDUSD $0.998264 0.03%

SIREN $0.498488 -2.62%

CRV $0.239102 1.63%

SPX $0.388307 0.19%

LAB $4.65 3.09%

LIT $1.40 7.02%

KAU $145.85 0.27%

ZRO $1.36 -1.27%

ETHFI $0.384432 0.29%

PYTH $0.04291628 3.40%

MON $0.02825043 3.87%

USDGO $0.999887 0.03%

PENDLE $1.94 3.01%

KAIA $0.054735 5.62%

BTT $0.00000032 0.02%

GNO $118.44 -0.33%

PRIME $1.038 0.05%

9BIT $0.03718284 -3.97%

CFX $0.058029 0.37%

LDO $0.354686 -1.38%

ZBCN $0.00306132 2.06%

DCR $17.16 -1.24%

BSV $14.85 -1.59%

FLOKI $0.00003064 0.69%

TEL $0.00310097 -0.85%

KAG $76.39 2.40%

USDAI $0.999146 -0.13%

JASMY $0.00578589 2.18%

OP $0.132728 2.59%

OHM $18.95 -0.01%

NEX $0.00000471 -13.30%

GRT $0.02601871 2.42%

NFT $0.00000028 -0.29%

B $0.278594 -19.35%

STRK $0.04381143 0.30%

FRAX $0.992182 -0.04%

GRASS $0.445307 30.87%

ENS $6.40 1.95%

IOTA $0.057032 1.98%

RUSD $0.999477 -0.01%

SKYAI $0.248297 -27.98%

JTO $0.518392 0.11%

GWEI $0.116043 -6.81%

UB $0.096279 -10.88%

APEPE $0.00000114 -0.09%

USDM $0.999326 0.12%

SYRUP $0.203178 -1.61%

REAL $0.071171 -0.77%

ULTIMA $2,729.35 -0.18%

AKT $0.790727 3.61%

CRVUSD $0.999309 0.00%

XPL $0.088048 1.56%

COMP $22.47 -1.76%

USDA $0.982928 -0.01%

APYUSD $1.37 -0.17%

AXS $1.20 0.13%

TRAC $0.464147 1.59%

RAY $0.771389 -0.05%

PIEVERSE $0.767047 4.46%

BEAT $0.772854 15.93%

FF $0.088161 -2.93%

THETA $0.204875 0.28%

NEO $2.90 0.67%

PC0000031 $1.00 0.00%

BSB $0.90059 -7.15%

WIF $0.200182 2.42%

MBTC $0.180532 10.23%

SAND $0.07327 1.21%

FARTCOIN $0.194276 1.87%

TWT $0.461548 -0.20%

UDS $1.50 -0.38%

REUSD $1.078 0.02%

XCN $0.00486708 1.80%

S $0.04812994 2.35%

ONYC $1.10 0.07%

BTSE $1.12 -1.38%

CFG $0.305862 3.60%

IP $0.498246 0.63%

BORG $0.178846 2.00%

WFI $2.08 -0.04%

MANA $0.089539 0.60%

BILL $0.069274 -17.84%

VSN $0.04528019 -3.62%

CRCLON $116.01 4.27%

ZANO $10.80 -0.53%

USTBL $1.084 0.01%

GALA $0.00341941 1.33%

MX $1.77 0.14%

PC0000033 $1.00 0.00%

WAL $0.067936 -1.71%

USAT $0.998575 -0.01%

SATUSD $0.994538 0.52%

XP $0.056959 -4.22%

RUNE $0.439594 4.06%

CVX $1.69 -0.35%

EURS $1.22 -0.48%

ZK $0.01547991 0.75%

BAT $0.100299 1.35%

GUSD $0.995236 -0.06%

IMX $0.174945 1.68%

XEC $0.00000733 0.85%

EIGEN $0.197997 2.55%

GENIUS $0.436599 -2.67%

NUSD $0.998913 0.02%

HNT $0.797596 -1.50%

RAVE $0.573501 -5.30%

AR $2.19 -0.11%

APE $0.143445 -1.65%

CHEEMS $0.0000007 2.35%

FDIT $1.00 0.00%

SFP $0.284306 1.82%

GLM $0.13932 2.17%

AB $0.00140202 -0.50%

SAFO $1.008 0.01%

STRCX $102.18 -0.77%

RAIL $2.41 55.49%

ASTEROID $0.00032247 -3.05%

TAG $0.00123918 2.22%

A $0.081839 0.54%

FRXUSD $0.999798 0.03%

PC0000097 $1.00 0.00%

AUSD $0.999643 -0.01%

RIVER $6.77 9.63%

DEVOURED

Zuckerberg warns ‘success isn't a given' after laying off 10% of Meta

Tech startupcareerai NBC News

Mark Zuckerberg announced Meta laid off 10% of its workforce, approximately 8,000 employees, while shifting 7,000 roles to AI and warning success is "not a given."

What: Meta CEO Mark Zuckerberg confirmed the company laid off about 8,000 employees (10% of its workforce) and will not fill 6,000 open positions, while redirecting 7,000 employees into AI roles. He assured workers there would be no further cuts in 2026, emphasizing AI as the "most consequential technology of our lifetimes."

Why it matters: This signals a major strategic pivot for Meta towards AI, underlining the intense industry competition and the necessity for large tech companies to rapidly reallocate resources and talent to stay competitive in the evolving AI landscape, even at the cost of significant workforce restructuring.

Takeaway: If you are a Meta employee, understand that the company's strategic focus is heavily shifting to AI, meaning career paths and opportunities will increasingly align with AI-related initiatives.

Original article

As Meta cut 10% of its workforce on Wednesday — a move that had been anticipated for nearly a month — CEO Mark Zuckerberg addressed the tech giant’s transition in a companywide memo.

Zuckerberg thanked the impacted employees, stressed the importance of artificial intelligence, assured workers there shouldn’t be additional cuts in 2026 and laid out some of his vision for the future. Along with the cuts, Meta is redirecting 7,000 employees into AI roles, NBC News reported Tuesday.

“But success isn’t a given,” Zuckerberg warned. “AI is the most consequential technology of our lifetimes. The companies that lead the way will define the next generation,” Zuckerberg said in the note. The memo was posted on X by a New York Times reporter and a source familiar with the matter confirmed its authenticity to NBC News.

A leak last month about upcoming layoffs prompted Meta executives to share, shortly before reporting first-quarter 2026 earnings, that the tech giant was gearing up to lay off around 10% of the company’s employees.

The reorganization, which Meta first detailed in an internal memo in April, includes cutting about 8,000 employees and not filling approximately 6,000 open positions. Meta previously confirmed the April announcement’s authenticity to NBC News.

The employees affected by this restructuring were informed on Wednesday.

In April, Meta explained some of the calculus behind its shifting financials, noting that it was increasing 2026 capital expenditures to between $125 billion and $145 billion. That was because of “expectations for higher component pricing this year and, to a lesser extent, additional data center costs to support future year capacity,” according to Meta’s first-quarter 2026 report.

“This is the most dynamic I have seen our industry. I’m optimistic about everything we’re building to give billions of people the power to express themselves and connect with the people they care about,” Zuckerberg said in his Wednesday note. “We’re transforming our company to make sure it will always be the best place for talented people to have the greatest impact.”

Zuckerberg said he’s “grateful to those leaving today” and added that the company hasn’t been as transparent as he would have liked. He said that’s something he hopes Meta works on.

Employees in the United States who were laid off will receive severance including four months’ pay, with additional weeks for each year they were employed by Meta, according to the April memo sent by the head of people, Janelle Gale. Additional support, including for immigration and healthcare, is expected to be made available, as well.

DEVOURED

Returning to life!

Data aipolicycareer Tidydesign

AI offers genuine empowerment in areas like programming and learning but carries significant harms, including environmental costs, copyright issues, and wealth concentration, requiring data science leaders to engage critically.

What: The article argues that AI presents a dual nature: genuinely empowering capabilities in programming, translation, voice input, and broad learning, making them more accessible. Conversely, it causes harm through high environmental costs, copyright infringements, increased wealth concentration, encouragement of shallow thinking, and exacerbating unequal access to technology.

Why it matters: This piece reflects a growing tension within the tech community regarding the societal impact of AI, moving beyond pure technological advancement to confront its ethical and economic implications, particularly for data science professionals who must navigate these complexities.

Original article

AI is both genuinely empowering for data science. It makes programming, translation, voice input, and broad learning more accessible. However, it is genuinely harmful through environmental cost, copyright issues, wealth concentration, shallow thinking, and unequal access. The tension cannot be neatly resolved, but data science leaders still need to engage with AI seriously so they can help people use it well.

DEVOURED

OpenData (Tool)

Data opensourceapi tryopendata.ai

OpenData is an open-core platform making public datasets accessible via a single API for search, query, and visualization.

What: OpenData provides an open-core platform designed to simplify access and interaction with public datasets, offering capabilities like search, join, query, visualization, and sharing through a unified API.

Why it matters: This reflects a growing trend towards abstracting away data complexity, making public data more usable for developers and data scientists without requiring deep domain expertise in diverse data formats.

Takeaway: Explore OpenData's API if you need to integrate or analyze public datasets without building custom scraping or parsing logic.

Decoder

Open-core: A business model where a software product's core features are open-source, while additional enterprise-grade features or services are offered commercially.

Original article

OpenData is an open-core platform that makes public datasets easy to search, join, query, visualize, and share through one clean API.

DEVOURED

SiteRows (Tool)

Data webapitools Siterows

SiteRows enables users to query websites directly with SQL for structured data extraction, eliminating the need for custom web scraping code.

What: SiteRows is a tool that allows users to interact with websites using SQL queries to extract structured data. It aims to simplify data retrieval from the web by providing a SQL interface, bypassing the need to write traditional web scraping scripts.

Why it matters: This tool signifies a broader movement towards democratizing data access and reducing the technical barrier for non-specialists to extract information from the web, leveraging familiar interfaces like SQL.

Takeaway: If you need to extract structured data from websites regularly but want to avoid writing and maintaining scraping code, evaluate SiteRows for its SQL-based approach.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Apple might replace aluminum with titanium in future iPhones again, per leak

Design hardwaremobileapple 9to5Mac

Apple is reportedly re-evaluating titanium for future iPhones, like the iPhone Ultra, after thermal and weight issues led them to revert to aluminum for the iPhone 17 Pro.

What: Apple is researching a new, improved titanium alloy for future iPhones, including potential "iPhone Ultra" and Pro models. This comes after the iPhone 17 Pro allegedly switched back to aluminum due to titanium's challenges with heat conductivity and weight, according to leaker Instant Digital.

Why it matters: This reflects Apple's continuous pursuit of premium materials and lightweight design, indicating their commitment to overcoming technical hurdles for desired aesthetic and durability features, even if it means iterative changes in material choices.

Original article

Apple is reportedly researching a new and improved titanium alloy for future iPhones after switching the iPhone 17 Pro back to aluminum, largely because titanium caused thermal and weight challenges. According to leaker Instant Digital, Apple still sees titanium as a premium material and is exploring ways to improve its heat conductivity while keeping its durability and lighter feel, with possible future use in models like the iPhone Ultra and future Pro devices.

DEVOURED

Google's new app icons were desperately needed

Design frontendmobile Creative Bloq

Google is rolling out redesigned Workspace app icons that offer clearer distinctions and improved legibility after widespread criticism of their previous "unified" and indistinguishable designs.

What: Google is deploying new Workspace app icons that maintain a consistent gradient style but feature more unique shapes and colors, aiming for better clarity and accessibility. This update follows years of user complaints that the old "unified" icons were too similar, often prioritizing brand aesthetic over legibility.

Why it matters: This redesign demonstrates a company responding to user feedback regarding usability and accessibility, highlighting the critical balance between consistent brand identity and functional clarity in interface design, especially for widely-used products.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

From faster pencil to AI Experience Architect: a designer's path

Design aicareer uxdesign.cc

AI is shifting the design profession from merely creating screens to a more strategic role of "AI Experience Architect," focusing on designing workflows and systems around AI.

What: The article suggests AI is transforming design, moving it beyond traditional screen and asset creation. Designers now have an opportunity to become "AI Experience Architects," shaping how AI integrates into products, teams, and business decisions, making the role more strategic and systems-focused rather than purely production-focused.

Why it matters: This signals an evolution in the design field, where practitioners need to expand their skills beyond UI/UX to include system-level thinking, AI integration, and strategic product development to remain relevant and impactful in an AI-driven future.

Takeaway: Designers should focus on understanding AI's capabilities and limitations to transition into roles that define AI's strategic application within products and organizations.

Original article

AI is pushing designers beyond simply creating screens and assets toward designing workflows, systems, and organizational processes around AI itself. The biggest opportunity isn't just working faster with AI tools, but becoming someone who shapes how AI fits into products, teams, and business decisions — turning design into a more strategic, systems-focused role rather than a purely production-focused one.

DEVOURED

A Visual Unicode Explorer (Website)

Design webtypography elastiq.ch

Charcuterie offers a visual web-based explorer for Unicode, allowing users to browse characters, discover related glyphs, and explore scripts and symbols.

What: Charcuterie is a new online tool from elastiq.ch that serves as a visual Unicode explorer, making it easier for designers and developers to navigate the vast array of Unicode characters, glyphs, and scripts.

Why it matters: This tool simplifies the process of working with international text and symbols, enhancing accessibility for developers and designers who need to integrate diverse character sets into their applications or websites.

Takeaway: Bookmark Charcuterie if you frequently work with character encoding, special symbols, or internationalization in your web development or design projects.

Decoder

Unicode: An international standard for encoding, representing, and handling text expressed in most of the world's writing systems.
Glyph: The actual shape or representation of a character. A single character (like 'a') can have multiple glyphs (e.g., in different fonts or styles).

Original article

Charcuterie is a visual explorer for Unicode. Browse characters, discover related glyphs, and explore scripts, symbols, and shapes across the standard.

DEVOURED

The Click's clever university branding puts the ‘I' in identity, individual, and Imperial College London

Design brandingeducation It's Nice That

Imperial College London’s new branding, designed by The Click, gives each department a unique "I" logo to foster individual identity and community belonging.

What: UK branding studio The Click created a new visual identity for Imperial College London, introduced on May 20, 2026, which features 24 bespoke "I" logos for its various departments, such as a sleek "I" for aeronautics and a polymer-chain "I" for chemistry. The rebrand focuses on individuality and belonging rather than a louder university-wide logo.

Why it matters: This approach to branding reflects a growing understanding that strong organizational identity can be built through celebrating internal diversity and fostering individual departmental pride, rather than enforcing a singular, monolithic image.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Arts and Culture Slows Down Ageing. So Why Aren't We Doing More of It?

Design researchcareer Creative Bloq

A new University College London study found that engaging in arts and culture weekly slows biological aging twice as much as weekly exercise.

What: Research published in *Innovation in Aging* shows people who participate in arts and cultural activities weekly are biologically a year younger than those who don't, while weekly exercise only grants six months. Author Tom May argues that despite knowing the benefits, people often succumb to digital inertia.

Why it matters: This study provides a scientific basis for the long-suspected benefits of creative and cultural engagement, challenging the common modern habit of passive digital consumption over active, enriching experiences. It highlights the friction between immediate digital gratification and the long-term, tangible benefits of engaging with physical culture.

Takeaway: Schedule arts and cultural activities into your weekly routine, treating them like essential exercise or appointments, to actively combat digital inertia and improve well-being.

Deep dive

A University College London study, published in Innovation in Aging, found that regular engagement in arts and cultural activities slows biological aging.
Participants who engaged in activities like painting, singing, or visiting galleries at least once a week were, on average, biologically a year younger.
This effect is double that of weekly exercise, which only resulted in participants being six months younger biologically.
The anti-aging effect of arts and culture is comparable to the biological difference between smokers and those who have quit.
The article points out that many creative professionals, despite recognizing the value, struggle to prioritize these activities due to inertia and the addictive nature of digital scrolling.
Author Tom May suggests treating arts and culture as essential practices, similar to exercise, by scheduling them explicitly.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

Wikipedia's most underrated logo finally gets the love it deserves

Design brandingweb Creative Bloq

Wikimedia released limited-edition 'Wikipede' merch, capitalizing on fan love for the quirky, unused pixel-art logo concept, after an April Fools' joke went viral.

What: For Wikipedia's 25th anniversary, Wikimedia Foundation is releasing merchandise featuring 'Wikipede,' an early-2000s pixel-art logo concept that never made it to official use. Fan demand for the character surged after a viral April Fools' joke suggested it would become the new Wikipedia logo.

Why it matters: This move demonstrates how online communities can drive brand engagement, turning historical design artifacts into cultural touchstones and valuable merchandise. It highlights the power of nostalgia and niche fan obsessions in branding, particularly for widely recognized platforms like Wikipedia.

Deep dive

Wikipedia is celebrating its 25th anniversary.
Wikimedia Foundation released a 'Baby Globe' mascot for the anniversary.
They also released limited-edition merchandise featuring the 'Wikipede' logo concept.
Wikipede is a quirky, unused pixel-art logo from the early 2000s that has gained significant fan adoration.
Fan interest in Wikipede spiked after Wikimedia played an April Fools' joke, announcing it as the new official Wikipedia logo.
The new merch collection, available on the Wikipedia Store, includes tote bags and mouse pads.

Original article

Full article content is not available for inline reading.

Read the original article →

DEVOURED

E-Hiking Is Here. You Can Tell by My 1,000-Watt Hips

Tech hardwareconsumerdesign WSJ

Personal exoskeletons, previously niche for military and medical use, are now becoming lightweight and affordable for general consumer use, enabling "e-hiking."

What: Personal exoskeletons, such as the Hypershell X Ultra, are transitioning from specialized applications in military, heavy industry, and rehabilitation to being more accessible and affordable for everyday consumers, creating a new category like "e-hiking" for enhanced physical activity.

Why it matters: This indicates a broader trend of advanced assistive technology, once confined to high-cost or specialized fields, becoming consumerized and democratized, potentially leading to new forms of leisure and mobility.

Decoder

Exoskeleton: A wearable robotic device that supports and amplifies human movement, traditionally used for heavy lifting, rehabilitation, or military applications.

Original article

Personal exoskeletons, once reserved for military, heavy industry, and mobility rehabilitation, are now light and affordable enough for regular people who want to feel superhuman.

DEVOURED

New iPhone Ultra leaks cover release timing, display breakthrough, more

Tech mobilehardware 9to5Mac

Apple's rumored foldable iPhone, possibly named "iPhone Ultra," faces indefinite delays due to hinge issues despite reportedly entering trial production.

What: Apple's first foldable iPhone is reportedly in trial production, but significant problems with its hinge mechanism are causing an "infinite delay" to its production process.

Why it matters: This reflects the ongoing engineering challenges in bringing foldable smartphone technology to mass market reliability, even for companies with Apple's resources, and indicates that consumer readiness for foldables may still be several years away.

Original article

Apple's first foldable iPhone is reportedly in trial production, but there are issues with its hinge that are causing an infinite delay to the production process.

DEVOURED

Freckles, tattoos, and imperfect hairlines: inside LEO, the men's hair loss brand that ditches the gloss

Design brandingmarketing Creative Boom

LEO, a new men's hair loss brand, launched a refreshing identity by Creative Spark that ditches typical glossy perfection for honest, relatable imagery and messaging.

What: Manchester agency Creative Spark developed a disruptive brand identity for LEO, a men's medicated hair loss subscription service. The campaign, launched around May 20, 2026, uses candid photography featuring real men with imperfections and a conversational tone, centered on the tagline "Where's your head at?".

Why it matters: This branding strategy reflects a broader trend of brands moving towards authenticity and empathy in marketing, recognizing that consumers, particularly men on sensitive topics like hair loss, respond better to genuine connection than aspirational but unrealistic portrayals. It echoes Dove's "Real Beauty" campaign by valuing relatability over idealized perfection.

Original article

LEO launched a new brand identity by Creative Spark that rejects the glossy, hyper-masculine style typical of men's hair loss advertising in favor of honest, relatable messaging and real-looking imagery. Built around the line “Where's your head at?”, the campaign focuses less on selling perfection and more on helping men feel understood, using candid photography, straightforward language, and a more emotionally open approach to hair loss and self-esteem.

Devoured - May 21, 2026

Stable Audio 3.0

Meet Stable Audio 3.0, the model family built for artistic experimentation with open-weight models

Key Takeaways:

Meet the Stable Audio 3.0 model family

Open for experimentation, with ownership of what you create

Architectural advancements for variation and iteration

Setting the stage for what’s next

Get started with Stable Audio 3.0 now

Anthropic to Pay SpaceX Nearly $45 Billion for Computing Deal

AI Solves a Longstanding Geometry Conjecture

Introducing Agent Executor, Google's distributed Agent Runtime

Introducing Agent Executor, Google’s distributed Agent Runtime

Jaana Dogan

Ethan Bao

Federate with Google’s agent runtime

Own your agents, models, and compute

Scale agents up on Kubernetes with an agent-first compute layer

Get started today

Gemini 3.5 Flash Developer Guide

Coding is solved? Software is not

Coding is not the whole job

Software development reduces entropy

But fast coding can add entropy too

The missing piece: a new workflow

What needs to change?

Context chosen on purpose

Specs that stay with the work

Evidence reviewers can trust

Checkpoints where judgment matters

What we are building for the new workflow

Years-long fight over users' right to tweak smart TV software heads to trial

Software Freedom Conservancy sues Vizio

FSF says there’s “no reason” for code to be withheld

Vizio fights back

Broader implications for smart TVs

One ruling already made

GitHub confirms breach of 3,800 repos via malicious VSCode extension

mondayDB 3 – Solving HTAP for a Trillion-Table System

WrenAI (GitHub Repo)

The open context layer for AI agents over business data.

What WrenAI is

Why agent builders pick WrenAI

With & Without Wren AI

Quickstart

1. Install the skill bundle

2. Ask your agent to set things up

3. (Optional) Enrich the project

4. Ask questions

Two beats: scaffold fast, enrich deep

What's Included

What's next

Documentation

Community

Contributing

Contributors

License

Protocols for transactional usage of object storage

What data agent benchmarks do and don't tell us

OpenAI Reportedly Moves Toward IPO

OpenAI barrels toward IPO that may happen in September

On Building Agents From First Principles

A Bitter Lesson for Data Filtering

A Bitter Lesson for Data Filtering

Submission history

Access Paper:

Lance Unified Multimodal Model (GitHub Repo)

Lance: Unified Multimodal Modeling by Multi-Task Synergy

🌟 Highlights

🎨 Demo

Text-to-Video

Video Editing

Multi-turn Consistency Editing

Intelligent Video Generation

Video Understanding

Text-to-Image Generation

Image Editing

Image Understanding

🚀 Installation

Recommended Environment