Tech startupinfrastructureaicloudhardwarepolicy

The SpaceX IPO and Data Centers in Space

SpaceX is reportedly seeking a "nuts" $2 trillion valuation in its upcoming IPO, largely based on a speculative $26.5 trillion opportunity in space-based AI data centers, despite current revenues and a recent $5.1 billion AI R&D loss from xAI.

Stratechery

Summary

What: Ben Thompson analyzes SpaceX's S-1 filing, which projects a $28.5 trillion total addressable market (TAM), including a $26.5 trillion opportunity for AI, specifically "agentic inference" in space. This is part of its argument for a $2 trillion valuation despite $18.67 billion in revenue and $4.9 billion in losses last year (due to xAI's $5.1 billion R&D expense). He argues that while technically challenging, space-based data centers for agentic inference could become viable, especially given terrestrial constraints like power and zoning.

Why it matters: This article explores how Elon Musk's companies often "change the rules through scale" and how his ability to "transform shared delusion into mass market reality" with investors allows for speculative, ambitious valuations based on future potential rather than current financials, pushing the boundaries of what is considered financially feasible in tech. It also highlights the growing pressure on terrestrial data center expansion, suggesting space as a future alternative.

Deep Dive

SpaceX is preparing for an IPO, seeking a $2 trillion valuation with a projected $28.5 trillion total addressable market, primarily driven by a $26.5 trillion AI opportunity.* This valuation comes despite reporting $18.67 billion in revenue and $4.9 billion in losses last year, with xAI contributing $5.1 billion in R&D expenses.* Ben Thompson critiques the "absurd" numbers but acknowledges Elon Musk's track record of achieving ambitious goals like electric cars and reusable rockets.* The core of the AI opportunity is "agentic inference" in space, which refers to AI workloads where latency is less critical, allowing for different hardware architectures.* Thompson argues that individual Starlink-like satellites could function as "racks" for data centers, interconnected by lasers, with their own solar panels and radiator arrays.* Terrestrial constraints, such as power availability and increasingly difficult zoning for data centers, could make space a necessary alternative for future compute demand.* The article suggests that while highly speculative, the concept of space-based data centers addresses a critical future need for compute capacity and aligns with Musk's pattern of long-term, high-risk bets.

Decoder

Agentic Inference: An AI workload type where AI agents perform tasks without immediate human intervention, making latency less critical and allowing for different, potentially slower and cheaper, memory and compute architectures compared to training or human-facing inference.
S-1 Filing: A registration form filed with the U.S. Securities and Exchange Commission (SEC) by companies planning to go public, providing detailed information about the company's business, finances, and risks to potential investors.
TAM (Total Addressable Market): The total revenue opportunity that is available for a product or service if 100% market share is achieved.
xAI: Elon Musk's artificial intelligence company, established to develop AI models.

Original Article

The SpaceX IPO and Data Centers in Space

It’s hardly the biggest problem in the world — or perhaps the height of privilege to consider it a problem at all — but one of the most annoying consumer experiences is booking an Uber Black and realizing you got assigned a Tesla Model Y (Uber finally stopped allowing new Model Y’s onto Black last year). Buckle up for an uncomfortable back seat, basic plastic finishes, and, all-too-often, potential car sickness from a driver who hasn’t completely mastered the Tesla’s aggressive regenerative braking.

Still, the fact that the Model Y ever made it to the Black level is a testament to the brand Elon Musk built. Back in 2016, when 300,000 people dropped $1,000 each in a matter of hours to reserve an as-yet-unreleased Model 3, I explained that the phenomenon was because It’s a Tesla:

The real payoff of Musk’s “Master Plan” is the fact that Tesla means something: yes, it stands for sustainability and caring for the environment, but more important is that Tesla also means amazing performance and Silicon Valley cool. To be sure, Tesla’s focus on the high end has helped them move down the cost curve, but it was Musk’s insistence on making “An electric car without compromises” that ultimately led to 276,000 people reserving a Model 3, many without even seeing the car: after all, it’s a Tesla.

This is the same brand halo that landed what is, if we’re honest, a pretty basic car on the Uber Black list. What actually makes these cars compelling is the extent to which they are computers on wheels: I know plenty of very rich people who drive a Tesla not for the finishes but rather the Full Self-Driving (Supervised); there is nothing like it on the market, at least when it comes to cars you can own.

Tesla appears to be doubling down on this point of differentiation: the company stopped production of the Models S and X earlier this year, focusing production resources on the CyberCab and robots; if you want your car to drive itself, you’ll get the same model as everyone else. It reminds me of Andy Warhol’s famous quote:

What’s great about this country is that America started the tradition where the richest consumers buy essentially the same things as the poorest. You can be watching TV and see Coca-Cola, and you know that the President drinks Coke, Liz Taylor drinks Coke, and just think, you can drink Coke, too. A Coke is a Coke and no amount of money can get you a better Coke than the one the bum on the corner is drinking. All the Cokes are the same and all the Cokes are good. Liz Taylor knows it, the President knows it, the bum knows it, and you know it.

That “tradition” is scale, and America is indeed better at it than any other country in the world; and, amongst Americans, no one pursues and seeks to leverage scale quite like Musk.

Starlink and Airlines

From a press release from American Airlines:

American Airlines today announced a sweeping modernization of its narrowbody inflight customer experience with the installation of Starlink, the fastest Wi-Fi in the sky, on more than 500 narrowbody aircraft beginning in Q1 2027. Starlink is widely regarded as the world’s most advanced satellite constellation using a low Earth orbit to deliver broadband Internet capable of supporting inflight streaming, online gaming, collaborative meeting tools and more. With thousands of satellites in low Earth orbit, Starlink can deliver multigigabit connectivity to aircraft using its Aero Terminal, which can support up to 1 Gbps per antenna.

“As a premium global airline, we are continuously seeking out world-class partners like Starlink to deliver what our customers need and want,” said American Airlines Chief Customer Officer Heather Garboden. “The addition of Starlink solidifies American as a leading airline in keeping passengers connected in flight.” As part of American’s commitment to an elevated onboard experience, Starlink will enable seamless streaming, browsing and real-time communication capabilities across American’s domestic and short-haul international routes.

I linked to the press release just for the amusement of American Airlines, which has in recent years built its strategy around offering anything-but-premium on routes you need, billing their Starlink deal as a commitment to “an elevated onboard experience.” That may have been the argument for United’s Starlink deal when it was announced in 2024, but by this point it’s tablestakes, which is surely exactly how Musk wants it.

Starlink is the consumer-facing business of SpaceX, generating $8.7 billion in revenue last year and $4.4 billion in profit; while it’s not totally clear exactly how SpaceX accounts for launch costs, obviously Starlink benefits greatly from the fact that it has access to SpaceX’s launch capacity. That launch capacity has resulted in over ten thousand active satellites in low Earth orbit, delivering low latency high speed Internet anywhere in the world — including in the air. That’s the carrot for airlines; the stick is the prospect of everyone else having the same service, and customers making flight decisions based on the quality of Internet access available.

There is a similarity to Tesla in this way. Musk companies at their best don’t win the game; they change the rules through scale, such that billionaires buy economy cars because they actually drive themselves (with supervision), and airlines transform the consumer experience on their own dime. Musk makes all-in bets — whether that be in terms of launch capacity or in autonomous driving — not by making rational short-term business decisions, but by starting with the desired end state and working backwards.

SpaceX’s Silly S-1

Tech has a long history of silly charts — there is an entire category known as Bezos charts — and the SpaceX S-1 has one that made me laugh. It came in the discussion of SpaceX’s total addressable market:

We believe we have identified the largest actionable total addressable market (“TAM”) in human history. We estimate that our quantifiable TAM is $28.5 trillion, consisting of $370 billion in Space from space-enabled solutions; $1.6 trillion in Connectivity across $870 billion in Starlink Broadband and $740 billion in Starlink Mobile as well as additional opportunities in enterprise and government; $26.5 trillion in AI across $2.4 trillion in AI infrastructure, $760 billion in consumer subscriptions, $600 billion in digital advertising, and $22.7 trillion in enterprise applications. For illustrative purposes of sizing our addressable market opportunity, we exclude China and Russia from our global estimates.

This image is approximately to scale vertically, but certainly not horizontally: I could use the help in really wrapping my mind around the $26.5 trillion AI opportunity, given it’s more than 13 times the space and connectivity opportunity combined!

In all seriousness, the numbers are obviously absurd, but then again, everything about this IPO is absurd. SpaceX is seeking a $2 trillion valuation on a mere $18.67 billion in revenue with $4.9 billion in losses last year, and growth actually slowed from 35% to 33%. That slowdown happened despite the addition of xAI (and thus also X), which tipped the company from a small profit to that massive loss, thanks to $5.1 billion in AI R&D expense. That R&D, keep in mind, went towards building a model that is in 5th place, and whose entire founding team recently left the company. But sure, $26.5 trillion AI opportunity!

This is not to say that SpaceX won’t get its desired valuation. Tesla’s valuation never made any sense right up until the Models 3 and Y actually worked out, causing Tesla’s share price to soar (and even then it was hard to ever build a financial model that justified the new share price). Musk’s ability to make his own reality starts with investors; from 2021’s Mistakes and Memes and comparing Apple and Tesla:

This comparison works as far as it goes, but it doesn’t tell the entire story: after all, Apple’s brand was derived from decades building products, which had made it the most profitable company in the world. Tesla, meanwhile, always seemed to be weeks from going bankrupt, at least until it issued ever more stock, strengthening the conviction of Tesla skeptics and shorts.

That, though, was the crazy thing: you would think that issuing stock would lead to Tesla’s stock price slumping; after all, existing shares were being diluted. Time after time, though, Tesla announcements about stock issuances would lead to the stock going up. It didn’t make any sense, at least if you thought about the stock as representing a company.

It turned out, though, that TSLA was itself a meme, one about a car company, but also sustainability, and most of all, about Elon Musk himself. Issuing more stock was not diluting existing shareholders; it was extending the opportunity to propagate the TSLA meme to that many more people, and while Musk’s haters multiplied, so did his fans. The Internet, after all, is about abundance, not scarcity. The end result is that instead of infrastructure leading to a movement, a movement, via the stock market, funded the building out of infrastructure.

I explained in that Article why I generally did not cover Tesla’s financial results, and the reasoning extends to why I don’t expect to cover SpaceX’s: Musk is the master of memes, and is himself a meme. He offers a dream — Mars, fully autonomous vehicles, an addressable market of $28.5 trillion — and positions his companies and their stock as access to that dream, and through the alchemy of capital markets, transforms shared delusion into mass market reality.

Musk’s track record matters in this regard. Building an electric car company was possible, as was full self-driving (supervised); at the same time there were ever increasing government mandates and programs around decreasing emissions that acted as the stick to Tesla’s carrot. Similarly, landing rockets was possible, and the new market creation downstream from correspondingly lower launch costs was comprehensible. That Musk succeeded in both instances gives him the benefit of the doubt.

The question that matters, then, is not if the numbers make sense right now (they absolutely do not); what matters is if the dream is even possible, and if there are actual reasons to think it might happen. I think that data centers in space meet these conditions.

The Case for Data Centers in Space

The first question about data centers in space is if they are even possible, and I think the answer is clearly yes. The key thing to consider is that there is no requirement that these data centers look anything like data centers on earth. On earth we build massive buildings full of GPUs with massive infrastructure for cooling those GPUs and massive power plants (or a connection to a grid which connects to massive power plants) to power those GPUs. The idea of transporting these massive structures to space sounds implausible, and it is!

However, there is no reason that space data centers would look like data centers on earth. What makes far more sense is to think about an individual satellite as something akin to a rack. Right now the largest Starlink satellite in orbit is the V2 Mini Direct-to-Cell, which measures 7.4 meters by 2.7 meters by 0.3 meters (estimated); an NVL72 rack from Nvidia, meanwhile, measures 2.2 meters by 1.1 meters by 0.6 meters, so we’re already in the right size range. The V2 Mini Direct-to-Cell consumes (and dissipates) up to an estimated 25kW of energy; the NVL72 up to 135kW, and it can fit a 1 trillion parameter model quantized to FP4.

The big shortcoming for a rack-satellite is power and its dissipation, but going from 25kW to 135kW is certainly within the realm of possibility — and given that you don’t need much of the cooling and power distribution usage on earth, something closer to 100kW might deliver similar performance. There are other issues to address, including the problem of radiation screwing with calculations, reliability, etc., although those two concerns could be addressed in part by using larger chips (which are less efficient, but also use less power); these rack-satellites will also be disposable, like Starlink satellites, ameliorating reliability issues. The key factor, however, is that a fleet of racks, interconnected with lasers (as Starlink’s already are), each with their own solar panels and radiator arrays for cooling (deploying 200+ square meters of radiators per rack will be a huge challenge), is possible.

The next question about data centers in space is if there is a use case for them — the carrot — and I already made the argument that there is in The Inference Shift. Specifically, there are three types of workloads developing around LLMs: training, answer inference, and agentic inference. From the section making the case for “agentic inference”:

Critically, this articulation of an agentic-specific memory hierarchy implies a necessary trade-off of speed for capacity. Here’s the thing, though: lower speed isn’t nearly as important a consideration if there isn’t a human in the loop. If an agent is waiting around for a job that is being run overnight, the agent doesn’t know or care about the user experience impact; what is most important is being able to accomplish a task, and if entirely new approaches to memory make that possible, then delays are fine.

If delays are fine, then all of the focus on pure compute power and high-bandwidth memory seems out of place: if latency isn’t the top priority, then slower and cheaper memory — like traditional DRAM, for example — makes a lot more sense. And if the entire system is mostly waiting on memory, then chips don’t need to be as fast as the cutting edge either. This represents a profound shift in future architectures, but it also doesn’t mean that current architectures are going away:

Training will continue to matter, and Nvidia’s current architecture, including high-speed compute, large amounts of high-bandwidth memory, and high-speed networking, will likely continue to dominate.

Answer inference will be a meaningful market, albeit a relatively small one, and speed from chips like Cerebras or Groq (I explained how Nvidia is deploying Groq’s LPUs here) will be very useful.

Agentic inference will gradually unbundle the GPU, which alternates between stranding high-bandwidth memory (during the prefill process) and stranding compute (during the decode process), in favor of increasingly sophisticated memory hierarchies dominated by high capacity and relatively lower cost memory types, with “good enough” compute; indeed, if anything it will be the speed of CPUs for things like tool use that will matter more than the speed of GPUs.

At the same time, these categories won’t be equal in size or importance. Specifically, agentic inference will be the largest market by far, because that is the market that won’t be limited by humans or time. Today’s agents are fancy answer inference; in the future true agentic inference will be work done by computers according to dictates given by other computers, and the market size scales not with humans but with compute.

It’s agentic inference that makes the most sense for racks in space, and conveniently enough, that is also the market that is likely to be the largest in the long run.

The third question about data centers in space is if there is a stick. Specifically, while I think that racks-in-space are both a lot more viable than people think, and a lot more relevant to agentic inference than current modes of compute, it is at the end of the day cheaper and easier to build on earth, all things being equal.

All things are not equal, however: right now we are at the very beginning of the AI buildout and already one of the biggest constraints is not just power (expected), but zoning (unexpected). I wrote in an Update last week:

That leads to an interesting contrast to globalization: when companies were closing down American factories and laying off workers and moving operations to China, none of the affected towns or workers had a say. They just suddenly no longer had a job, and a huge number of cities across the Rust Belt no longer had a reason to exist. People simply had to move, or worse, retreat to things like alcohol or drugs.

AI, however, is the opposite: building data centers requires permission, which is to say that people actually have a say. Again, I am not at all saying that these people are well informed about data centers, or about the economic impact on their communities, much less the economic impact of AI generally; what I am noting is that people who didn’t have a say in globalization are suddenly finding they do have a say about AI, and it’s not a surprise they are expressing their disapproval by blocking data centers.

In that Update I made the case that data center builders — and by extension the companies that use them — should straight up pay people for permission to build data centers in their communities. At a minimum, however, that increases the costs of terrestrial data centers. What seems very plausible in the long run is that the demand for compute ends up being so large that there eventually is nowhere left to build, making the vast expanses of space not just an alternative but in fact the only choice.

An IPO Worth Supporting

If all of this happens — and there are a lot of “if”s here! — then suddenly that $2 trillion valuation starts looking reasonable. SpaceX is already monetizing xAI’s first data center, Colossus 1, to the tune of $15 billion/year for 300MW of capacity; that’s 3,000 racks-in-space. Anthropic, meanwhile, will probably make 3x the revenue on that capacity; it remains to be seen if xAI can get back in the state-of-the-art game, but if so then the amount of revenue it can generate per rack-in-space will be commensurately higher. Even without xAI, however, SpaceX has the potential to be a monopoly provider of marginal compute capacity.

There are, needless to say, a massive number of assumptions baked into this argument, including assuming a huge number of engineering challenges are solved, Starship actually works, SpaceX gets sufficient supply of the right kinds of chips, compute demand is massively larger, agentic inference unbundles current architectures, and data center opponents are successful. The risk attached to all of these assumptions should discount the valuation you put on this business, which is to say I still think this IPO is nuts.

At the same time, I’m glad it exists, for multiple reasons. The first one is the most obvious one: Musk, for all of his faults, has already pushed humanity forward on multiple vectors, including electric cars, self-driving, reusable rockets, satellite Internet, etc., and I’m excited to see him try and do more.

The second is that I am in fact concerned about our ability to muster enough compute to fully realize the gains from AI, and am very worried about a replay of nuclear power, where our failure to build denied us the opportunity to even imagine what could be invented in a world of unlimited energy; the fact Musk is proposing an alternative path to unlimited compute is a relief.

The third is that I appreciate the extent to which this IPO is a return to what an IPO should be: the opportunity for people to contribute capital to actually build the business, and to benefit if it works out. As I noted, I can’t make a financial model that necessarily justifies this valuation, particularly based on current financials, but neither can a VC investing in the Series A of a company. SpaceX has already invented a lot, and its early investors are going to make a lot of money with this IPO; at the same time, there is still so much more to invent that there remains a lot of upside — and, to be very clear, a lot of risk. It’s a testament to SpaceX’s ambitions that retail investors get to play VC.

And hey, you get Mars upside for free!

Tech policyinfrastructuresecurityhardware

US Space Force confirms SpaceX will build sensor-to-shooter targeting network

The US Space Force awarded SpaceX a $2.29 billion contract to build the "Space Data Network Backbone," a Starlink-based satellite network for real-time military data distribution, aiming for an operational prototype by late 2027.

Ars Technica

Summary

What: The US Space Force's Space Systems Command announced a $2.29 billion firm-fixed-price contract for SpaceX to develop the Space Data Network (SDN) Backbone, a low-Earth orbit (LEO) satellite constellation based on Starlink technology. This network will provide secure, high-speed tactical communications for US military forces, connecting sensors and weapons systems globally, with a prototype due by the end of 2027. This initiative replaces stalled efforts by the Space Development Agency (SDA).

Why it matters: This contract solidifies SpaceX's critical role in US military infrastructure, moving beyond launch services to become a central provider of secure, space-based communication and targeting networks, indicating a strategic shift towards leveraging established commercial satellite technology for defense. It also raises questions about the future of the Space Development Agency's prior multi-vendor approach.

Decoder

Firm-fixed-price contract: A contract type where the price is not subject to adjustment based on the contractor's cost experience in performing the contract. It places maximum risk and full responsibility for all costs and resulting profit or loss upon the contractor.
Low-Earth Orbit (LEO): An orbit around Earth with an altitude between 160 kilometers (99 miles) and 2,000 kilometers (1,200 miles), often used by satellite internet constellations like Starlink due to lower latency.
Space Development Agency (SDA): A Pentagon office established in 2019 to rapidly procure, develop, and field new generations of missile-tracking and data-relay satellites.

Original Article

SpaceX has won a lucrative contract to provide the US military with a means of distributing space-based sensing and targeting data, forming the “backbone” of a rearchitected network after separate Pentagon initiatives stalled, officials announced Tuesday.

Space Systems Command, the Space Force’s primary procurement and acquisition center, announced the $2.29 billion firm-fixed-price agreement, confirming long-simmering reports that the Pentagon was likely to tap SpaceX for a new communications network in low-Earth orbit. SpaceX’s selection for the Space Data Network (SDN) Backbone contract “accelerates the delivery of a resilient, high-speed communications network in space,” Space Systems Command said in a statement.

The network will be based on technology originally developed for SpaceX’s Starlink global Internet constellation. SpaceX already builds and launches specially designed satellites, called Starshield, for military applications. The SDN Backbone network in low-Earth orbit (LEO) will presumably use the Starshield platform.

“This award will enhance the network with an expanded optically interconnected mesh of satellites delivering worldwide tactical communications and broadband communication services,” Space Systems Command said.

Col. Ryan Frazier, acting Space Force portfolio acquisition executive for Space-Based Sensing and Targeting, said the network “leverages the best of commercial innovation” and will be a “huge benefit and enabler” for US military forces. The network “acts as a core communications layer for the USSF war-fighting systems, ensuring our sensors and shooters are connected continuously, globally and securely,” Frazier said in a press release.

Changing midstream

This may sound familiar to anyone who has kept up with the evolution of a Pentagon office named the Space Development Agency. Established in 2019, SDA started launching prototypes for a constellation of missile-tracking and data-relay satellites in 2023. The idea was to rapidly procure, develop, and field new generations of tracking and data “transport” satellites every two years. SDA’s strategy was to cast a wide net across the US space industry, using satellites and sensors developed by many companies.

But SDA’s architecture stalled, and military officials blamed the delays on bottlenecks in satellite supply chains and difficulties integrating the network and its numerous contractors. The Government Accountability Office last year also identified technical problems that slowed the program’s development and adoption.

The first budget request from the second Trump administration last year revealed a change in the Pentagon’s thinking. In budget documents, White House officials mentioned a new program called “pLEO SATCOM” or “MILNET” while proposing to eliminate funding for the next tranche of data transport satellites from the Space Development Agency. MILNET has since been renamed the Space Data Network.

A stack of 21 SDA data transport satellites manufactured by York Space Systems was launched last September. Credit: York Space Systems

Lawmakers have voiced concerns about moving away from SDA’s original strategy, which leaned on competition and open architectures, and giving the network to a single company. Space Systems Command said Tuesday that the Space Data Network will work with “multiple vendors” with plans to “expand its participants over the summer.” These participants may include companies with their own budding broadband constellations, such as Amazon.

The command did not offer further details on how it will ensure competition or open standards, but officials said SDA’s previous data transport architecture, which still has satellites under construction awaiting launch, will “come together” with the Space Force’s Space Data Network procurement efforts.

“Our acquisition strategy is designed to foster competition and broaden our industrial base,” said Lt. Col. Jeffrey Fry, SDN Backbone system program manager, in a statement. “We aren’t trading speed for scale; we are demanding both.”

The other major line of effort at SDA is focused on deploying a constellation of low-Earth orbit satellites to detect and track missile launches. These tracking satellites will fly much closer to Earth than the Space Force’s legacy missile-warning satellites, with improved capability to monitor emerging threats like hypersonic missiles. SDA’s tracking and transport layers were intended to work together to detect missile threats and provide targeting data for interceptors to take them out. The program predated President Donald Trump’s announcement last year of plans for a Golden Dome missile shield, but SDA’s tracking and data transport network would underpin such a missile defense program.

The Pentagon has not announced any changes to SDA’s missile tracking layer, but the data transport network accounted for the majority of the agency’s satellites. The Space Force’s decision to turn over the data relay backbone to SpaceX will shrink SDA’s portfolio, raising questions about the agency’s long-term future.

While SDN Backbone is a new program, SpaceX brings many of the advantages of incumbency. SpaceX has more than 10,000 Starlink satellites in orbit, primarily for civilian use, and hundreds more Starshield satellites for military use. Starshield satellites already provide connectivity for various military weapons systems, including one-way attack drones used for attacks on Iran.

SDA has awarded contracts to date for approximately 340 data transport satellites. Those satellites, under development by York Space Systems, Lockheed Martin, Northrop Grumman, and Rocket Lab, carried an average cost of about $16 million per spacecraft, significantly more than the cost of a single satellite from SpaceX’s Starlink or Starshield assembly line. SDA has not announced plans to cancel any of its existing data relay satellite contracts.

Last year, when SDN was still known as MILNET, a Space Force commander responsible for operating the military’s communications satellites said the network would comprise some 480 satellites operated by SpaceX and overseen by a military mission director. Ars asked Space Systems Command for an update on the number of satellites for SDN Backbone and who will operate them, and we will provide an update when we receive a response.

Whatever it includes, SpaceX is required to deliver a “fully operational prototype capability” for the SDN Backbone by the end of 2027, the Space Force said. With this delivery, SpaceX will assume a larger role in direct combat support to go along with its position as the world’s leading commercial launch provider and satellite manufacturer.

Tech mobilefrontendaiprivacy

What Apple and Google are doing to your push notifications

Apple and Google are increasingly using on-device AI models like Apple Intelligence and Gemini Nano to parse, rank, summarize, and even rewrite user push notifications, significantly reducing sender control.

Jacques Corbytuech

Summary

What: Apple and Google have been actively intervening in push notifications since 2009, with recent years seeing on-device AI models summarizing, reordering, and rewriting notifications before they reach the user. Apple Intelligence (iOS 18.3) and Google's Gemini Nano stack (Android 14) use small adapters to specialize base models for tasks like summarization and prioritization, with defaults set to summarize.

Why it matters: This trend reflects a broader industry shift where platform owners increasingly mediate content delivery through AI, prioritizing user attention and platform control over direct sender-to-recipient communication. It also signals notifications' evolution into future triggers for AI agents, demanding a shift in how developers design them.

Takeaway: Developers should prioritize concise, fact-led notification titles and consider shifting marketing weight to owned in-app surfaces that bypass platform AI editing. Also, expose app actions via App Intents (iOS) or App Actions (Android) for future AI agent interaction.

Deep Dive

Apple and Google have progressively increased platform intervention in push notifications since 2009, initially for battery management, then for user controls like Android 8's notification channels (2017) and iOS 15's Focus modes (2021).
Recent changes, especially from 2024-2026, involve on-device AI models (Apple Intelligence, Gemini Nano) summarizing, reordering, and rewriting notifications before display, with defaults set to summarize.
The platforms use small, specialized AI adapters for tasks like summarization and prioritization, trained on diverse data mixtures including notification payloads.
These AI-driven edits happen on the device, after transport, making it opaque to senders whether a notification was summarized, deprioritized, or suppressed by user settings or AI.
Visibility for senders is poor; metrics like "delivered-to-device" often don't confirm "displayed-to-user," and there's no API to detect if AI modified a notification.
The article argues senders should focus on fact-led notification content (e.g., "Your delivery is 15 minutes away" instead of "We've got great news!").
Notifications are evolving into triggers for AI agents (e.g., Siri, Gemini) to act on users' behalf, meaning developers should expose app actions via frameworks like Apple's App Intents or Android's App Actions.
Developers are advised to reserve push for re-engaging dormant users and time-critical transactional alerts, shifting cross-sell and discovery to owned in-app surfaces.
Permission requests for notifications should be contextual, after demonstrating value, not at app launch, as opt-in rates have fallen sharply since Android 13 (2022) required explicit user grants.
Segmenting and personalizing notifications significantly improves engagement compared to broad broadcasts, which also risk higher opt-out rates.

Decoder

APNs (Apple Push Notification Service): The proprietary service used by Apple to send push notifications to iOS, iPadOS, macOS, tvOS, and watchOS devices.
FCM (Firebase Cloud Messaging): A cross-platform messaging solution from Google that lets developers reliably deliver notifications to client apps on Android, iOS, and web.
Apple Intelligence: Apple's personal intelligence system for iPhone, iPad, and Mac, featuring generative AI capabilities including on-device summarization and prioritization of notifications.
Gemini Nano: Google's on-device AI model, part of the Gemini family, designed for efficient performance directly on Android devices, used for features like notification summaries and smart replies.
AICore: An Android system service introduced in Android 14 that hosts Gemini Nano and other AI models on the system partition, ensuring privacy and sharing weights across authorized apps.
LoRA (Low-Rank Adaptation): A technique used in machine learning to fine-tune large language models more efficiently by adding small, trainable matrices (adapters) rather than retraining the entire model.
App Intents (iOS): Apple's framework that allows developers to expose app actions and functionality to Siri and Apple Intelligence, enabling voice commands and automation.
App Actions (Android): Google's equivalent to App Intents, allowing developers to define app functionality that can be invoked by Google Assistant and Gemini, enabling users to accomplish tasks hands-free.
Live Activities (iOS): A feature in iOS 16.1+ that allows apps to display real-time information on the Lock Screen and Dynamic Island, bypassing standard notification summarization.
SKAdNetwork: Apple's privacy-preserving framework for attributing app installs and re-engagements without revealing user-level data, particularly relevant after App Tracking Transparency (ATT).

Original Article

Full article content is not available for inline reading.

Read the original article →

Tech aicareerproductivity

Judgment is the skill that matters most in the AI era

Jim Grey contends that "judgment" – the ability to critically evaluate and refine AI output – is the most crucial skill in the AI era, not prompting.

Jim Grey's Blog

Summary

What: Jim Grey argues that while AI makes generating content like words, code, and images incredibly fast and easy, it does not lower the cost of being wrong. He emphasizes that the most valuable skill is critical engagement with AI's output, recognizing errors, superficiality, or when it deviates from truth, a skill he developed by editing authors like David Pogue.

Why it matters: This piece highlights a critical shift in the value chain of creative and technical work with the rise of AI, moving from pure creation to critical evaluation and refinement, suggesting that human discernment remains indispensable for quality output.

Takeaway: When using AI, prioritize spending time critically evaluating and refining its output, rather than passively accepting it, to build and sharpen your judgment skills.

Deep Dive

Jim Grey asserts that AI excels at quickly generating plausible output (words, code, images, analyses), but it does not make judgment easy or cheap.
The most valuable skill in the AI era is the ability to critically engage with AI's output, discerning when it's wrong, shallow, synthetic, or can be improved.
Grey's personal experience as an editor for authors like David Pogue taught him judgment through critically engaging with both excellent and flawed work.
He cites an example where he spent weeks refining a personal essay generated by Claude, significantly altering and adding details the AI missed.
Another example involved using Codex to generate technical documentation, where he and a colleague developed a prompt to make AI explicitly surface its inferences and prevent unverified claims.
Grey warns that AI takes away the "struggle" that traditionally builds judgment, and there's a risk that passive acceptance of AI output will hinder skill development.
He taught IT recruiters to use AI for marketing content, emphasizing editing skills over prompting, helping them recognize synthetic language and factual inaccuracies.
The core implication is that AI lowers the cost of creation but not the cost of being wrong, making critical judgment more important than ever.

Decoder

Codex: An AI model developed by OpenAI, often used for generating code and natural language explanations of code. It's an ancestor of the models powering GitHub Copilot.

Original Article

Judgment is the skill that matters most in the AI era

AI has made generating plausible output incredibly fast and easy. Words, code, images, presentations, analyses — ask for something and the machine will produce it right now.

What AI has not made fast and easy is judgment.

The people who will get the most value from AI are not the best prompters. They’ll be the ones who can critically engage with what comes back, recognizing when it’s wrong, when it’s shallow, when it’s synthetic, and when it can be improved.

I’ve spent the last year using AI heavily in my professional work, and here and there on my personal blog. I came to AI with good judgment built over decades as a writer and engineering leader. If anything, using AI has deepened and sharpened my judgment.

Where judgment comes from

Conventional wisdom says that judgment comes from years of making your own mistakes. The software developer who debugs their own bad code until the lesson is internalized. The writer who rewrites their own bad drafts until they develop an eye for what isn’t working. Failure as the teacher.

But judgment does not require creating the flawed work yourself. It requires sustained critical engagement with flawed work regardless of origin.

Did you know I never set out to be a writer? I fell into it when writing technical manuals was the best job I could get after engineering school. I did that for a few years and built baseline writing and editing skills.

Then I got a job editing technology books for a major publisher. I’ve written before about how I was David Pogue’s editor. His work was excellent and precise, requiring only a light hand and collaborative refinement. I can think of a couple other authors whose work required something closer to reconstruction. Every other author I edited fell somewhere in between.

Pogue showed me what good looked like and how to make it better. Those other authors showed me what broken looked like so I could learn how to fix it.

I learned some editorial judgment by writing my own bad drafts in that first job. But I learned it deeply by engaging critically with output that arrived in front of me.

I don’t want to overplay this hand. I was just 25 and 26 when I edited those books. My judgment has deepened over more than three decades of additional experience, some through making my own mistakes and some through working out the mistakes of others.

But the path to judgment in the AI era is going to be critically engaging with AI’s output.

What critical engagement looks like

Earlier this year I used Claude to help me write a personal essay about Carmel, Indiana. I’d been thinking for 30 years about why I never felt like I fit there. Finally I believed I was close to having the answer. I turned to Claude to help me finish thinking it through and work on an article explaining it.

The process was weeks of conversation, dropping observations and photographs and honest admissions about class and bias and values, asking Claude to hold it all simultaneously while I kept thinking.

Then I asked for a draft. Claude gave me something reasonable to start with, but it didn’t sound like me and made some assertions I wasn’t comfortable with.

I didn’t accept what came back — I wrestled with it. I pushed Claude hard on those assertions, leading me to remove some and sharpen others. I cut whole sections. I wrote new material from scratch. “Bulldozed bones” wasn’t in the draft; I wrote that. The Rottweiler wasn’t in the draft. I put her there. The VW Jetta and Kia Soul weren’t in the draft. Those details came from my life, and I knew they belonged in the article.

Judgment isn’t just about catching errors. It’s also about knowing when technically correct isn’t actually right, and knowing when the output has drifted from your truth or standard even when it hasn’t made a mistake.

The second example comes from my consulting work. I used Codex to generate technical documentation for an entire codebase. It would have taken me weeks, maybe months, to document it manually. But it would have also taken me weeks to months to verify all of AI’s output.

A colleague and I spent considerable time creating the prompt, iterating through failures and correcting what didn’t work. The prompt we arrived at does something specific: it makes Codex surface its inferences explicitly, and it prevents Codex from claiming more than the evidence supports. Documentation generated by that prompt will tell you “this function appears to handle authentication based on naming conventions, but the implementation is unclear” rather than asserting what it can’t verify.

That discipline — making AI transparent about what it doesn’t know — is itself a judgment call. We built it through critical engagement with output that failed, identified the failure mode, and wrote a constraint that prevented it.

A legitimate concern

The concern is that AI takes away the struggle that helps build judgment. The student who has AI write the essay never develops the writer’s eye. The developer who accepts AI-generated code without reviewing it never builds the intuition that catches subtle errors. The cultural pressure runs toward acceptance and speed, not critical engagement.

AI has a way of making mediocre work look finished. The corporate world in particular rewards throughput far more than discernment, and AI makes plausible output cheap.

Trouble is, AI can’t make errors cheap. Judgment remains crucial. Passive acceptance won’t build judgment. Critical engagement will. That’s a habit, and it’s teachable.

Not long ago I taught a group of IT recruiters how to use AI to generate marketing content. They’re not writers, but they don’t have a marketing department. The primary lesson wasn’t prompting. It was editing. How to read AI output and know when it sounds synthetic rather than like them. How to catch claims they can’t stand behind. How to pull a draft back toward something true and human. Through that process, they gradually develop a sharper eye for what good looks like.

That’s the same lesson I learned from David Pogue’s chapters and the difficult authors’ chapters 30 years ago.

The implication

AI lowers the cost and increases the speed of generating words, code, and all kinds of other output. It does not lower the cost of being wrong.

This kind of judgment is learnable and always has been. What’s changed is the source of the output. You will develop this judgment by engaging critically, every time, with everything AI gives you.

The recruiters I taught will get better over time, if they keep at it. The ones who engage critically will get better.

That’s always been how craft works.

I consult with engineering organizations on exactly these problems. If that’s useful to you, I’d enjoy talking. Reach me here.

AI agentsstartupenterprise

More Devins in More Places

Cognition, creators of the AI software engineer Devin, raised over $1 billion at a $26 billion valuation, with customers like Mercedes-Benz cutting project times drastically.

Cognition

Summary

What: Cognition secured over $1 billion in funding at a $26 billion valuation from investors including Lux Capital and General Catalyst. Their AI software engineer, Devin, has helped clients like Mercedes-Benz reduce an eight-month project to eight days, and Itaú fix 70% of security vulnerabilities automatically. Cognition's internal team reports 89% of code commits are by Devin.

Why it matters: This substantial funding and rapid adoption by large enterprises underscore a growing industry trend towards AI-powered autonomous agents significantly streamlining software development, suggesting a future where AI handles a large percentage of code commits.

Takeaway: Evaluate how AI agents like Devin could accelerate your team's software development lifecycle, particularly for legacy modernization or security vulnerability remediation.

Deep Dive

Cognition raised over $1 billion in funding, reaching a $26 billion valuation.
Investors include Lux Capital, General Catalyst, 8VC, Founders Fund, and others.
Devin, an AI software engineer, was launched two years ago.
Enterprise usage has grown over 10x this year, with run-rate revenue at $492 million.
Major clients include Citi, Mercedes-Benz, Goldman Sachs, Elevance, Dell, Santander, U.S. Army, and U.S. Navy.
Mercedes-Benz cut an eight-month legacy modernization project to eight days using Devin.
Itaú, Latin America’s largest bank, uses Devin to automatically fix 70% of security vulnerabilities.
Cognition operates as an "agent lab," collaborating with various foundation model labs to optimize model usage for specific tasks.
They recently launched SWE-1.6, a model known for cost-effectiveness and speed (up to 950 tok/s).
Cognition's internal engineering team reports 89% of code committed is by Devin or local agents in Windsurf.

Decoder

Agent lab: A company focused on developing and deploying AI agents that can autonomously perform complex tasks, often by orchestrating multiple underlying AI models.

Original Article

Cognition has raised over $1B at a $26B valuation led by Lux Capital, General Catalyst, and 8VC, with support from existing investors including Founders Fund, Elad Gil, Alpha Wave, Definition Capital, Positive Sum, Avenir, Vitruvian, Bain Capital Ventures, Conversion Capital, 137 Ventures, Soma Capital, and Omri Casspi. We’re also excited to welcome new investors including Ribbit Capital, Atreides, and Layer Global.

We launched Devin two years ago as the first AI software engineer. Since then, cloud agents have gone from niche to mainstream, and today they are the fastest growing way to create software. Our enterprise usage has grown >10x since the start of this year, and our run-rate revenue grew to $492M.

As we’ve scaled, Cognition has become a trusted partner for the world’s largest and most impactful organizations like Citi, Mercedes-Benz, Goldman Sachs, Elevance, Dell, Santander, the U.S. Army, and the U.S. Navy. Fast-growing startups like Exa, Modal, Eight Sleep, and OpenRouter have also made their software development lifecycle more autonomous with Devin.

Customers are delivering real outcomes with this leverage. Mercedes-Benz cut an eight-month legacy modernization project down to eight days. Systems integrators like Infosys and Cognizant have embedded Devin into how they deliver work to ship projects faster than ever before. Itaú, Latin America’s largest bank, fixes 70% of security vulnerabilities automatically with Devin.

The Independent Agent Lab

Cognition is an agent lab. We work closely with all of the foundation model labs to make sure every Cognition customer gets the best of all models available.

The value of independence is increasing as token usage grows exponentially across the industry.

Teams today care more than ever about the ratio of price to performance, which requires using the right model for the right task. We evaluate model performance across 100+ categories of software engineering tasks, and architect Devin to help engineering teams automatically manage spend.

Earlier this year, we started building out our model training program. We recently launched SWE-1.6, which has become the most used model in Windsurf, and which customers love for both its cost and speed (up to 950 tok/s).

What’s Next

We’re now shifting to a world of self-driving software development. Individual engineers are able to spend more of their time on the creative structuring of problems and tasks, and their army of Devins reliably executes. Our own engineering team provides some of the clearest evidence of this shift: at Cognition, 89% of code committed by our engineers is committed by Devin (and the rest by local agents in Windsurf).

We’re growing our team to match this moment. If you’re excited about building the future of software engineering in a world with many more software engineers, join us. And if you haven’t seen the latest in Devin firsthand, try it out now.

AI researchbiologyopensource

Biohub releases a world model of protein biology

Biohub has open-sourced a "world model" of protein biology, including ESMC, ESMFold2, and ESM Atlas, enabling rapid protein structure prediction and design of new therapeutic binders.

Biohub

Summary

What: Biohub released an open discovery engine for protein biology comprising ESMC, a language model trained on 2.8 billion protein sequences; ESMFold2, a design engine that predicts 3D protein structures with state-of-the-art accuracy, outperforming AlphaFold 3 in antibody-antigen prediction; and ESM Atlas, which maps 6.8 billion protein sequences and 1.1 billion predicted structures. Researchers used ESMFold2 to design protein binders against five cancer and immunology targets in days, achieving high affinity and stability.

Why it matters: This open-source release fundamentally democratizes access to advanced protein prediction and design tools, significantly accelerating drug discovery and biological research by moving time-consuming experimental work into computational design.

Takeaway: Researchers in biochemistry and drug discovery should integrate Biohub's open models into their workflows to accelerate protein structure prediction and therapeutic binder design.

Deep Dive

Biohub released an open discovery engine for protein structure prediction, design, and biological discovery.
The engine consists of three main components: ESMC, ESMFold2, and ESM Atlas.
ESMC is a state-of-the-art language model trained on approximately 2.8 billion protein sequences, internalizing fundamental protein biology rules.
ESMFold2 is a design engine that transforms ESMC's sequence representations into atomically-resolved 3D structures of biomolecular complexes.
ESMFold2 demonstrated state-of-the-art accuracy in structure prediction, particularly for protein-protein and antibody-antigen interactions, outperforming AlphaFold 3 in certain benchmarks.
Researchers used ESMFold2 to computationally design protein binders against five cancer and immunology targets (EGFR, PDGFRβ, PD-L1, CTLA-4, CD45) in days, yielding functional, high-affinity, specific, and stable binders in lab experiments.
ESM Atlas makes ESMC's representations navigable across 6.8 billion protein sequences and 1.1 billion predicted structures, organizing proteins by learned relationships and surfacing unannotated biological connections.
All three models are freely available to the global scientific community via the Biohub Platform.
The goal is to accelerate the path from protein biology to binder design, transforming initial empirical screening into computation-guided design.
Dr. Priscilla Chan, Biohub Co-Founder, emphasizes the commitment to open science to accelerate discovery and personalized cures.

Decoder

Protein binder: A molecule, typically a protein or antibody, designed to specifically and tightly attach to another molecule (the target), often used in therapeutics to block or stimulate biological pathways.
Antibody-antigen interaction: The specific binding between an antibody and an antigen, crucial for immune responses and a key target in drug development.
AlphaFold 3: A previous version of an AI system developed by DeepMind/Isomorphic Labs for predicting protein structures.

Original Article

Biohub today announced the release of a world model of protein biology: a scientific engine for prediction, design, and discovery that can map proteins across the tree of life, predict their structures, and design new protein binders that function in laboratory experiments.

Proteins are the machinery of life. Nearly every function of the human body depends on them. They are among the most important targets in medicine, yet designing functional, stable proteins that work as intended in the body is an immense scientific and technical challenge.

Today, Biohub is making available to researchers everywhere an open discovery engine for protein structure prediction, design, and biological discovery built around three releases: ESMC, ESMFold2, and ESM Atlas:

The core scientific hypothesis of ESM is that training a language model across the sequences of all life will cause it to internalize the fundamental properties that govern protein biology — the rules underlying how proteins fold, interact, and function across all of life. At its foundation is ESMC, a state-of-the-art language model that represents proteins, trained on approximately 2.8 billion sequences drawn from across all of life.
ESMFold2 is the design engine built to transform ESMC’s sequence representations into atomically-resolved 3D structure of biomolecular complexes. In experiments described in a preprint posted today, researchers used ESMFold2 to design protein binders against five targets central to cancer and immunology — a computational search completed in days, rather than several months or years. The lab-validated binders exhibited high affinity, specificity, and stability — properties critical for clinical utility — and showed minimal similarity to sequences in public databases, suggesting the model is producing de novo solutions, rather than retrieving known binders.
ESM Atlas makes ESMC’s representations navigable across 6.8 billion protein sequences and 1.1 billion predicted structures — the largest application of AI to protein biology to date. It organizes proteins by relationships the model has learned, surfacing connections that existing databases have not captured, including evolutionary links between gene-editing enzymes spread across distant branches of life. Much of that biology has never been annotated. For researchers working on diseases where the biology is poorly understood, it makes uncharacterized biology searchable.

All three are freely available to the global scientific community at Biohub Platform.

“Designing the interactions between proteins is a fundamental problem in biochemistry, and critical for the design of medicines. What we’ve shown is that these models have learned such a high-fidelity world model of biology that you can design protein interfaces computationally, take them into the laboratory, and they function as predicted.”
— Alex Rives, Head of Science, Biohub

ESMFold2: A faster path from protein biology to binder design

ESMFold2 is an open, state-of-the-art structure prediction model that translates knowledge of patterns across evolution encoded in ESMC into precise, atomic-resolution 3D models of proteins and their interactions. It leads across standard protein folding benchmarks at predicting protein-protein and antibody-antigen interactions.

Antibody-based therapies have become a cornerstone of modern medicine, accounting for roughly one quarter of all new FDA drug approvals, spanning cancers, autoimmune diseases, and conditions that once had few treatment options. Finding a viable therapeutic candidate depends on identifying molecules that bind tightly and specifically to a disease target; a single preclinical binder candidate typically takes three to four years to develop. ESMFold2, which predicts the structural configurations most likely to achieve high affinity for a given target, can move much of the initial search into computation, producing experimentally testable designs in days.

Biohub researchers used the model to design protein binders against five targets at the center of cancer and immunology research — EGFR and PDGFRβ (implicated in tumor growth), PD-L1 and CTLA-4 (immune checkpoints that cancer cells exploit to evade detection), and CD45 (a regulator of immune cell signaling). Designs achieved hit rates of 36–88% for compact minibinders and 15–29% for antibody-derived formats, with confirmed binding in laboratory experiments. For PD-L1, designed binders restored T cell signaling in laboratory tests, blocking the same pathway that approved checkpoint therapies target.

ESMFold2 changes the accuracy and speed of early therapeutic binder discovery, transforming the initial search from largely empirical screening into computation-guided design that takes hours or days.

“Biohub was built on the belief that open science accelerates discovery. Making these tools freely available means researchers everywhere can move faster toward personalized cures that work for individual patients, because they target the specific biology driving their disease.”
— Dr. Priscilla Chan, Biohub Co-Founder

A shared, open scientific ecosystem built on a world model of protein biology

The world model of protein biology is trained on the evolutionary record of life itself, billions of protein sequences spanning the full breadth of life, including bacteria in deep soil, organisms in extreme environments, and the more than 20,000 types of proteins found in the human body. Its training objective is simple: predict the amino acids that evolution selects. Because evolution tends to preserve proteins that are fit for purpose, the patterns preserved across billions of years of data implicitly encode the physical rules governing protein function. What this work shows is that from this training, a world model emerges — one that has internalized those rules deeply enough to generate functional proteins from scratch.

Biohub’s mission is to cure and prevent disease. We believe the path to that goal is understanding biology at its deepest level — and making the tools of that understanding available to every scientist. Together, ESMFold2, ESMC, and ESM Atlas constitute a state-of-the-art, openly available ecosystem for protein structure prediction and design — a shared foundation for any researcher working on fundamental biology or the development of new therapeutics.

###

About Biohub

Biohub is a 501(c)(3) biomedical research organization building the first large-scale initiative to combine frontier AI and frontier biology to solve disease. With its compute capacity, AI research and engineering, and state-of-the-art technology for measuring, imaging, and programming biology, Biohub is enabling scientists worldwide to use AI-powered biology to study how cells operate and organize as systems — ultimately understanding why disease happens and how to cure or prevent it. Learn more at biohub.org.

Press Contact
press@biohub.org

News

May 27, 2026
Nature: Move over, AlphaFold: open source model predicts shape of 1 billion proteins

The new open-source atlas, generated by an AI tool called ESMFold2, vastly increases the known protein universe.
Press
May 27, 2026
Biohub releases a world model of protein biology

Biohub’s open models map the protein universe and design functional binders with therapeutic-level affinity in the lab.
News
May 18, 2026
Biology’s blind spot

Inflammation drives nearly every major disease, yet we’ve never been able to directly watch it progress in living tissue. These researchers are building the technologies to change that.
Blog

News

View all

AI llmstartupenterprisepricing

I think Anthropic and OpenAI have found product-market fit

Simon Willison argues that Anthropic and OpenAI have found product-market fit by aggressively increasing API pricing for enterprise customers using coding/general-purpose AI agents.

Simon Willison’s Weblog

Summary

What: Simon Willison observes that both Anthropic and OpenAI have switched enterprise pricing from fixed seats to API token usage, with Anthropic's change in November 2025 and OpenAI's in April 2026. This aggressive pricing, combined with higher token consumption by coding agents like Claude Code and Codex, is leading to significant enterprise costs, as exemplified by a personal API cost of over $2,000/month for tools he pays $200 for.

Why it matters: This shift from discounted enterprise seats to usage-based API pricing, driven by the token-hungry nature of coding agents, signals a crucial inflection point where frontier AI labs are finally converting widespread adoption into substantial, potentially profitable, revenue streams from enterprise customers.

Takeaway: Enterprises should audit their AI agent usage and re-evaluate their LLM spending, as the era of heavily discounted enterprise AI access may be ending, especially for coding-intensive workflows.

Deep Dive

Simon Willison posits that Anthropic and OpenAI have achieved product-market fit, particularly with coding and general-purpose AI agents.
Both companies have transitioned their enterprise pricing models from fixed-seat discounts to API token usage.
Anthropic made this change for its Enterprise plan around November 2025, and OpenAI followed in April 2026 for its various enterprise tiers.
New frontier models like GPT-5.5 (April 23) and Opus 4.7 (April 16) also launched with higher API prices.
Willison's personal usage, paying $200/month for Max/Pro plans, would cost over $2,000/month if charged at API rates, highlighting the cost disparity.
He argues that coding agents are driving this revenue, as they consume vastly more tokens and are used by highly compensated professionals.
The article counters "AI failure stories," suggesting Uber's budget overruns and Microsoft's Claude Code cancellations are actually signs of high demand and aggressive pricing rather than disillusionment.
Anthropic's $1.25 billion per month compute agreement with SpaceX through May 2029 for inference capacity (COLOSSUS and COLOSSUS II) indicates the massive scale of their operations.
The shift to enterprise API revenue suggests AI labs are moving to cut out middlemen like Cursor and GitHub Copilot, which historically relied on their APIs.
April 2026 is identified as a new "inflection point" where the revenue implications of agent systems are becoming clear.
The author expects upcoming IPO S-1 documents from Anthropic and OpenAI to confirm these revenue trends.

Decoder

Product-market fit: The degree to which a product satisfies a strong market demand; a common startup milestone indicating a sustainable business model.
Coding agent: An AI system, like Anthropic's Claude Code or OpenAI's Codex, designed to perform programming tasks, from generating code to debugging and managing development workflows.
S-1 document: A registration form required by the U.S. Securities and Exchange Commission (SEC) for U.S. companies planning to go public, providing detailed financial and business information.

Original Article

I think Anthropic and OpenAI have found product-market fit

Anthropic are strongly rumored to be about to have their first profitable quarter. Stories are circulating of companies surprised at how expensive their LLM bills are becoming from usage by their staff. I think this is because OpenAI and Anthropic have both found product-market fit.

Enterprise customers are now paying API prices
I think they’ve found product-market fit
And they’re ramping up
The AI-failure stories around this are pretty thin
We also know the labs are spending a lot
API revenue is becoming less important
April is a new inflection point

Enterprise customers are now paying API prices

I currently subscribe to the $100/month Max plan from Anthropic and the $100/month Pro plan from OpenAI. If you are a heavy user of coding agents these plans are a fantastic deal. I just ran the ccusage tool on my laptop to get an estimate of how much I would have spent if I were to pay for API tokens in the past 30 days and got:

$1,199.79 for Anthropic Claude Code
$980.37 for OpenAI Codex

That’s $2,180.16 worth of tokens for $200—not bad at all! I’m a moderately heavy user of these tools, but I’m certainly not running agents every hour of the day and night.

I had assumed that companies making extensive use of agents were getting similar discounts. It turns out I could not have been more wrong about that.

I haven’t been able to track down the exact date, but at some point in the last six months Anthropic switched their Enterprise plan (originally “Claude seats include enough usage for a typical workday” back in August 2025) to $20/seat/month plus API pricing for usage. This story about the change from The Information is dated Apr 14, 2026, but cites an Anthropic spokesperson claiming that the pricing change occurred in November 2025. Existing customers are finding out about the change as they renew their contracts.

OpenAI made a similar pricing change in April. The Codex rate card (Internet Archive copy) currently says:

Note: On April 2, 2026, we updated Codex pricing to align with API token usage, instead of per-message pricing. This change was applicable to new and existing Plus, Pro, ChatGPT Business and new ChatGPT Enterprise plans.

On April 23, 2026, we made this update for all existing ChatGPT Enterprise plans as well, inclusive of Edu, Health, Gov, and ChatGPT for Teachers.

It’s a little harder to decode as they quote prices in “credits”, but as far as I can tell those credit costs are an exact match for the API token costs listed for those models.

All of which is to say that as of April 2026 the “Enterprise” cost for both OpenAI Codex and Anthropic Claude Code/Cowork is the same as the listed API price.

GPT-5.5 (released April 23rd) is 2x the API price of GPT-5.4. Opus 4.7 (April 16th) is around 1.4x the price of Opus 4.6 when you take their new tokenizer into account.

So April saw both leading model companies release new frontier models with a higher API price, and both companies now have measures to lock their enterprise customers (who tend to sign year-long deals) at those API prices, not the previous extreme discounts.

I think they’ve found product-market fit

Why these sudden aggressive moves on pricing? Both Anthropic and OpenAI are planning to IPO, but I suspect there’s a more important factor here: I think they’ve finally found product-market fit, with the coding/general-purpose agent products embodied by Claude Code/Cowork and Codex.

Tools like ChatGPT are wildly popular, but that wild popularity has been difficult to turn into revenue. In February OpenAI boasted more than 900 million weekly active users for ChatGPT, but only 50 million—5.6% of that—were paying consumer subscribers.

Charging $10-$20/month per user is an OK business, but you’d need 1-2 billion subscribers sticking around for four years to cover $1 trillion in infrastructure.

Companies spending $200+/month/user will get you there a whole lot faster—and as noted above, as a power-user I’m at ~$1,000/month in API costs per vendor already.

Coding agents really did change everything. These are tools which burn vastly more tokens, but are also quickly becoming daily drivers for the work carried out by extremely well-compensated professionals. Right now that’s still mostly software engineers, but a coding agent is a tool that can automate anything you can do by typing commands into a computer... so they are clearly applicable to a much wider set of skilled knowledge workers.

As I’ve discussed on this site at length, the models released in November 2025 elevated agents to being genuinely useful. We’ve had six months to get used to that idea now—it’s no wonder companies are beginning to spend real money on this technology.

You could argue that ChatGPT achieved product-market fit when it became the fastest-growing consumer app in history back in February 2023... but it certainly wasn’t making any actual money back then. Coding agents plus enterprise pricing marks the point when these companies start making very real revenue. Maybe even enough to start covering their costs!

And they’re ramping up

As further evidence that enterprise agents represent product-market fit for these companies, consider their open job listings.

OpenAI have 703 open jobs right now, of which I’d categorize 229 (32.6%) as relating to enterprise sales and support—account executives, “Go To Market”, “Forward Deployed Engineers” and the like.

Anthropic have 390 open jobs, 105 (26.9%) of which look enterprisey to me.

It’s pleasingly ironic that these AI labs have picked a business model with such a heavy demand on human labor—enterprise sales contracts don’t close themselves without a whole lot of humans in the mix!

(I ran this analysis by scraping their job sites with Claude Code, then having it use Datasette’s JSON API to pipe that data into Datasette Cloud where I used Datasette Agent for the analysis, exported here. Dogfood!)

The AI-failure stories around this are pretty thin

I started digging into this in response to a growing volume of stories claiming that large companies were sounding the alarm because their AI usage costs had grown so large.

The most widely cited of these stories appear quite overblown to me.

The most discussed has been Uber, based on this report where CTO Praveen Neppalli Naga indicated that Uber had “maxed out its full year AI budget just a few months into 2026”, mostly thanks to Claude Code.

Given that Claude Code only got really good in November it’s entirely unsurprising to me that a budget set in 2025 may have failed to predict demand for that tool in 2026!

That Uber story was further fueled by comments made by Uber’s COO, Andrew Macdonald, on the Rapid Response podcast. I tracked down the segment and there really isn’t much there. Here’s what Andrew said:

But then you sometimes go and talk to your senior engineering leaders and you’re saying, OK, how many projects that were on the cutting room floor got moved above the line because of the productivity gains because 25% of our code commits were via Claude Code last quarter?

That link is not there yet, right? I think maybe implicitly there’s more that is getting shipped. But it’s very hard to draw a line between one of those stats and, OK, now we’re actually producing like 25% more useful consumer features, right? And that line is hard to draw.

Somehow this fragment turned into headlines like Uber’s COO says it’s getting harder to justify the money spent on AI tokenmaxxing, because the market for stories about AI failures remains enormous.

The other popular story around this is Microsoft starts canceling Claude Code licenses, ostensibly to encourage their engineers to dogfood their own Copilot CLI agent instead—but The Verge reporter Tom Warren says “sources tell me the decision is also a financial one”, triggered by the June 30th end of Microsoft’s financial year.

I think both of these stories support my “product-market fit” hypothesis. The best advice I ever heard on pricing a product was that your customer should suck air through their teeth and then say yes. Uber’s budget overrun and Microsoft’s seat cancellations look like that effect playing out in practice.

We also know the labs are spending a lot

The big AI labs spend billions of dollars on both training and inference. Credible figures are hard to come by, but we did get one huge hint as to the figures involved from, oddly enough, the recent SpaceX S-1:

[...] in May 2026, we entered into Cloud Services Agreements with Anthropic PBC (“Anthropic”), an AI research and development public benefit corporation, with respect to access to compute capacity across COLOSSUS and COLOSSUS II. Pursuant to these agreements, the customer has agreed to pay us $1.25 billion per month through May 2029 [...]

The Anthropic announcement said that this deal meant they could “increase our usage limits for Claude Code and the Claude API”, heavily implying that Colossus is being used for inference, not model training.

Anthropic already have vast amounts of compute from other providers. The fact that they’re willing to spend $1.25 billion per month for extra capacity from just one of their vendors hints at how big these inference budgets have become.

API revenue is becoming less important

Over the past two years my impression has been that OpenAI made more of their income from subscription revenue while Anthropic made more from their API.

Anthropic’s API revenue was historically quite dependent on a small number of large API customers—this VentureBeat story from August 2025 quotes “sources familiar with the matter” suggesting that just Cursor and GitHub Copilot were responsible for $1.2 billion of the company’s then-$4 billion revenue.

Today Anthropic are rumored to hit $10.9 billion in the second quarter, potentially even operating at a profit for the first time.

This pivot-to-Enterprise suggests that the labs have realized that the real money lies in cutting out the middlemen. Anthropic’s Claude Code directly competes with Cursor and Copilot. No wonder Cursor are investing in their own models!

April is a new inflection point

I’ve called November 2025 the November inflection point because that was when GPT-5.1 and Opus 4.5, combined with their respective coding agent harnesses, got good—good enough that we’ve spent the last six months adapting to agent systems that can reliably get useful work done.

I think April 2026 is a new inflection point where the revenue implications of this have started to land, to the benefit of the frontier AI labs and with material impacts on the budgets of large companies.

We’ll know for sure how real this moment is when the S-1 documents for the upcoming Anthropic and OpenAI IPOs give us some real, audited numbers to get our teeth into.

AI researchhardwarecomputer-vision

NVIDIA's LocateAnything for Faster Grounding

NVIDIA introduces LocateAnything, a vision-language grounding framework that achieves over 10x faster decoding by predicting bounding boxes in parallel, rather than token-by-token.

NVIDIA Research

Summary

What: NVIDIA's LocateAnything is a new VLM (Vision-Language Model) framework leveraging Parallel Box Decoding (PBD) to decode entire bounding boxes or points as atomic units in a single step. This method, combined with the 138 million sample LocateAnything-Data dataset, significantly improves decoding throughput to 12.7 BPS (boxes per second) on an H100 GPU and enhances high-IoU localization quality across benchmarks like LVIS and COCO, outperforming Qwen3-VL and Rex-Omni.

Why it matters: This research addresses a fundamental bottleneck in VLMs by moving away from autoregressive token decoding for spatial information, highlighting the industry's drive for faster, more efficient, and robust multimodal AI, particularly for real-time applications like robotics and embodied agents.

Deep Dive

LocateAnything uses Parallel Box Decoding (PBD) to predict bounding boxes and points as atomic units in one step.
This contrasts with traditional VLMs that serialize 2D boxes into multiple 1D tokens, causing a sequential generation bottleneck.
Built on a Moon-ViT vision encoder and Qwen2.5 language decoder.
Achieves 12.7 BPS on an NVIDIA H100 GPU, 10x faster than Qwen3-VL (1.1 BPS) and 2.5x faster than Rex-Omni (5.0 BPS).
Improves mean F1 by +3.8% on LVIS and +1.8% on COCO compared to Rex-Omni.
Introduces LocateAnything-Data, a large-scale dataset with 138 million training samples and 785 million boxes for diverse localization tasks.
Supports flexible inference modes: Fast (MTP for max throughput), Slow (NTP for max stability), and Hybrid (defaulting to Fast, falling back to Slow for irregularity/ambiguity).
Excels in various tasks including general object detection, GUI element grounding, referring comprehension, text localization (OCR), and layout grounding.

Decoder

VLM (Vision-Language Model): An AI model capable of understanding and processing both visual information (images, videos) and natural language text, often bridging the two modalities.
Grounding: The task of connecting elements in a natural language query to specific regions or objects within an image.
Bounding Box: A rectangular coordinate that defines the location and extent of an object in an image.
IoU (Intersection over Union): A metric used to evaluate the accuracy of object detection models, measuring the overlap between a predicted bounding box and a ground truth bounding box.
Autoregressive: A model that predicts future values based on its own past outputs, generating sequences token by token.
Moon-ViT: A type of Vision Transformer, a neural network architecture for image recognition.
Qwen2.5: A large language model developed by Alibaba Cloud.
MLP Projector (Multi-Layer Perceptron Projector): A neural network component that maps features from one domain (e.g., visual tokens) to another (e.g., language decoder input).
BPS (Boxes Per Second): A measure of throughput for object detection and grounding models, indicating how many bounding boxes can be processed per second.

Original Article

Overcoming Autoregressive Bottlenecks in VLM Grounding

Vision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are learned and decoded largely independently. This token-by-token decoding mismatches the coupled structure of box geometry and creates a practical inference bottleneck due to strictly sequential generation.

We introduce LocateAnything, a unified generative grounding and detection framework based on Parallel Box Decoding (PBD). By decoding geometric elements such as bounding boxes and points as atomic units in a single step, LocateAnything preserves intra-box geometric coherence and unlocks substantial parallelism. We show that PBD improves both decoding throughput and localization accuracy.

We further develop a scalable data engine and curate LocateAnything-Data, a large-scale dataset with more than 138 million training samples, substantially increasing data diversity for high-precision localization. Extensive evaluations show that LocateAnything advances the speed–accuracy frontier, achieving significantly higher decoding throughput while improving high-IoU localization quality across diverse benchmarks. The results highlight the complementary benefits of Parallel Box Decoding and large-scale training data in enabling efficient and precise unified visual grounding and detection.

LocateAnything: Parallel Box Decoding

To reconcile high-throughput decoding with reliable localization, we propose LocateAnything, a unified framework for VLM-based visual detection and grounding built upon Parallel Box Decoding (PBD).

Box-Aligned Atomic Units

Input: An image and a natural language text query. The vision encoder extracts visual tokens at native resolution, preserving fine-grained spatial details for high-precision localization.
Parallel Decoding: LocateAnything treats each bounding box (or point) as an atomic unit of constant length and predicts the full coordinate set (x1, y1, x2, y2) in one parallel step, avoiding arbitrary chunking of coordinate tokens.
Architecture: Built upon a Moon-ViT vision encoder and a Qwen2.5 language decoder, bridged by a MLP projector, directly converting visual tokens into a sequence of box-aligned block-level predictions.

Flexible Inference Modes

Fast Mode (MTP): Predicts full boxes in parallel for maximum throughput, suitable for latency- and compute-constrained settings such as on-device robotics and embodied agents.
Slow Mode (NTP): Decodes coordinate tokens autoregressively for maximum stability, appropriate for high-precision labeling, dataset curation, and accuracy-oriented offline evaluation.
Hybrid Mode: Uses Fast Mode by default and falls back to Slow Mode when format irregularity or spatial ambiguity is detected, preserving most speed gains while maintaining robust outputs.

On-Demand Inference: Corrected NTP Re-decoding

When parallel decoding encounters Format Irregularity (malformed syntax at category boundaries) or Spatial Ambiguity (intermediate coordinates between densely arranged objects), the compromised block is discarded and generation reverts to the last verified prefix. NTP then autoregressively generates tokens for the problematic block before switching back to MTP.

138M Diverse Language Queries and 785M Boxes

To train a highly capable model for general-purpose visual detection and grounding, we curate LocateAnything-Data, a multi-domain dataset encompassing 12M unique images and massive, dense supervisory signal spatial signals.

General Object Detection

66.9% of queries and 83.1% of boxes. Provides essential bounding box supervision for precise and dense coordinate alignments.

GUI Element Grounding

16.5% of queries. Enables the model to support embodied agents and graphical user interface navigation tasks.

Referring Comprehension

7.3% of queries. Links complex natural language intents to specific spatial regions within images.

Text Localization (OCR)

3.6% of queries. Perceives and tightly grounds textual information within images.

Layout Grounding

3.5% of queries. Enriches the structural reasoning capabilities for document and scene layout understanding.

Point-Based Localization

2.2% of queries. Refines spatial precision for fine-grained coordinate predictions.

State-of-the-Art Visual Grounding & Detection

We report accuracy metrics and throughput (BPS, measured on a single NVIDIA H100 GPU) of LocateAnything under the default Hybrid Mode. LocateAnything achieves 12.7 BPS, over 10× faster than textual-based Qwen3-VL (1.1 BPS) and 2.5× faster than quantized-based Rex-Omni (5.0 BPS).

High-Quality Multi-Object Detection

Results on LVIS and COCO. LocateAnything improves the mean F1 by +3.8% on LVIS and +1.8% on COCO compared to Rex-Omni at identical model size, with particularly strong gains at high IoU thresholds (31.1 vs. 20.7 at IoU=0.95 on LVIS). Dense Object Detection. On dense detection benchmarks Dense200 and VisDrone, LocateAnything achieves 58.7 and 39.9 mean F1 respectively, substantially outperforming Rex-Omni (58.3 / 35.8), demonstrating superior boundary delineation in heavily overlapping environments.

Precise Open-World Localization

GUI Grounding (ScreenSpot-Pro). LocateAnything achieves a SOTA mean F1 of 60.3, surpassing generalist VLMs like Qwen3-VL-30B-A3B and specialized models such as GUI-Owl-32B, with particularly strong performance on icon-based queries. Layout Grounding & OCR. LocateAnything establishes new standards on document understanding: 76.8 and 70.1 mean F1 on DocLayNet and M6Doc respectively, outperforming Rex-Omni by substantial margins (+6.1 / +14.5). On TotalText OCR, it achieves 43.3 mean F1, surpassing all compared methods. Referring Expression Comprehension. LocateAnything seamlessly aligns nuanced human intents with visual regions, achieving 78.7 mean F1 on HumanRef and remaining highly competitive on RefCOCOg against top-tier models. Point-Based Localization. Evaluation on point-based grounding across COCO, LVIS, Dense200, VisDrone, HumanRef, and RefCOCOg benchmarks.

Analyzing Design Choices and Decoding Efficiency

We conduct ablation studies on the COCO dataset to validate our core designs across coordinate representation, MTP formulation, decoding mode, box output order, and throughput scaling.

Coordinate Representation, MTP Formulation & Decoding Modes. (a) PBD (Slow Mode) achieves the highest F1 of 52.1, proving box-aligned formulation provides stronger supervision than 1D serialization. (b) PBD dramatically outpaces structure-agnostic MTP methods (16.9 BPS vs. 5.5 BPS for SDLM-B6) while improving F1. (c) Joint training pushes Slow Mode to 52.1 F1; Hybrid Mode preserves most speed gains (13.2 BPS) at 51.6 F1. Decoding Mode Comparison. Joint dual-formulation training successfully pushes the Slow Mode upper bound from 50.1 to 52.1 F1. Hybrid Mode seamlessly resolves the speed-accuracy trade-off, achieving robust high-precision localization while preserving most speed gains. Box Ordering & Decoding Throughput. Left: X-Y Corner Order sorting yields the highest F1-score among four spatial ordering strategies. Right: As target boxes increase from 20 to 300, NTP methods suffer from severe latency bottleneck, while Parallel Box Decoding achieves a 2× to 6× speedup, scaling throughput from 12 BPS to ~25 BPS in dense scenes.

High-Quality Grounding In The Wild

LocateAnything achieves precise visual grounding across document understanding, GUI interaction, and object detection tasks.

Dense Object Detection High-precision OCR Referring Expression Comprehension

LocateAnything

If you find LocateAnything's parallel box decoding useful for your research, please consider citing our work.

@article{wang2025locateanything,
  title   = {LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding},
  author  = {Shihao Wang and Shilong Liu and Yuanguo Kuang and Xinyu Wei and Yangzhou Liu and Zhiqi Li and Yunze Man and Guo Chen and Andrew Tao and Guilin Liu and Jan Kautz and Lei Zhang and Zhiding Yu},
  journal = {arXiv preprint arXiv:2605.27365},
  year    = {2026},
}

AI hardwarepolicystartupenterprise

Nvidia bets $150B on Taiwan as Trump's plan to make US an AI hub backfires

NVIDIA CEO Jensen Huang announced a $150 billion annual investment to establish a new headquarters in Taiwan by 2030, solidifying the island's role as the "epicenter" of AI manufacturing, potentially conflicting with former President Trump's "America First" chip initiatives.

Ars Technica

Summary

What: Jensen Huang stated that NVIDIA will invest $150 billion annually in Taiwan to create a new headquarters and expand partnerships with TSMC, Foxconn, Wistron, and Quanta Computer. This move aims to leverage Taiwan's advanced packaging technology and supply chain for AI chips, ensuring NVIDIA can meet surging demand, especially for upcoming systems like Vera Rubin, despite Trump's earlier push for US domestic manufacturing.

Why it matters: This decision underscores the continued, arguably irreplaceable, strategic importance of Taiwan in the global semiconductor and AI supply chain, even as major nations like the US attempt to onshore production, revealing the deeply entrenched and specialized nature of advanced manufacturing.

Deep Dive

NVIDIA CEO Jensen Huang announced $150 billion annual investment in Taiwan, establishing a new HQ by 2030.
The investment aims to secure Taiwan's position as the "epicenter" of AI manufacturing.
NVIDIA previously invested $10-15 billion annually in Taiwan, now scaling up due to demand.
The move supports expansion of partnerships with TSMC and other key players like Foxconn, Wistron, and Quanta Computer.
This strategy leverages Taiwan's advanced packaging technology, which is not yet available at TSMC's US factories.
Huang's actions appear to prioritize global supply chain efficiency over former President Trump's "America First" policies for domestic AI chip manufacturing.
Trump's previous attempts to impose tariffs and restrictions on US firms selling chips to China (e.g., requiring chips to be routed through the US) have reportedly backfired.
NVIDIA fears supply chain constraints for its upcoming Vera Rubin AI system, making Taiwan crucial.
Despite US efforts to diversify the chip supply chain, Taiwan still produces over 90% of the world's most advanced semiconductor chips.

Decoder

TSMC (Taiwan Semiconductor Manufacturing Company): The world's largest dedicated independent semiconductor foundry, a critical supplier for companies like NVIDIA.
Advanced Packaging Technology: Sophisticated techniques for connecting and arranging semiconductor chips on a substrate to improve performance, power efficiency, and form factor, often crucial for high-performance AI chips.
Vera Rubin: NVIDIA's announced next-generation AI system, following current architectures like Blackwell.

Original Article

In a splashy move that signals that Taiwan remains irreplaceable to the AI industry’s short-term and long-term goals, Nvidia CEO Jensen Huang announced Wednesday that his chip company will invest $150 billion a year to make sure Taiwan remains at the “epicenter” of the “AI revolution.”

“This is where the chips come, packaging comes, this is where the systems are made, this is where AI supercomputers were created,” Huang said. “The number of partners we work with here in Taiwan, incredible.”

As Reuters reported, the substantial investments will be used to create a new Taiwan headquarters for Nvidia, which Huang expects will drive so much AI innovation that the partnership will cement Taiwan as “the world’s tech manufacturing hub for a long time.” That ambitious project will be operational by 2030, Nvidia anticipates, after breaking ground this year.

“Four years ago, five years ago, Nvidia was spending about 10, 15 billion dollars a year in Taiwan,” Huang said at a ceremony celebrating the launch of the company’s new Taiwan base. “Now we’re spending 100, going to 150 billion dollars in Taiwan each year.”

Nvidia is currently the world’s most valuable company, making history in 2025 after becoming the first company to reach a $5 trillion market capitalization. And Huang bragged that the Taiwan base will make sure Nvidia is “worth even more in three ⁠to five years.”

But Huang has so far not explained how Nvidia’s plans in Taiwan may potentially conflict with Donald Trump’s push to make the US the world’s AI hub.

Nvidia did not immediately respond to Ars’ request to comment on this seeming tension.

Nvidia needs Taiwan HQ to meet demand

Last April, Nvidia started producing AI chips on US soil for the first time. The move seemed designed to appease Trump, who had been pressuring US firms to increase domestic manufacturing, a top priority of his AI Action Plan.

At that time, Huang said that “the engines of the world’s AI infrastructure are being built in the United States for the first time,” because “adding American manufacturing helps us better meet the incredible and growing demand for AI chips and supercomputers, strengthens our supply chain, and boosts our resiliency.”

Over the next four years, he projected that Nvidia could produce up to half a trillion dollars of AI infrastructure in the US—but it was hard to see how Nvidia could race to achieve that result when the company still relied on shipping chips to Taiwan for advanced packaging.

Now, Huang seems to be confronting that reality head-on, prioritizing more investments and deepening partnerships in Taiwan at a time when Huang claims that overwhelming demand for agentic AI is accelerating AI factory buildouts “at extraordinary speed,” The Guardian reported.

While the US investments will surely factor into Nvidia’s growth, it’s the Taiwan HQ that seemingly matters most.

Tech giants collectively plan to spend $750 billion on AI infrastructure this year, with “a significant portion” of that expected to “go towards chips for data centers,” the Guardian noted, and Nvidia needs a plan to keep up with that rapidly spiking demand. Then there’s also Nvidia’s new AI system, Vera Rubin, to consider, which Huang claimed would be a “generational leap” that’s going to be “kicking off the greatest infrastructure buildout in history.” Nvidia fears it will face supply chain constraints “throughout the entire life of Vera Rubin,” Huang said.

Perhaps to Huang, the Taiwan base looks like a lifeline for that and future systems.

Before Trump’s AI Action Plan rolled out, Nvidia had previously manufactured all its AI chips exclusively in Taiwan. So, the firm is well acquainted with the benefits of working in that ecosystem.

With its Taiwan HQ, Nvidia hopes to expand its partnership with the Taiwan Semiconductor Manufacturing Company (TSMC), while benefiting from close proximity to advanced packaging technology not yet available at TSMC’s US factories. And Nvidia can also “boost its alliances” with other nearby partners playing “key roles in the build-out of AI servers and infrastructure,” like Foxconn, Wistron, and Quanta Computer, Reuters reported.

For Nvidia, the focus appears to be on expanding the AI ecosystem to further its bottom line. Earlier this month, Huang told CNBC that Nvidia would be “aggressively” expanding its supply chain and suggested that the “first priority for its growing cash pile was supporting suppliers amid surging demand.”

Trump’s plans for Nvidia chips backfired

Trump has not yet commented on Nvidia’s plans in Taiwan, but the US president has repeatedly praised Huang as brilliant, while consulting with Huang on AI industry and tariff questions. Over the past year, their ties have grown, with Huang making commitments to perhaps avoid the consequences of Trump’s tariff regimes. Last year, Huang paid $1 million to attend a Mar-a-Lago dinner, then promised to invest $500 billion in US data centers. Shortly afterward, Trump halted plans for export controls blocking some of Nvidia’s chips from China’s market.

But Huang may be too smart to be all-in on Trump’s AI plans, perhaps increasingly recognizing that Trump’s export controls and tariffs aren’t working as planned to ensure US dominance in AI.

Directly impacting Nvidia, Trump’s plan to give the US a 25 percent cut of certain Nvidia chips sold to China seemingly backfired, since China has refused to purchase the chips. China’s refusal is reportedly not due to paying the fee, but due to a requirement that all chips subjected to the fee must be routed through the US. China seems worried that the US might tamper with chips sold in its markets, and Nvidia is pretty sure that Beijing won’t budge on buying its chips any time soon, so long as Trump’s policy remains in place.

For Huang, the goal remains to sell Nvidia chips in China’s market, which the company recently told investors it has “largely conceded” to Huawei. And about a month ahead of Trump’s meeting with China’s president, Xi Jinping, Huang told the US think tank the Special Competitive Studies Project that Trump’s export curbs blocking its chips from China have “already largely backfired.”

“Conceding an entire market the size of China probably don’t make a lot of strategic sense,” Huang said, whereas giving US chip companies access to China’s market where AI demand is spiking “makes a lot of sense.”

But Huang has to be careful navigating Trump, who likely still relies on Huang despite their perhaps disparate views on where the global AI hub should be. When Trump tapped Huang at the last minute to attend a summit with China’s president, Xi Jinping, in Beijing, Huang reportedly dropped everything to go, seemingly in the hopes that Trump would convince China to buy Nvidia chips.

However, experts agreed that Trump had little leverage at the summit, and it was later confirmed that US export curbs were not discussed. After the meeting, Trump confirmed that China had no plans to buy Nvidia’s chips because “they want to develop their own” and already have a chip that’s more advanced than Nvidia’s product, the H200.

Looming chip tariffs

Ultimately, the summit may have been a wasted trip for Huang, who might be tiring of Trump’s trade tactics, despite exemptions from tariffs that have seemingly benefited Nvidia.

So far, Trump has exempted semiconductors to be used in data centers from tariffs. But Nvidia likely knows that could change soon.

In July, official investigations into whether more tariffs are needed to protect national security will conclude. Among the most feared tariffs that could come, there’s a threat looming over the AI industry that Trump “may issue ‘significant’ additional tariffs” on semiconductors used in data centers in order “to encourage domestic manufacturing,” a supply chain management newsletter called Supply Chain Dive reported.

Currently, the US only fully manufactures about 10 percent of the chips it requires, a Trump proclamation read. That is “too low to meet projected national defense needs and to match the requirements of a growing commercial industry,” Trump said, ordering the probes to see if substantial tariffs might be needed to stop firms from relying so much on importing semiconductors.

Last week, US trade representative Jamieson Greer said that “the Trump administration continues to weigh US tariffs on imported semiconductors to boost domestic chip manufacturing, though there are no immediate plans to impose any new levies,” Bloomberg reported. However, “Greer stressed the importance of using import duties to bring chip production back to the US,” confirming that Trump’s goal is to “facilitate the reshoring” of the semiconductor supply chain.

Huang bets on Taiwan

For Nvidia, commitments to invest in the US may be enough to avoid future tariffs, Greer suggested. That makes it appear as if Huang has been successful at both influencing and staying ahead of Trump’s next moves.

But Trump seems unlikely to take kindly to Huang’s mission to ensure Taiwan maintains dominance in the semiconductor industry.

Trump has recently sent confusing signals on the US position on Taiwan, which he has irrationally accused of stealing the semiconductor industry from the US. Last October, Taiwan rejected Trump’s demands to move 50 percent of its chip production into the US or else lose US protection from a potential Chinese invasion. Although Trump recently approved the largest-ever weapons package to support Taiwan’s defense, he has said it’s up to Xi to decide if China will invade Taiwan or not, which experts warned expressed US indifference.

Although the US likely needs more time than Trump’s presidency to achieve the goals of the AI Action Plan, Trump seemingly thinks that pressuring Taiwan to shift its production could be a shortcut.

Whether Taiwan will ever bend to that pressure remains to be seen, as it has sought to strengthen its own communications with the Trump administration. Experts have suggested that explosive AI demand will, over time, diminish Taiwan’s lead, currently producing over 90 percent of the world’s most advanced semiconductor chips. Countries that have experienced global chip shortages have realized that it’s foolish to rely on one supplier, and it’s expected that the supply chain will diversify, as leading nations pioneering AI build up their own domestic manufacturing or seek to support allies doing the same.

Huang does not appear to expect Taiwan’s dominance to wane any time soon, though. He was born in Taiwan before emigrating to the US at the age of 9, and while he did not indicate exactly how long he intends to invest $150 billion a year into Taiwan projects, he did suggest that Nvidia’s future hinged on establishing a headquarters there, while seeming to take pride in Taiwan’s accomplishments.

“Taiwan is booming,” Huang said at the launch.

Data infrastructurekafkaperformance

Kafka Share Groups and Parallelizing Consumption — Tuning max.poll.records

Kafka Share Groups improve parallelism but require careful tuning of max.poll.records to prevent a few consumers from "greedy capturing" large batches and degrading throughput.

Jack Vanlightly

Summary

What: Jack Vanlightly's tests on Kafka 4.2.0/4.3.0 show that the default max.poll.records of 500 is often too high for Share Groups, leading to "greedy-capture" where a few consumers hog records. Optimal performance at 60K msg/s with 5ms processing required setting max.poll.records to 30 and adjusting consumer count.

Why it matters: This highlights that new Kafka features like Share Groups change core performance bottlenecks, requiring re-evaluation of long-standing configurations and potentially custom solutions to ensure fair resource distribution in distributed systems.

Takeaway: When using Kafka Share Groups, calculate group.share.partition.max.record.locks / consumers-per-partition and set max.poll.records to that value, then tune slightly lower for optimal throughput.

Deep Dive

Kafka Share Groups allow consumption parallelization beyond partition count, unlike traditional Consumer Groups.
The primary bottleneck with Share Groups shifts to the combination of max.record.locks (broker-side) and max.poll.records (consumer config).
The default max.poll.records of 500 can lead to "greedy-capture," where a few consumers receive large batches, exhausting the inflight record budget for a partition.
This causes other consumers to sit idle, reducing overall throughput, even if the theoretical maximum is much higher.
An "accidental fair-sharing" regime can occur at low loads, masking suboptimal max.poll.records settings, but it collapses under higher load or consumer restarts.
The article demonstrates how a system with 300 consumers peaking at 60K msg/s initially dropped to 4800 msg/s with default max.poll.records=500.
Tuning max.poll.records to 30 and increasing consumers to 312 restored the 60K msg/s throughput with stable end-to-end latency.
A rule of thumb for max.poll.records is group.share.partition.max.record.locks / consumers-per-partition, then tune slightly lower.
Increasing group.share.partition.max.record.locks (up to 10000) can also help by providing a larger inflight budget.
The author suggests Kafka could benefit from an alternative broker-side fair-sharing enforcement mechanism in future versions.
Dimster, a benchmarking tool, was used for live interaction to mutate workloads and observe config changes on the fly.

Decoder

Kafka Share Groups: A new Kafka consumer group type that allows multiple consumers within a group to process records from the same partition in parallel, providing finer-grained control over workload distribution compared to traditional Consumer Groups.
max.poll.records: A consumer configuration that determines the maximum number of records returned in a single call to poll().
group.share.partition.max.record.locks: A broker-side configuration that limits the total number of records that can be "in-flight" (locked by consumers for processing) simultaneously from a single partition within a Share Group.
Greedy-capture regime: A state where a few consumers receive very large batches of records, occupying a significant portion of a partition's inflight record budget, while other consumers remain idle, leading to reduced overall throughput.
Accidental fair-sharing regime: A temporary state at low producer rates where the broker doesn't have enough records to fill large max.poll.records batches for every consumer, unintentionally distributing records more evenly.
Dimster: A benchmarking and load testing tool for Kafka that allows real-time modification of workloads and configurations.

Original Article

Full article content is not available for inline reading.

Read the original article →

Data databaseaivector-databasedistributed-systems

How CockroachDB Built Vector Indexing at Scale

CockroachDB developed C-SPANN, a novel distributed vector indexing system that treats index data as regular table data, overcoming scalability limitations of existing algorithms like HNSW and IVF.

ByteByteGo

Summary

What: CockroachDB’s engineering team built C-SPANN, a hierarchical K-means tree vector index stored as key-value rows, for real-time, sharding-compatible, distributed vector search. This custom solution addresses architectural requirements like no central coordinator, persistent storage, minimal network hops, and incremental updates, which popular methods like HNSW failed to meet.

Why it matters: This demonstrates the challenges of integrating advanced AI capabilities like vector search into existing, highly distributed, transactional database architectures, often requiring custom engineering rather than off-the-shelf solutions to maintain core guarantees.

Takeaway: If evaluating vector databases for a distributed transactional system with strict consistency and real-time update needs, consider systems that integrate vector indexing natively into their distributed storage model.

Deep Dive

CockroachDB needed vector search but existing algorithms (HNSW, IVF) didn't fit its distributed SQL architecture.
Key architectural requirements for the vector index included: no central coordinator, no large in-memory caches, minimal network hops, sharding compatibility, hot spot avoidance, and real-time incremental updates.
HNSW was ruled out due to its in-memory graph structure and resistance to sharding; IVF struggled with single-node assumptions and dynamic updates.
CockroachDB developed C-SPANN, combining ideas from Microsoft's SPANN/SPFresh and Google's ScaNN, adapted for its distributed environment.
C-SPANN uses a hierarchical K-means tree where vectors are grouped into partitions with centroids.
The critical design choice is storing each partition as self-contained key-value rows within CockroachDB, treating the index as ordinary table data.
This allows C-SPANN to leverage CockroachDB's existing distributed machinery for splitting, rebalancing, caching, and replication automatically.
Index maintenance, like splitting overly large partitions and merging small ones, happens incrementally in the background using "nearest partition assignment" for accuracy.
Vectors are compressed using RaBitQ (reducing 1536-dimension 2-byte floats to ~200 bytes per vector) to save storage and improve scan speed.
Quantization loss is compensated by a reranking step: searching compressed vectors for candidates, then fetching full-precision vectors for exact distance calculation.
Multi-tenancy is handled via prefix columns (e.g., user_id or geographical regions) on the vector index, creating separate K-means trees per tenant for performance and security.
C-SPANN offers real-time transactional freshness, native scaling, and vectors coexisting with transactional data.
Current limitations (as of the 25.2 preview) include Euclidean distance only, limited filtering on non-prefix columns, and not winning on pure latency benchmarks against specialized in-memory systems.

Decoder

Vector embeddings: Numeric representations of objects (like images, text, audio) where similar objects have similar vectors, enabling semantic search.
Approximate Nearest Neighbor (ANN) search: Algorithms that find vectors "close" to a query vector, sacrificing some accuracy for significantly faster search performance on large datasets.
HNSW (Hierarchical Navigable Small World): A popular graph-based ANN algorithm known for high accuracy, often used in vector databases, but typically relies on large in-memory structures and is difficult to shard.
IVF (Inverted File Index): An ANN algorithm that clusters vectors into a predefined number of centroids; queries search only a subset of clusters, faster than brute force but can struggle with dynamic updates in distributed settings.
C-SPANN: CockroachDB's custom-built distributed vector indexing system, based on a hierarchical K-means tree structure.
K-means tree: A hierarchical clustering structure where vectors are grouped into partitions with representative centroids, and those centroids are further grouped, forming a tree.
Centroid: The mean position of a cluster of vectors, representing its "center of mass."
Quantization: A technique to reduce the precision (and thus size) of vector dimensions, trading some accuracy for storage and performance gains.
RaBitQ: A specific quantization technique used by C-SPANN that reduces each vector dimension to a single bit.

Original Article

Full article content is not available for inline reading.

Read the original article →

Data clouddatabaseperformancedevops

I Inherited a $140K Snowflake Bill — Three Months Later It Was $38K. Here's Everything I Learned

A developer reduced a $140K Snowflake bill to $38K in three months by aggressively right-sizing warehouses, optimizing storage retention, and redesigning queries to leverage physical data layout and incremental pipelines.

Level Up Coding

Summary

What: The author successfully cut Snowflake costs by 73% from $140K to $38K monthly. Key strategies included right-sizing virtual warehouses, setting aggressive auto-suspend times (60 seconds), reducing storage by optimizing Time Travel and Fail-safe retention, and critical query optimizations like clustering data by common predicates and preferring incremental data pipelines.

Why it matters: This case study illustrates that while cloud data warehouses offer immense scalability, their cost efficiency heavily depends on understanding the underlying billing model and applying disciplined data engineering practices, especially around compute resource management and physical data organization.

Takeaway: Review your Snowflake (or similar cloud data warehouse) virtual warehouse sizes, auto-suspend settings, and data retention policies; audit top queries for SELECT *, function-wrapped predicates, and opportunities for clustering or incremental processing.

Deep Dive

Snowflake costs are driven by storage, compute (virtual warehouses), and cloud services.
The biggest initial savings came from optimizing compute:
Right-sizing virtual warehouses: Reduced from X-Large to Large for most, with a Small for simple reporting.
Aggressive auto-suspend: Set to 60 seconds for all warehouses, ensuring they shut down quickly when idle.
Multi-cluster warehouses: Enabled for high-concurrency workloads to scale query processing independently.
Storage costs were reduced by:
Optimizing Time Travel: Reduced default retention from 90 to 1 day for most tables (or 0 for temporary).
Fail-safe retention: Default 7 days, but can be disabled for transient data.
Reducing staging data: Cleaned up unnecessary data in staging tables.
Cloud services costs were often a symptom of inefficient compute and storage, reducing as other areas improved.
Query optimization techniques for performance and cost:
Clustering: Used only when predicates heavily filtered on clustered columns; avoided for tables not frequently filtered this way.
Avoid SELECT *: Explicitly select columns to reduce I/O.
Avoid function-wrapped filters: WHERE DATE(column) = '2023-01-01' prevents index/micro-partition pruning. Use WHERE column >= '2023-01-01' AND column < '2023-01-02'.
Incremental pipelines: Process only new or changed data instead of full reloads.
Pre-aggregation: Aggregate data before complex joins to reduce data volume.
Union all vs. OR: UNION ALL can be more efficient than OR clauses across different datasets.
Zero-copy cloning: Useful for testing without duplicating storage.
Materialized Views: Can pre-compute frequently queried results for faster access.
The author used ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY and QUERY_HISTORY to identify cost drivers.

Decoder

Virtual warehouse: Snowflake's compute clusters, which execute queries. Their size (Small, Medium, Large, etc.) determines processing power and cost.
Auto-suspend: A setting for Snowflake virtual warehouses that automatically suspends the warehouse after a specified period of inactivity, stopping compute billing.
Time Travel: A Snowflake feature that allows accessing historical data (e.g., for querying past states of a table or recovering dropped tables) for a configurable duration.
Fail-safe: A non-configurable 7-day period in Snowflake after Time Travel expires, providing additional data protection, but still incurs storage costs.
Micro-partitions: Snowflake's fundamental unit of storage, automatically created and optimized for columnar storage, compression, and querying.
Clustering: A Snowflake feature (Enterprise edition+) that physically co-locates data with similar values in specified columns within micro-partitions to improve query performance on those columns.
Zero-copy cloning: A Snowflake feature that allows creating copies of databases, schemas, or tables instantly without consuming additional storage until data is modified in the clone.

Original Article

Snowflake cost and performance hinge on three separable layers: storage, compute, and cloud services, with the biggest savings coming from right-sizing warehouses, aggressive auto-suspend, and reducing storage bloat from retention settings. The strongest optimization levers are physical data layout and query design: use clustering only when predicates match, avoid SELECT *, function-wrapped filters, and full reloads, and prefer incremental pipelines and pre-aggregation before joins.

Data aiagentsdatabasegraph

RushDB 2.0: Memory Infrastructure for the Agentic Era

RushDB 2.0 unifies graph storage, native semantic search, and an Ontology API into a single agent memory infrastructure, eliminating the need for separate vector stores.

RushDB Blog

Summary

What: RushDB 2.0 introduces native vector support for semantic search, an Ontology API for authoritative schema discovery by agents, MCP access with OAuth 2.0, pre-written "Skills" for common agent tasks, analytical queries with select + groupBy, and the option to "Bring Your Own Neo4j" instance.

Why it matters: This release indicates a maturation in the infrastructure for AI agents, moving towards integrated, schema-aware memory systems that simplify agent development and reduce hallucinations by providing structured context.

Takeaway: Developers building AI agents might investigate RushDB 2.0 for a unified memory solution that promises to reduce complexity compared to stitching together multiple data stores.

Deep Dive

RushDB 2.0 integrates native semantic search directly into its graph database, managing embedding indexes internally and supporting custom embedding models.
The new Ontology API provides agents with an authoritative schema, including labels, types, value ranges, and relationship maps, to prevent hallucinations.
An MCP (Multi-Client Protocol) server with OAuth 2.0 is included, exposing over 35 RushDB tools for discovery, read, write, AI, and analytics operations to agents.
"Skills" are pre-written knowledge files for AI coding agents, covering topics like agent memory patterns, query building, data modeling, and faceted search.
Analytical queries now support select + groupBy with functions like sum, avg, min, max, collect, and timeBucket, directly within the query layer.
Users can connect their own Neo4j instances to RushDB, allowing data to remain within their infrastructure using the LMPG model.
A new "Knowledge Unit" (KU) pricing model is introduced, reflecting data creation and querying, with a more generous free tier.

Decoder

MCP (Multi-Client Protocol): A standardized protocol for AI agents to interact with tools and services.
LMPG (Labeled Property Graph): An extension of the property graph model where nodes and relationships can have explicit type labels, improving schema clarity and query performance.
Ontology API: An API that provides an authoritative, machine-readable schema of the data, helping AI agents understand available labels, types, and relationships.

Original Article

Full article content is not available for inline reading.

Read the original article →

Data aiinfrastructuredatabasecaching

MurrDB (GitHub Repo)

MurrDB is a fast NVMe/S3-backed serving cache optimized for batch reads/writes of large tabular data for ML/AI inference, offering a cheaper, lower-latency alternative to Redis.

GitHub

Summary

What: MurrDB is a RocksDB-based cache designed for ML/AI inference workloads, providing tiered storage (memory, NVMe, S3), native batch operations over columnar data, zero-copy wire protocol for Python (Numpy/Pandas/Polars/Pytorch), and stateless persistence on S3. Benchmarks show it's significantly faster and uses less RAM than Redis for typical ML ranking use cases.

Why it matters: This project addresses a common pain point in ML/AI serving: efficiently retrieving large, tabular feature data without keeping everything in expensive RAM. It indicates a move towards specialized data infrastructure tailored for the unique demands of AI inference.

Takeaway: If you are dealing with large tabular data for AI inference and finding Redis too slow or expensive for batch operations, consider evaluating MurrDB as a specialized serving cache.

Deep Dive

MurrDB operates as a caching layer between batch data pipelines and inference applications, with data tiered across memory, NVMe, and S3.
It is designed for batch-in, batch-out operations over columnar storage, allowing direct ingestion of Parquet/Arrow files.
The zero-copy wire protocol allows direct conversion to Numpy, Pandas, Polars, or Pytorch arrays in Python without additional parsing overhead.
MurrDB is stateless, persisting all data to S3, which allows it to self-bootstrap and avoid data loss on node eviction.
It is specifically not a general-purpose database, OLTP system, analytics database, or general-purpose cache.
Benchmarks indicate MurrDB is approximately 3x faster than Redis on packed-blob reads and 12x faster on HSET layouts for ML ranking, while using about 3x less RAM.
The project is still in early days but is actively developing features like Arrow Flight/gRPC APIs, more data types, and Apache Iceberg support.

Decoder

NVMe (Non-Volatile Memory Express): A specification for accessing non-volatile storage media attached via a PCI Express (PCIe) bus, offering much higher throughput and lower latency than traditional SATA SSDs.
RocksDB: An open-source, embeddable persistent key-value store optimized for fast storage and retrieval on modern hardware, developed by Facebook.
Zero-copy wire protocol: A communication protocol designed to transmit data without requiring the CPU to copy it from one memory location to another, reducing latency and CPU overhead.
Columnar storage: A database storage method that stores data tables by column rather than by row, optimized for analytical queries where aggregates are calculated over many rows for a subset of columns.
Parquet/Arrow files: Optimized columnar data formats used for efficient data interchange and storage in big data ecosystems.

Original Article

Full article content is not available for inline reading.

Read the original article →

Tech hardwareairoboticsmanufacturing

Tesla's dedicated Optimus factory construction officially underway at Giga Texas

Tesla has begun construction on a dedicated Optimus robot factory at Giga Texas, aiming to produce 27,000 robots daily by Summer 2027, projecting Optimus to become Tesla's most valuable product.

Teslarati

Summary

What: Tesla started building a new 5.2 million square foot Optimus factory at Gigafactory Texas, with the first steel structure erected on May 27, 2026. Initial Optimus production will start in July or August 2026 in Fremont, with high-volume output of 10 million units annually (about 27,000 daily) expected from the Texas facility by Summer 2027.

Why it matters: This move signifies Tesla's aggressive shift beyond electric vehicles into humanoid robotics, positioning AI-powered machines as a core, potentially dominant, revenue stream and major contributor to its valuation, reflecting Elon Musk's long-stated ambition for Optimus.

Decoder

Optimus: Tesla's humanoid robot project, designed to perform general-purpose tasks.

Original Article

Tesla’s dedicated factory for building up to ten million Optimus units is officially under construction at Gigafactory Texas.

Drone footage released on May 27 by Giga Texas observer Joe Tegtmeyer captures the significant milestone of the first steel structure officially standing at Tesla’s new Optimus factory on the North Campus of the facility.

Phase two of land reclamation is advancing steadily, and the progress will let the new building extend nearly the full length of the main Giga Texas factory, potentially exceeding 4,000 feet, while measuring somewhere between 50 and 70 meters narrower. Extensive foundation work is proceeding as well.

Big news at the new Optimus 10m/y factory construction site today! The 1st steel structure has been erected & as expected the second phase of land reclamation is underway.

This will allow this new factory to grow to nearly the same length as the main Giga Texas factory,… pic.twitter.com/FidRLV6XpU

— Joe Tegtmeyer 🚀 🤠🛸😎 (@JoeTegtmeyer) May 27, 2026

This facility forms a central element of Tesla’s broader North Campus expansion at Giga Texas. The project will add more than 5.2 million square feet of new industrial space. It sits alongside other advanced developments, including a Terafab for next-gen AI chips. The scale reflects Tesla’s commitment to transforming humanoid robotics into a core pillar of the company’s future.

Musk has said that Optimus will be the biggest product in the world on several occasions. He believes it will be Tesla’s biggest valuation contributor.

Tesla prepares to expand Giga Texas with new Optimus production plant

Tesla plans to build about 10 million robots at the site annually once it is completed, which would be about 27,000 units each day.

The Optimus plant at Giga Texas is part of Tesla’s phased strategy for Optimus manufacturing. In an effort to start production of the robot well before the Giga Texas plant is complete, Tesla ended production of the Model S and Model X vehicles, which were built in Fremont, California, to make way for initial Optimus manufacturing efforts.

Production there will start in either July or August of this year, and early units will support internal factory tasks while the team gathers real-world data to refine processes. The Gigafactory Texas facility will house a second-gen production line. It targets high-volume output starting in Summer 2027.

Musk has repeatedly described Optimus as potentially more valuable than Tesla’s entire vehicle business. Current versions are already completing minor tasks around various facilities, while Tesla continues to refine its abilities and add new features.

Tesla’s total investment could reach several billion dollars. Significant challenges lie ahead, including the creation of an entirely new manufacturing ecosystem, the refinement of AI systems for dependable autonomy, and the development of reliable supply chains for actuators, sensors, and other components.

Nevertheless, the visible progress at Giga Texas highlights Tesla’s capacity to translate ambitious concepts into physical reality.

Tesla’s Optimus factory stands as much more than a simple expansion project, as it is quite literally the second phase of what could potentially be the biggest product ever. With construction beginning, 2027 is poised to become a transformative year for Tesla, as it evolves even further from an electric vehicle leader into a pioneer of intelligent, general-purpose machines.

SpaceX to become America’s Military data backbone for missiles, drones, and warfighters

The Space Force just handed SpaceX $2.29 billion to build the military’s space internet backbone.

The U.S. Space Force awarded SpaceX a $2.29 billion contract on May 26, 2026 to build the backbone of its Space Data Network, a satellite-based communications system designed to keep American military forces connected anywhere on Earth in real time. The contract is firm-fixed-price and requires SpaceX to deliver a fully operational prototype by the end of 2027.

In plain terms, the SDN Backbone is the plumbing behind the military’s space-based internet. It functions as a low Earth orbit satellite constellation providing robust, high-capacity, and low-latency data transport for the Joint Force, connecting sensors and weapons systems continuously, globally, and securely. Think of it as a private, hardened version of Starlink built specifically for battlefield communications, one that soldiers, ships, and aircraft can rely on even in contested environments where ground-based networks have been disrupted.

SpaceX is quietly becoming the U.S. Military’s only reliable rocket

The Space Force was direct about why SpaceX was selected. “The SDN Backbone leverages the best of commercial innovation and delivers a strong foundation for the SDN mission set — a huge benefit and enabler for our warfighters,” said USSF Col. Ryan Frazier.

“We aren’t trading speed for scale; we are demanding both. By using rapid prototyping and Other Transaction Authorities, we are ensuring our advanced solutions are integrated and delivered to the warfighter as fast as possible,” added USSF Lt. Col. Fry, SDN Backbone system program manager.

The SDN Backbone will work alongside the Space Development Agency’s Transport Layer, with the two systems forming a unified open architecture to provide critical data transport for current and future Department of War missions.

As Teslarati has reported, this is not SpaceX’s first Space Force contract of 2026. In April, the Space Force awarded SpaceX $178.5 million to launch missile tracking satellites, and SpaceX is already embedded in the Golden Dome missile defense software group. The $2.29 billion SDN Backbone award puts SpaceX at the center of how the American military communicates in space, a position with direct implications for its reported $1.75 trillion IPO valuation as the company heads toward a public offering as early as June 2026.

Tesla teases going Plaid Mode with the Model 3

Tesla Vice President of Vehicle Engineering, Lars Moravy, recently revealed the company has thought about introducing a Plaid powertrain on the Model 3, but there could be some challenges involved.

On the Ride the Lightning podcast, Moravy revealed that he thinks about a Plaid Model 3 “all the time,” and it certainly has a place in Tesla’s potential lineup of future vehicles.

Now that the Plaid powertrain is technically defunct due to the newfound absence of the Model S and Model X, Tesla could find a way to reintroduce the lightning-quick trim level to its mass-market vehicles.

But there are going to be some challenges with it. Moravy said that the Model 3 Plaid would likely adopt the carbon-sleeved motors that the Model S Plaid had. However, packaging would be a major challenge, as Moravy said on the podcast, it would be a “tight engineering squeeze.”

It’s important to note that there are no active production plans for the Model 3 Plaid at this point, but it’s also worth noting that with the Model S and Model X Plaid no longer available, Tesla would likely be willing to introduce something that is even more white-knuckle than the Model 3 Performance, which already boasts a 2.9-second 0-60 MPH acceleration rate and a top speed of 163 MPH.

Of course, there is the Roadster, but we don’t know when that will exactly make it to market, and we know that, for sure, it will not be accessible to many.

Tesla unveils juicy new detail on the Roadster and hints at new unveil timeline

Tesla has prided itself in building some of the best cars out there, but they’re also interested in building cars that are simply fun to be in.

A Plaid Model 3 could truly push the limits and could end up being one of the best cars Tesla will ever build, especially if it can shave off at least half of a second from its 0-60 MPH time and increase its top speed slightly.

More than anything, the real changes will be in the ride and aerodynamics. Tesla improving things like the suspension, handling, and downforce will be the true trademarks of its Plaid powertrain; putting it in the Model 3 could be a great move for the company and for customers interested in high-end performance.

NASA’s first human outpost on the Moon starts now – SpaceX on deck

NASA named the rovers, landers, and vendors that will build America’s first Moon Base.

NASA has laid out its most detailed Moon Base plan to date, describing a permanent outpost near the Moon’s south pole that the agency intends to build over the coming decade as a direct stepping stone to Mars. “The Moon Base will be America’s and humanity’s first outpost on another celestial world,” NASA Administrator Jared Isaacman said, adding that every mission crewed and uncrewed “will be a learning opportunity as we return to the lunar surface, build the infrastructure to stay, and master the skills required to live and operate in one of the most demanding and dangerous environments imaginable.”

The plan is structured in three phases involving both uncrewed and crewed missions to deliver equipment, vehicles, and infrastructure to the surface, with the first three moon base missions targeted to launch before the end of 2026.

Moon Base I, targeting fall 2026, will use Blue Origin’s Blue Moon Mark 1 lander to deliver scientific instruments to the Shackleton Connecting Ridge, the same region where Artemis astronauts will land. Moon Base II will send Astrobotic’s Griffin lander carrying more than 1,100 pounds of cargo including Astrolab’s FLIP rover to begin developing mobility systems on the surface. Moon Base III will carry the Lunar Vertex science mission on Intuitive Machines’ Nova-C Trinity lander to study lunar swirls near the south pole, with ESA and Korean science payloads aboard.

Elon Musk pivots SpaceX plans to Moon base before Mars

On the rover side, NASA awarded Astrolab $219 million and Lunar Outpost $220 million to build the first phase of Lunar Terrain Vehicles, with both rovers targeted for deployment to the lunar surface by 2028. Astrolab’s crewed rover weighs roughly 2,000 pounds and can reach over 6 mph. Lunar Outpost’s Pegasus rover can operate autonomously or via remote control at over 9 mph. Blue Origin separately received $188 million with an option worth $280.4 million to deliver cargo landers for rover transport.

NASA also confirmed that MoonFall, a mission deploying four survey drones to scout Artemis landing sites, has selected Firefly Aerospace to build the transport spacecraft, with a 2028 launch target.

SpaceX sits at the center of that commercial layer. SpaceX holds the NASA Human Landing System contract for the Starship-derived lander that will put astronauts on the surface under Artemis IV, currently targeting 2028. Before that can happen, SpaceX must demonstrate in-orbit propellant transfer at scale, a process requiring multiple Starship tanker launches to fuel a single mission. Water ice at the lunar south pole is central to the base’s long-term viability, as it can be converted into drinking water, breathable oxygen, and rocket fuel, directly reducing dependence on Earth resupply. That resource loop becomes far more practical if Starship can land and be refueled on or near the Moon itself.

Elon Musk has publicly stated that Starship V3, which recently completed its first flight, should be capable enough for initial Mars missions. The Moon Base plan announced Tuesday is the infrastructure layer that connects everything between those two ambitions, and SpaceX is the only American company currently contracted to build the rocket that gets humans to either destination.

Latest
Popular
Videos

SpaceX to become America’s Military data backbone for missiles, drones, and warfighters

Tesla’s dedicated Optimus factory construction officially underway at Giga Texas

Tesla teases going Plaid Mode with the Model 3

Tesla patent reveals strategy for solving major Full Self-Driving, Optimus issue

Tesla Model Y becomes first-ever car to reach legendary milestone

SpaceX just forced Verizon, AT&T and T-Mobile to team up for the first time in history

Tesla FSD in Europe vs. US: It’s not what you think

Tesla FSD mocks BMW human driver: Saves pedestrian from near miss

SpaceX completes second catch of lower stage, but loses Starship

How to give your Tesla a Custom Lovk Sound! Easy tutorial!! #tesla #teslatok #teslalocksound
♬ Calm LoFi song(882353) - S_R

Tech securitymobileios

Apple Developing iPhone Anti-Snatching Feature That Locks Stolen Phones Instantly

Apple is developing a new iPhone feature that automatically locks a device if it detects a snatch-and-grab theft using sensors and a paired Apple Watch, similar to an existing Android capability.

MacRumors

Summary

What: Apple is working on an anti-snatching feature for iPhones, detected in Apple code by 9to5Mac. It will use the gyroscope, accelerometer, and other sensors, along with an Apple Watch connection, to instantly lock the iPhone and activate Stolen Device Protection if a theft is detected. The release date is unknown, but Android already offers a similar "Theft Detection Lock."

Why it matters: This feature aims to enhance iPhone security against opportunistic thieves who may observe passcodes before snatching devices, addressing a specific type of physical theft that existing security measures, like Stolen Device Protection, might not fully prevent when a passcode is known.

Decoder

Stolen Device Protection: An existing iOS security feature that adds biometric authentication requirements and time delays for sensitive actions (like changing Apple ID password) when the iPhone is away from familiar locations.

Original Article

Apple Developing iPhone Anti-Snatching Feature That Locks Stolen Phones Instantly

Apple is developing a new feature that will lock your iPhone if it's snatched from your hand by a thief, according to Apple code seen by 9to5Mac. The option will use the gyroscope, accelerometer, and other sensors to determine when an iPhone has been grabbed. It'll also rely on a paired Apple Watch to detect when the iPhone has suddenly moved away from the owner's wrist.

Once the iPhone is yanked from your hand, it will lock and activate Stolen Device Protection to prevent thieves from accessing information on it.

Stolen Device Protection adds extra security to your iPhone when you're away from familiar locations like home or work. It requires biometric authentication for actions like accessing stored passwords or credit cards, and there are built-in hour-long delays for actions like changing an Apple Account password.

The feature was originally designed to protect iPhone users from stealthy thieves who observe someone's passcode and then snatch an iPhone. With a passcode, thieves could get into apps and access bank account data and other sensitive information, but Stolen Device Protection prevents that from happening.

Android already has a Theft Detection Lock feature that locks a smartphone in a snatch-and-grab theft situation.

There is no word on when the new feature might be added to the iPhone.

Tech aillmagentsdevopstools

Beyond the Prompt: Claude Code

Arpan Patel argues Claude Code should be treated as an autonomous agent requiring careful configuration of `.claude` directories, skills, and subagents, rather than a simple chatbot, to maximize its effectiveness and prevent mistakes.

Arpan Patel (github.io)

Summary

What: Arpan Patel details how to effectively use Claude Code, emphasizing treating it as a programmable agent instead of a chatbot. Key practices include using "plan mode," referencing files directly (e.g., `@src/auth/login.py`), delegating tasks, and, crucially, making Claude "update CLAUDE.md" with rules after its mistakes. The `.claude` directory provides a layered configuration system for project-specific and global instructions, skills, agents, and rules, with `.claude/skills` being the preferred method for reusable commands.

Why it matters: This guide illustrates a shift in how developers interact with advanced AI coding assistants, moving from simple prompting to sophisticated configuration and meta-programming, where the AI is trained to learn from its own failures and adapt its behavior to specific project conventions and workflows. It highlights the emerging paradigm of "AI operations" (AIOps) for developer tools.

Takeaway: For Claude Code users, immediately start updating your `CLAUDE.md` and `CLAUDE.local.md` with specific project rules and personal feedback to improve the model's accuracy and adherence to best practices. Explore creating custom skills and subagents for repetitive tasks.

Deep Dive

Claude Code should be treated as an autonomous agent with guardrails, not a chatbot or fancier autocomplete.* The core principle is to give Claude a way to verify its own work, shifting from user feedback to Claude's internal iteration.* Recommended workflow includes "explore, then plan, then code," using Shift+Tab twice to enter read-only plan mode for larger changes.* Referencing files directly (e.g., @src/auth/login.py) and piping errors (cat error.log | claude) provides exact context.* Users should delegate tasks to Claude, providing crisp briefs, rather than pair-programming line-by-line.* Crucially, when Claude makes a mistake, the user should instruct it to "Update CLAUDE.md so you don't repeat this," which compounds its learning.* The .claude directory acts as a layered configuration system, with project-scoped files (committed to repo) and global-scoped files (local to machine).* CLAUDE.md provides instructions for every session; Boris Cherny advocates for keeping it short, focused on guardrails, and letting Claude write its own rules.* CLAUDE.local.md is for personal, gitignored notes and feedback, helping correct individual quirks and recurring PR comments.* "Skills" (folders under .claude/skills/) are reusable units of expertise, preferred over single-file commands for their ability to include supporting files and inline shell commands (!git diff HEAD).* Skills support progressive disclosure, where only frontmatter descriptions are read initially, pulling full instructions only when the skill fires.

Decoder

Claude Code: An AI-powered coding assistant developed by Anthropic, designed to help developers with various programming tasks.
SKILL.md: A markdown file used within Claude Code's skill system to define a reusable command or set of instructions, often including frontmatter for metadata and inline shell commands.
Frontmatter: Metadata included at the top of a file, typically in YAML format, providing structured information about the content (e.g., skill description, name, allowed tools).

Original Article

Full article content is not available for inline reading.

Read the original article →

Tech startuphardwareaiautomotive

Self-driving, Tesla, and the influence of brand

Tesla's Full Self-Driving (FSD) is inaccurately perceived as market-leading due to strong brand influence, despite being a Level 2 system with significant competition and a declining brand image, especially in Europe due to the "Musk Effect."

Ian Betteridge

Summary

What: The article critiques the perception that Tesla's Full Self-Driving (FSD) is unique, clarifying it's a Level 2 driver-assist system like Ford BlueCruise and GM Super Cruise, requiring constant driver supervision. While FSD offers broad coverage, competitors like GM Super Cruise (750,000 miles of LiDAR-mapped roads) often provide a better experience within their mapped areas. Tesla's brand has become politically charged and less "cool" since 2022, leading to significant drops in recommendation scores (e.g., 8.2 to 4.0 in the US) and European registrations (e.g., 49% drop in Jan-Feb 2025).

Why it matters: This highlights how powerful brand perception can override factual product capabilities, especially in consumer markets like automotive. It also underscores the increasing impact of a CEO's public persona on brand value and sales, showing that even innovative tech companies are not immune to political and cultural shifts.

Decoder

Level 2 Autonomous Driving: Systems where the car can control steering, acceleration, and braking in certain conditions, but the human driver must remain fully engaged and supervise at all times. Examples include Tesla FSD, Ford BlueCruise, and GM Super Cruise.
Level 3 Autonomous Driving: Systems where the car can handle all aspects of driving in specific environments and conditions, and the driver can legally take their eyes off the road. The system is legally responsible for driving in these scenarios, though the driver must be ready to intervene when prompted. Mercedes Drive Pilot was the first certified Level 3 in the US.
LiDAR: Light Detection and Ranging, a remote sensing method that uses pulsed laser to measure distances, often used in autonomous vehicles for highly accurate environmental mapping.

Original Article

I've been a Stratechery subscriber since it started and one thing I have noticed about Ben is that he occasionally gets a little overexcited about stuff that is probably not totally close to his wheelhouse. This is one of those times.

In his opening about Tesla's brand halo, he notes:

I know plenty of very rich people who drive a Tesla not for the finishes but rather the Full Self-Driving (Supervised); there is nothing like it on the market, at least when it comes to cars you can own.

(Emboldening is mine).

The problem is, Ben is simply wrong to say "there's nothing like it on the market" and it's a perfect illustration of the impact of both Tesla's brand halo, and why brand matters a lot in cars.

Where Ben is sort-of correct about FSD (Supervised) is that it does have a genuine and distinctive strength: breadth of coverage. Tesla doesn't limit FSD to a pre-mapped road network. If the cameras see usable lane lines and the system is confident enough, it will typically engage. FSD extends operation to surface streets, neighbourhood roads, and complex intersections, with over-the-air updates that can change behaviour overnight. No other mass-market consumer system matches that scope.

However... Tesla FSD (Supervised) remains a Level 2 technology. Systems such as Tesla FSD, Ford BlueCruise, GM Super Cruise, and BMW Highway Assistant all fall into this same category. The car can steer, accelerate, and brake in certain conditions, but the driver remains responsible and must supervise at all times.

And there is real competition within the Level 2 space. GM Super Cruise is available across more than 20 Chevrolet, Cadillac, GMC, and Buick models and is highly regarded for highway use. Super Cruise covers roughly 750,000 miles of LiDAR-mapped roads in the US and Canada. Its tradeoff is consistency within mapped territory versus Tesla's broader but less predictable coverage. But if you're in a covered area, the experience is likely to be much better than Tesla's.

Mercedes Drive Pilot was the first Level 3 system certified for consumer use in the US, which is technically more autonomous than FSD — at Level 3, the system, not the driver, is legally responsible for driving, and you can (legally) take your eyes off the road. However, the Drive Pilot Level 3 system would only work at low speeds on a few stretches of highway, making it look limited compared to Tesla FSD's ability to drive in almost all conditions including city traffic. For what it's worth, Mercedes has since abandoned Drive Pilot and is moving to a Level 2 approach, partly due to low adoption. Mercedes found that not many people were willing to pay that much for a feature that could only be engaged under strict conditions.

This, by the way, is a problem that Tesla will also face when moving from Level 2 to Level 3. They will have to charge a heck of a lot more for Level 3, and it's not clear that most/anyone will pay for it.

The claim is particularly shaky if you look outside the US. Car markets like BYD and companies like Wayve are increasingly offering high-quality self-driving capabilities for consumer vehicles in a similar vein to Tesla FSD. Chinese manufacturers are moving fast in this space. I have Chinese friends whose car drives them to work every day, supposedly with their supervision, but actually while they read emails. It's that reliable, at least in big cities.

But all of this sort-of underlines the point that Ben is ultimately making: Tesla's brand halo is such that people believe it's a leader in self-driving, even when it's actually not. As with most decisions connected to cars (something I learned working on car review brands) people only make semi-rational decisions even when they think they are being entirely rational. You like certain brands, and you look for rational reasons for that choice even when that ignores involving areas where competition is at the same level or better. Brand loyalty mean an incredible amount in the car business.

The brand is a blessing and a curse

All of which is why I think Tesla is actually in big trouble.

Like a lot of new companies, the brand perception has moved through phases. First, maybe from 2008-2012, it was the scrappy innovator, a Silicon Valley startup that wanted to prove EVs weren't slow or dull. Elon Musk was seen as a combination of Eddison and Steve Jobs, a genius who was going to have statues raised to him in the future. Then it was a luxury disruptor, thanks to the Model S. The Model Y then brought the brand within reach of a new audience, and it became a mass market brand. That takes us up to about 2022

Since then, though, cracks have appeared. First, as legacy brands launched credible EVs, Tesla's technological lead began to narrow, and its relatively spartan interiors started to feel like a liability rather than a design statement. It had quality issues which were addressed slowly.

Consumer perception metrics were particularly troubling. The company's recommendation score in the US reached a new low of 4.0 out of 10, down from 8.2 in 2023. Scores for reputation, trust, and "coolness" declined particularly sharply in Europe and Canada.

And of course there is the most corrosive thing of all: The Musk Effect. Musk's political activities — from running DOGE in the Trump White House to endorsing far-right figures like Germany's AfD and Tommy Robinson in the UK — sparked consumer backlash that persisted throughout 2025, and continues now. As Forrester analyst Dipanjan Chatterjee put it, Musk's reputation "rubbed the sheen right out of what was once a darling and soaring automotive brand."

Despite increased EV registrations across the continent, Tesla registrations in Europe dropped by 49% through January and February 2025. In Sweden, Germany, and the UK, registrations fell by over 80%, 46%, and 68% respectively. Europe, where environmental and political values are especially salient to purchase decisions, has been the hardest hit.

It's not entirely one-way. While new-buyer metrics collapsed, Tesla's US loyalty score among existing owners actually increased slightly, from 90% to 92%. People who already own a Tesla still mostly intend to stick with the brand. The problem is with acquisition — convincing new buyers to choose Tesla over a rapidly widening field of alternatives. Five car makers now outrank Tesla in brand value: Toyota, Mercedes-Benz, Volkswagen, Porsche, and BMW, while BYD's brand value jumped 23% in the same period Tesla's fell 36%.

In short: Tesla went from underdog Silicon Valley insurgent, to aspirational luxury disruptor, to mainstream EV dominant, and has now arrived somewhere considerably more uncomfortable — a brand whose product is still respected by owners but which has become politically charged, competitively pressured, and no longer "cool" in the way it once was, particularly in Europe. And when purchases of cars are so heavily influenced by brand perception, that would be a big, big issue.

Tech securitypolicycryptodata

Google Employee Charged With Insider Trading on Polymarket

A Google employee, Michele Spagnuolo, has been charged with fraud and money laundering after insider trading on Polymarket using nonpublic search data.

The Wall Street Journal

Summary

What: Michele Spagnuolo, a Google employee, faces charges of fraud and money laundering. The charges stem from his alleged use of nonpublic Google search data to place bets on Polymarket, a decentralized prediction market.

Why it matters: This incident highlights the significant legal and ethical risks associated with employees having access to sensitive internal data, especially when combined with emerging decentralized platforms that can be used for financial speculation.

Decoder

Polymarket: A decentralized prediction market platform built on blockchain technology, where users can bet on the outcome of future events.
Insider trading: The illegal practice of using nonpublic, material information obtained through one's position to make trades for personal profit, giving an unfair advantage.

Original Article

Michele Spagnuolo has been charged with fraud and money laundering after making bets based on nonpublic search data.

Tech careerproductivityengineering

Fast is better than slow

Patrick Dubroy argues that being fast is intrinsically better for programmers because it accelerates learning, decision-making, and ability to try multiple solutions.

Dubroy.com

Summary

What: Patrick Dubroy, co-creator of Ohm, suggests that top programmers are fast, leading to quicker data collection, faster learning, and the ability to test multiple approaches. He advises practices like not delaying work, reclaiming small chunks of time, sharing work at 70% completion, asking for advice, picking battles in code reviews, and doing only what's required.

Why it matters: This perspective challenges the common notion that speed is a consequence of skill, suggesting it's a foundational practice that actively builds skill, advocating for efficiency and pragmatism over perfectionism in software development.

Takeaway: Consider adopting the "70% rule" for sharing work or pull requests to get feedback sooner, and focus on doing only what's required to avoid unnecessary speculative work.

Deep Dive

Patrick Dubroy observes that highly effective programmers are fast, a characteristic he believes enables them to be great programmers rather than being a result of it.
Speed allows for quicker data acquisition, leading to better decisions and faster learning over time.
Being fast also enables trying multiple approaches to a problem, increasing the likelihood of finding the best solution.
Dubroy differentiates this from "hustle culture," advocating for smarter, faster work habits rather than simply working longer hours.
Key suggestions include "don't delay" on problems, addressing them immediately rather than postponing them until later.
"Reclaim small chunks" by using fragmented time productively, rather than only relying on long, uninterrupted blocks to get work done.
"Don't worry about looking dumb" by sharing work early and often, embracing feedback on 70% complete work rather than striving for 100% perfection before sharing.
Ask colleagues for advice instead of trying to solve everything alone, emphasizing that software development is a team sport.
"Pick your battles" in code reviews, quickly addressing minor reviewer comments rather than arguing, and avoiding nitpicking when reviewing others' code.
"Do only what's required," focusing on the minimum necessary to fulfill a request to avoid wasted effort on speculative "above and beyond" work.

Original Article

Fast is better than slow

Don’t question why. Fast is better than slow. That’s just how it is. Your job is to take everything you can already do and do it faster.

If you can embrace the idea that fast is intrinsically better than slow, you’re halfway home. If you can get an entire team of players to embrace that idea, you’re going to win a lot of games.

All other things being equal, if I can get the ball from Point A to Point B with one touch, it is better than getting it there in two touches. Why? Because one touch is faster than two touches, and fast is better than slow.

About 10 years ago, I realized all the best programmers I had worked with had something in common: they were fast. By that I mean that they moved quickly: we’d discuss a problem and an hour or two later they’d already have a patch ready or a prototype to show off.

It took me a while, but eventually I realized: they weren’t fast because they were great programmers, they were great programmers because they were fast.

Think about it — if you’re fast, you get data more quickly. That helps you make better decisions, sooner. It also means you learn faster, and over longer periods it means you learn more. Being fast also means you can try out multiple approaches to a problem and pick the best one.

A lot of people push back on this because it sounds like hustle culture. But there are lots of ways to move faster that don’t involve working long hours. Jamie Brandon has written a pair of excellent posts on this: Speed matters and Moving faster. You should go and read those if you haven’t already.

I have a few suggestions of my own — things that are a bit more about the messy reality of working as a software engineer than they are about coding per se. And I’m slightly embarrassed to admit that, unlike Jamie, they took me more than a decade to learn.

Don’t delay. This is a big one. I’ve worked with many people who seem to move slowly out of habit. They learn about a problem at 4pm, and decide to tackle it tomorrow. Or next week, or next quarter.

I think this is often about avoiding discomfort. Getting started is hard: you often don’t know exactly what needs to be done, or where to begin. It’s comforting to believe that waiting will make it easier, but in my experience it rarely does.

Reclaim the small chunks. Some programmers have convinced themselves that they need long, uninterrupted work periods to get anything done. As I wrote in Getting things done (in small increments), I think this is more of a preference than a hard constraint. Most people could get better at this if they tried.

The wins can be surprisingly big. At many companies, you might be lucky to have a single uninterrupted block of 3–4 hours each day. Say you have another hour or two of meetings — that’s still 25–35% of your time lost to fragmentation. You can get a lot more done if you spend that time being productive rather than reading email or browsing Hacker News.

Don’t worry about looking dumb. You probably already know that you should share your work early and often. But it’s uncomfortable, so it’s easy to put it off while telling yourself a story like “I have a high bar for quality.”

You’ll get results much faster if you learn to push through that discomfort. Oliver Burkeman talks about the 70% rule:

If you’re roughly 70% happy with a piece of writing you’ve produced, you’ll should publish it. If you’re 70% satisfied with a product you’ve created, launch it.

[…]

Moving forward at 70% takes more guts, more strength of character, than holding out for 100%, because it entails moving forward amid uncertainty, anxiety, and the disagreeable feeling that comes with putting less-than-perfect work into the world.

The same goes for PRs. Don’t waste time polishing your code in hopes that your reviewer will find nothing wrong — push it now and accept the feedback. There are no points for getting your PR approved without comments.

Another way to move more quickly (and potentially look dumb in the process) is to ask your colleagues for advice. I’ve seen many developers who seem to think they’re required to come up with everything themselves. But software development is a team sport — don’t force yourself to go it alone.

Pick your battles. When it comes to collaboration, don’t waste time bikeshedding. If your PR reviewer wants you to change something, it’s almost always faster to do what they’re asking than to argue about it. 95% of the time, the differences are so minor that it’s barely worth discussing. Save your time and energy for the 5% that matter.

The same thing applies when you’re reviewing — don’t waste your time on inconsequential things. I’m definitely not saying you should rubber stamp everything; I often leave comments suggesting better names, or other ways to do things, but leave the final decision up to the author. I think at least 50% of my reviews are “LGTM with comments” — in my opinion, this gives you most of the benefits of code review without letting it suck up too much time (for you or your teammates).

Do only what’s required. In one of my first internships, the team lead gave me some advice: “When someone asks you to do something, do the absolute minimum that’s required. If you can do that consistently, everyone will think you’re a genius.” At the time, I thought it was cynical; but over the years, I’ve come to realize how wise it is.

If you try to go “above and beyond”, you’re almost always guessing — about what someone else wants, or what the system will require in the future. And the more inexperienced you are, the greater the chance that your guess is wrong.

To move faster, don’t waste time doing things that nobody asked for.

Don’t question why. Fast is better than slow. That’s just how it is. Your job is to take everything you can already do and do it faster.

AI audiocreative

ElevenLabs Music Generation Model

ElevenLabs launched Music v2, a new AI music generation model capable of seamless genre transitions mid-track, with pricing cuts up to 50% for developers and brands.

ElevenLabs

Summary

What: ElevenLabs released Music v2, an improved music-generation model offering better vocals, instrumentation, and arrangement, with features like inpainting for track sections and full song structure generation. It supports multilingual lyrics and allows genre switching within a single song while maintaining coherence. Pricing for Music v1 and v2 has been cut by up to 50% for ElevenAPI and 40% for ElevenCreative customers.

Why it matters: This release signals the rapid advancement and commercialization of generative AI in creative fields, particularly music, with a focus on both technical sophistication (genre switching, coherence) and market accessibility (lower pricing, licensed data).

Takeaway: Developers building audio-enabled applications should explore the ElevenAPI for integrating custom, commercially cleared music generation, especially with the reduced pricing.

Deep Dive

ElevenLabs released Music v2, an upgraded music generation model.
It offers improved vocals, instrumentation, arrangement, and multilingual support.
New capabilities include switching genres mid-track without breaking musical coherence.
Music v2 allows "inpainting" to regenerate specific sections of a track (e.g., bridge without affecting chorus).
Users can build full songs section by section (intro, verse, chorus) maintaining structure.
The model powers three platforms: ElevenMusic (for creators), ElevenAPI (for developers), and ElevenCreative (for brands/ads).
Pricing for Music v1 and v2 has been reduced by up to 50% for ElevenAPI and 40% for ElevenCreative self-serve customers.
All generated tracks are trained on licensed data and cleared for commercial use, with no sync fees or clearance delays.

Decoder

Inpainting: In generative AI, the process of filling in or regenerating specific, designated parts of an existing output (e.g., an image, audio track) while preserving the surrounding content.

Original Article

Music v2 delivers better vocals, instrumentation, and arrangement across every genre, with improved multilingual support and a set of new capabilities.

Music v2 powers three ElevenLabs platforms, each built for a different use case:

ElevenMusic — listen, remix, and create tracks
ElevenAPI — embed music generation directly in your product
ElevenCreative — downloadable music for ads, branded content, and video

Alongside this release, we've cut both Music v1 and Music v2 pricing by up to 50% for ElevenAPI and up to 40% for ElevenCreative self-serve customers.

What’s new?

Music v2 introduces a new set of capabilities and improvements when compared to our previous model, from new ways to control and shape a track, to handling vocal and compositional complexity at a level that wasn't possible before.

A single song can move from opera to heavy metal and back, sustain fast rap and dense lyrical delivery, and embed non-musical sound effects directly within the track, all without breaking musical coherence.

Music v2 also gives you granular control for how you work with the model. With improved inpainting, you can select any section of a track and regenerate just that part, tweaking the bridge without touching the chorus, leaving everything else exactly as it is.

Composition can now go further too. Rather than generating short clips, Music v2 lets you build a full song section by section — intro, verse, chorus, and beyond — maintaining structure and continuity throughout. And across languages, lyrics, vocals, and arrangements now perform more reliably in the language you write in, with meaningful improvements across a growing list of supported languages.

Built for musicians, developers, and brands

Music v2 is the model powering products across three ElevenLabs platforms, each built for a different use case. Whether you're creating music, building a product, or making content at scale, there's a purpose-built platform for you.

For musicians and creators, ElevenMusic is your studio. Start from a lyric, a mood, or a reference track. Develop it into a full composition. Remix tracks you discover. Music v2 is the engine underneath.

For developers, ElevenAPI gives you direct access to the model. Generate, inpaint, and reference-match programmatically. Embed custom music generation wherever your product needs it.

For brands and content teams, ElevenCreative Music handles licensed music at scale. Brief it like a creative director — sonic mood, genre, tempo, brand voice — not just a text prompt. No sync fees. No clearance delays.

This release also builds on growing alignment across the music industry around licensed and rights-respecting AI music development, including recent licensing collaborations with Believe.

Available now

Music v2 is available today across ElevenMusic, ElevenCreative and coming soon to ElevenAPI. Reach out to our Sales team to enquire about early access. The model is trained only on licensed data and cleared for commercial use, so every track you generate is yours to use — no sync fees, no clearance delays, no restrictions on how you deploy it.

Try out Music v2.

AI llminfrastructureperformanceopensource

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

Hugging Face introduced "Delta Weight Sync" in TRL, using a Hub "bucket" to drastically cut data transfer for asynchronous reinforcement learning by sending only changed model parameters.

Hugging Face

Summary

What: Hugging Face's TRL library now uses "Delta Weight Sync" to optimize asynchronous Reinforcement Learning (async RL). This method sends only the modified model parameters, rather than the entire model, reducing data transfer from gigabytes to megabytes per RL step. It leverages a Hugging Face Hub "bucket" for high-frequency object storage, allowing independent locations for the trainer and inference engine.

Why it matters: This technical optimization addresses a significant bottleneck in distributed AI training by improving efficiency and reducing bandwidth costs, making large-scale asynchronous reinforcement learning more practical and scalable.

Takeaway: For developers implementing async RL with large models, consider integrating "Delta Weight Sync" or similar delta-encoding strategies to reduce synchronization overhead and improve training performance.

Deep Dive

The blog post introduces "Delta Weight Sync" within Hugging Face's TRL library.
This method is designed to reduce the weight synchronization payload in asynchronous Reinforcement Learning (async RL).
Instead of transferring the full model weights at each RL step, only the changed parameters (deltas) are transmitted.
This significantly reduces data transfer volume, from gigabytes to megabytes.
A Hugging Face Hub "bucket" is used for high-frequency object storage to facilitate this.
The "bucket" allows the trainer and inference engine to operate from separate locations without direct communication, fetching only necessary updates.
The technique leads to substantial bandwidth savings and improves the efficiency of distributed training.

Decoder

Async RL (Asynchronous Reinforcement Learning): A class of reinforcement learning algorithms where multiple agents or workers interact with environments and update a central model asynchronously, improving training speed and stability.
Delta weight sync: A method in machine learning for synchronizing model parameters across distributed systems by only sending the differences (deltas) in weights since the last update, rather than the full set of weights.
TRL: Stands for "Transformer Reinforcement Learning," a library by Hugging Face designed to fine-tune large language models with Reinforcement Learning from Human Feedback (RLHF) and other RL algorithms.

Original Article

The blog post introduces a method to reduce the weight synchronization payload in async RL using "Delta Weight Sync," which transmits only changed model parameters between RL steps, significantly reducing data transfer from gigabytes to megabytes. A Hugging Face Hub "bucket" manages high-frequency object storage, enabling separate locations for the trainer and inference engine without direct communication, leading to substantial bandwidth savings.

AI securityinfrastructurecloudapi

Secure MCP Tunnel

OpenAI is offering a "Secure MCP Tunnel" solution, enabling private on-premises servers to connect to OpenAI products like ChatGPT without exposing them to the public internet.

OpenAI

Summary

What: OpenAI's Secure MCP Tunnel allows private MCP (Model Connector Protocol) servers to communicate with OpenAI products by using `tunnel-client` to establish outbound HTTPS connections, handling requests locally and returning responses through the same tunnel. This avoids opening inbound firewall ports and supports enterprise networking requirements.

Why it matters: This reflects OpenAI's push into enterprise solutions, addressing critical security and privacy concerns for companies wanting to integrate AI models with their internal systems while keeping sensitive data and infrastructure private.

Takeaway: If your enterprise wants to integrate private, on-premises data or models with OpenAI products, investigate the Secure MCP Tunnel to maintain network security.

Deep Dive

Secure MCP Tunnel connects private MCP servers to OpenAI products (ChatGPT, Codex, Responses API) without public internet exposure.
tunnel-client runs inside the customer's network, initiating outbound HTTPS connections to api.openai.com:443.
It long-polls OpenAI for queued MCP work, forwards JSON-RPC requests to the private MCP server, and returns responses.
The private MCP server does not need a public listener or open inbound firewall ports.
Supports enterprise networking features like outbound proxies, custom CA bundles, and mTLS.
Can also support narrowly scoped HTTP callouts into a customer network via an embedded MCP server, Harpoon.
Configuration is done in Platform tunnel settings, requiring a tunnel_id and a runtime API key.
Common deployment patterns include Kubernetes sidecar, dedicated Kubernetes deployment, or VM/systemd service.
Allows OAuth discovery to travel through the tunnel, keeping the MCP server private.

Decoder

MCP (Model Connector Protocol): A protocol used by OpenAI for connecting external models or data sources to its products.
JSON-RPC: A remote procedure call protocol encoding its calls and responses in JSON.
mTLS (Mutual Transport Layer Security): A two-way authentication method where both the client and server verify each other's identities using digital certificates.

Original Article

Secure MCP Tunnel lets you connect private MCP servers to supported OpenAI products without opening inbound firewall ports or exposing those servers to the public internet. Run tunnel-client inside the network that can already reach your MCP server; it opens an outbound HTTPS path to OpenAI, pulls queued MCP work, forwards requests locally, and returns responses through the same tunnel.

What is an MCP tunnel?

An MCP tunnel is an outbound-only connection from a host inside your network to an OpenAI-hosted MCP endpoint. Use it when your MCP server is private, on-premises, or behind a firewall, but ChatGPT, Codex, the Responses API, or another supported OpenAI surface still needs to call it.

Secure MCP Tunnel keeps the MCP server private while giving supported OpenAI products a normal MCP request path. tunnel-client polls OpenAI for work, forwards MCP requests locally, and returns responses through the same tunnel.

Use Secure MCP Tunnel when

Your MCP server runs on a private network, on-premises, on a developer machine, or behind existing access controls.
You want ChatGPT, Codex, the Responses API, or another supported OpenAI surface to use that server without making the MCP server public.
Your network allows the host running tunnel-client to make outbound HTTPS requests to api.openai.com:443 by default, or mtls.api.openai.com:443 when control-plane mTLS is configured, and reach the private MCP server.
Start with the MCP and Connectors guide for general MCP concepts.

How it works

Create or manage an OpenAI-hosted MCP tunnel endpoint in Platform tunnel settings.
Run tunnel-client inside the network that can reach your private MCP server.
Configure tunnel-client with the tunnel identity and the private MCP server address.
OpenAI products send MCP requests to the OpenAI-hosted tunnel endpoint.
tunnel-client long-polls for queued work, forwards each JSON-RPC request to the private MCP server, and posts the response back through the tunnel.

The private MCP server does not need a public listener. The OpenAI-hosted endpoint gives supported products a normal MCP request path, while the network initiation point stays inside your boundary. When a connector asks for streamed results, the tunnel path can forward intermediate server-sent events.

OpenAI products call the OpenAI-hosted tunnel endpoint; tunnel-client long-polls for queued work and returns the MCP response through the same tunnel.

Before you start

You need:

A tunnel_id from Platform tunnel settings.
A runtime API key for tunnel-client. The key principal needs Tunnels Read + Use for the target tunnel.
A tunnel manager with Tunnels Read + Manage if you need to create or edit tunnel metadata.
An MCP server that tunnel-client can reach over stdio or HTTP from inside your network.

Network requirements

tunnel-client does not need inbound internet access. It needs outbound HTTPS to OpenAI and local reachability to the private MCP server:

From	To	Used for
Host running `tunnel-client`	`api.openai.com:443` over HTTPS on `/v1/tunnel/*`	Default polling and response posting.
Host running `tunnel-client`	`mtls.api.openai.com:443` over HTTPS on `/v1/tunnel/*`	Polling and response posting when control-plane mTLS is configured.
Host running `tunnel-client`	The configured stdio command or MCP server URL	Forwarding MCP requests from inside your network.

Set up tunnel-client

Open Platform tunnel settings, then use the download link there or the latest public tunnel-client release from openai/tunnel-client. Keep your runbook pointed at the latest-release URL instead of hard-coding a specific release URL.

If you already have a binary, start with tunnel-client help quickstart. For a named local stdio profile, use:

1
2
3
4
5
6
7
8
9
10
export CONTROL_PLANE_API_KEY="sk-..."

tunnel-client init \
  --sample sample_mcp_stdio_local \
  --profile local-stdio \
  --tunnel-id tunnel_0123456789abcdef0123456789abcdef \
  --mcp-command "python /path/to/server.py"

tunnel-client doctor --profile local-stdio --explain
tunnel-client run --profile local-stdio

For an HTTP MCP server, use --mcp-server-url https://mcp.internal.example.com/mcp instead of --mcp-command.

Keep tunnel-client run ... healthy while you create or test the connector. Connector discovery and MCP tool calls depend on the running client.

The local admin UI at /ui shows whether the running client is healthy, ready, and connected before you test from ChatGPT, Codex, or an API flow.

Choose where to run tunnel-client

Run tunnel-client in the same trust boundary that can already reach the private MCP server. Common deployment patterns are:

Kubernetes sidecar: Run tunnel-client beside the MCP server in one Pod and connect over localhost.
Dedicated Kubernetes deployment: Run tunnel-client separately when the MCP server is already reachable through a private Service.
VM or systemd service: Run tunnel-client on a host that can reach the MCP server over private networking.

Connect from ChatGPT

Open ChatGPT connector settings, create a custom connector, and choose Tunnel under Connection. Select an available tunnel when ChatGPT lists it, or paste a valid tunnel_id if you already have one.

If the tunnel does not appear in ChatGPT, verify that the tunnel is associated with the target workspace and that the connector operator has Tunnels Read + Use.

Security and networking

The private MCP server stays inside the customer-controlled environment. tunnel-client reaches OpenAI over outbound HTTPS using the runtime API key and, when required, optional control-plane mTLS.

The MCP server address stays private and is used only from inside the environment where tunnel-client runs.
tunnel-client authenticates to the OpenAI tunnel control plane; supported OpenAI products use the OpenAI-hosted tunnel endpoint.
Tunnel access follows the existing organization and workspace context instead of introducing a separate public ingress path.
tunnel-client supports enterprise networking requirements such as outbound proxies, custom CA bundles, control-plane client certificates, and MCP-side mTLS.

Advanced: allowlisted HTTP callouts

Secure MCP Tunnel can also support narrowly scoped HTTP callouts from supported agent or API flows into a customer network. tunnel-client includes an embedded MCP server, Harpoon, that exposes configured HTTP targets by label and lets callers invoke them through the tunnel with bounded request/response limits.

Use this when you need to reach a small set of private REST endpoints without exposing them publicly. Harpoon is not a general-purpose proxy: callers cannot choose arbitrary hosts, and requests are limited to the targets and methods configured by the customer.

Troubleshooting

Tunnel not visible in ChatGPT: Check the tunnel workspace scope and the connector operator’s Tunnels Use permission.
Connector discovery or tool calls fail: Confirm that tunnel-client run ... is still running, then re-run tunnel-client doctor --profile <name> --explain.
You can inspect a tunnel but cannot edit it: The operator likely has Tunnels Read but not Tunnels Manage.
tunnel-client exposes /healthz, /readyz, /metrics, and a local admin UI at /ui.
The admin UI is loopback-only by default. Expose it remotely only when you intentionally need an operator network to reach it.
Use those surfaces to confirm that the client is healthy, ready, and polling before testing from ChatGPT, Codex, or an API flow.
If the client is not connected, requests through the tunnel fail until tunnel-client reconnects.
Raw HTTP logging is disabled by default, and support exports are redacted.

OAuth

OAuth discovery can travel through the tunnel path so the MCP server itself can remain private.
The tunnel preserves the upstream authorization server metadata needed for browser-facing OAuth flows.
The authorization server itself is not automatically tunneled. If it is unreachable from the public internet and from the tunnel-client host, the OAuth flow can still fail even when the MCP server is reachable.

Where to configure it

Manage OpenAI-hosted MCP tunnel endpoints in Platform tunnel settings.
Use a tunnel when creating a connector from ChatGPT connector settings.
For Codex or API flows, use the tunnel-backed MCP target exposed by the supported product surface.

Next steps

Create or manage the tunnel in Platform tunnel settings.
Validate your tunnel-client profile with tunnel-client doctor --profile <profile> --explain.
Connect the tunnel from ChatGPT connector settings or the supported OpenAI surface you are using.

Sanitized OpenAI Platform tunnel settings screenshot. — Create and manage OpenAI-hosted MCP tunnel endpoints from Platform tunnel settings.

Sanitized ChatGPT connector settings screenshot with Tunnel selected. — Select Tunnel when connecting a ChatGPT connector to a private MCP server.

AI opensourcedatapythonrust

LiteParse v2.0

LlamaIndex's LiteParse v2.0, a standalone open-source PDF parsing tool, has been rewritten in Rust for up to 100x faster processing and now supports Rust, JS/TS, Python, and WASM for browser and edge environments.

LlamaIndex

Summary

What: LiteParse v2.0, developed by LlamaIndex, is an open-source PDF parsing tool offering high-quality spatial text parsing with bounding boxes, screenshot generation, and multi-language support. The rewrite in Rust significantly improves performance, making it up to 100x faster, and adds native support for Rust, JavaScript/TypeScript, Python, and WASM.

Why it matters: This move towards highly optimized, local-first data processing tools written in Rust and supporting WASM demonstrates a growing trend to improve performance, reduce cloud dependencies, and enable client-side AI capabilities within the RAG ecosystem.

Takeaway: If you need fast, local PDF parsing for RAG applications, consider integrating LiteParse v2.0 into your Rust, Python, or JavaScript projects, or exploring its WASM capabilities for browser-based use.

Decoder

RAG (Retrieval-Augmented Generation): An AI framework that retrieves facts from an external knowledge base to ground large language models (LLMs) on accurate and up-to-date information.
WASM (WebAssembly): A binary instruction format for a stack-based virtual machine, designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications.

Original Article

LiteParse v2.0 is out now, and it is blazing fast + runs everywhere!

We rewrote everything from scratch in Rust, and now:

up to 100x faster parsing
install natively in Rust, JS/TS, and Python
a custom WASM package enables browser and edge runtime usage

pip install liteparse

npm i ＠llamaindex/liteparse

npm i ＠llamaindex/liteparse-wasm

cargo install liteparse

Blog: llamaindex.ai/blog/liteparse-v2-0-runs-everywhere?utm_medium=socials&utm_source=twitter&utm_campaign=2026-may-

Repo: github.com/run-llama/liteparse

We recently added 3 finetuning projects 🔥

Finetuning embeddings
@OpenAI finetuning gpt-3.5-turbo to distill GPT-4
Finetuning Llama 2 for text-to-SQL

We now have a brand-new guide ✨showing how to include all these components when building RAG:

gpt-index.readthedocs.io/en/latest/end_to_end_tutorials/finetuning.html

Finetuning embeddings: github.com/run-llama/finetune-embedding

Finetuning gpt-3.5-turbo:

colab.research.google.com/drive/1vWeJBXdFEObuihO7Z8ui2CAYkdHQORqo?usp=sharing

We now have the most comprehensive cookbook on building LLMs with Knowledge Graphs (credits @wey_gu).

Key query techniques: text2cypher, graph RAG
Automated KG construction
vector db RAG vs. KG RAG

Check out the full 1.5 hour tutorial:

The full Colab notebook is here:

There was so much content beyond the live webinar that we recorded a part 2 🔥

We stitched it together in the video.colab.research.google.com/drive/1tLjOg2ZQuIClfuWrAC2LdiZHCov8oUbs?usp=sharing

To reiterate, there’s a ton of content in here - it basically qualifies as a mini-course 🧑‍🏫

First, we learn the concepts through helpful visual explanations and links.

Learn both about KGs and the traditional RAG stack.

Introducing “One-click Observability” 🔭

With one line of code, you can now seamlessly integrate @llama_index with rich observability/eval tools offered by our partners (@weights_biases, @arizeai, @truera_ai).

Easily debug/eval your LLM app for prod 💪 https://t.co/tia41IgsT6gpt-index.readthedocs.io/en/latest/end_to_end_tutorials/one_click_observability.html

[1] @weights_biases Prompts lets users log/trace/inspect the LlamaIndex execution flow during index construction/querying.

You automatically get traces, and can also choose to version/load indices.

https://t.co/iGDkmxybzggpt-index.readthedocs.io/en/latest/end_to_end_tutorials/one_click_observability.html

[2] OpenInference (@arize_ai) is a standard for capturing/storing AI model inferences.

It allows you to experiment/visualize LLM apps using observability tools like @arize_phoenix.

Check out the notebook here! https://t.co/aT9PAP3jGhgpt-index.readthedocs.io/en/latest/examples/callbacks/OpenInferenceCallback.html

Tip for better RAG systems💡: don’t just store raw text chunks, augment them with structured data.

Enables metadata filtering
Helps bias embeddings

Here’s a guide on how to use the @huggingface span-marker to extract entities for this exact purpose📕: https://t.co/Gwwoeu3i9Hgpt-index.readthedocs.io/en/latest/examples/metadata_extraction/EntityExtractionClimate.html

In this example, we parse the 2023 IPPC Climate Report.

After text parsing to break the document into chunks, we use the span-marker extractor to extract relevant entities.

These entities can be used as metadata filters (in a vector db) or to help enhance the context embeddings.

In this guide, we do the latter. Adding/embedding the right metadata directly improves the generated answer (left), vs. without (right)

AI mobilefrontendjavascript

Introducing Apex: A Fast, Specialized Model for React Native

Callstack has introduced Apex, a specialized AI coding model based on Gemma 4 and fine-tuned for React Native development, offering faster output and lower inference costs for framework-specific tasks.

Callstack

Summary

What: Apex is a new AI coding model from Callstack, built on Gemma 4 and trained with Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) using curated React Native ecosystem data. It's designed to provide specific answers, faster output (2,000 to 4,000+ tokens per second on NVIDIA RTX PRO 6000 GPUs), and lower inference costs for React Native app development, addressing architecture decisions and framework-specific issues.

Why it matters: The emergence of highly specialized, smaller AI models like Apex, optimized for specific domains like React Native, indicates a shift towards more cost-effective and practically useful AI tools for developers, moving beyond general-purpose frontier models for high-volume, repetitive tasks.

Takeaway: If you are a React Native development team, consider applying for the private beta of Apex to explore how a domain-specific AI model can improve coding efficiency and reduce costs for your specific workflows.

Decoder

Gemma 4: A family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create Gemini models.
Supervised Fine-Tuning (SFT): A machine learning technique where a pre-trained model is further trained on a labeled dataset specific to a new task to adapt its capabilities.
Group Relative Policy Optimization (GRPO): A reinforcement learning algorithm used for training policies, potentially adapted here for fine-tuning the model's code generation.
Inference Cost: The computational cost (e.g., GPU usage, energy) incurred when running a trained AI model to generate predictions or outputs.

Original Article

Join Apex private beta

We can help you move it forward!

At Callstack, we work with companies big and small, pushing React Native everyday.

Learn more about AI

Here's everything we published recently on this topic.

Agencies or Swarms? What Small Model Cooperation Means for AI Engineering

A hands-on look at agentic swarms, distributed inference, and what small-model benchmarks suggest about future AI infrastructure.

When to Use Apple Foundation Models on Mobile

Apple’s Foundation Models framework gives mobile teams a practical local-first option for short, privacy-sensitive, and low-latency tasks, while cloud models still handle the heavier work.

How Skillgym Helps You Verify Agent Skills Still Work After Every Change

See how skillgym tests agent skills across runners, validates behavior, and catches regressions before your users do.

How We Optimized Agent Device for Mobile App Automation

Discover how we optimized Agent Device to reduce LLM token consumption by over 50% and speed up AI-driven mobile app automation.

Agents Commander: A Lightweight Interface for Multi-Agent CLI Workflows

Manage and connect multiple AI agents in a CLI with a simple communication protocol.

Announcing Codex Plugins for React Native Development

How Codex plugins work, why they're useful for React Native development, and two plugins we built for building and testing React Native apps.

Testing AI Agent Skills Reliably With Skill Gym

Learn how Skill Gym helps test, validate, and improve AI agent skills with repeatable workflows, assertions, and model-aware feedback.

Improve Your AI Coding Workflow With Cursor Tips and Tricks

Explore practical tips for working with AI coding agents, managing context, using skills, reviewing code, and choosing the right models.

React Native Evals: Measuring AI Code Quality in Practice

React Native Evals introduces a data-driven way to measure how AI coding models perform on real React Native development tasks.

Building v0 iOS and Fixing React Native Along the Way

A technical look at building v0 mobile with React Native and Expo, covering iOS-first decisions, native modules, and fixes pushed to React Native core. Featuring Fernando Rojo, Szymon Rybczak, and Oskar Kwaśniewski.

Building a Custom AI Assistant for Your Business

Learn what it takes to build a custom AI-powered chatbot for your company, including the challenges, solutions, performance, and security considerations.

The Hard Way vs. The React Native AI SDK Way: Stop Writing Custom Modules for Every Model

Learn how React Native AI uses a unified provider model to switch between cloud and on-device LLMs without rewriting application logic, enabling reliable AI experiences online and offline.

Build Smarter Apps: Tool Calling & AI Orchestration Explained

Learn how on-device models in React Native can call tools, interact with external APIs, and return structured outputs you can use directly in application logic.

MLC LLM + React Native: On-Device AI Without the Pain

Learn how to run third-party on-device LLMs in React Native using MLC LLM. Choose your models, run them efficiently across platforms, and keep a unified JavaScript API with React Native AI.

Intelligent Fallbacks with Apple Intelligence in React Native

Learn how to use Apple’s built-in Foundation Models in React Native apps, reduce memory usage, and enable fast, fully on-device AI with React Native AI Apple.

What Is the React Native AI SDK? A Complete Intro & Quickstart

Learn why on-device AI matters for React Native apps, how local LLMs behave offline, and what problems React Native AI is designed to solve from day one.

Implementing an Android TurboModule with Kotlin

Learn how to implement a React Native TurboModule on Android, from the TypeScript spec and Codegen to the Kotlin implementation and package registration.

Working with Different Threads in Swift TurboModules

Learn how JavaScript, native module, background, and main threads interact in React Native TurboModules, and how to use GCD queues to run async work safely in Swift.

How to Add Type-Safe Constants to Swift TurboModules

Learn how to expose typed, immutable constants from a Swift TurboModule using Codegen, and get full end-to-end type safety between native and JavaScript.

Adding Event Emitters to Your TurboModule in Swift

Learn how to emit native events from a Swift TurboModule and subscribe to them in JavaScript, including proper memory management and Codegen integration.

Writing Your First TurboModule in Swift

Learn how to build a fully working TurboModule in Swift, integrate it with Codegen, and bridge it through Objective‑C inside an Expo app.

AI securityagents

Finding high-severity security issues with publicly available models

Ramp used its Inspect coding agent to find high-severity security issues in its own backend by running 10,000 sessions with a minimal "find security issues" prompt.

X

Summary

What: Ramp deployed approximately 10,000 "Inspect" coding-agent sessions against its own backend over an 8-hour period using a simple prompt to "find security issues," successfully identifying high-severity vulnerabilities.

Why it matters: This demonstrates the potential for AI agents to automate and scale security testing and red-teaming, suggesting a future where AI-driven tools become a standard part of vulnerability discovery and defense.

Takeaway: Consider experimenting with AI agents for automated security vulnerability scanning in your own development environments.

Original Article

Ramp pointed roughly 10,000 Inspect coding-agent sessions at its backend in an 8-hour run with a minimal "find security issues" prompt.

AI enterprisecloud

Google expands Gemini for Business with shareable Projects

Google is enhancing Gemini for Business with new shareable Projects, enabling team collaboration within dedicated multi-surface workspaces, bridging the gap with the Enterprise tier.

TestingCatalog

Summary

What: Google is rolling out "Projects" for Gemini for Business, a feature previously exclusive to Gemini Enterprise. These projects provide dedicated multi-surface workspaces where team members can collaborate on chats, manage uploaded files, assign colors, define system instructions, and invite collaborators to work within the same chat, similar to Microsoft Copilot's group chat. Workflow agents, based on the Enterprise platform, are also coming to Gemini for Business.

Why it matters: This move indicates Google's strategy to position Gemini as a comprehensive, collaborative AI assistant for enterprises, directly competing with offerings from Microsoft, Anthropic, and OpenAI that focus on team-level orchestration and persistent AI agents.

Takeaway: If your team uses Gemini for Business, look out for the new "Projects" feature for enhanced collaborative AI workspaces and automated workflows.

Original Article

Google is continuing to push Gemini for Business deeper into team workflows, with several updates moving from internal development toward broader rollout. The most notable change centers on Projects, a feature that already exists in Gemini Enterprise but is being adapted for the Business tier, with a structure distinct from that of consumer Gemini.

These are true container projects where individual chats live inside dedicated folders, alongside uploaded files that can be managed within the same project, a setup that turns each project into a multi-surface workspace rather than a single chat thread.

Customization extends beyond organization. Users can assign a color to each project, define system instructions that apply across every chat inside it, and invite collaborators to work in the same workspace. That last point is where things get interesting: the collaboration model lets multiple people access and respond inside the same chat, mirroring the group chat pattern Microsoft introduced in Copilot but framed around business team contexts. This particular implementation looks unlikely to make its way to the consumer Gemini app, though further expansion across Business seats appears probable.

In parallel, Google is bringing workflow agents to Gemini for Business, building on the agent platform already available in the Enterprise edition. A reworked builder lets users configure automated, scheduled tasks that call connectors across the Google suite and beyond, covering Gmail, Drive, Calendar, and various third-party tools.

Availability follows the usual staged pattern, with some accounts already seeing these capabilities while others wait their turn. The overall trajectory positions Gemini for Business as a closer counterpart to Enterprise, narrowing the feature gap between the two tiers while keeping shared workspaces and agent orchestration as the central pitches for paying teams. Pairing project-level memory with scheduled agents also lines Google up against the always-on assistants taking shape across Microsoft, Anthropic, and OpenAI, where the competitive question is increasingly which platform can run reliable, multi-step work on behalf of a whole team rather than a single user.

Data infrastructurecloudstoragedesign

Design S3 Object Storage Like a Senior Engineer

S3-scale object storage relies on a flat, immutable namespace, separating metadata and data, and using distributed sharding, merged segment files, and object chunking for petabyte-scale performance.

Level Up Coding

Summary

What: Designing object storage like Amazon S3 for 100+ petabytes and hundreds of millions of objects requires architectural choices like a flat key-value namespace, decoupling metadata from data, distributed metadata sharding, merging on-disk segment files to prevent inode exhaustion, and chunking large objects for parallel I/O.

Why it matters: This article dissects the fundamental architectural decisions behind highly scalable, durable object storage, demonstrating how seemingly simple concepts like key-value stores become complex engineering challenges at extreme scales.

Takeaway: When designing large-scale storage systems, consider the implications of inode limits, metadata scalability, and the benefits of immutable objects for simplifying consistency and concurrency.

Deep Dive

S3-scale object storage (100PB+, billions of objects) deviates significantly from traditional file systems.
It uses a flat key-value namespace for objects, with no hierarchical directory structure.
Metadata and payload bytes are decoupled, allowing them to scale and be managed independently.
Object immutability simplifies caching, replication, and concurrency control.
Distributed metadata sharding is crucial for scalability, distributing the metadata workload across many nodes.
To avoid inode exhaustion on underlying file systems, object storage often merges many small object files into larger "segment files" on disk.
Large objects are chunked into smaller parts, enabling parallel reads/writes and efficient range requests.
Consistency models for object storage often lean towards eventual consistency for metadata updates to maximize availability and performance.
Versioning and lifecycle policies are critical features for managing object changes and retention at scale.
Strong encryption at rest and in transit is a standard requirement for secure object storage.

Decoder

Object storage: A data storage architecture that manages data as objects, distinct from file systems (hierarchical) and block storage (raw disk blocks). Objects contain data, metadata, and a globally unique identifier.
Flat namespace: A storage model where all objects reside at the same logical level, identified by a unique key, without hierarchical directories.
Metadata sharding: Distributing the storage and management of object metadata (e.g., object names, sizes, creation dates) across multiple servers or partitions to scale horizontally.
Inode exhaustion: A problem in traditional file systems where the system runs out of available "inodes" (data structures storing file metadata), even if there is still free disk space.
Segment files: Large files on disk used by object storage systems to store multiple small objects, consolidating them to reduce the number of underlying file system entries (inodes).
Object chunking: Breaking down a large object into smaller, manageable pieces (chunks) for storage and transfer, enabling parallel I/O and efficient handling of partial object requests.

Original Article

S3-scale object storage hinges on a flat, immutable namespace: buckets hold objects identified by keys, while metadata is separated from payload bytes so the system can scale independently. At ~100PB and hundreds of millions of objects, the design requires distributed metadata sharding, merged on-disk segment files to avoid inode exhaustion, and chunking of large objects for parallel reads and range requests.

Data aisecuritypolicydesign

AI Risk Is an Architecture Problem

AI risk is primarily an architectural problem, not just a model problem, where system design controls what the AI sees, outputs, and acts upon, mitigating data exposure, incorrect outputs, and unintended actions.

Applied Ingenuity

Summary

What: AI risks, categorized as data exposure, incorrect output, and unintended action, manifest as brand, compliance, liability, operational, and commercial harms. The article emphasizes that architectural controls—like bounding AI permissions, adding human review loops, and deterministic validations—are the most effective ways to manage these risks, rather than solely focusing on model improvements.

Why it matters: This reframes the conversation around AI safety from an abstract, model-centric view to a practical, system-level design challenge, underscoring that engineering controls are crucial for responsible AI deployment in real-world applications.

Takeaway: When designing AI-powered systems, prioritize implementing architectural safeguards such as strict data access controls, output validation layers, and mandatory human-in-the-loop review for high-impact actions.

Deep Dive

AI risk should be evaluated at the system level, encompassing the entire application, not just the underlying AI model.
Three core "mechanism risks" are identified: data exposure, incorrect output, and unintended action.
These mechanism risks can lead to five "business harms": brand risk, compliance risk, liability risk, operational risk, and commercial risk.
The most critical control for AI risk is architecture:
What the AI can see: Control input data, access to internal systems, and external internet access.
What its output feeds into: Implement validations and filters on AI outputs.
What it can do without checks: Restrict actions and integrate human review loops.
Examples of architectural controls:
Input sanitization: Preventing prompt injection or sensitive data leakage.
Output filtering and validation: Ensuring AI responses meet criteria before use.
Human-in-the-loop: Requiring human approval for significant AI-generated actions.
Bounded permissions: Giving AI systems only the minimum necessary access to tools and data.
Deterministic safeguards: Hard-coding rules or checks that the AI cannot override.
These architectural controls can sharply reduce action risk even without changing the AI model itself.
The "risk surface" of an AI system is defined by its autonomy, scope of access, and impact of its actions.

Decoder

Mechanism risks: The direct technical failures of an AI system, such as exposing data, generating incorrect outputs, or taking unintended actions.
Business harms: The organizational consequences of mechanism risks, including damage to brand reputation, regulatory non-compliance, legal liability, operational disruptions, and financial losses.
Architectural controls: System-level design decisions and safeguards (e.g., input validation, output filtering, human review, bounded permissions) implemented around an AI model to manage its risks.
Human-in-the-loop (HITL): A system design pattern where a human explicitly reviews, approves, or modifies an AI's output or proposed action before it is executed.
Bounded permissions: Granting an AI system only the minimum necessary access rights or capabilities (e.g., to specific tools, data, or external APIs) to perform its intended function, limiting the scope of potential harm.

Original Article

AI risk should be assessed at the system level, not just the model level. The three mechanism risks of data exposure, incorrect output, and unintended action map to five business harms: brand, compliance, liability, operational, and commercial risk. The most important control is architecture: what the AI can see, what its output feeds into, and what it can do without checks. Adding human review, deterministic validations, and bounded permissions can sharply reduce action risk without changing the model.

Data aimachine-learningethics

Auditing Model Bias with Balanced Datasets with Mimesis

The Mimesis Python library enables auditing machine learning model bias by generating balanced, synthetic counterfactual datasets to test for discrimination.

KDnuggets

Summary

What: Iván Palomares Carrascosa demonstrates how the Mimesis library can create synthetic datasets with controlled variables (e.g., gender, income) to detect hidden biases in ML models like a loan approval classifier. It allows creating "clones" with identical financial profiles but different demographic characteristics to isolate bias.

Why it matters: This highlights a practical, privacy-preserving method for addressing ethical AI concerns, especially when real-world sensitive data cannot be used for bias detection, pushing for more responsible AI development.

Takeaway: Developers can use the Mimesis Python library to generate synthetic data for testing model bias without compromising real-world sensitive information.

Decoder

Counterfactual dataset: A dataset constructed to show what would have happened if a specific feature (e.g., gender) had been different while keeping all other features constant, used to isolate the impact of that feature.

Original Article

Full article content is not available for inline reading.

Read the original article →

Data aienterprisedevops

Open Data Product SDK: Turning Data Product Ideas Into Standard YAML With AI Models

The Open Data Product SDK now uses AI models to convert free-form text descriptions into standards-compliant YAML for data product catalogs, simplifying metadata creation.

Open Data Products Blog

Summary

What: The Open Data Product SDK integrates AI to transform natural language descriptions and Markdown into machine-readable YAML, defining data product catalogs, item-level specifications, and ODPG graph context. This automates the process of capturing business objectives and use cases for data products, creating ODPC Catalog YAML and related metadata.

Why it matters: This move signifies an effort to bridge the gap between business stakeholders and technical data product definitions, streamlining governance and discoverability in enterprise data ecosystems through AI-assisted automation.

Takeaway: Data teams looking to standardize and automate data product cataloging might explore the Open Data Product SDK's new AI capabilities for converting natural language into YAML specifications.

Decoder

Data product: A reusable, well-defined, and standardized dataset or data service designed to meet specific business needs, often treated as a product with its own lifecycle and governance.
YAML: (YAML Ain't Markup Language) A human-friendly data serialization standard often used for configuration files and data exchange between languages.
ODPC (Open Data Product Catalog): A standard for defining and cataloging data products, making them discoverable and understandable across an organization.
ODPG (Open Data Product Graph): A graph-based representation of data products and their relationships, providing context and lineage.

Original Article

Open Data Product SDK now supports AI-assisted conversion of free-form text and Markdown into standards-ready YAML for data product catalogs, item-level specs, and ODPG graph context. The workflow captures product descriptions, use cases, business objectives, and signals, then generates ODPC Catalog YAML and connected portfolio metadata. The goal is to replace manual metadata editing with a standards-first path from stakeholder language to machine-readable data product definitions.

Data pythondataframe

Announcing Polars 1.41

Polars 1.41 boosts performance for wide Parquet tables by up to 3.29x and deepens query optimization with nested common subplan elimination.

Polars

Summary

What: Polars 1.41, released by Thijs Nieuwdorp on May 26, 2026, significantly speeds up Parquet footer decoding, showing a 3.29x improvement for 10,000 columns (from 117.4 ms to 35.74 ms). It also enhances the query optimizer for `LazyFrame` by deduplicating common subplans across all nesting depths and introduces `LazyFrame.gather()` for integer-based row selection.

Why it matters: This update signals Polars' ongoing commitment to high performance and sophisticated query optimization, positioning it as a strong contender against other data manipulation libraries, especially for complex analytical tasks and large datasets.

Takeaway: If you use Polars, upgrade to 1.41 to automatically benefit from faster Parquet scans and more efficient lazy query execution, particularly for wide tables or complex nested operations.

Deep Dive

Polars 1.41, released on May 26, 2026, focuses on performance and query optimization for data analysis.
Parquet metadata decoding is significantly faster, achieving up to 3.29x speedup for tables with 10,000 columns (from 117.4 ms to 35.74 ms).
This speedup is due to replacing a generic, auto-generated Thrift decoder with a hand-written, specialized decoder for Parquet's stable metadata schema.
The query optimizer now performs common subplan elimination at every nesting depth, preventing redundant computations in complex LazyFrame queries.
Previously, subplans shared at higher levels in the query plan might still be re-evaluated; this is now fixed by traversing into cache nodes.
LazyFrame.gather() has been introduced, allowing row selection by integer index without first materializing the LazyFrame into a DataFrame.
These improvements are automatic for existing scan_parquet calls and LazyFrame operations.

Decoder

Polars: A high-performance DataFrame library for Rust and Python, designed for speed and efficiency on large datasets.
Parquet: A columnar storage file format optimized for analytics, often used with big data processing frameworks.
Thrift: A lightweight, language-independent software stack for point-to-point RPC, used by Parquet to describe its metadata.
LazyFrame: A Polars data structure that represents a sequence of operations to be performed later, enabling query optimization.
Common subplan elimination: A compiler optimization technique where repeated computations of identical sub-expressions are identified and replaced with a single computation, whose result is then reused.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design aianimationenterprise

Reallusion Pairs 3D Control with ByteDance's Seedance 2.0

Reallusion launched AI Studio, integrating its iClone 3D animation tools with ByteDance's Seedance 2.0, providing filmmakers precise spatial control that text-only AI video generators lack.

The Next Web

Summary

What: Reallusion's AI Studio combines iClone 3D animation with generative AI video models, notably ByteDance's Seedance 2.0, allowing artists to define camera paths, character positions, and motion in 3D before AI handles visual rendering. It's a multi-model platform also supporting Veo 3 and Kling AI.

Why it matters: This hybrid approach addresses a key limitation of pure text-to-video AI – the lack of precise spatial and directorial control – indicating that the future of professional AI video production will likely involve integrating AI into existing 3D workflows rather than replacing them entirely.

Takeaway: If you're a 3D animator or filmmaker using iClone, explore Reallusion AI Studio to integrate generative AI into your workflow while maintaining precise control over scene composition and camera.

Deep Dive

Reallusion's AI Studio provides a hybrid workflow for video creation, combining traditional 3D animation with generative AI.
It integrates Reallusion's iClone 3D animation tool with ByteDance's Seedance 2.0, the current top-ranked AI video model on Artificial Analysis.
The core benefit is offering filmmakers precise spatial control over character motion, camera choreography, and scene layouts, which pure text-prompt AI video generators struggle with.
Artists build 3D scenes in iClone, defining camera paths, character positions, and lighting, which then acts as a "precision control layer" for the AI.
Seedance 2.0 handles visual rendering, textures, and cinematic quality, interpreting the 3D data for more intentional motion dynamics and camera work.
AI Studio is a multi-model platform, also supporting Flux, Nano Banana (image), Kling AI, Veo 3, Wan, LTX, and Scail (video), allowing studios to switch models based on shot requirements.
This approach offers stability for creators, as 3D scene data and assets are stored locally in iClone, mitigating risks associated with single AI platform dependencies (like OpenAI's Sora shutdown).
The company, founded in 1993, extends its existing ecosystem into generative video, recognizing that useful AI creative tools are often hybrid systems augmenting professional workflows.
The platform targets professional filmmakers who prioritize spatial precision and frame-level control over pure AI convenience, betting that this gap won't fully close with language-driven generation alone.

Decoder

iClone: Reallusion's real-time 3D animation software for character animation, scene design, and virtual production.
Seedance 2.0: A generative AI video model developed by ByteDance, known for its spatial intelligence and ability to interpret exact scene layouts from control data.
Generative AI video models: AI systems that can create video content from various inputs, such as text prompts or 3D data.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design aiux

Should I design for humans or machines?

Future UX design systems will need to support both human interpretation and machine-readable specifications, as AI becomes a critical "user" requiring explicit structure and logic.

UXDesign.cc

Summary

What: Design systems are evolving beyond human-centered documentation to accommodate AI and machine "users," necessitating two parallel layers: descriptive guidance for humans and precise, executable, machine-readable specifications defining design decisions and behaviors.

Why it matters: The increasing integration of AI into design and development workflows means that design artifacts and systems must become consumable by algorithms, pushing designers to think about explicit logic and structure in addition to aesthetics and usability for human users.

Takeaway: Consider how your design system documentation could be structured to be machine-readable, potentially by using more explicit component properties, tokens, and logic that AI tools could interpret and generate from.

Original Article

UX and design systems are evolving from being built solely for human interpretation to also serving AI and machine “users,” exposing how traditional, flexible, human-centered documentation often fails when machines require explicit structure, logic, and rules. Future design systems will need two parallel layers: descriptive guidance for humans and machine-readable specifications that define decisions, constraints, and behaviors in precise, executable formats.

Design careerux

Closing the Loop: What to do After a Design Critique Ends

Effective design critiques require crucial follow-up communication, beyond the session itself, to maintain stakeholder engagement and ensure feedback visibly influences design changes.

NNGroup

Summary

What: Design critiques often fail because teams neglect follow-up. Effective follow-up involves an immediate recap of next steps and a deeper communication showing how specific feedback led to design changes, attributing improvements to stakeholder input.

Why it matters: Neglecting to "close the loop" erodes trust and engagement, as stakeholders perceive their contributions as inconsequential, leading to less thoughtful input in future sessions and undermining the value of the critique process itself.

Takeaway: After a design critique, send a concise recap outlining decisions and next steps. For deeper follow-up, explicitly show "before and after" designs, attributing changes to specific feedback, and explain why certain feedback was not acted upon.

Deep Dive

Design critiques are intended to improve design but often fail due to a lack of proper follow-up after the session.
The absence of follow-up makes participants feel their feedback is not valued or acted upon, leading to decreased engagement and less prepared input in future critiques.
There are two key types of follow-up: an immediate post-session recap and a deeper follow-up after the design has evolved.
The immediate recap can be a short message (e.g., Slack) summarizing decisions, items for investigation, and feedback not pursued.
The deeper follow-up builds trust by explicitly connecting feedback to design changes, often through "before and after" visuals with annotations.
Strong follow-up attributes changes to specific feedback, making participants feel like contributors.
It's also important to acknowledge feedback that was not acted upon, providing clear reasons (e.g., out of scope, usability testing results) to build credibility.
Triggers for follow-up include significant design changes, before handing off to development, and when feedback influences a major directional decision.
Skipping follow-up gradually erodes engagement and can make critique sessions feel like a formality over time.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design frontendwebux

Hero Image Best Practices: How to Design High-Impact Hero Sections

High-impact hero images are large, above-the-fold visuals that instantly communicate brand value, set the mood, and guide user action, requiring careful attention to clarity, performance, and accessibility.

Webflow

Summary

What: Hero images are critical large visuals at the top of a webpage, typically spanning full width, featuring headlines and CTAs. They are crucial for creating a first impression, strengthening branding, forging emotional connections, and communicating value without extensive text.

Why it matters: With diminishing user attention spans, effective hero sections serve as a primary conversion mechanism, proving that visual impact and immediate value proposition are paramount in guiding user experience and preventing bounces.

Takeaway: When designing hero sections, prioritize clear visual hierarchy with ample negative space, optimize image/video file sizes, ensure mobile responsiveness, and always include sufficient color contrast and accessibility options like captions for videos.

Deep Dive

Hero images are large visuals at the top of a webpage, typically spanning the full width, containing a headline, supporting text, and a call to action (CTA).
They serve as the first impression for a website, influencing visitor perception of the brand and encouraging further exploration.
Effective hero images enhance branding, create emotional connections, and communicate the value proposition instantly, reducing the need for visitors to read extensive text.
Key best practices include setting the appropriate brand mood through aesthetics and emotion (e.g., calm for telehealth, vibrant for events).
Design for clarity by using layout, contrast, and spacing, particularly negative space, to draw focus to the headline and CTA.
Use motion and video intentionally to demo products or add emotional depth, but be mindful of file size for performance and offer reduced-motion options for accessibility.
Avoid generic stock imagery; instead, use custom photography or brand-specific illustrations to build authenticity and connect visuals to actual offerings.
Optimize file size using modern formats like WebP or AVIF and compression tools to ensure fast load times, which impacts both UX and SEO.
Design for responsiveness, ensuring hero images appear correctly and text remains readable across various devices, especially mobile, given over half of web traffic comes from phones.
Ensure accessibility by providing sufficient color contrast for text and offering captions or transcripts for videos.
Test hero images across multiple devices, browsers, and assistive technologies, and A/B test variations to measure effectiveness with the target audience.

Decoder

Hero image: A large, prominent image, video, or graphic section located at the top of a webpage, typically "above the fold," designed to capture attention and convey key information or branding immediately.
Call to action (CTA): A prompt on a website that encourages users to take a specific desired action, such as "Sign Up," "Learn More," or "Buy Now."
Above the fold: The portion of a webpage that is visible without scrolling, a term originating from newspaper design where important news was placed on the top half of the front page.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design frontendaiweb

Modify Website Elements Directly in the Browser (Website)

Retune allows designers to visually tweak website elements directly in the browser, with an AI agent (like Claude Code) automatically generating the corresponding source code changes.

Retune

Summary

What: Retune is a desktop tool by Sujan Khadgi that enables direct-in-browser visual editing of websites, like adjusting padding or colors. It generates structured diffs that AI coding tools such as Claude Code or Cursor then use to write and commit the changes to source code (Next.js, Vite, Remix, Tailwind, CSS Modules).

Why it matters: This indicates a significant step towards "vibe coding" where designers can intuitively manipulate interfaces, abstracting away the code-writing process to AI agents, potentially accelerating frontend development workflows.

Takeaway: Developers can npm install retune and integrate it with AI coding tools like Claude Code or Cursor to enable visual-first design iteration on web projects.

Deep Dive

Retune is a visual design tool that runs locally on the user's desktop.* It allows users to select any element on a live webpage and adjust its styling (e.g., padding, border-radius, color) directly in the browser.* Changes are previewed instantly without requiring code modifications first.* Instead of writing code, Retune generates a "structured diff" detailing the visual changes.* This structured diff is then sent to an AI coding assistant, such as Claude Code or Cursor (via an MCP client).* The AI agent is responsible for translating these visual changes into actual code (e.g., modifying CSS, Tailwind classes, or component properties) and writing them to the source files.* Retune supports popular frontend frameworks and styling methods including Next.js, Vite, Remix, Tailwind CSS, CSS Modules, and plain CSS.* It automatically detects the styling approach and identifies React components for better context for the AI agent.* The tool integrates via a component added to the layout, which automatically hides in production builds.* Users can review and approve code changes before they are committed by the AI tool, maintaining human oversight.* Sujan Khadgi developed Retune.

Decoder

MCP Client: Stands for "Model-agnostic Communication Protocol" client, a generalized interface allowing different AI coding models (like Claude Code, Cursor) to communicate with development tools like Retune.* Structured diff: A detailed, machine-readable description of changes, not just a plain text difference, providing context like component names, selector paths, and exact before-and-after values for properties.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design aibackendfrontendcareer

How I Rebuilt My Portfolio with Claude Code

Product designer Patrick Morgan rebuilt his entire portfolio in a weekend using Claude Code, demonstrating that thorough preparation and workflow, not just prompting, is key to AI-assisted development.

Unknown Arts

Summary

What: Patrick Morgan migrated his portfolio from Framer to a custom Astro and Tailwind CSS stack on GitHub Pages in a single weekend using Claude Code. Success was attributed to extensive preparation: screenshots, a design brief, a detailed build plan, and a CLAUDE.md file guiding the AI agent.

Why it matters: This case study illustrates how AI coding assistants can drastically accelerate development for non-coders, enabling full ownership of the codebase and integration into a standard Git workflow, suggesting a new paradigm for designers to build production-ready sites.

Takeaway: When using AI for coding, prioritize detailed upfront preparation, including visual references, a clear build plan, and a CLAUDE.md file to guide the AI, over simply trying to write clever prompts.

Deep Dive

Patrick Morgan, a product designer, decided to rebuild his portfolio site using an AI coding assistant.* He switched from Framer, a low-code design tool, to a custom Astro, Tailwind CSS, and shadcn/ui stack deployed on GitHub Pages.* The entire rebuild, including infrastructure setup and content migration, was completed in a single weekend.* Morgan attributes the project's success not to advanced prompting, but to meticulous "mise-en-place" or upfront preparation.* This preparation included curating screenshots of the existing site, writing a detailed design brief, outlining project markdown files, and preparing image assets.* A crucial step was co-creating a comprehensive build plan with Claude, including a tech decision table, file tree, content architecture, schema definitions, and six implementation phases.* He also established a CLAUDE.md file at the repository root, which the AI agent reads at the start of every session to understand project context, architectural standards, naming conventions, and specific instructions.* Claude Code assisted with environment configurations, troubleshooting, content migration, styling, and page assembly.* Morgan's role involved reviewing AI output, enforcing design patterns, making aesthetic decisions, and performing periodic cleanup passes.* The resulting site enables a GitHub-native workflow, where new features follow a loop of GitHub Issues, AI-assisted implementation plans, feature branches, and pull requests.

Decoder

Astro: A modern web framework for building content-focused websites, known for shipping zero JavaScript by default.* Tailwind CSS: A utility-first CSS framework for rapidly building custom designs.* shadcn/ui: A collection of reusable components built with Tailwind CSS, designed for easy theming and integration into projects.* Framer: A design and prototyping tool that also allows for building interactive websites with visual editors.* CLAUDE.md: A plain text file placed at the root of a repository that an AI agent, like Claude Code, reads at the start of a session to understand project context, architectural standards, naming conventions, and specific instructions.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design airesearchsoftware

The Interface is No Longer the Product

Mozilla.ai posits that traditional human interfaces are no longer the product, arguing that future AI-native applications will center on structured data that agents directly read and modify.

Mozilla.ai Blog

Summary

What: Alejandro Gonzalez from Mozilla.ai argues that software categories like spreadsheets and slide decks are "accidents of interface history." He contends that AI agents don't need graphical UIs; they need structured data representations as the source of truth, with human interfaces becoming mere renderings or outputs, not the primary interaction surface.

Why it matters: This perspective suggests a fundamental shift in software architecture, where the underlying structured data model, inspectable by agents, becomes paramount. It implies that current tools built for human interaction will be vulnerable to agent-native designs that abstract away the interface, potentially redefining productivity and software development.

Deep Dive

The article, written by Alejandro Gonzalez for Mozilla.ai, argues that the traditional human interface is no longer the central product in an AI-driven world.* It suggests that AI agents do not require visual interfaces like menus, canvases, or mouse interactions.* Instead, agents need structured data representations that they can directly read, reason about, and modify.* Current software categories (e.g., spreadsheets, slide decks, CRMs) are considered "accidents of interface history," designed around human interaction limitations rather than fundamental requirements.* In agent-native applications, the structured internal representation of work (the "artifact") will be the source of truth, not the human-facing interface.* Traditional interfaces will become "renderers" or "outputs" of this underlying structured data, adapted for human consumption.* The author notes that code has always worked this way, being text with clear semantics that tools can parse and transform without visual rendering, explaining why agents are already capable in this domain.* The "bridge" phase involves agents learning to use existing human-centric apps, but the "destination" is applications rebuilt to be agent-native.* Agent-native apps will feature structured internal representations, renderers for human-friendly views, validators for coherence, diff and approval systems, and import-export capabilities for legacy formats.* The scarce resource will shift from abundant code and interfaces to the structured understanding of the work itself and how it changes.

Decoder

Agent-native applications: Software applications designed from the ground up to primarily interact with and be manipulated by AI agents through structured data, rather than through traditional human-centric graphical user interfaces.

Original Article

Full article content is not available for inline reading.

Read the original article →

Tech careersoftware-engineeringproductivitymanagement

The Best Engineers Write Less Code

The best engineers focus on building only necessary features and reducing complexity, as every line of code adds ongoing costs and becomes a liability if it doesn't solve the right problem.

Maxim Shvets (github.io)

Summary

What: The article argues that software development is expensive due to ongoing costs like maintenance, debugging, and infrastructure. Great engineers create value by understanding what *not* to build and by actively deleting unnecessary code. It emphasizes aligning with stakeholders on "why" a feature is needed and using agile feedback loops to avoid building the wrong things.

Why it matters: This piece provides an editorial insight into the true value proposition of senior engineering roles, shifting the metric from code output to strategic impact, cost reduction, and problem-solving. It also implicitly pushes back against the idea that AI-generated code will diminish the need for critical engineering judgment.

Takeaway: Before starting a new feature, have a clear conversation with stakeholders to ensure alignment on the core problem being solved and its necessity, potentially tweaking requirements for faster (cheaper) implementation. Regularly review existing codebases for opportunities to delete unnecessary complexity.

Original Article

Coding is expensive and time consuming

We are living in an age of mass denial. Some very smart people are telling us that we no longer need to write good software because LLMs can generate code now. But terrible software is still terrible software, whether it was written by Claude Code or by a human who did not know what they were doing. The computers running it are still real. Latency is still real. Complexity is still real. Maintenance is still real. All the old rules still apply. There is going to be an enormous amount of work fixing the slop now being pushed into production. But I digress.

How much does software cost

Good software takes time, and businesses measure everything in it. You see, the time it takes to make something directly impacts how much the thing costs to produce. This is why your manager/director/vp is so interested in how long it will take. Time is literally money. The company is cutting checks twice a month to everyone involved. This is why they are so excited about AI, because it is faster(cheaper).

Be a great employee

The best thing you can do for your stakeholder is avoid building unnecessary things. Every feature carries ongoing costs: maintenance, debugging, infrastructure, support, future migrations, and cognitive overhead. Senior engineers understand this instinctively. Figuring out what should not be built is one of your biggest contributions. On the road to becoming a senior engineer, you will go from “I can do it boss, it will take a couple of days” to “Hey, why are we building this in the first place”. That second question is very very important. Remember kids: code is a liability unless it solves the right problem.

This is not what I meant.

If you take one thing away from this post, it is this. Building the wrong thing for your stakeholder is the worst. You just spent time (which is money), on making something. It is not what they wanted. Now they have to see how much of what you made they can keep. Then more time to do something else. Don’t get me wrong. This kind of thing is inevitable. Even if you make exactly what they wanted two months ago, business needs may change. But there is a mitigation strategy.

Yes, It’s Agile

Forget about ceremonies and pantomime. The important part is balancing production with feedback from stakeholders. If you are asked to make something, first make sure both you and the stakeholder are aligned on why it is needed. Have a little chat. You may even tweak some requirements to make it happen faster (cheaper). Then pick an interval, not too long. Show your progress. Get feedback. This way you won’t be making the wrong thing (spending company money) for too long. Great engineers are not measured by how much code they produce. They are measured by how much value they create relative to the complexity they leave behind.

What should you do with the free time you have? Go delete some code.

Tech aistartupcareer

Avoiding Death on the Yellow Brick Road

A common misconception is that large AI labs will automate all software; however, the real value lies in building specialized, trustworthy AI solutions for vertical industries.

Joe Schmidt IV (X.com)

Summary

What: The article argues against the belief that large AI labs will automate all software development. Instead, it suggests significant opportunities exist in developing "scaffolding" around AI models to make their output trustworthy, compliant, and operational within specific industries, rather than focusing solely on core AI research.

Why it matters: This highlights a growing understanding that horizontal AI tools need vertical integration and specialization to deliver real-world value, indicating a shift from generic AI applications to industry-specific solutions that address complex problems.

Takeaway: Consider specializing in vertical AI applications rather than generic AI tools, focusing on compliance, trust, and operational integration for specific industries.

Original Article

It is possible that the big AI labs will eventually automate software away in every industry. However, the reality is that the world is full of complex, often vertical problems that can't simply be solved with a horizontal tool with access to standard tools and computer use. The value will come from the scaffolding around models that makes their output trustworthy, compliant, and operational inside a specific industry. There are plenty of opportunities in AI outside of the core research lab niche.

Tech startupbusinessaibranding

Meta's Premium Branding Mess

Meta's new "Meta One" premium subscription strategy is a confusing mess, introducing seven different tiers and prices (from $2.99/month to $49.99/month) for social features, AI capabilities, identity protection, and marketing advantages under inconsistent naming.

Spyglass

Summary

What: Meta is rolling out a confusing array of seven premium subscription tiers, ranging from Facebook Plus ($3.99/month) for social features to Meta One Advanced ($49.99/month) for marketing boosts. The "Meta One" brand is used inconsistently for AI features (Meta One Plus, Meta One Premium), identity verification (Meta One Essential), and marketing (Meta One Advanced), with varying prices and benefits that overlap poorly.

Why it matters: This illustrates the challenges large tech companies face in monetizing their vast product portfolios beyond advertising, often resulting in fragmented, complex, and user-unfriendly pricing and branding strategies as they experiment with new revenue streams like AI and premium features.

Original Article

And there I was, ready to make fun of Google's rather ridiculous branding for 'Google AI Ultra'. One tidbit out of Google I/O last week was there are now two tiers for the highest end of Gemini access and usage – but they're both called the same thing. Google easily could have slotted in 'Max' to denote the new $100/month plan, as it would have worked quite nicely with 'Plus', 'Pro', 'Max', and 'Ultra' (thanks in large part to Apple's consumer hardware branding efforts). But no, there are now two 'Ultra' tiers, one at $100/month and one remaining at $200/month, seemingly just to confuse people.

Anyway, that's silly – to the point where Google has already had to clarify the plans. But into the bar walks Meta, going full-on "hold my beer".

It starts out simple enough. Meta, as has long been rumored (and tested – humorously by force in Europe), is getting into the subscription game for their myriad social services. So now worldwide we're getting:

Facebook Plus – $3.99/month
Instagram Plus – $3.99/month
WhatsApp Plus – $2.99/month

While the earlier test versions were focused on fewer (or un-targeted) ads, these seemingly each will have new features and functionality. They're decidedly not for the masses, but instead for power users of each product. They're basically Meta's way to milk the whales. Can you actually milk a whale? Meta is going to find out!

With user growth continuing to slow, and in some markets retract, this is the old tried and true playbook of profit harvesting. It perhaps points to peak ad saturation. Or simply the need for new revenue streams for Meta – which is still beholden to one business and model far more than any of their peers.

Anyway, all of that makes sense and is relatively straightforward. It would probably make sense to be able to bundle such offerings altogether for, say, $9.99/month (about a buck off), but that will undoubtedly be tested too. I mean, they're already using the 'Meta One' moniker... But from here, for some reason, ideas and models keep flowing and going...

Alongside the launch, Meta says it will begin testing even more subscription plans, which is where things start to get confusing. For Meta AI users, it will test two plans — Meta One Plus ($7.99/mo) and Meta One Premium ($19.99/mo) with the same features, but the Premium plan unlocks more capacity on higher compute queries. That means the Premium plan would offer deeper reasoning for complex tasks (i.e., more of “thinking mode” in the Meta AI app or on the web). It would also offer move video and image generation capabilities across Meta’s apps.

Okay, so much for my 'Meta One' $9.99/month plan when 'Meta One Plus' costs $7.99/month. And it seemingly has nothing to do with any of the social 'Plus' plans, but instead is only about AI. But it's also not called 'Muse Plus' – the new branding for Meta's latest AI efforts (RIP 'Llama'). Nor is it called 'Meta AI Plus', which would seemingly be more straightforward (in line with Google's premium naming for AI). 'Meta One Premium' offers a weird jump from $7.99/month to $19.99/month (why not $14.99/month?) and not only buys more compute, but also gets new capabilities across the aforementioned social apps.

But wait, there's more:

The Meta One Essential plan ($14.99/mo) will offer the Verified badge, impersonation protection, and an enhanced linksheet where users can link out to their online presence across social channels and the web, similar to Meta Verified. The more expensive Meta One Advanced plan ($49.99/mo) will include the Essential plan benefits, as well as the ability to be featured in the Facebook feed, appear higher in Facebook and Instagram search results, gain attention with a bold “Follow” button on Reels, and automatically send “follow” invitations to people who engage with your content.

I mean, what? Now I'm legitimately confused. Seemingly the reason why 'Meta One Premium' isn't $14.99/month is because that's the price of 'Meta One Essential'. But that gives you totally different things than the other 'Meta One' plans, namely identity protection. That's great (and, of course, already offered), but maybe call it something else given these other offerings?

'Meta One Advanced' takes the confusion to another level by adding in the ability to be featured in the Facebook feed and Instagram search algos. This is basically the growth-hacky "Shit X Does Plan".

Again, fine. But it's going to be confusing as hell when it's all branded as 'Meta One' alongside the premium AI stuff. Let's just spell out the offerings here in ascending order:

WhatsApp Plus (Social) – $2.99/month
Facebook Plus (Social) – $3.99/month
Instagram Plus (Social) – $3.99/month
Meta One Plus (AI) – $7.99/month
Meta One Essential (Identification) – $14.99/month
Meta One Premium (AI) – $19.99/month
Meta One Advanced (Marketing) – $49.99/month

So that's three social 'Plus' offerings, all collectively under the 'Meta One' idea, but not called that. Plus another 'Plus' that is called ‘Meta One’ but has nothing to do with social, but is about AI. Meanwhile, the other, more expensive AI offering does have some interaction with the social offerings under the 'Premium' moniker. Which is more expensive than the 'Essential' version of the same 'Meta One' brand, because that's only about identity verification. With the last and most expensive 'Meta One' product being all about brands (and I suppose individuals) being able to spam – er, market to – the various Meta social products.

Got all that? I'm not certain I do. We're not quite in Microsoftian territory, but we're close! Maybe Meta should just stick with ads.

AI agentsbackendenterprise

Building self-improving tax agents with Codex

OpenAI details how it used Codex to build self-improving tax agents for Thrive Holdings, allowing the AI to prepare increasingly complex tax returns and learn from production failures.

OpenAI

Summary

What: OpenAI developed a self-improving AI agent using Codex for Thrive Holdings to automate tax return preparation. The system is designed to learn from discrepancies between lab and production behavior, and from "critical failures" like incorrect calculations or missing deductions, enabling it to handle more complex tax scenarios over time.

Why it matters: This case study demonstrates a practical application of autonomous AI agents in a complex, high-stakes domain like tax preparation, highlighting how feedback loops from real-world failures can drive continuous improvement and expand an agent's capabilities beyond initial training.

Takeaway: If building AI agents for critical, real-world tasks, implement robust feedback mechanisms to identify and learn from production failures, enabling continuous self-improvement.

Decoder

Self-improving agent: An AI agent designed to enhance its performance and capabilities over time by learning from its experiences, often through feedback loops from real-world interactions and observed failures.

Original Article

Real-world systems often behave differently in production than they do in the lab. Teams often discover these failures after launch, then spend weeks fixing them. That feedback loop is slow and manual. Today, it is possible to build agents that self-improve. This post looks at how OpenAI used Codex to build this type of agent at Thrive Holdings, resulting in an AI that can prepare increasingly complex tax returns.

AI policywebmedia

YouTube Expands Automatic AI Video Labeling

YouTube will now automatically label videos containing "significant photorealistic AI content," reducing reliance on creator self-disclosure and making these labels more prominent for viewers starting May 2026.

YouTube Blog

Summary

What: Starting May 2026, YouTube is rolling out internal signals to auto-detect and label videos with significant photorealistic AI content if creators fail to disclose it. These labels will be more visible, appearing directly below the video player for long-form videos and as an overlay on Shorts, simplifying the disclosure process previously introduced in 2024.

Why it matters: This shift towards automated detection and more prominent labeling reflects a growing industry-wide effort to combat misinformation and increase transparency around AI-generated media, acknowledging the challenges of relying solely on creator honesty.

Takeaway: Creators publishing on YouTube should be aware of the new automatic AI detection and prominent labeling, ensuring they accurately disclose AI use in their content or risk incorrect automatic labeling.

Decoder

C2PA metadata: Content Authenticity Initiative (CAI) and Coalition for Content Provenance and Authenticity (C2PA) metadata, which provides a tamper-evident history of media content, including information about its creation and edits, such as AI generation.

Original Article

We've heard consistently from our community that they value transparency when it comes to generative AI content. That’s why since 2024, we've been labeling content when creators disclose they've used AI tools.

We've learned in that time about what people find useful when it comes to AI disclosures, and today we're making two updates that we think will make this process much simpler and more intuitive for creators and viewers on YouTube.

More visible, simplified labels

We’re moving the disclosure label for photorealistic and meaningfully AI altered or generated content to a more prominent position.

For Long-form Videos: The label will now appear directly below the video player, above the description.
For Shorts: The label will appear as an overlay on the video itself.

By moving these labels on to the main stage, viewers get the context they need at a glance. This is now the single label format for all photorealistic and meaningfully AI altered or generated content on YouTube.

For content that is unrealistic, animated, or slightly altered, viewers can find this disclosure in the expanded description.

Introducing automatic AI detection

While we still require creators to manually disclose when they use realistic AI, we want to make the process more seamless and reliable. Starting in May 2026, we’re rolling out new internal signals to help identify AI-generated content.

If a creator doesn’t specify whether or not they used AI, but our systems detect significant photorealistic AI use, we will now automatically apply a label.

As this technology continues to improve, creators remain in control. If a creator thinks their content was incorrectly identified as AI-generated, they can update the disclosure status in YouTube Studio. However, disclosures will remain permanent in a handful of cases, including:

Content created using YouTube’s own AI tools, like Veo or Dream Screen.
Content containing C2PA metadata indicating they were fully generative AI.

Our commitment to responsibility

These changes are designed to balance transparency with creator control. It’s important to note that a disclosure label alone does not change how a video is recommended or whether it’s eligible to earn money. In a world where AI is changing what’s possible, our goal is simple: make it as easy as possible for creators and viewers to have the right information.

AI startupcomputer-vision

Former Google and Apple researchers launch Trajectory to enhance AI feedback loops

Former Google and Apple researchers have launched Trajectory, a Palo Alto startup focused on building AI systems that can accurately perceive and interpret the physical world.

CryptoBriefing

Summary

What: A new Palo Alto startup, Trajectory, founded by former Google and Apple researchers, aims to develop AI systems with enhanced capabilities for "seeing and interpreting the physical world," focusing on improving AI feedback loops.

Why it matters: This signals a trend towards more specialized AI ventures focusing on fundamental perceptual capabilities, potentially addressing current AI limitations in understanding real-world contexts and improving grounding for agents.

Original Article

The new Palo Alto-based startup aims to make AI systems that can actually see and interpret the physical world.

AI researchpolicy

Google DeepMind's Hassabis: AGI is 3 to 4 years away

Google DeepMind CEO Demis Hassabis has revised his AGI prediction, now stating it could be achieved as early as 2029-2030, a significant acceleration from his previous 2030-2035 estimate.

Sherwood News

Summary

What: Google DeepMind CEO Demis Hassabis, speaking in May 2026, shortened his prediction for Artificial General Intelligence (AGI) from 2030-2035 to 2029-2030, citing the rapid acceleration in AI agent capabilities. Other predictions vary, with Ilya Sutskever (Safe Superintelligence Inc. CEO) estimating 2030-2045 and Nvidia CEO Jensen Huang claiming AGI has already been achieved.

Why it matters: The constantly shifting timelines from prominent AI leaders reflect the rapid and unpredictable pace of AI development, creating both excitement and uncertainty about its near-term societal impact and the industry's ability to forecast its own progress.

Decoder

Artificial General Intelligence (AGI): A hypothetical type of AI that can understand, learn, and apply intelligence across a wide range of tasks at a human-like level, rather than being specialized for a single task.

Original Article

Google DeepMind’s Hassabis: AGI is 3 to 4 years away

Google DeepMind CEO and Nobel Prize winner Demis Hassabis shortened his prediction for when the era of AGI would be upon us.

Last year, we gathered up the many predictions we were hearing from tech leaders about when we should expect artificial general intelligence to arrive.

There was no shortage of opinions then, and the range of predictions was pretty wide.

Recently, Google DeepMind CEO Demis Hassabis revised his prediction, shortening the time frame significantly. Last June, Hassabis predicted that AGI may be achieved between 2030 and 2035. Last week, the window was narrowed to 2029-30.

During the Google I/O conference, Hassabis said, “When we look back at this time, I think we all realize that we were standing in the foothills of the singularity. It will be a profound moment for humanity.”

After the keynote, Axios’ Ina Fried spoke with Hassabis, and he said that with the acceleration of agents, we may also be faster to achieve AGI: “We can see agents really happening now and imagine what they will be in another year, and how useful they’ll be.” He reportedly predicted we would have AGI by 2029 or 2030.

We also noted a newer AGI prediction from former OpenAI cofounder Ilya Sutskever (now the CEO of Safe Superintelligence Inc.). Way back in 2017, before ChatGPT came out, Sutskever predicted we would’ve already had AGI by 2019 or 2021. But last November, at the dawn of the agentic AI boom, Sutskever made a much wider estimate: as soon as 2030 or up to 2045.

Nvidia CEO Jensen Huang kind of made a new prediction back in March while on Lex Fridman’s podcast. He said we had already achieved AGI, but then heddged a bit. “I think it’s now. I think we’ve achieved AGI.”

We’ll keep this tracker updated. Seen one that we’ve missed? Send them our way: keegan@sherwoodmedia.com!

Anthropic raises $65 billion at a $965 billion valuation, releases a more “honest” Claude Opus 4.8

Anthropic’s monster $965 billion valuation puts it firmly ahead of OpenAI’s $850 billion valuation as the rivals head toward expected IPOs later this year.

Report: Microsoft tries to get back in the AI coding game with new model

Microsoft wants to fight its way back into the AI coding field by releasing a new model next week at its annual Microsoft Build developer conference, The Information reports.

The company is expected to announce a new family of models as Microsoft AI CEO Mustafa Suleyman seeks to shore up the company’s own AI offerings and gradually wean it off OpenAI’s technology over the remainder of their $13 billion partnership.

Microsoft was initially well positioned to meet software developers with AI-enhanced tools. It owns GitHub, the most popular platform for hosting and sharing code, and GitHub’s Copilot AI-powered coding tool was released months before OpenAI’s ChatGPT debuted in 2022.

But it fumbled one of the biggest first-mover advantages in history as Anthropic’s Claude Code, OpenAI’s Codex, and Cursor rolled out coding tools that developers loved.

Microsoft to Release New Coding Model Next Week in Comeback Attempt

Microsoft was initially well positioned to meet software developers with AI-enhanced tools. It owns GitHub, the most popular platform for hosting and sharing code, and GitHub’s Copilot AI-powered coding tool was released months before OpenAI’s ChatGPT debuted in 2022.

But it fumbled one of the biggest first-mover advantages in history as Anthropic’s Claude Code, OpenAI’s Codex, and Cursor rolled out coding tools that developers loved.

Waymo to launch free robotaxi rides in its new Ojai vans

The new vehicles are less expensive — which is important for the service to really scale.

Report: Tesla’s Robotaxi trainers don’t think it’s ready for prime time

If you listen to Tesla CEO Elon Musk, you might think rapid expansion of the company’s Robotaxi service is right around the corner. If you listen to the people tasked with reviewing the footage and training its AI, that future is a long way off.

An in-depth report from Reuters that interviewed nine former “data labelers” and a former Tesla self-driving engineer paints a picture of highly massaged safety stats, vehicles failing to execute basic driving functions, and a behind-the-scenes reality where the supposedly “autonomous” tech relies heavily on the exact kind of localized, labor-intensive mapping and training Musk has publicly mocked. The skepticism runs so deep that one former insider told reporters they wouldn’t ride in a Robotaxi “if you f---ing paid me.”

Currently, the service is operating about 30 unsupervised vehicles across three Texas cities — a much more circumscribed execution than Musk had initially planned. The problem, for Tesla, is that the success of its Robotaxi business is now integral to the company’s value proposition.

Why Tesla’s AI trainers don’t trust its self-driving tech – or its safety stats

An in-depth report from Reuters that interviewed nine former “data labelers” and a former Tesla self-driving engineer paints a picture of highly massaged safety stats, vehicles failing to execute basic driving functions, and a behind-the-scenes reality where the supposedly “autonomous” tech relies heavily on the exact kind of localized, labor-intensive mapping and training Musk has publicly mocked. The skepticism runs so deep that one former insider told reporters they wouldn’t ride in a Robotaxi “if you f---ing paid me.”

Currently, the service is operating about 30 unsupervised vehicles across three Texas cities — a much more circumscribed execution than Musk had initially planned. The problem, for Tesla, is that the success of its Robotaxi business is now integral to the company’s value proposition.

Recent pushes into subscriptions, enterprise, and floating a cloud business suggest it’s serious about new revenue sources.

AI mobilellm

Anthropic to expand Claude Voice Mode to more languages

Anthropic is preparing a significant update for Claude's mobile voice mode, adding support for 18 new languages and a push-to-talk feature, allowing users to switch languages mid-conversation.

TestingCatalog

Summary

What: Anthropic is updating Claude's mobile voice mode, introducing 18 new languages (including German, Portuguese, Chinese, Japanese, Russian, Ukrainian) and a push-to-talk option. Users will be able to ask Claude to change languages on the fly during a conversation. This update also includes a refreshed UI and new voices for most languages, despite still relying on external text-to-speech providers like ElevenLabs.

Why it matters: This expansion helps Claude catch up to competitors like ChatGPT and Gemini, which already offer multilingual voice capabilities, making Anthropic's conversational AI more accessible globally and competitive in the mobile AI assistant market.

Takeaway: If you use Claude's mobile app and require multilingual interaction, anticipate an update adding support for 18 new languages.

Decoder

Full-duplex streaming: A communication system that allows for simultaneous two-way conversation, where both parties can speak and hear each other at the same time, unlike turn-based systems.

Original Article

Full article content is not available for inline reading.

Read the original article →

Data aiagentsopensource

I battletested 5 open source analytics agents

A battletest of five open-source "analytics agents" found that only some are actually designed for analytics, with reliability depending more on robust business context than the agent interface itself.

The New AI Order

Summary

What: The author tested LangChain, Wren AI, nao, LibreChat, and Vercel's AI analytics template, concluding that their functionalities vary widely and many aren't true analytics agents. The reliability of answers correlated with how well business context (via prompts, semantic models, or tooling layers) was integrated, not just the agent's LLM.

Why it matters: This critical review highlights the hype versus reality in the "AI agent" space, especially for analytics, showing that successful deployments still hinge on disciplined data modeling and context provision, rather than relying solely on the LLM's intelligence.

Takeaway: Before adopting an "analytics agent," thoroughly evaluate its core purpose and assess how it integrates and leverages your existing business context and semantic layer for reliable insights.

Deep Dive

The article critically reviews five open-source "analytics agents": LangChain, Wren AI, nao, LibreChat, and Vercel's template.
A primary finding is that many tools marketed as "analytics agents" are not specifically built for robust data analysis.
LangChain: While a framework for building agents, it requires significant custom development to function as an analytics agent.
Wren AI: Designed as a semantic layer, it offers strong analytics capabilities by integrating structured business context.
nao: Focused on data exploration and visualization, it performs well for its intended purpose.
LibreChat: Primarily a chatbot interface for various LLMs, not an analytics agent in itself.
Vercel's AI analytics template: A starter template, useful for demonstrating concepts but needs extensive work for production analytics.
The reliability of an agent's answers is less about the agent's "intelligence" and more about the quality and accessibility of the business context it can access.
This context can be provided through:
Well-crafted prompts for LLMs.
Integration with semantic models or business glossaries.
Structured markdown files or documentation.
The underlying "MCP" (Measurement, Collection, and Processing) or tooling layer.
The article implies that a robust semantic layer (like Wren AI's approach) is critical for consistent and accurate analytics agent performance.
The term "analytics agent" is often loosely applied, requiring users to understand the specific problem each tool is designed to solve.

Decoder

Analytics agent: An AI system (often powered by a Large Language Model) designed to interpret natural language queries, interact with data sources, and provide analytical insights or visualizations.
LangChain: A framework for developing applications powered by large language models, allowing chaining together different components.
Wren AI: An open-source semantic layer platform designed to help LLMs understand and query structured data, often used for analytics.
nao: An open-source analytics agent tool specifically focused on data exploration and visualization.
LibreChat: An open-source self-hosted chatbot client that supports various LLMs, primarily focused on conversational interfaces.
Vercel's AI analytics template: A starter project or example application provided by Vercel demonstrating how to build an AI-powered analytics interface.
Semantic layer: A business-friendly data layer that translates complex database structures into common business terms, allowing users to query data using familiar language and concepts.
MCP/tooling layer: Refers to the underlying Measurement, Collection, and Processing infrastructure and tools used to manage and integrate data that an analytics agent would access.

Original Article

Open-source “analytics agents” are often grouped together, but LangChain, Wren AI, nao, LibreChat, and Vercel's template solve very different problems, and only some are actually built for analytics. Reliable answers depend less on the agent interface and more on where business context lives, whether that's prompts, semantic models, markdown files, or the underlying MCP/tooling layer.

Data aidatabasepostgresql

Scaling AI-Driven Marketing Processes with PostgreSQL

PostgreSQL offers a reliable central data layer for scaling AI-driven marketing workflows by combining relational tables, JSONB, full-text search, and pgvector capabilities.

Cybertec PostgreSQL

Summary

What: Marketing teams can use PostgreSQL to manage AI workflows by using ENUMs for workflow state, combining relational tables with JSONB for flexible data, connecting campaigns and performance data, and leveraging full-text search and pgvector for semantic context.

Why it matters: This shows how established relational databases like PostgreSQL are evolving with extensions and features to meet the demands of modern AI-driven applications, demonstrating its versatility beyond traditional OLTP.

Original Article

Marketing teams can scale AI workflows reliably by using PostgreSQL as their central data layer via workflow state management (using ENUMs), combining relational tables with JSONB for flexibility, connecting campaigns/assets/performance data, and leveraging full-text search and pgvector for semantic context.

Data databasealgorithms

Deconstructing Data Sketches

Data sketches are probabilistic algorithms that estimate expensive metrics like distinct counts from small samples, trading perfect accuracy for significant speed and compute savings.

Lezwon's Substack

Summary

What: Data sketches, such as those that store the lowest K hashed values, provide approximate answers to queries like distinct counts or quantiles without scanning an entire dataset. They are valuable for large-scale dashboards, reports, and distributed data aggregation by reducing computational load.

Why it matters: This shows how fundamental computer science concepts are being applied to manage the scale and cost challenges of modern data systems, where exact answers are often less critical than fast, approximate ones.

Decoder

Data sketch: A probabilistic data structure or algorithm that uses a small amount of memory to summarize a larger dataset, enabling approximate answers to queries (e.g., distinct count, quantiles) much faster than exact methods.
Distinct count: The number of unique items in a set of data.
Quantiles: Values that divide a set of data into equal-sized, contiguous subgroups (e.g., quartiles divide into four, deciles into ten).

Original Article

Data sketches estimate expensive metrics like distinct counts by storing a small probabilistic sample, such as the lowest K hashed values, instead of scanning every row. They trade perfect accuracy for huge speed and compute savings, making them useful for large-scale dashboards, reports, and distributed aggregation.

Data hardwareaigpuvisualization

Visualize the Brrr (Website)

Brrrviz.com offers interactive visualizations to demystify GPU architecture and how it accelerates AI workloads for developers.

Brrrviz

Summary

What: Brrrviz.com is a website providing visual explanations of GPU internals and their role in AI, aiming to make these "hidden engines" less mysterious and costly for developers.

Why it matters: As AI adoption grows, understanding the underlying hardware like GPUs becomes crucial for developers to optimize performance and cost, moving beyond treating them as black boxes.

Takeaway: Explore brrrviz.com to gain a better intuitive understanding of GPU architectures and their computational patterns, which can help in optimizing AI models and hardware utilization.

Original Article

GPUs are the hidden engines driving today's AI revolution, but most developers treat them as mysterious, costly accelerators.

Design mobilemediasocialproduct

Spotify now lets you ‘clip' moments from your favorite podcast

Spotify is rolling out a "Podcast Clips" feature enabling users to easily capture, trim, and share short audio moments from podcasts, making it easier for viral content to spread.

TechCrunch

Summary

What: Spotify launched a new "Podcast Clips" feature, accessible via a scissors icon in the "Now Playing" view, allowing users to trim and share segments of podcasts to social media. This follows strong engagement with the "Chapters" feature, which sees over 2 million saves/playlist additions monthly.

Why it matters: This feature reflects the growing importance of podcasts as a source of breaking news and significant discussions, particularly from tech and AI executives, by making key moments more shareable and discoverable across platforms.

Takeaway: If you consume podcasts on Spotify, look for the new scissors icon to easily share impactful moments, potentially boosting visibility for creators.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design mobileai

iOS 27 to feature upgraded Camera interface and Photos app: Here's what's rumored

iOS 27 is rumored to introduce a more customizable Camera app and new AI-powered photo editing features like Visual Intelligence integration, though some AI features might face delays.

9to5Mac

Summary

What: Rumors suggest iOS 27 will offer a revamped Camera app with customizable controls and direct integration of Visual Intelligence for scanning nutrition labels or business cards. New Photos app editing tools include Extend, Enhance, and Reframe, but their reliability and release schedule are uncertain.

Why it matters: Apple is likely pushing to enhance its native photo and camera experience with more AI capabilities and user customization, aligning with broader industry trends toward intelligent image processing and personalized interfaces.

Original Article

iOS 27 is rumored to bring a more customizable Camera app and new AI-powered Photos features. Users may be able to rearrange camera controls, use Visual Intelligence directly in the app for tasks like scanning nutrition labels or business cards, and access new editing tools such as Extend, Enhance, and Reframe. However, some of these AI features are still unreliable and could be delayed or scaled back before release.

Design opensourceweb

Open-source Design Editor (Website)

OpenPencil is a new open-source design editor that directly opens and maintains fidelity with Figma's proprietary .fig files.

OpenPencil

Summary

What: OpenPencil is an open-source design editor designed to natively open Figma's .fig files, ensuring "round-trip fidelity" (meaning the file can be opened, edited, and saved without losing data or formatting) using a Kiwi binary codec.

Why it matters: This project challenges Figma's dominance by offering an open-source alternative that aims for direct compatibility with their proprietary file format, potentially democratizing design tool access and interoperability.

Takeaway: Designers working with Figma files might want to explore OpenPencil as an open-source alternative or complementary tool that offers direct file compatibility.

Deep Dive

OpenPencil is an open-source design editor.* Its primary feature is the native ability to open .fig files, the proprietary format used by Figma.* The tool uses a "Kiwi binary codec" to ensure "round-trip fidelity."* Round-trip fidelity implies that files can be opened, edited, and saved in OpenPencil and then reopened in Figma without degradation or loss of information.* This project positions itself as an alternative or complementary tool for designers heavily reliant on the Figma ecosystem.

Decoder

Figma .fig files: The proprietary file format used by Figma, a popular web-based interface design and prototyping tool, for storing design projects.* Kiwi binary codec: A specific binary encoding and decoding system used by OpenPencil to accurately read and write the complex data structure of Figma's proprietary .fig files, preserving design details.

Original Article

OpenPencil is an open-source design editor that natively opens Figma .fig files. The tool uses a Kiwi binary codec to ensure round-trip fidelity when working with files.

Design brandingethics

As Violence Breaks Out Over Design Drops, Who's Taking Responsibility?

Luxury brands like Swatch and Burberry are facing violence at product launches due to scarcity-driven marketing, with people getting hurt and police intervening.

Creative Bloq

Summary

What: Violence has occurred at product launches for brands including Audemars Piguet x Swatch Royal Pop, Burberry x Supreme, Vans, and Pop Mart Labubu dolls, leading to arrests and store closures since 2025. This chaos is attributed to deliberate scarcity marketing, as seen with Pop Mart's 185% revenue increase in 2025.

Why it matters: This highlights how successful marketing tactics, particularly those creating artificial scarcity and FOMO, can have severe unintended social consequences like public disorder and physical harm, raising ethical questions for designers and brands about their responsibility beyond profit.

Decoder

FOMO: Fear of Missing Out, a psychological phenomenon where people feel anxiety about missing out on desirable experiences or products.
Product drop: A marketing strategy where a new product is released in a limited quantity at a specific time, often creating high demand and urgency.

Original Article

Full article content is not available for inline reading.

Read the original article →

Tech aiopinioncareer

The Copy and the Guru

AI's best role is not to replace human creativity but to serve as a "muse," sharpening thinking and revealing the evolution of one's own ideas through archival analysis.

Om Malik

Summary

What: Arjun Moorthy and Om Malik suggest that AI excels as a "muse" to refine human thinking and track the development of personal ideas over time. Rather than fully automating content creation, AI helps users find insights from their archives and observe how their own thinking has evolved.

Why it matters: This perspective challenges the common narrative of AI as a replacement for human intellect, instead framing it as a tool for augmentation and self-reflection, particularly valuable for knowledge workers and content creators.

Takeaway: Consider using AI tools as personal analytical aids to review your past work, identify patterns in your thinking, and refine new ideas, rather than solely for generative tasks.

Decoder

Digital twin: In this context, it refers to a digital replica of an individual's intellectual output or online persona, potentially manipulated by AI, rather than a physical object's digital counterpart.

Original Article

The digital twin concept is the ultimate expression of media manipulation becoming the primary currency instead of authenticity.

AI researchdata

Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings

"Epicure" is a new family of three skip-gram ingredient embeddings, trained on 4.14 million multilingual recipes, designed to map food ingredient relationships based on co-occurrence and chemical properties.

arXiv

Summary

What: Researchers Jakub Radzikowski and Josef Chen introduced "Epicure," a set of three skip-gram ingredient embeddings. These models were trained from scratch on a corpus of 4.14 million recipes spanning seven languages and normalized 1,790 canonical ingredient entries using an LLM-augmented pipeline. The three variants (Cooc, Chem, Core) differ in their random-walk schema on ingredient co-occurrence and typed FlavorDB ingredient-compound graphs.

Why it matters: This research exemplifies the application of advanced AI techniques like embeddings to highly specific, real-world domains like culinary science, demonstrating how complex relationships (flavor, chemistry) can be modeled to uncover novel insights or power recommendation systems.

Decoder

Skip-gram: A neural network architecture used in word embedding models (like Word2Vec) that predicts context words given a target word, allowing it to learn vector representations (embeddings) of words based on their co-occurrence in text.* Embeddings: Numerical representations (vectors) of text, images, or other data that capture semantic relationships, allowing machines to process and understand them.* NPMI graph (Normalized Pointwise Mutual Information): A graph where nodes are ingredients and edges represent the statistical association or co-occurrence strength between them, often used in natural language processing to understand word relationships.

Original Article

Epicure is a family of three sibling skip-gram ingredient embeddings retrained from scratch on a multilingual recipe corpus.

Design airesearch

AI in Design Report 2026 (Website)

A new "AI in Design 2026" report aims to comprehensively analyze how AI is reshaping the entire tech design process and designer workflows.

State of AI in Design

Summary

What: The "AI in Design 2026" report is being compiled to document and understand the transformative impact of AI on technology design, specifically focusing on how it affects individual designers' work and team dynamics.

Why it matters: This initiative highlights the growing importance of AI in creative fields, signaling a need for structured analysis to track its evolution and impact on professional roles and industry practices.

Takeaway: Designers interested in the intersection of AI and design might want to follow the "AI in Design 2026" report for insights into industry trends and best practices.

Original Article

AI in Design 2026 aims to capture how AI is transforming tech design across designers' desks and within their teams.

Design aipolicy

Decimal brands CCAI, the coalition advocating for responsible AI use

Decimal Brands designed a grassroots-inspired identity for CCAI, the Coalition for Responsible AI, deliberately avoiding a corporate aesthetic to emphasize human creativity.

The Brand Identity

Summary

What: Decimal Brands developed a flexible, community-focused brand identity for the Coalition for Responsible AI (CCAI), incorporating organic watercolor gradients and a collaborative drawing tool to highlight human creativity, openness, and responsible AI development over a slick corporate image.

Why it matters: This branding approach reflects a broader industry movement to humanize AI and foster trust by emphasizing ethical development and user-driven creation, moving away from potentially intimidating, hyper-futuristic AI imagery.

Decoder

CCAI: Coalition for Responsible AI, an advocacy group focused on promoting ethical and human-centric approaches to AI development and deployment.

Original Article

Decimal created a flexible, grassroots-inspired identity for CCAI that avoids a polished corporate AI aesthetic by combining a community-focused logomark, organic watercolor-style gradients, and a collaborative drawing tool that lets members create their own visuals. The branding emphasizes human creativity, openness, and responsible AI development, framing AI as a tool shaped by people rather than a dominant technological force.

Design sustainabilityurban-planning

Venice Innovation Design 2026: Water as a Design Framework

Venice Innovation Design 2026 focused on water as a design framework for public space, sustainability, and urban care on San Servolo Island from May 23-24.

Designwanted

Summary

What: The seventh Venice Innovation Design (VID) event, held on San Servolo Island May 23-24, 2026, explored water management, public space quality, and environmental fragility. Speakers included Giulio Lo Iacono (ASviS) on SDG 6 and Gaetano Cascini (Politecnico di Milano) on rainwater reuse. The event also highlighted ongoing regeneration projects on the island, including an amphitheater by Mario Cucinella built using 3D printing in 2025.

Why it matters: This reflects a growing trend in design and urban development to integrate environmental challenges, particularly water management and climate resilience, into foundational design frameworks rather than treating them as add-ons, showcasing tangible, localized innovation.

Decoder

SDG 6: Sustainable Development Goal 6, one of 17 goals set by the United Nations, focused on ensuring availability and sustainable management of water and sanitation for all.
Vaporetto: A public waterbus used for transport in Venice.

Original Article

Venice Innovation Design 2026 took place May 23–24 on San Servolo Island.

Design careerportfolio

Lots of creatives have a password-protected part of their portfolio

Many creative professionals use password-protected portfolios for confidential work, and can supplement them with passion projects or updated older work to showcase skills.

It's Nice That

Summary

What: Katie Cadwell, co-founder of Lucky Dip and The NDA Podcast, advises designers on handling confidential client work in portfolios. She suggests using password-protected sections for select studios, discussing work in interviews without leaving digital copies, or listing client names (like Apple) to demonstrate experience. Passion projects and updated older work are encouraged to show unconstrained creativity.

Why it matters: This highlights the persistent challenge for creative professionals in showcasing their best work while respecting NDAs, pushing them to find alternative strategies that balance confidentiality with career progression in a competitive industry.

Takeaway: If you're a designer applying for new roles with confidential client work, consider creating a password-protected section of your portfolio to share directly in interviews, and actively develop passion projects to showcase your skills.

Decoder

NDA: Non-Disclosure Agreement, a legal contract preventing the sharing of confidential information.

Original Article

Confidential work is a common portfolio limitation in design, so it's acceptable to discuss it privately while using passion projects and refreshed older work publicly to showcase your creativity and skills.

Digest devoured!

May 28

Home

The SpaceX IPO and Data Centers in Space

Summary

Deep Dive

Decoder

Original Article

The SpaceX IPO and Data Centers in Space

Starlink and Airlines

SpaceX’s Silly S-1

The Case for Data Centers in Space

An IPO Worth Supporting

US Space Force confirms SpaceX will build sensor-to-shooter targeting network

Summary

Decoder

Original Article

Changing midstream

What Apple and Google are doing to your push notifications

Summary

Deep Dive

Decoder

Original Article

Judgment is the skill that matters most in the AI era

Summary

Deep Dive

Decoder

Original Article

Judgment is the skill that matters most in the AI era

Where judgment comes from

What critical engagement looks like

A legitimate concern

The implication

More Devins in More Places

Summary

Deep Dive

Decoder

Original Article

The Independent Agent Lab

What’s Next

Biohub releases a world model of protein biology

Summary

Deep Dive

Decoder

Original Article

ESMFold2: A faster path from protein biology to binder design

A shared, open scientific ecosystem built on a world model of protein biology

News

Nature: Move over, AlphaFold: open source model predicts shape of 1 billion proteins

Biohub releases a world model of protein biology

Biology’s blind spot

I think Anthropic and OpenAI have found product-market fit

Summary

Deep Dive

Decoder

Original Article

I think Anthropic and OpenAI have found product-market fit

Enterprise customers are now paying API prices

I think they’ve found product-market fit

And they’re ramping up

The AI-failure stories around this are pretty thin

We also know the labs are spending a lot

API revenue is becoming less important

April is a new inflection point

NVIDIA's LocateAnything for Faster Grounding

Summary

Deep Dive

Decoder

Original Article

Overcoming Autoregressive Bottlenecks in VLM Grounding

LocateAnything: Parallel Box Decoding

Box-Aligned Atomic Units

Flexible Inference Modes

On-Demand Inference: Corrected NTP Re-decoding

138M Diverse Language Queries and 785M Boxes

General Object Detection

GUI Element Grounding

Referring Comprehension

Text Localization (OCR)

Layout Grounding

Point-Based Localization

State-of-the-Art Visual Grounding & Detection

High-Quality Multi-Object Detection