Loading digest...
May 13
1 / ?
AI infrastructuregpuserverless

How to achieve truly serverless GPUs

Modal cut GPU inference server boot time 40x by snapshotting CUDA contexts directly into memory, taking replica spin-up from 2000 seconds to 50 seconds across 15 million production deployments.

Summary

What: Modal engineers Charles Frye, Jonathan Belotti, Erik Bernhardsson, and Akshat Bubna built a system combining GPU buffers, lazy filesystem loading via ImageFS, CPU checkpoint/restore with gVisor runsc, and CUDA memory snapshotting to reduce inference server cold starts from tens of minutes to tens of seconds. The system restored 15 million GPU snapshots and 35 million CPU snapshots across February-April 2026, with customers like Reducto seeing 6x improvements (70s to 12s cold starts).
Why it matters: This reveals how serverless economics for AI inference depends on solving GPU Allocation Utilization, not just Model FLOP/s — spiky user-driven demand means fixed allocations commonly achieve only 10-20% utilization, making fast replica scaling the critical bottleneck for profitable inference services at scale.
Takeaway: If you're running inference servers at scale, Modal offers $30/month free credits at modal.com. For DIY implementation, their CUDA checkpoint/restore approach uses recent Nvidia driver features for device memory snapshotting combined with gVisor runsc for host-side state.

Deep Dive

  • Modal reduced GPU inference server replica spin-up from 2000+ seconds to ~50 seconds through four architectural optimizations spanning the entire stack
  • Cloud buffers: Maintain idle, health-checked GPUs ready for immediate scheduling, removing tens of minutes of instance allocation latency from the hot path; GPU failure rates are significant enough to require active boot checks and weekly dcgmi diagnostics
  • Custom filesystem (ImageFS): Built with libfuse to lazily load container images from a content-addressed, multi-tier cache (page cache → SSD → AZ cache → CDN → blob storage); most container files are never read, so lazy loading with metadata-first start cuts boot from minutes to ~100ms
  • Disabled gzip compression (single-threaded DEFLATE bottlenecks at ~100 MB/s), tuned read_ahead_kb to 32 MB, achieving multi-GB/s throughput from cache
  • CPU checkpoint/restore: Using gVisor's runsc runtime (emulates Linux kernel in userspace), they serialize container state to disk and restore 10x faster than cold start; snapshots are host-sensitive (e.g., AWS g6.12xlarge lacks pclmulqdq instruction), requiring multiple snapshots per deployment
  • GPU memory snapshotting: Recent Nvidia drivers checkpoint device memory into host memory, which then gets checkpointed to disk, enabling full device state restoration including CUDA graphs and Torch compiler artifacts
  • Typical speedups are 4-10x for GPU workloads — vLLM boot latency dropped from 95.7s to 13.8s mean, SGLang from 83.7s to 17.5s for 1 GiB models (Qwen 3 0.6B) across tens of thousands of cold starts
  • GPU snapshots require adjustments: weight offloading before checkpoint, KV cache recreation, and currently limited to single-GPU (multi-GPU nccl programs deadlock on pause)
  • Weight loading bottleneck: billions to trillions of bytes at few GB/s = seconds to hundreds of seconds; could be >3x faster with AZ weight server, >10x with RDMA over RoCE/InfiniBand
  • Production usage Feb-April 2026: ~35M CPU snapshot replicas (>5M GPU-hours), ~15M CPU+GPU snapshot replicas (>2M GPU-hours) across hundreds of organizations
  • Reducto case study: Document processing platform (known for indexing Jeffrey Epstein's JMail correspondence) scales to thousands of GPUs on-demand for enterprise jobs with tens-of-minutes deadlines
  • System aggregates capacity from multiple cloud providers for low cost and high availability, requiring snapshot compatibility across heterogeneous instance types
  • GPU Allocation Utilization (GPU-seconds running code ÷ GPU-seconds paid for) is distinct from nvidia-smi metrics and is the critical bottleneck for inference economics — industry average is <70% at peak, often 10-20% in practice
  • Spiky demand from external user behavior creates high peak-to-average ratios that wreck economics with fixed allocations but are profitable with fast auto-scaling

Decoder

  • GPU Allocation Utilization: GPU-seconds spent running application code divided by GPU-seconds paid for — distinct from nvidia-smi "utilization" (fraction of time kernel code runs) and Model FLOP/s Utilization (algorithmic operations divided by arithmetic bandwidth)
  • CRIU (Checkpoint/Restore In Userspace): Linux transparent checkpoint/restore system that serializes running process state (heap, threads, file descriptors) to recreate processes from storage faster than re-executing
  • gVisor/runsc: Google container runtime that emulates Linux kernel in userspace for security isolation, enabling checkpoint/restore without host kernel cooperation via nvproxy for GPU driver communication
  • libfuse: Library for writing Linux filesystems in userspace with kernel intermediation, simpler than kernel modules but with double context switches
  • CUDA graphs: GPU execution graphs made up of pointers to tensors and kernels with no native serialization, expensive to recreate (tens of seconds to minutes) during inference engine startup
  • Content-addressed cache: Cache indexed by file content hash rather than path, enables reuse when same files appear in different container layers or images
  • vLLM/SGLang: Popular LLM inference servers that perform model loading, CUDA graph capture, and Torch compilation during startup

Original Article

Full article content is not available for inline reading.

Read the original article →

AI claude

Fast mode for Claude Opus 4.7

Anthropic launches Claude Opus 4.7 fast mode in research preview for API and seven developer tools, opt-in now but set to become default.

Summary

What: Anthropic released fast mode for Claude Opus 4.7 in research preview. Available in the API, Claude Code, Cursor, Emergent, Factory, v0, Warp, and Windsurf. Currently opt-in via waitlist, will become default eventually.

Original Article

Full article content is not available for inline reading.

Read the original article →

AI llmtokenizationscaling

Compute Optimal Tokenization

Training 1,300 models revealed the industry-standard 20 tokens per parameter scaling law is a BPE tokenizer artifact, not a universal constant.

Summary

What: Tomasz Limisiewicz, Artidoro Pagnoni, Mike Lewis, Luke Zettlemoyer, and team trained nearly 1,300 models to derive compression-aware neural scaling laws. Published May 12, 2026 on arXiv (2605.01188v1), the work challenges the Chinchilla heuristic of 20 tokens per model parameter, showing this ratio is specific to Byte-Pair Encoding tokenizers. They propose scaling training data in bytes rather than tokens, with optimal compression varying by compute budget.
Why it matters: This reveals that fundamental scaling laws used across the industry have been inadvertently biased by tokenization choices. Byte-based scaling could reshape compute allocation for multilingual models, where different languages have vastly different compression rates under the same tokenizer.
Takeaway: If you're planning large-scale pre-training, review the compression-aware scaling framework at co-tok.github.io before allocating compute budgets.

Deep Dive

  • The authors trained 1,300 models across varying model sizes and compression rates (bytes per token) to systematically derive tokenizer-agnostic scaling laws
  • The widely-cited Chinchilla law (20 tokens per parameter) is shown to be an artifact of specific BPE tokenizers, not a fundamental property of neural scaling
  • Training data should scale proportionally to model parameters in bytes, not tokens, to account for variable information density across languages and tokenizers
  • Optimal compression rate is compute-dependent: as FLOP budgets increase, lower compression (more bytes per token) becomes more efficient
  • This framework provides a path to more efficient multilingual foundation models by optimizing tokenization as a dynamic scaling variable rather than a static preprocessing choice
  • Project code and interactive tools available at co-tok.github.io for exploring compression-compute tradeoffs

Decoder

  • Byte-Pair Encoding (BPE): A tokenization algorithm that iteratively merges the most frequent pairs of bytes/characters into single tokens. Used by most LLMs including GPT and LLaMA.
  • Chinchilla scaling laws: Research from DeepMind establishing that optimal LLM training uses roughly 20 tokens per model parameter. Named after their 70B parameter Chinchilla model.
  • Compression rate: In this context, the information density of tokens measured in bytes per token. Higher compression means each token encodes more raw information.
  • FLOP budgets: Total floating-point operations allocated for training. Determines the compute cost/scale of a training run.

Original Article

Compute Optimal Tokenization

Authors: Tomasz Limisiewicz, Artidoro Pagnoni, Srini Iyer, Mike Lewis, Sachin Mehta, Alisa Liu, Margaret Li, Gargi Ghosh, Luke Zettlemoyer

Paper: https://arxiv.org/abs/2605.01188v1

Code: https://co-tok.github.io

TL;DR

WHAT was done? The authors systematically derived compression-aware neural scaling laws by training nearly 1,300 models to determine how information granularity (bytes per token) impacts optimal compute allocation.

WHY it matters? This work proves that the widely accepted heuristic of scaling models by 20 tokens per parameter is an artifact of specific subword tokenizers. Establishing a tokenizer-agnostic scaling law based on bytes provides a robust framework for maximizing compute efficiency across diverse languages and modalities.

Executive summary: For research teams optimizing large-scale pre-training runs, the tokenization scheme is often treated as a static preprocessing step. This paper reframes tokenization as a dynamic scaling variable. By optimizing the "compression rate" (information density), the authors demonstrate that training data should scale proportionally to model parameters in bytes, not tokens. Furthermore, they reveal that the optimal compression rate is compute-dependent, requiring lower compression as FLOP budgets scale up, thus offering a new blueprint for training highly efficient, massively multilingual foundation models.

Details

The Tokenization Artifact Bottleneck in Neural Scaling

Foundation model scaling is largely governed by established scaling laws, most notably the heuristic derived in Training Compute-Optimal Large Language Models (Chinchilla), which posits an optimal ratio of approximately 20 training tokens per model parameter. However, a critical blind spot in this heuristic is its reliance on a fixed tokenization scheme. Expressing data volume strictly in tokens ignores the variable information density that each token represents, essentially binding fundamental scaling behavior to the arbitrary mechanics of Byte-Pair Encoding (BPE) tokenizers. This study isolates the token as a variable to identify the true invariant in scaling behavior, exposing the extent to which popular tokenizers inherently skew compute allocation.

AI llmresearchreinforcement-learning

Reinforcing Recursive Language Models

RL fine-tuning closes the gap: Daniel Kim and Rehaan Ahmad's 4B Qwen matches Claude Sonnet 4.6 on evidence selection, 8x faster.

Summary

What: Daniel Kim and Rehaan Ahmad RL fine-tuned Qwen3.5-4B as a recursive language model (RLM) using a single shared policy for parent and child agents. After cold-start SFT on Qwen3.5-397B traces and GRPO training on 1000 evidence-selection queries, the model approached Claude Sonnet 4.6's 0.607 rubric score while completing queries in 7 seconds versus 60+ for Sonnet.
Why it matters: This demonstrates that task-specific RL fine-tuning can enable small models to match frontier model performance on specialized inference strategies, suggesting a path toward deployable, cost-efficient agentic systems that don't require massive models for production use.
Takeaway: Code, training scripts, and RLM scaffold implementation are available on SkyRL for experimenting with RL fine-tuning your own task-specific RLMs.

Deep Dive

  • Uses GRPO (Group Relative Policy Optimization) where child RLM rollouts inherit the advantage of parent rollouts, enabling single-policy training instead of separate parent/child models
  • Training pipeline requires cold-start SFT on teacher traces from Qwen3.5-397B-A17B before RL because small models cannot bootstrap RLM behavior from prompting alone (0 pass@16 without SFT)
  • Step-wise training treats each turn as a separate sample since the RLM scaffold rewrites the user prompt per turn rather than accumulating chat history, so an N-turn rollout produces N training samples
  • Rubric-based LLM judges for reward assignment proved more robust than F1 span-overlap metrics, which were too noisy when multiple valid answer spans exist for the same question
  • REPL environment exposes built-in functions (list_papers, search, extract_section, get_paper_abstract) and RLM-specific calls (FINAL, rlm_query, rlm_query_batched for parallel sub-agent spawning)
  • Evidence selection task: given a question and up to 10 arXiv papers, return variable-length verbatim text spans that answer the question (RAG unsuitable due to dynamic span requirements)
  • Training objective includes normalization term (1/k_g) when summing child contributions to prevent gradient domination when many children spawned, keeping contributions balanced across RLM tree depth
  • Dataset consists of 1000 synthetically generated queries over paper groups, with a frontier model generating questions and selecting relevant paragraphs as ground truth from semantically similar high-upvoted papers
  • Models see noisy PDF-parsed text at test time (not clean OCR) to mimic production settings where running OCR on every new document is prohibitively expensive
  • Ablation with reduced prompt (200 vs 1500 tokens describing task strategy) converges slightly lower and less stably, suggesting current models still need explicit strategy guidance but future RLM-native models could discover strategies autonomously
  • Training performed on single 8xH200 node with batch size 16 and 8 samples per prompt, supporting up to 512 concurrent rollouts across parent and child RLMs
  • Recursive extension supports arbitrary RLM depth using recursive subtree loss formulation where each node contributes its own loss plus averaged contributions from all children

Decoder

  • RLM (Recursive Language Model): Inference strategy where a language model operates in a programmatic environment (typically a REPL) and can spawn copies of itself as sub-agents to decompose long-context or complex tasks, recursively calling itself with different prompts and contexts
  • REPL (Read-Eval-Print-Loop): In RLM context, a Python environment where the model writes code each turn, the system executes it, and results are returned as the next user message—code becomes the primary interface for inspecting and transforming data, not just another tool
  • GRPO (Group Relative Policy Optimization): RL algorithm that computes advantages for each rollout relative to a group of sampled rollouts (group statistics) rather than using a learned value function baseline
  • Advantage inheritance: Training technique where child RLM rollouts inherit the sequence-level advantage computed for their parent rollout, enabling single-policy training across the entire RLM tree depth without separate reward signals for each child trajectory
  • Step-wise training: Each turn in a multi-turn RLM rollout becomes a separate training sample (necessary when turns don't share prefixes because user prompt is rewritten per turn), with the final turn's advantage broadcast backward to previous turns

Original Article

Full article content is not available for inline reading.

Read the original article →

AI opensourcepython

Cactus Needle (GitHub Repo)

Cactus Compute distilled Gemini 3.1 into a 26M-parameter model that outperforms Qwen-0.6B at function calling and runs 6,000 tokens/sec on MacBooks.

Summary

What: Cactus Needle is a 26M-parameter Simple Attention Network distilled from Gemini 3.1, trained on 16 TPU v6e for 200B tokens over 27 hours. Runs at 6,000 tokens/sec prefill and 1,200 decode, with fully open weights on GitHub. Beats FunctionGemma-270m, Qwen-0.6B, Granite-350m, and LFM2.5-350m on single-shot function calling.
Why it matters: Shows distillation from frontier models into sub-30M specialists is production-ready, enabling a shift from cloud APIs to local-first AI on consumer devices.

Deep Dive

  • 26M-parameter Simple Attention Network distilled from Gemini 3.1 using encoder-decoder architecture (12 encoder, 8 decoder layers)
  • Encoder uses self-attention with GQA and RoPE but no feedforward network; decoder adds cross-attention and tool calling head
  • Pretrained on 200B tokens over 27 hours on 16 TPU v6e, then post-trained on 2B tokens of function call data in 45 minutes
  • Runs at 6,000 tokens/sec prefill and 1,200 decode on Cactus hardware; weights fully open on GitHub
  • Beats FunctionGemma-270m, Qwen-0.6B, Granite-350m, and LFM2.5-350m on single-shot function calling for personal AI
  • Includes web playground UI for testing and finetuning on custom tools at the click of a button
  • Designed for consumer devices (phones, watches, glasses) prioritizing local inference over conversational scope
  • Architecture uses ZCRMSNorm, gated residuals, tied embeddings, d=512, 8 heads/4 KV heads, 8192 BPE vocab
  • CLI supports finetuning on custom JSONL data, data generation via Gemini, evaluation, and TPU management
  • Authors acknowledge small models can be "finicky" and recommend testing on own tools before deployment

Decoder

  • Simple Attention Network: Novel architecture class designed for efficient inference on consumer devices, using encoder-decoder structure without traditional feedforward networks in the encoder
  • GQA (Grouped Query Attention): Memory-efficient attention mechanism that groups multiple query heads to share key-value pairs, reducing KV cache size
  • RoPE (Rotary Position Embeddings): Position encoding technique that applies rotation matrices to queries and keys, enabling better length extrapolation
  • ZCRMSNorm: Zero-centered Root Mean Square normalization, a normalization layer variant

Original Article

Needle

We distilled Gemini 3.1 into a 26m parameter "Simple Attention Network" that you can even finetune locally on your Mac/PC. In production, Needle runs on Cactus at 6000 toks/sec prefill and 1200 decode speed. Weights are fully open on Cactus-Compute/needle, as well as the dataset generation.

d=512, 8H/4KV, BPE=8192
                                  ┌──────────────┐
                                  │  Tool Call   │
                                  └──────┬───────┘
                                        ┌┴──────────┐
                                        │  Softmax  │
                                        └─────┬─────┘
                                        ┌─────┴─────┐
                                        │ Linear (T)│  ← tied
                                        └─────┬─────┘
                                        ┌─────┴─────┐
                                        │ ZCRMSNorm │
                                        └─────┬─────┘
                                     ┌────────┴────────┐
                                     │ Decoder x 8     │
                                     │┌───────────────┐│
                                     ││ ZCRMSNorm     ││
                                     ││ Masked Self   ││
                                     ││ Attn + RoPE   ││
                                     ││ Gated Residual││
                                     │├───────────────┤│
  ┌──────────────┐                   ││ ZCRMSNorm     ││
  │ Encoder x 12 │──────────────────────▶Cross Attn   ││
  │              │                   ││ Gated Residual││
  │ ┌──────────┐ │                   │└───────────────┘│
  │ │ZCRMSNorm │ │                   └────────┬────────┘
  │ │Self Attn │ │                      ┌─────┴─────┐
  │ │ GQA+RoPE │ │                      │ Embedding │  ← shared
  │ │Gated Res │ │                      └─────┬─────┘
  │ │          │ │                    ┌───────┴───────-┐
  │ │ (no FFN) │ │                    │[EOS]<tool_call>│
  │ └──────────┘ │                    │ + answer       │
  │              │                    └───────────────-┘
  └──────┬───────┘
         │
    ┌────┴──────┐
    │ Embedding │
    └────┬──────┘
         │
    ┌────┴──────┐
    │   Text    │
    │  query    │
    └───────────┘
  • Pretrained on 16 TPU v6e for 200B tokens (27hrs).
  • Post-trained on 2B tokens of single-shot function call dataset (45mins).

Needle is an experimental run for Simple Attention Networks, geared at redefining tiny AI for consumer devies (phones, watches, glasses...). So while it beats FunctionGemma-270m, Qwen-0.6B, Graninte-350m, LFM2.5-350m on single-shot function call for personal AI, Those model are have more scope/capacity and excel in conversational settings. Also, small models can be finicky. Please use the UI in the next section to test on your own tools, and finetune accordingly, at the click of a button.

Quickstart

git clone https://github.com/cactus-compute/needle.git
cd needle && source ./setup
needle playground

Opens a web UI at http://127.0.0.1:7860 where you can test and finetune on your own tools. Weights are auto-downloaded.

Usage (Python)

from needle import SimpleAttentionNetwork, load_checkpoint, generate, get_tokenizer

params, config = load_checkpoint("checkpoints/needle.pkl")
model = SimpleAttentionNetwork(config)
tokenizer = get_tokenizer()

result = generate(
    model, params, tokenizer,
    query="What's the weather in San Francisco?",
    tools='[{"name":"get_weather","parameters":{"location":"string"}}]',
    stream=False,
)
print(result)
# [{"name":"get_weather","arguments":{"location":"San Francisco"}}]

Finetuning

# Playground (generates data via Gemini, trains, evaluates, bundles result)
needle playground

# CLI (auto-downloads weights if not local)
needle finetune data.jsonl

CLI

needle playground                  Test and finetune via web UI
needle finetune <data.jsonl>       Finetune on your own data
needle run --query "..." --tools   Single inference
needle train                       Full training run
needle pretrain                    Pretrain on PleIAs/SYNTH
needle eval --checkpoint <path>    Evaluate a checkpoint
needle tokenize                    Tokenize dataset
needle generate-data               Synthesize training data via Gemini
needle tpu <action>                TPU management (see docs/tpu.md)
@misc{ndubuaku2026needle,
  title={Needle},
  author={Henry Ndubuaku, Jakub Mroz,  Karen Mosoyan, Roman Shemet, Parkirat Sandhu, Satyajit Kumar, Noah Cylich, Justin H. Lee},
  year={2026},
  url={https://github.com/cactus-compute/needle}
}
AI agentsopenaiautomation

Building Self-Repairing Agent Loops

OpenAI demonstrates agent loops that automatically fix stale code examples through structured review, repair, and validation cycles until tests pass.

Summary

What: A cookbook tutorial showing a three-phase agent workflow for repairing outdated Jupyter notebooks: review (inspect and return structured findings), repair (apply focused edits using findings and validation feedback), validate (run checks and report remaining issues). Uses OpenAI's Codex CLI in headless mode with JSON schemas for each phase. Three example notebooks iterate 1-3 times with models gpt-5.4-mini and gpt-5.5 until validation passes. Includes business rules defining 'good' (modern API patterns like client.responses.create, qdrant.query_points, current Evals workflows).
Why it matters: This pattern makes agent outputs reliable by closing the loop: validation failures become structured input for the next repair iteration rather than a dead end. The notebook repair is the example, but the architecture applies wherever output can be measured with trustworthy feedback (unit tests, schema validation, policy checks, simulations, human approval).
Takeaway: The full working notebook with companion data is at developers.openai.com/cookbook/examples/codex/build_iterative_repair_loops_with_codex. The three-phase pattern (structured review → focused repair → measured validation → repeat) is reusable for any domain where you can define correctness and measure progress toward it.

Deep Dive

  • Three-phase loop architecture: review inspects without editing and returns structured findings; repair applies focused changes using findings plus latest validation feedback; validate runs checks and reports what still needs work
  • Uses Codex CLI (@openai/codex npm package version 0.130.0) in headless mode so repair steps run from Python cells instead of interactive chat UI
  • Structured outputs with JSON schemas at each phase: findings schema (artifact, issue_type, severity, description, suggested_fix_direction), fix schema (changes_made, unresolved_items, updated_artifact_path), validation schema (overall_passed, cases array, remaining_delta)
  • Business rules define what good looks like: current API patterns (client.responses.create not chat.completions.create, modern tools schema not legacy function-calling, qdrant.query_points not qdrant.search, current Evals API not oaieval CLI), runnable local setup, preserved teaching goals
  • Three example notebooks with increasing repair depth: one-pass (modernize Qdrant query path), two-pass (modernize Evals flow then remove result-log brittleness using validation feedback), three-pass (modernize model/API shape, tighten runnable setup, restore full retrieval teaching flow)
  • Validation executes each repaired notebook end-to-end; failures become structured feedback for next iteration rather than terminal errors
  • Models: defaults to gpt-5.4-mini for speed (configurable via REPAIR_MODEL env var), uses gpt-5.5 for COOKBOOK_CHAT_MODEL, supports REPAIR_REASONING_EFFORT setting
  • Separation of concerns keeps each phase focused: review doesn't edit files, repair doesn't validate, validate doesn't prescribe fixes
  • Working Python code loads sample notebooks, extracts metadata (target_iteration, repair_depth), runs review/repair/validate cycle tracking iterations until validation passes or limit reached
  • Setup: npm install -g @openai/codex, set OPENAI_API_KEY, download companion data/docs folder with pre-repair sample notebooks
  • Pattern is domain-agnostic: substitute notebook execution with your validation method (unit test suite, schema validator, policy checker, simulation harness, approval workflow)
  • Structured handoffs between phases make the loop debuggable, rerunnable, and adaptable to other artifact types beyond notebooks

Decoder

  • Codex CLI: OpenAI's agent orchestration tool (npm package @openai/codex), distinct from the deprecated Codex code-generation model. Runs agent workflows programmatically with structured JSON schemas for input/output rather than free-form chat, enabling headless automation.

Original Article

Full article content is not available for inline reading.

Read the original article →

AI researchworld-models

AI for the Real World: A conversation with Yann LeCun

Yann LeCun argues LLMs will never reach human-level intelligence because language represents only a tiny fraction of human understanding.

Summary

What: LeCun states that today's LLMs, while commercially valuable, cannot achieve AGI through text prediction alone. Future AI systems must use world models that learn abstract representations of physics, causality, and consequences to enable planning and reasoning in real-world environments like robotics, healthcare, and industrial systems.
Why it matters: Signals a major research direction shift from scaling language models to embodied AI and physics-based reasoning, challenging the prevailing bet that more data and compute on text will lead to AGI.

Decoder

  • World models: AI systems that learn abstract representations of how the physical world works (physics, causality, cause-and-effect) rather than just predicting text sequences. Enables AI to simulate outcomes and plan actions in real environments.

Original Article

Full article content is not available for inline reading.

Read the original article →

AI llmmobileandroid

Gemini Intelligence Comes to Android

Google's Gemini Intelligence brings autonomous task agents to Android that shop across apps, browse the web, and create widgets from natural language.

Summary

What: Google announced Gemini Intelligence at Android Show: I/O Edition on May 12, expanding agentic capabilities first introduced at the Samsung Galaxy S26 launch earlier this year (ordering food, booking rides). New features include cross-app task automation (copying grocery lists from notes to shopping carts with confirmation before checkout), auto-browse for booking appointments, Gemini in Chrome arriving late June for page summarization, Personal Intelligence form autofill (opt-in), Rambler dictation in Gboard that transcribes and formats speech by removing filler words, and natural language widget creation (example: 'Suggest three high-protein meal prep recipes every week'). Features roll out to Samsung Galaxy S26 and Google Pixel devices in summer 2026, expanding to other Android devices later in 2026.
Why it matters: Google is betting that the next phase of mobile AI is autonomous task execution embedded throughout the OS, not chatbots. This positions Android to compete with Apple Intelligence by making AI a background automation layer rather than a foreground assistant.

Deep Dive

  • Google unveiled Gemini Intelligence at Android Show: I/O Edition with autonomous AI features that perform multi-step tasks across Android apps
  • Cross-app automation example: press power button, describe task like "copy my grocery list and add items to my shopping cart," Gemini completes it with confirmation before checkout
  • Auto-browse feature (experimentally introduced in January) now rolling out to Android for booking appointments and completing web-based tasks
  • Gemini in Chrome coming to Android in late June for page summarization and Q&A, matching desktop functionality
  • Personal Intelligence form autofill learns user details and fills forms automatically (opt-in, can be disabled in settings)
  • Rambler feature in Gboard uses multimodal AI for dictation that transcribes speech, removes filler words, and formats text
  • Natural language widget creation lets users build custom Android widgets by describing them (e.g., "Suggest three high-protein meal prep recipes every week")
  • Widget creation similar to tool released by hardware startup Nothing last year
  • All features follow Material 3 design language and launch first on Samsung Galaxy S26 and Google Pixel devices in summer 2026
  • Broader Android device rollout scheduled for later in 2026

Decoder

  • Agentic AI: AI systems that autonomously complete multi-step tasks across applications without constant user guidance, acting as agents on the user's behalf (e.g., booking appointments, managing shopping workflows).
  • Vibe-coding: Creating software, UI components, or widgets by describing them in natural language rather than writing code, named for specifying the desired "vibe" or characteristics you want.

Original Article

Google announced a number of new Gemini Intelligence-branded AI features at its Android Show: I/O Edition event on Tuesday. These include the ability for AI to complete tasks across apps, browse the web, fill out forms, dictate speech, and even allow you to vibe-code your own Android widgets.

Gemini gets more powerful

The company had already introduced some agentic capabilities, such as ordering food or booking a ride, to Gemini at the Samsung Galaxy S26 launch earlier this year. There, Google announced that Gemini would soon be able to perform more complex tasks, like booking a front-row bike for a spin class, finding a class syllabus in Gmail, and then searching for books related to that topic.

Now Google's AI assistant will be able to handle a multistep process, like copying a grocery list from your notes app, then adding items to the cart in your shopping app. To use this feature, you'll press the phone's power button and describe the task. Meanwhile, the content on the phone's screen acts as the context for the assistant. Google noted that Gemini will wait for your final confirmation to complete the checkout.

In addition, a feature first introduced in January had allowed Gemini to browse the web for you and complete tasks like booking an appointment, as part of an experimental rollout. Today, Google said this auto-browse feature is making its way to Android, too.

In late June, Android devices will also get Gemini in Chrome, an AI feature that will help users summarize content or ask questions about what is on the web page, similar to how Gemini in Chrome works on the desktop.

Another small but useful addition is that Gemini will be able to fill out forms on your behalf after learning details about you through Personal Intelligence. (Google said this feature is opt-in, and you can turn it off via settings anytime.)

Plus, Gemini will come to Android's Gboard keyboard. Google is using Gemini's multimodal capabilities by introducing a feature called Rambler in Gboard, which is similar to those found in other AI-powered dictation apps. The feature will let you speak in your own tone, transcribe the speech, and format it by removing filler words.

Vibe-code your own Android widgets

Vibe-coding apps are picking up pace, and Google wants to give Android users a taste of this, too.

The company is introducing a way for users to build Android widgets by describing them in natural language. For example, users can build a meal-planning widget using query text like, "Suggest three high-protein meal prep recipes every week."

The idea of creating a widget is not novel to Gemini. Notably, the hardware startup Nothing also released a similar tool last year.

Google said that Gemini Intelligence will follow the company's Material 3 expressive design language in its features.

The company said that these AI-powered features will first make their way to the latest Samsung Galaxy and Google Pixel devices this summer and will be available across other Android devices later this year.

AI opensourcelegalagents

Claude for Legal (GitHub Repo)

Anthropic released Claude for Legal on GitHub with 12 practice-area plugins, 100+ workflow agents, and a security-reviewed marketplace for community legal skills.

Summary

What: Anthropic open-sourced Claude for Legal, a repository with 12 practice-area plugins (commercial, corporate, employment, privacy, product, regulatory, AI governance, IP, litigation, plus legal clinic and law school modules) containing 100+ named workflow agents. Includes MCP connectors for Thomson Reuters CoCounsel, Ironclad, DocuSign, iManage, Everlaw, and CourtListener. Runs as Claude Cowork/Code plugins or deploys via Managed Agents API. Apache 2.0 licensed.
Why it matters: This shows Anthropic moving beyond general-purpose assistants into vertical AI stacks with domain-specific guardrails (mandatory source attribution, conservative defaults on privilege, explicit review gates before filing). The legal-builder-hub creates a trust layer for community skills with security scans, allowlists, and license gates, suggesting a model for safe AI skill marketplaces in regulated domains.
Takeaway: Study https://github.com/anthropics/claude-for-legal as a reference architecture for vertical AI applications with domain-specific guardrails and MCP connector patterns.

Deep Dive

  • 12 practice-area plugins each include a cold-start interview that learns your playbook and writes a practice profile (CLAUDE.md) that every skill in that plugin reads from
  • Named workflow agents include Vendor Agreement Reviewer, DSAR Responder, Claim Chart Builder, Termination Reviewer, NDA Triager, Board Consent Drafter, and 100+ others
  • MCP connectors wire Claude to legal systems: Thomson Reuters CoCounsel (Westlaw Deep Research), Ironclad (contract lifecycle management), DocuSign, iManage (document management), Everlaw (e-discovery), CourtListener (federal dockets)
  • Dual deployment: install as interactive Claude Cowork/Code plugin OR deploy via Managed Agents API for scheduled/headless workflows (renewal watcher, docket watcher, regulatory feed monitor)
  • Built-in guardrails reflect attorney-review requirement: source attribution on every citation, conservative defaults on privilege and subjective legal calls, jurisdiction assumptions surfaced, explicit gates before anything is filed or sent
  • Microsoft 365 integration: contract review skills output Word tracked changes preserving styles and numbering, diligence skills output Excel workbooks with citation columns
  • legal-builder-hub provides a trust layer for community skills: injection detection, hidden-content scans, source allowlists, license gates (personal/firm/commercial), freshness tracking for bundled regulatory content, install audit logs
  • Scheduled agents run on cron cadence: renewal watcher scans contract registers for cancel-by deadlines, docket watcher monitors court filings, reg-change monitor polls regulatory feeds and writes Monday digests
  • All skills are markdown files with YAML frontmatter, no build step — customize by editing practice profiles or forking skills directly
  • Example workflows: tabular M&A diligence (one row per document, every cell cited to source), element-by-element patent claim charts, privacy triage (PIA vs mandatory DPIA vs proceed), AI impact assessments across regulatory regimes

Decoder

  • DSAR: Data Subject Access Request — GDPR right for individuals to request a copy of their personal data a company holds
  • PIA: Privacy Impact Assessment — internal review of how a project affects user privacy
  • DPIA: Data Protection Impact Assessment — mandatory GDPR assessment required for high-risk data processing activities
  • DPA: Data Processing Agreement — contract between data controller and processor defining how personal data is handled under GDPR
  • FTO: Freedom To Operate — patent clearance analysis to determine if launching a product would infringe existing patents
  • VDR: Virtual Data Room — secure online repository for sharing confidential documents during M&A due diligence
  • MCP: Model Context Protocol — Anthropic's standard for connecting Claude to external data sources and tools

Original Article

Full article content is not available for inline reading.

Read the original article →

AI researchimage-generationmultimodal

Qwen-Image-2.0 Technical Report

Qwen-Image-2.0 generates slides, posters, and comics from 1K-token prompts with accurate multilingual typography and photorealistic rendering in a unified model.

Summary

What: The Qwen team released Qwen-Image-2.0, a multimodal diffusion transformer coupling Qwen3-VL as a condition encoder to handle prompts up to 1,000 tokens. The model generates text-rich content (slides, posters, infographics, comics) with improved multilingual typography, photorealism, and instruction following, outperforming prior Qwen-Image models in human evaluations.
Why it matters: This signals image generation models evolving from simple synthesis to full-stack visual content creation, handling typography and layout tasks traditionally requiring Photoshop or Canva.

Deep Dive

  • Qwen-Image-2.0 unifies high-fidelity image generation and precise editing in a single framework, addressing previous model limitations
  • Architecture couples Qwen3-VL (vision-language model) as condition encoder with a Multimodal Diffusion Transformer for joint condition-target modeling
  • Supports instruction prompts up to 1,000 tokens, enabling complex compositional requirements beyond what previous models handled
  • Generates text-rich content including presentation slides, marketing posters, data infographics, and comic panels with accurate embedded text rendering
  • Significantly improves multilingual text fidelity and typography across different scripts and languages, a persistent challenge in image generation
  • Enhances photorealistic generation with richer details, more realistic textures, and coherent lighting compared to prior Qwen-Image models
  • Training uses large-scale data curation and a customized multi-stage pipeline to balance multimodal understanding with flexible generation and editing capabilities
  • Addresses key limitations: ultra-long text rendering, multilingual typography, high-resolution photorealism, robust instruction following, and efficient deployment
  • Human evaluations show substantial improvements over previous Qwen-Image models in both generation and editing tasks across diverse scenarios
  • Handles compositionally complex scenarios better than existing models, particularly in text-rich and multilingual contexts
  • 57 co-authors from the Qwen team contributed to the technical report
  • Submitted to arXiv on May 11, 2026 as cs.CV (Computer Vision and Pattern Recognition)

Decoder

  • Diffusion Transformer: Neural architecture combining diffusion models (which generate images by iteratively denoising random noise) with transformer attention mechanisms for better quality and control
  • Condition encoder: Component that processes input instructions or reference images into embeddings that guide the generation process
  • Qwen3-VL: Qwen team's vision-language model that understands both images and text, used here to interpret user prompts and reference images

Original Article

Qwen-Image-2.0 Technical Report

We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing models still struggle with ultra-long text rendering, multilingual typography, high-resolution photorealism, robust instruction following, and efficient deployment, especially in text-rich and compositionally complex scenarios. Qwen-Image-2.0 addresses these challenges by coupling Qwen3-VL as the condition encoder with a Multimodal Diffusion Transformer for joint condition-target modeling, supported by large-scale data curation and a customized multi-stage training pipeline. This enables strong multimodal understanding while preserving flexible generation and editing capabilities. The model supports instructions of up to 1K tokens for generating text-rich content such as slides, posters, infographics, and comics, while significantly improving multilingual text fidelity and typography. It also enhances photorealistic generation with richer details, more realistic textures, and coherent lighting, and follows complex prompts more reliably across diverse styles. Extensive human evaluations show that Qwen-Image-2.0 substantially outperforms previous Qwen-Image models in both generation and editing, marking a step toward more general, reliable, and practical image generation foundation models.

Tech aimobileandroidgemini

Android is getting a big AI overhaul in 2026

Google bypasses Android 17 for its AI features, shipping app automation, Gemini widgets, and Auto Browse through Play Services and app updates instead.

Summary

What: Google announced Android AI features ahead of I/O 2026: app automation expanding beyond DoorDash/Uber, Chrome Auto Browse for Android 12+ in late June, AI-generated widgets ("Create My Widget"), Rambler voice cleanup in Gboard, and Android Auto video playback in select vehicles. Most ship via Play Services, not Android 17.
Why it matters: Shipping features through Play Services instead of OS updates signals Google's shift away from traditional Android release cycles, prioritizing feature velocity over OS cohesion and reducing dependence on OEM update schedules. This fragments the platform definition but extends new capabilities to older devices (Android 12+) without waiting years for manufacturer updates.
Takeaway: Test your web app with Chrome Auto Browse when it reaches Android 12+ in late June to ensure form flows and navigation patterns work with AI automation.

Deep Dive

  • Google announcing major Android AI features ahead of I/O 2026, most shipping through Play Services rather than Android 17 OS release
  • App automation expanding beyond initial DoorDash/Uber test (launched earlier 2026), now handling multi-step cross-app tasks like extracting course books from Gmail syllabus and adding to shopping cart
  • Chrome Auto Browse (launched on desktop months ago) coming to Android 12+ in late June, allows AI to navigate mobile web pages for multi-step tasks
  • Autofill with Google getting Gemini integration to handle more complex data like license plates beyond standard name/address fields
  • "Create My Widget" lets users generate custom widgets via prompts (e.g., meal plan schedule, event countdown with weather data), fully Material themed
  • Rambler voice input feature in Gboard cleans up spoken text, removing "ums" and "uhs" while preserving context and tone
  • Android Auto getting adaptive display support for non-standard screen shapes, Material 3 Expressive themes, Immersive Navigation, and home screen widgets
  • Video playback coming to Android Auto (parked only, YouTube initially) in select vehicles from BMW, Ford, Genesis, Hyundai, Kia, Mercedes-Benz, Renault, Škoda, Volvo, and others
  • Android 17 (June launch) features mostly limited to camera improvements for flagship devices, Instagram integration with Ultra HDR and Night Mode, enhanced lost device security requiring PIN plus biometric, session-only precise location access with new indicator, and Pause Point 10-second cooldown for distracting apps
  • New 3D emoji design rolling out to Pixel devices this summer, other Android 17 devices later this year
  • Most new Android features bypassing OS updates entirely, shipping via Play Services and app updates to reach broader device base including Android 12+

Decoder

  • Auto Browse: Chrome feature where Gemini AI autonomously navigates web pages to complete multi-step tasks, operating visibly or in background until requiring user authorization for sensitive actions
  • Immersive Navigation: Android Auto feature announced earlier in 2026 that integrates vehicle camera feeds with Google Maps to provide enhanced lane-level turn guidance
  • Create My Widget: Gemini-powered Android feature allowing users to generate custom home screen widgets via text prompts, combining data sources like weather, countdowns, and meal plans in Material-themed layouts
  • Play Services: Google's proprietary background service layer for Android that enables feature updates independent of OS version, allowing new capabilities to reach devices without waiting for manufacturer OS updates

Original Article

Google's I/O conference is next week, and we expect to hear a lot about the company's AI endeavors. The company says there's so much to talk about that it's spilling the Android beans a little early, and yes, a lot of AI is involved. In the coming months, Google will roll out more smartphone AI features under the Gemini Intelligence banner, bringing more automation and customization to your phone.

App automation will be a major element of Android going forward, Google says. Automation for apps is expanding after Google began testing it earlier in 2026 with DoorDash and Uber on Pixel and Samsung phones. It was a very frustrating experience at launch, but Google says it has spent the intervening months fine-tuning the system.

Google promises that Android will be able to handle more complex automations across apps. For example, the robot could find a course syllabus in Gmail and then hop to a shopping app to add the necessary books to your shopping cart. Google also suggests taking a picture of a travel brochure and telling Gemini to book something similar in the Expedia app.

This could theoretically reduce busy work, but that's only true if it works and your task takes advantage of the right apps. Android won't just automate any old app on your phone. The automation will only work in select apps, mostly limited to food and grocery ordering and ride-hailing. For everything else, there's Chrome.

The Gemini-powered Auto Browse feature that debuted on desktop Chrome several months ago will launch on Android toward the end of June for all Android 12 and higher devices. This feature uses powerful cloud-based Gemini models to parse webpages and handle multi-step tasks for you.

We were not overly impressed with the speed or accuracy of Auto Browse in desktop Chrome, but simpler, mobile-optimized pages might be a little more usable for the robot. Like the desktop version, you'll be able to watch the AI navigate the web for you or let it do its thing in the background until your authorization is needed for something sensitive.

Similarly, the Autofill system in Android is getting an AI automation upgrade. Google says Autofill with Google will soon plug into Gemini's Personal Intelligence, allowing it to fill in more information when you encounter an online form. It can still handle your name, address, and other established personal details, but it may also be able to add data like your car's license plate. Google says the feature is opt-in, so you can keep the traditional autofill experience.

Gemini is all ears

Gemini Intelligence is also powering some new convenience features in Android, including those rumored AI-generated widgets. Google calls that "Create My Widget," but don't expect miracles here. These widgets appear mostly to be about displaying data from your account or around the web.

For example, Google says you might want a widget that recommends meal plans on a set schedule or sets a countdown to an important event. You can do that with Create My Widget. You can even mix and match the kinds of included data. Android will offer suggested recipes for new AI widgets, but you can also simply enter a prompt describing what you're looking for. Perhaps you want to see a countdown widget with specific weather metrics—that should be possible with Gemini-powered widgets. No matter how you make them, the widgets are fully Material themed and resizable.

Google is also bringing more AI to voice input with a feature called Rambler, which is integrated with Gboard. Plenty of people already use AI to polish text in emails or other documents before sending them, and this is essentially the same thing for voice input. With Rambler, you can just start talking—or rambling, if you will—and the AI will get the gist of what you say. All the "ums" and "uhs" will go away, and the final result will essentially become a summary of what you said.

The company claims that Rambler will understand the context and nuance of what you're saying, so the end product still sounds like you. There will be a prominent indicator when Rambler is enabled, and Google promises that no audio or text will be retained.

Android in cars

Plugging your Android phone into a car that supports Android Auto will be different soon, too. For starters, Google says Android Auto will now adapt to varying display sizes and shapes. So even if your car has a weird polygon screen, Android should fill it completely.

What you see on that screen will be different, too. Android Auto is getting a makeover with greater support for Material 3 Expressive themes and a new navigation experience. Yep, Immersive Navigation, which Google announced earlier this year, is almost ready to actually roll out to users. Accessing data from other apps in the car will be easier as well due to the addition of widgets. Google says there will be widgets for contacts, weather, and select third-party apps. For cars with Google built in, the vehicle's cameras will plug into Maps to provide more accurate lane guidance. Gemini will also be able to answer questions about the vehicle's status, including warning lights and cargo capacity.

Android Auto media apps have hardly evolved over the years, but 2026 will bring some notable changes. Popular apps like YouTube Music and Spotify are getting design overhauls that make them easier to use in the car. Video playback is also coming to Android Auto for the first time. Naturally, this will only work when you're parked and using a supported app like YouTube.

Google says Android Auto will switch seamlessly to audio-only mode when you start driving, but this requires buy-in from automakers for safety and technical reasons. Video will only be available in supported cars from BMW, Ford, Genesis, Hyundai, Kia, Mahindra, Mercedes-Benz, Renault, Škoda, Tata, and Volvo. More vehicles may come later.

What about Android 17?

Google has announced all these new Android features while barely mentioning Android 17, which is slated to launch in June. Almost everything new in Android will arrive via Play Services, app updates, or on specific devices (like Pixel and Samsung Galaxy) through partnerships.

There are a few tidbits for Android 17 itself. Google says flagship Android 17 devices will see some changes to the camera experience, including better video quality in social media apps like Instagram and "screen reaction" overlays in video. There will also be native support for Ultra HDR, native stabilization, and Night Mode in the Instagram Edits app on Android 17 devices.

On the security front, Google will enhance lost device features to require both a PIN and biometric unlock to better prevent bad actors from using your device. This will disable quick settings and block new Wi-Fi and Bluetooth connections. Android 17 will also get a new option for location access, allowing you to share your precise location with an app only for the current session. A new location indicator, similar to the ones for the camera and microphone, will make it clear when an app is accessing your location.

There's also Pause Point, which lets you add a 10-second cooldown timer to apps you've labeled as distracting. This will be bundled into the existing Digital Wellbeing suite.

Lastly, Google has redesigned its emoji yet again. No, the blobs aren't coming back. Emoji now have a more detailed 3D appearance, but you'll see them first on Pixel devices over the summer. Other Android 17 devices will have to wait until "later this year." Most device makers craft their own emoji, so you may never see Google's new smileys outside of apps like YouTube and Gmail.

Tech opensourcedatabaseredisnosql

Redis and the Cost of Ambition

Redis Inc's 2024 licensing betrayal and decade of feature bloat spawned Valkey, a fork that wins by ignoring AI hype and optimizing what made 2011 Redis indispensable.

Summary

What: Charles Leifer argues Redis lost its identity chasing features (search, streaming, time-series, graph, AI context engine) while abandoning what made it successful: simplicity, speed, clean protocol. Redis Inc dropped BSD for AGPLv3 in 2024, took over antirez's trademark, and positioned as an AI company. Valkey emerged as a performance-focused fork improving multi-threading, memory efficiency, and cluster reliability instead of adding features.
Why it matters: This reveals a common open source failure mode where successful tools chase feature parity with specialized competitors (ElasticSearch, Kafka, InfluxDB) instead of doubling down on original strengths. Users adopted Redis precisely because they didn't want separate systems for caching, queues, and counters. The "astronaut mode" development (building without real use cases, like Disque) predictably leads to abandonment.

Deep Dive

  • Redis succeeded in 2011 as a "memcached but way better" remote dictionary server with tasteful design: single-threaded for atomic operations, event-driven with non-blocking I/O, simple RESP protocol codeable in an hour, well-chosen primitives (lists, hashes, sets, sorted-sets) that covered 80% of web app needs
  • Over the next decade Redis chased every database trend: JSON documents (vs MongoDB), full-text search (vs ElasticSearch), graph databases, event streaming (vs Kafka), strong consistency (vs etcd), time-series (vs InfluxDB), and now vector storage for AI
  • The author identifies this as ignoring two realities: (1) the simplicity and orthogonality made Redis indispensable, (2) anyone serious about search/streaming/strong-consistency wants the real thing, not a half-baked Redis module inheriting all of Redis's limitations
  • Redis Inc (originally Garantia Data, a generic NoSQL hosting service) signed antirez to legitimize themselves, took over trademark rights, then dropped BSD licensing in 2024 for AGPLv3. When this blew up they offered tri-licensing but damage was done
  • RESP3 protocol breaks fundamental RESP2 assumption of request/reply, adds client-side caching (Redis the cache now needs caching), represents classic second-system failure per Brooks
  • Disque (2015) exemplified the problem: antirez built it in "astronaut mode" without a real use case, admitting it was response to what he saw people doing rather than his own need. Predicted outcome: abandonment. Disque sat at 8K GitHub stars but became abandonware immediately, later rewritten as Redis module also abandoned 7-8 years ago
  • Aphyr's Redis-Raft analysis found 21 issues in initial build including unavailability, eight crashes, stale reads, lost updates, infinite loops, corrupt responses. "Essentially unusable" - illustrates the disconnect between Redis as cache/data-structure server vs. Redis as strongly-consistent distributed database
  • Valkey's existence and rapid adoption is the market's verdict. Instead of chasing features, Valkey invested in unglamorous work: multi-threaded performance, memory efficiency, cluster reliability and throughput. Performance benchmarks target the 80% who want 2011 Redis features
  • Current Redis positioning (2026): "The Real-Time Context Engine for AI Apps" with "Try Redis for Free" and "Get a Demo" buttons signals full enterprise sales transformation
  • Recent antirez PR adding array type to Redis (despite having hashes, lists, streams, JSON arrays, time-series, sorted-sets) shows continued feature accumulation while project is "in a bit of a crisis"
  • Author clarifies this isn't criticism of antirez's talent/taste but of ambition disconnected from what made Redis successful: developer ambition to solve complex problems, ambition to be everything to everyone, landlord ambition to extract maximum revenue before AWS/GCP finish them off

Decoder

  • RESP/RESP2/RESP3: REdis Serialization Protocol, the wire protocol Redis clients use to communicate with the server. RESP2 was simple request/reply. RESP3 added complexity like push messages and client-side caching support.
  • Valkey: Open source fork of Redis created after Redis Inc's 2024 licensing change, now developed under Linux Foundation. Focuses on performance improvements rather than new features.
  • antirez: Salvatore Sanfilippo, original creator of Redis who worked from 2009-2020. Known for elegant system design and transparent communication.
  • astronaut mode: Building software without a concrete use case, driven by what seems intellectually interesting rather than solving real problems. Term implies over-engineering.
  • Disque: Abandoned distributed message broker antirez created in 2015, later rewritten as Redis module also abandoned. Example of feature built without real need.

Original Article

Redis and the Cost of Ambition

And they said, Go to, let us build us a city and a tower, whose top may reach unto heaven; and let us make us a name, lest we be scattered abroad upon the face of the whole earth.

I recently skimmed antirez's patch for adding an array type to Redis. The patch itself is not particularly noteworthy except as an example of how AI-assisted tooling can augment the abilities of a talented and tasteful systems engineer. What got me thinking was antirez' prose in the top of the pull-request, explaining the rationale for an array type:

Hashes give you random lookups, but you have to store an index as a key, and have no range visibility. Lists give you appending and trimming, but what is in the middle remains hard to access. Streams give you append-only events, which is another (useful, indeed) beast.

He could have also mentioned Redis' other array-ish faculties like JSON which has arrays natively, time-series, and sorted-sets, which can be made to behave like an array in some situations. Here we are in 2026, Redis is in a bit of a crisis, and yet is sitting with a massive PR to add a new array type. What is going on?

Let us make us a name

Redis has been through a lot over the last decade, driven partly by enterprise DBaaS dynamics, partly by second-system effects:

  • Licensing: Redis Inc waged a scorched-earth campaign against its users by dropping BSD in 2024. When this blew up in its face, a strategic retreat was called and a parley offered: tri-licensing choose your own adventure with AGPLv3 as the lone OSI option (AGPL allows Redis Inc to claim being open source, but it's very different than BSD). The VC-backed company behind Redis is an interesting story in itself. Originally named Garantia Data, they were basically another NoSQL cloud hosting service. They got into Redis hosting, started calling themselves Redis Something-or-other, and eventually signed antirez to legitimize themselves. After a couple years they took over the trademark rights, which set the stage for the rugpull later. This post and the comment replies aged about well as you might expect.
  • Bloat and lock-in: Redis began with a handful of useful data-types. Over time the feature-set has grown (and grown) to include exotic data-structures, complex stateful systems (streams), semi-proprietary-ish modules (depending on what version you run). I was amazed when I pulled up Redis' landing page today and read that their positioning in 2026 is The Real-Time Context Engine for AI Apps. Additionally notable are the "Try Redis for Free" and "Get a Demo" buttons in this screenshot. I'm not sure which is more surprising - "for Free" or the enterprise sales-coded "demo" CTA.
  • The "How many times do we have to tell you we are a web-scale database" dynamic. This is exemplified by the story around Sentinel, Cluster, Redis-Raft, and enterprise features like active-active geo-distribution®, Redis Flex®, Redis-on-Flash®, and whatever else.
  • Protocol: RESP3 has a lot of sharp edges and breaks the fundamental assumption in RESP2 of request/reply. The new protocol is in my opinion a classic second system failure mode straight out of Brooks.
  • First-class client-side caching support. In a kind of reductio-ad-absurdum, Redis (lately the cache), now needs a new protocol to support caching on the client side as well.

What happened to dear old Redis, I wondered. And the more I thought about it, a satisfying explanation started to coalesce which explains all the above phenomena. To me, the picture that emerges is that of a solution that lost its identity through ambition.

The noble Brutus
Hath told you Caesar was ambitious.
If it were so, it was a grievous fault,
And grievously hath Caesar answered it.

Vernal delight and joy

I would put it some time around 2011...that time of excitement when so many new ideas were coming in vogue among web and web-adjacent developer circles. When NoSQL was blasting off into its hype cycle, web scale was not used ironically, and the Bigtable and Dynamo papers were still widely read and discussed. Alongside this were Ruby on Rails, elegance (a desirable property for CRUD apps), web 2.0, REST and JSON. Redis perfectly captured the zeitgeist, and found its way into everyone's stack practically overnight. A capture from late 2011 has Redis describing itself as an advanced key-value store and data structure server. Notably absent is the word database.

Prior to Redis, memcached was that one indispensable piece of infrastructure running quietly on most web servers. In every deployment I've ever seen Memcached typically handles a variety of ad-hoc usages in addition to just caching, like locks, counters, rate-limits, stuff like that. So when Redis landed, the story at that time was something like "memcached but way better". Redis' name, Remote Dictionary Server, emphasized that it was a fast in-memory dictionary that could be used by all your services. In addition to blobs of bytes, Redis operated on a handful of tastefully-chosen data-structures (linked-list, hash-table, set, sorted-set), which vastly expanded the kinds of ad-hoc use-cases such a service could provide.

I want to call attention to some of the specific design considerations that I think Redis nailed perfectly, and which were instrumental in its initial success:

  • Protocol: The beauty of Redis' wire protocol is that it is simple enough that it can be understood and coded in an hour, while being expressive enough that it can represent a number of rich data-types. Building a client library is simple and clean, and the protocol felt right. Anecdotally, my most popular blog post is a write your own Redis tutorial written nearly 10 years ago, which walks through building the protocol and a simple server.
  • Single-threaded, event-driven, in-memory: these go together because they combine in a really purposeful way. By being single-threaded, all operations are guaranteed atomic, full-stop. This eliminates a huge class of complexity and makes Redis easy to reason about. In order to make single-threaded work, the server needed to be implemented using non-blocking I/O. Operations on the data itself needed to be fast as hell. Put it together and you have a fast key/value store that can handle tons of clients and do it all from a single thread.
  • Data structures: the primitives were chosen well and were suited to a web application's most common needs. Cache? Just use strings and an expiry. Queue? Use a list. Structured data? Use a hash. Locks, counters, rate-limiting, liveness, monitoring, leaderboards, whatever - all easy with the builtin data types.

Adoption grew incredibly quickly and the project was, deservingly, a huge success. At some point along the way, the ambitions of the project changed. Redis took up the mantle of being a database:

Ambition

With diadem and sceptre high advanced,
The lower still I fall; only supreme
In misery; such joy ambition finds.

Some features have been genuinely useful additions, such as BZPOPMIN added in 5.0, which allows a blocking-pop to be performed on a sorted-set (very nice when using sorted-sets as a priority queue). Others struck me as being extremely un-Redis-y, like ACLs. But mostly, there seems to be a desire to make Redis be everything for everyone. The addition of these features closely tracks the "latest cool thing" developers are talking about on HN over the last decade:

The problem with this mindset are two-fold. First, it ignores the factors that made Redis an indispensable part of everyone's stack in the first place. Redis was simple, the commands were orthogonal and tightly scoped, the protocol was clean, and it was conceptually coherent. Second, it ignores the fact that anyone who is serious about integrating full-text search / event stream processing / strong-consistency kv / time-series / vector storage is going to want the real thing, not some half-baked Redis module that inherits all of Redis' restrictions. Because, at the end of the day, the HA story on Redis is complicated. The persistence story is nuanced and there are important tradeoffs. The protocol pain and client fragmentation is a real hurdle. Redis does not aim to replace Postgres in your stack, and I would argue that ElasticSearch / RabbitMQ / etc / etc are similarly foundational pillars of any system.

Here is a quote from Aphyr's analysis of the initial development build of Redis-Raft:

...we found twenty-one issues, including long-lasting unavailability in healthy clusters, eight crashes, three cases of stale reads, one case of aborted reads, five bugs resulting in the loss of committed updates, one infinite loop, and two cases where logically corrupt responses could be sent to clients. The first version we tested (1b3fbf6) was essentially unusable...

Redis the cache and data-structure server is a fundamentally different proposition from "Redis the etcd" or any of the other databases named above. This is the disconnect.

He heard the sound of the trumpet, and took not warning

When antirez announced Disque back in 2015, I wrote a short piece explaining why I won't use it. My reasoning hinged on this comment by antirez:

Disque was designed a bit in astronaut mode, not triggered by an actual use case of mine, but more in response to what I was seeing people doing with Redis as a message queue and with other message queues.

I read that admission as being predictive of one outcome: abandonment. Projects developed in "astronaut mode", as personal challenges, as learning exercises are wonderful. Without a solid use-case, though, will the maintainer retain interest and focus in order to solve the long-tail of hard problems that crop up as soon as people start using it? While also maintaining Redis? HA message delivery is legitimately difficult to solve well, and whatever side of the CAP theorem you optimize for, you will be forced to make tradeoffs and solve some difficult problems.

Furthermore, I believed nobody would adopt. There were many mature message brokers in 2015. People used Redis as a message broker because they were already using Redis and it was good enough and simple. The need wasn't for a new message broker, nor was the need for Redis to become a more complex message broker. The project misread why people use Redis as a broker in the first place. People use Redis as a message broker specifically because they don't want to use something else.

I believe my predictions held true - Disque became abandonware almost as soon as it was announced, despite sitting at 8K stars on GitHub. Some time later it was rewritten as a Redis module, but that project is also sitting abandoned for the last 7 or 8 years.

I want to be clear that none of this discussion should be taken as overt or implied criticism of antirez. I have enormous respect for his talent and his taste. The main force I see at work in the development of Redis is, as I mentioned in the beginning, ambition. The ambition of a developer to solve complex problems, the ambition to be everything to everyone, the ambition of Redis' landlords to find a way to extract maximum revenue before AWS and GCP finish them off for good. There is nothing inherently wrong with ambition. The problem is when the ambition leads you to lose sight of what made you successful in the first place.

Valkey's existence and adoption is the wider market's final verdict on this dynamic. Rather than chase features and bullet points, Valkey has invested in the un-glamorous work of improving multi-threaded performance, memory efficiency, cluster reliability and throughput. Valkey's performance benchmarks are impressive and aimed squarely at the 80% of Redis users who just want the same features Redis shipped with back in 2011. There's no need for a new array type in Valkey's world.

Tech aiagentsenterpriseworkplace

Amazon employees are “tokenmaxxing” due to pressure to use AI tools

Amazon employees are automating busywork with MeshClaw AI agents just to hit token consumption targets tracked on internal leaderboards.

Summary

What: Amazon deployed MeshClaw internally, an AI agent that can deploy code, triage emails, and control Slack. The company set targets for 80%+ of developers to use AI weekly and tracks token consumption on leaderboards. Despite saying metrics won't affect performance reviews, employees report managers monitor the data, leading some to automate unnecessary tasks to boost their stats. Amazon expects to spend $200 billion on capex this year, mostly for AI infrastructure.
Why it matters: This reveals how AI adoption metrics create perverse incentives. Companies spending billions on AI infrastructure pressure employees to justify the investment through usage targets, but tracking drives gaming behavior rather than productive use. Meta employees are doing the same thing.
Takeaway: If you're building internal AI tools, avoid gamifiable usage metrics. Track outcomes like time saved, errors prevented, or features shipped, not token consumption or usage frequency.

Deep Dive

  • Amazon's MeshClaw is an internal AI agent platform inspired by OpenClaw (which went viral in February 2026)
  • The tool allows employees to create agents that connect to workplace software and act autonomously
  • Capabilities include initiating code deployments, triaging emails, monitoring deployments during meetings, and interacting with Slack
  • Amazon introduced targets requiring more than 80% of developers to use AI tools each week
  • The company tracks AI token consumption on internal leaderboards, recently restricted to show only to individual employees and their managers
  • Some employees are "tokenmaxxing" by using MeshClaw to automate unnecessary tasks purely to increase their token usage statistics
  • Amazon claims token statistics won't be used in performance evaluations, but multiple employees say managers are monitoring the data
  • More than 30 Amazon employees worked on building MeshClaw
  • Internal docs describe the bot as dreaming overnight to consolidate learning and working autonomously
  • Multiple employees raised security concerns about granting AI agents permission to act on their behalf, citing risks of errors or unintended actions
  • Meta employees have engaged in similar "tokenmaxxing" behavior on their internal leaderboards
  • Amazon's 2026 capex is expected to reach $200 billion, with the vast majority going to AI and data center infrastructure

Decoder

  • Token: Units of data processed by AI language models, roughly equivalent to 3-4 characters of text. Used to measure and bill API usage.
  • Tokenmaxxing: Gaming AI usage metrics by generating unnecessary AI activity to increase token consumption statistics, similar to how gamers "max out" stats.
  • MeshClaw: Amazon's internal AI agent platform that allows employees to create autonomous agents connected to workplace tools.
  • OpenClaw: Open-source AI agent framework that went viral in February 2026, allows running agents locally on personal hardware.

Original Article

Amazon employees are using an internal AI tool to automate non-essential tasks in a bid to show managers they are using the technology more frequently.

The Seattle-based group has started to widely deploy its in-house "MeshClaw" product in recent weeks, allowing employees to create AI agents that can connect to workplace software and carry out tasks on a user's behalf, according to three people familiar with the matter.

Some employees said colleagues were using the software to automate additional, unnecessary AI activity to increase their consumption of tokens—units of data processed by models.

They said the move reflected pressure to adopt the technology after Amazon introduced targets for more than 80 percent of developers to use AI each week, and earlier this year began tracking AI token consumption on internal leader boards.

"There is just so much pressure to use these tools," one Amazon employee told the FT. "Some people are just using MeshClaw to maximize their token usage."

Amazon has told employees that the AI token statistics would not be used in performance evaluations. But several staff members said they believed managers were monitoring the data.

"Managers are looking at it," said another current employee. "When they track usage it creates perverse incentives and some people are very competitive about it."

Silicon Valley groups are pushing to increase adoption of generative AI tools, as companies seek to demonstrate returns on vast spending commitments to AI infrastructure and embed the technology more deeply into day-to-day work.

Amazon this year is expected to spend $200 billion in capital expenditure, the vast majority of which will go toward AI and data center infrastructure.

The e-commerce group had posted team-wide statistics on AI usage by its staff, but recently limited access so that only employees themselves and managers can view their stats. Managers are discouraged from using token use to measure performance, according to a person familiar with the matter.

Meta employees have similarly engaged in so-called "tokenmaxxing" to improve their standing on internal leader boards.

The MeshClaw tool that some employees have used to increase their statistics was inspired by OpenClaw, which became a viral sensation in February. OpenClaw allows users to run agents locally on their own hardware, including computers and laptops.

Amazon's MeshClaw can initiate code deployments, triage emails, and interact with apps such as Slack, according to people familiar with the matter.

The company said in a statement that the tool enabled "thousands of Amazonians to automate repetitive tasks each day" and was one example of the group "empowering teams" to experiment and adopt AI tools.

"We're committed to the safe, secure, and responsible development and deployment of generative AI for our customers," it added.

More than three dozen Amazon employees worked on the in-house tool, according to internal documents. One recent memo describing the bot said: "It dreams overnight to consolidate what it learned, monitors your deployments while you're in meetings, and triages your email before you wake up."

Multiple Amazon employees said they were concerned about the security risks of an AI tool that was granted permission to act on a user's behalf. This risks situations where the agent may make errors or undertake unintended actions.

"The default security posture terrifies me," one employee said. "I'm not about to let it go off and just do its own thing."

Tech aillmddd

What Is Code?

Code's value shifts from machine instructions to conceptual models as LLMs generate syntax but risk accumulating 'cognitive debt' without understanding.

Summary

What: Unmesh Joshi, Distinguished Engineer at Thoughtworks, argues on martinfowler.com (May 12, 2026) that LLMs commoditize code generation but make code's role as conceptual model more critical. Introduces 'cognitive debt' as the gap when teams use LLM-generated vocabulary faster than they build understanding. References Domain-Driven Design's bounded contexts and ubiquitous language, TDD, and formal specs like TLA+ as tools for discovering vocabulary.
Why it matters: Signals that as LLMs lower the cost of generating syntax, the industry must shift focus to harder problems: discovering domain vocabulary, building conceptual models through iteration with domain experts, and using programming languages as thinking tools. Suggests practices like TDD, DDD, and close collaboration become more important in the LLM era, not less.

Deep Dive

  • Code serves two purposes: machine instructions (being commoditized by LLMs) and conceptual models for human understanding (increasingly valuable)
  • Coding is fundamentally vocabulary building—mapping domain concepts (retail, banking) to technical constructs (web, infrastructure) using programming language features
  • Frameworks like Spring are codified vocabularies for technical domains, but business domains require locally-discovered abstractions within bounded contexts
  • Domain-Driven Design's bounded contexts define where particular vocabularies are valid; ubiquitous language emerges from developer-domain expert collaboration
  • Programming languages (Go's channels, Rust's ownership, Java's OOP) are thinking tools that shape design discovery through their constraints
  • LLMs work best with clear vocabulary because training teaches them relationships between names, APIs, and patterns—vague prompts force guessing
  • Cognitive debt accumulates when LLMs generate plausible code with vocabulary faster than teams build conceptual understanding
  • Writing code actively (not passively reviewing) is necessary for thinking deeply about design—generation without engagement loses this benefit
  • Well-designed foundational code with clear abstractions acts as harness and context that constrains LLMs and makes output reliable
  • Good abstractions enable DSLs and natural-language interfaces (example: PlantUML) that LLMs can map to reliably with executable behavior as guardrails
  • TDD helps discover right vocabulary iteratively by forcing continuous feedback between model and behavior
  • Formal specifications like TLA+ can clarify thinking when natural language is vague and code is too verbose
  • The future of coding is building better conceptual models, vocabularies, and foundations for both humans and LLMs to work on

Decoder

  • Cognitive debt: The gap that accumulates when LLMs generate code with technical vocabulary (controllers, repositories, factories) faster than developers build conceptual understanding of what those structures mean and why they exist.
  • TLA+: Temporal Logic of Actions Plus, a formal specification language for designing and verifying concurrent and distributed systems with precise mathematical constraints.

Original Article

What Is Code?

What is code? At a high level, the answer to this question seems obvious. Code is what developers write: instructions expressed in a programming language that tells machines what to do. For years, writing code meant typing it out, word by word. Progress is measured by how efficiently code can be produced, compiled, tested and deployed. With modern LLMs we no longer need to type every word to produce code. Large amounts of executable code can now be generated from high-level descriptions. This forces a deeper question: If producing code becomes cheaper, what remains valuable about code?

Two Aspects of Code

Code has always served two distinct but intertwined purposes.

First, code is a set of instructions to a machine. It directs computation, moves data, interacts with storage, and coordinates execution. In the era of LLMs, this is the part being commoditized.

Second, code is a conceptual model of the problem domain. This is the "design" aspect. A well-designed codebase does not only contain instructions for the machine; it also contains concepts for humans and tools to reason with.

The activity we call coding is where these two aspects meet. We are shaping the concepts, names, boundaries, and relationships through which the system is understood.

Conceptual Models and Vocabulary

Making the conceptual model explicit is the deeper aspect of coding, driven by the domain and the use cases the system is meant to address. Every domain comes with established processes, practices, and more importantly a shared vocabulary. That vocabulary is where the conceptual model becomes visible.

Vocabulary is usually understood as the set of words used in a particular language or subject. I am able to write this article because I know the vocabulary of English. The reader can read it because they share that vocabulary with me.

But to understand this article, knowing English alone is not enough. This article is about software development. Software development is a broad, mature field with its own technical vocabulary. When I use a word like abstraction, I am not merely using an English word. I am referring to a specific software development concept, with its own meaning, history, and implications. A reader unfamiliar with software development may understand the word at the surface level, but miss its deeper meaning in this context. The mature areas with their own established vocabulary are called domains.

This is true of all serious domains. Communication depends on shared vocabulary. Whether we are communicating with a person, a framework, or an LLM, the words we use must map to concepts that the receiver can understand and act upon.

Vocabulary in Code

A well designed codebase is a representation of a certain vocabulary. Where does this vocabulary in code come from? This is where the unique nature of software development truly shines. Software development works on the intersection of various domains. At one end we have the domains such as banking, finance, retail, inventory, healthcare, insurance etc. On the other we have domains like web, infrastructure, AI, data engineering etc.

Someone doing web development needs to have a strong grasp of web architecture, the semantics of web methods, the universal caching potential of GET, and the implications of those semantics. Someone who does not know that will not architect complex systems well. The same is true in other domains. Vocabulary is not just a collection of labels. It carries meaning, constraints, and design consequences.

Consider a retail domain. When we write code for retail we talk about customers, products, orders, shipments, payments etc. When we are doing web development for the retail domain, the code contains concepts that map retail domain to web domain. E.g. Catalog is a resource and we can use GET, POST, PUT, DELETE HTTP methods to perform operations on it. Someone writing code needs to be familiar with both the vocabularies.

Coding for a domain is fundamentally an act of translation. The developer maps the domain vocabulary onto the vocabulary of technical domains. In doing so, a new vocabulary is also built using the constructs provided by a programming language. There are concepts like logs, repositories, quorums, transactions, and specific concepts like money. Concepts become types, relationships become interfaces, rules become invariants, and workflows become compositions.

The precise names of variables, the boundaries of methods, and the hierarchy of classes are discovered step by step. The right abstraction often is not obvious upfront; it reveals itself only as you continually mold and refactor the code against real-world constraints. When used well, the code slowly becomes a readable, highly specific representation of the domain itself.

For the technical domains we typically find frameworks and libraries which provide the base implementation patterns. Frameworks and libraries are codified vocabularies. They capture the most common patterns of usage. That is why ecosystems such as the Spring Framework exist for building enterprise applications involving the web, integration, and related concerns. Different programming languages bring their own flavor, along with specific design constraints that get reflected and codified in their frameworks and libraries.

Bounded Contexts and Local Vocabularies

Frameworks work well when a domain has stable, recurring structures with broadly shared semantics. But something like "online retail" or "stock exchange" is different. Those are not just technical stacks. The main reason there is no universal high-level framework is that the vocabulary is not stable enough across all instances of the domain. The attempt to find universal abstractions become either too generic to be useful, or too opinionated to be widely applicable. The closer you get to the core business model, the more the abstractions must be discovered locally. This is why the idea of a bounded context in Domain-Driven Design is so important. A bounded context marks the boundary within which a particular vocabulary and model are valid. The same word may mean different things in different contexts, and each context needs its own abstractions, rules, and language.

How do we build these local abstractions and vocabulary? A lot of this vocabulary is built through iterative sessions where we write code and reflect on it. Techniques like TDD are excellent for this iterative development of the vocabulary. They help us discover the right names, the right abstractions, and the right boundaries by forcing continuous feedback between the model and its behavior.

Coding can not happen in isolation. There must be close collaboration between domain experts, users and developers. This collaboration is necessary to build these local abstractions and vocabulary.

This connects directly to the lessons of agile software development. The emphasis on individuals and interactions, customer collaboration, working software, and responding to change is not just process advice. It is a way of discovering and refining vocabulary through feedback.

Domain-Driven Design makes this more explicit through the idea of a ubiquitous language: a shared language developed by developers and domain experts and tested continuously against working software.

Programming Languages As Thinking Tools

Building vocabulary through code requires active engagement in writing and reshaping code; not just passive review of generated code . The very act of thinking deeply about code often happens only when we are actively engaged in writing it. Programming languages and their constructs and constraints themselves become thinking tools. The design constraints provided by different programming languages help shape our thinking. The channels and lightweight threads of Go, the object-oriented model of Java, or the ownership model of Rust all push us to see structure, boundaries, and trade-offs in particular ways. In that sense, programming languages do not just help us express a design. They also help us discover it. I recently had to design a custom Future implementation for the asynchronous programming examples. One of the important aspects of future API is to design the compositions to be able to express a sequence of actions.

var future1 = action1();
future1.thenCompose(val1 -> action2(val1))
       .thenCompose(val2 -> action3(val2))

Knowing the concepts and vocabulary of functional programming is crucial to be able to implement this api well. Not knowing those concepts results in awkward implementation and usage.

Sometimes, the programming language syntax can become too verbose and hide the underlying structure of the solution. For example, recently while working with a snapshot isolation implementation for my workshop, describing the essential requirements in plain english was a bit vague and putting it in Java code was too verbose. More constrained formal specifications like TLA+ would have helped. But even writing a single page pseudo formal spec helped significantly.

Begin(T, coord):
  R(T) := HLC(coord).now()
  writeSet(T) := {}

Read(T, N, key):
    N.HLC.tick(R(T)) //HLC advanced. So any write or commit after this is guaranteed to be at a higher ts
    return latest committed version of key with ts <= R(T)

Write(T, N, key, value):
    N.HLC.tick(R(T))
    if LatestCommittedVersion(key).ts > R(T):
        abort T
    place provisional intent for (key, value)
    writeSet(T) := writeSet(T) union {key}

This pseudo-formal spec helped clarify my thinking and served as a good basis for further discussions, implementation and validations through tests.

Working with LLMs

Considering 'Coding as Vocabulary Building', has important implications for LLMs. LLMs are trained on vocabulary from a large body of text and code. They learn recurring relationships between names, APIs, libraries, frameworks, idioms, design patterns, and implementation structures. When they see words such as Controller, Repository, Reducer, ConsensusModule or TransactionLog those names are not just labels. They carry associations with known code structures and expected behavior.

This is why vocabulary matters when working with LLMs. If our prompts use vague or inconsistent language, the model has to guess the design we intend. If our codebase uses unclear names and inconsistent concepts, the model has little stable structure to follow. But when the vocabulary is precise, consistent, and embodied in the code, the LLM can map our intent more reliably to useful implementation.

Cognitive Debt

This also explains a particular danger of LLM-assisted coding: cognitive debt. Cognitive debt accumulates when words, abstractions, and structures are used without their meaning being well understood by the people working with them. LLMs amplify this risk because they can generate large amounts of plausible code very quickly. The generated code may contain controllers, repositories, reducers, factories, transactions, schedulers, or other familiar-looking structures. The code may compile. It may even pass basic tests. But if the team does not understand the conceptual model behind those structures, the codebase has gained vocabulary without shared understanding. The problem is not that the LLM generated code. The problem is that the code introduced vocabulary faster than the developers built understanding. That gap is one of the major contributors to cognitive debt.

Code as a shared Conceptual Model

As we discussed in Designing Abstractions with LLMs, writing code has two deeply interwoven activities: discovering and applying abstractions. Discovering the abstractions is where we are developing the vocabulary. Once a strong vocabulary is built, it represents a shared conceptual model. Once that model exists, much of coding becomes using that conceptual model to build use cases. This is where good libraries and good foundational code shine. This is the part where we try to hide the intricacies of the programming language and environments, and give a more and more English-like interface to the vocabulary we have built. A typical way this works is to build a DSL to make using this vocabulary or these abstractions easy and close to natural language.

LLMs are excellent at this. They provide a natural-language interface to your abstraction vocabulary. The best part is that if you have executable code behind your vocabulary of abstractions, it itself acts as an excellent guardrail for the LLM to fix its mistakes. Good abstractions, executable behavior, tests, types, and invariants all help constrain the model and make its output more useful.

We can use the vocabulary to build an external DSL. LLMs work very nicely as a natural language interface on top of the external DSL. I have been using LLMs to great success with tools like PlantUML. And it's not a surprise, since LLMs by design work best to map vocabularies.

In that sense, strong foundational code becomes even more important in the age of LLMs. Once the vocabulary of a system is well formed, coding becomes less about producing raw syntax and more about using a well-developed conceptual language to build reliable software.

Code itself as Harness and Context

A lot of discussion on context engineering and harness engineering treats code as blackbox with the responsibility of it to be generated correctly managed externally by providing right context in prompts or constructing right harness with specs, tests, static validations etc to make sure it is structured and works as intended. A well structured code with abstractions forming a well defined vocabulary itself acts as the most important part of the harness and context. I have repeatedly seen good success with well designed code working very well with the LLMs. More importantly, when the code is built with stable abstractions with clear semantics, you get some freedom to choose whatever LLM model you use and do not need to worry much about how accurate your prompts are. Code structure and accompanying tests themselves provide context and harness that makes the LLMs output reliable and useful.

Conclusion

The role of coding is not disappearing. But it is changing.

As LLMs make code generation cheaper, the mechanical act of writing instructions becomes less central. What becomes more important is making the conceptual model explicit, discovering the right vocabulary, and refining that vocabulary through iteration, domain expertise, and feedback. This is also why programming languages continue to matter deeply. We are not meant to be passive reviewers of generated code. The act of writing code is itself part of our thinking.

Code is still instructions for a machine. But it is also a model of understanding. In the LLM era, that second role becomes even more important. The future of coding is not just writing more code faster. It is building better conceptual models, better vocabularies, and better foundations on top of which both humans and LLMs can work.

DevOps testingperformancekubernetes

AI-assisted testing, extensions updates, and more: k6 2.0 is here

Grafana k6 2.0 embeds a Model Context Protocol server so AI assistants like Claude Code can write, run, and validate performance tests programmatically.

Summary

What: Grafana released k6 2.0 with built-in Model Context Protocol server, AI agent bootstrap tools (k6 x agent, k6 x mcp, k6 x docs, k6 x explore), Playwright-compatible browser testing, expect() assertions with auto-retry, subcommand extensions framework, JSON summary output, native OpenTelemetry support, and stable k6 Operator 1.0 for distributed Kubernetes testing.
Why it matters: Load testing tools are adapting to AI-driven development by becoming programmatically accessible to agents. k6's MCP server and agent workflows position it as the first major performance testing tool designed to be operated by AI assistants, reflecting how validation must scale alongside AI-accelerated code generation.
Takeaway: If you use Claude Code, Cursor, or another MCP-compatible AI assistant, run k6 x agent to enable performance testing workflows in your coding environment.

Deep Dive

  • k6 2.0 introduces four AI-focused commands: k6 x agent sets up configuration and skills for AI coding assistants, k6 x mcp runs a Model Context Protocol server giving agents tools to validate scripts and inspect results, k6 x docs provides CLI documentation access without web searches, and k6 x explore browses the extension registry with automatic resolution
  • Browser module significantly expands Playwright API compatibility, making it easier to adapt existing Playwright tests to k6 and progress from functional correctness testing to load testing the same user flows
  • New Assertions API introduces expect() with two forms: non-retrying assertions for static values like HTTP status codes and JSON payloads, and auto-retrying assertions for browser tests where elements may take time to appear or become interactive
  • Extensions framework expanded with subcommand extensions that add custom commands under k6 x namespace for test authoring, documentation, result processing, or internal tooling (the four AI commands use this same mechanism)
  • Extensions catalog consolidated to distinguish official Grafana-maintained extensions from community extensions, with clear compatibility expectations and registry requirements including documentation, build instructions, and version compatibility
  • xk6 evolved from build tool to full extension development toolbox with xk6 new for scaffolding, xk6 lint for compliance checking, and xk6 test for running test suites with TAP or CTRF JSON output
  • JSON summary output provides machine-readable end-of-test results for CI/CD systems and AI agents, eliminating the need to scrape terminal output for automated decision-making
  • Native OpenTelemetry output enables k6 results analysis alongside application telemetry in existing observability platforms
  • k6 Operator 1.0 reached stable release, supporting distributed test execution on Kubernetes for production-scale load testing
  • Official and community extensions support automatic resolution for protocols beyond HTTP including k6/x/faker, k6/x/mqtt, k6/x/sql, k6/x/dns, k6/x/sse, and k6/x/kafka

Decoder

  • Model Context Protocol (MCP): Standard protocol developed by Anthropic that lets AI assistants connect to external tools and data sources through a server-client architecture, enabling agents to invoke commands and read resources programmatically.
  • k6 Operator: Kubernetes operator that manages distributed k6 test execution across cluster nodes, enabling production-scale load testing by coordinating multiple test runners and aggregating results.
  • Auto-retrying assertions: Assertions that repeatedly evaluate a condition until it becomes true or a timeout expires, essential for browser testing where DOM elements may not be immediately available due to rendering, network requests, or JavaScript execution.
  • CTRF: Common Test Report Format, a JSON-based standard for test results that enables consistent reporting across different testing tools and CI/CD platforms.
  • Subcommand extension: k6 plugin type that adds custom CLI commands under the k6 x namespace rather than JavaScript imports, used for workflows, documentation, tooling, and agent integration.

Original Article

AI-assisted testing, extensions updates, and more: k6 2.0 is here

For years, teams have relied on k6 to take a more proactive approach to performance testing, ensuring they can catch issues early and deliver more reliable user experiences. That approach has helped make k6 one of the most widely used performance testing tools in the open source community today, with more than 30k stars on GitHub.

Last year, we introduced k6 1.0, a major release that brought TypeScript support, native extensions, revamped test insights, and production-grade stability guarantees.

Now, we've reached another milestone for the OSS project: k6 2.0 is generally available.

This latest release builds on k6 1.0 to better support faster, more automated software delivery lifecycles. We've introduced AI-assisted testing workflows, broader Playwright compatibility in the browser module, a new Assertions API, and more. Overall, the release makes it easier to author, validate, automate, and scale performance tests, especially as AI becomes a more integral part of your development workflows.

Even with these advancements, existing k6 users should still feel right at home: scripts, checks, thresholds, scenarios, and CI/CD workflows remain core to the testing experience.

Read on to learn more about what's new, and be sure to check out the k6 2.0 talk from GrafanaCON 2026 in the video below.

AI-assisted workflows for faster, scalable testing

AI is changing how software gets written. Developers can generate, refactor, and review code faster than ever, but faster output also raises the bar for validation.

As more teams bring AI assistants into their development workflows, testing needs to become easier to author and automate, and easier for both humans and agents to interpret. k6 2.0 is built around that shift: it helps teams create tests faster, express expectations more clearly, and scale validation from local development to production-like environments.

The release includes four new commands that enable deeper integration with AI workflows and help teams use k6 programmatically:

  • k6 x agent helps developers bootstrap agentic testing workflows in AI coding assistants like Claude Code, Codex, Cursor, and more. It sets up the configuration, skills, and references an agent needs to use k6 to write correct, idiomatic, and modern tests; turn requirements and expectations into a testing strategy; and build out a test suite.
  • k6 x mcp exposes k6 through a built-in Model Context Protocol server, giving compatible agents the tools and resources they need to work effectively with k6. Agents can validate and run scripts, inspect results, iterate quickly on the tests they write, and tap into k6 resources and best practices along the way.
  • k6 x docs gives agents and developers CLI access to k6 documentation, API references, and examples without leaving the session, or having to perform web searches.
  • k6 x explore lets agents and developers browse the k6 extension registry from the CLI, filtering by type or tier and surfacing the imports, subcommands, and outputs each extension provides. Combined with automatic extension resolution, agents can discover the right extension for a testing scenario and pull it into a script without leaving the session.

These commands also reflect how k6 2.0 extends beyond test scripts. They are built on the same subcommand extension model now available to extension authors, which we'll cover in the next section.

Extensions updates to expand the reach of k6

Extensions help you extend core k6 functionality with new features to support your specific reliability testing needs. The 2.0 release expands on extensions in multiple ways: it provides a consolidated catalog of official and community extensions, makes it easier to test more systems and protocols, and introduces a way to extend the k6 CLI itself.

A curated extensions catalog

In k6, official extensions are those owned and maintained by Grafana Labs, with defined compatibility expectations and support across a range of k6 versions. Community extensions are built and maintained by k6 contributors and members of our OSS community.

With k6 2.0, these extensions are consolidated into a single catalog that makes it easier to discover and use them, and more clearly defines the boundaries between them. Community extensions, for example, are clearly identified as community-maintained and must follow registry requirements before being included.

This distinction matters. Extensions can add new protocols, clients, outputs, and CLI workflows to k6, so teams need to understand what is maintained by Grafana Labs, what is maintained by the community, and what guarantees apply before adding an extension in their testing workflows.

The catalog also gives extension authors a clearer path to contribute. Public community extensions can be submitted for inclusion if they meet the registry requirements, including documentation, build instructions, usage guidance, and k6 version compatibility.

Test more systems and protocols

Modern systems consist of so much more than HTTP services and browser frontends. Teams also need to test databases, message queues, streaming APIs, DNS, event-driven systems, and other infrastructure components that sit on the critical path.

Official extensions maintained by Grafana Labs, including k6/x/faker, k6/x/mqtt, k6/x/sql, and k6/x/dns, sit alongside community extensions like k6/x/sse and k6/x/kafka to help with these needs.

For cataloged extensions that support automatic resolution, you can reference the extension in your script and let k6 handle the rest. For custom extensions or extensions outside automatic resolution, xk6 is still available.

xk6 as an extension development toolbox

Extensions are only as healthy as the tooling around them. In k6 2.0, xk6 grows from a custom k6 build tool into a full extension development toolbox.

Extension authors can scaffold a new project from official templates with xk6 new, build and run k6 with an in-development extension in one step, check a project against the registry's compliance requirements with xk6 lint, and run a suite of k6 scripts against the extension with xk6 test, reporting results in TAP or CTRF JSON for CI/CD pipelines.

The result is a shorter path from idea to a published, catalog-ready extension, and a consistent baseline of quality across official and community extensions alike.

Subcommand extensions

Not every extension needs to be something you import in a test script. k6 2.0 introduces subcommand extensions, a new way to add custom commands under the k6 x namespace.

This means teams can build workflows around test authoring, environment setup, documentation, result processing, mocks, internal tooling, or anything else they need close to the k6 runtime.

We're already using this model internally at Grafana Labs: k6 x agent, k6 x mcp, k6 x docs, and k6 x explore are all built as subcommand extensions. The same mechanism that powers these AI-assisted workflows is now available to extension authors.

Writing familiar browser and assertion tests

k6 2.0 significantly expands compatibility between the k6 browser module and the Playwright API, making it easier for teams to apply existing browser testing knowledge and adapt existing Playwright tests to k6.

This is important because browser testing is often where functional correctness, user experience, and performance meet. With a more familiar API surface, teams can progress more easily from "does this user flow work?" to "how does this user flow behave under load?"

k6 2.0 also introduces a new Assertions API. The expect() API brings a Playwright-inspired assertion style to k6 scripts, with expressive matchers for both protocol and browser testing.

Assertions come in two forms:

  • Non-retrying assertions, which evaluate whether a condition is true immediately. They're useful for static values such as HTTP status codes, response headers, JSON payloads, and configuration.
import http from 'k6/http'; import { expect } from 'https://jslib.k6.io/k6-testing/0.6.1/index.js';

export default function () {
      const response =  http.get('https://quickpizza.grafana.com/');
      expect(response.status).toBe(200);
      expect(response.body).toBeDefined();
}
  • Auto-retrying assertions, which hold the execution of the test until a condition becomes true or a timeout is reached. They're especially useful for browser tests where elements may take time to appear, update, or become interactive.
import { browser } from 'k6/browser';
import { expect } from 'https://jslib.k6.io/k6-testing/0.6.1/index.js';

export const options = {
  scenarios: {
    ui: {
      executor: 'shared-iterations',
      options: {
        browser: {
          type: 'chromium'
        }
      },
    },
  },
};

export default async function () {
      const page = await browser.newPage();
      await page.goto('https://quickpizza.grafana.com/');
      await expect(page.locator("h1")).toContainText("Welcome to QuickPizza!");
}

Assertions complement existing k6 checks. Checks are still a great fit for load testing because they continue execution and emit metrics for threshold evaluation. Assertions are designed for use cases where a failed expectation should stop the test because the scenario is no longer valid.

From AI-authored tests to production-scale validation

A locally run test is a useful starting point for evaluating performance. But as teams bring testing into AI-assisted workflows and CI/CD pipelines, results need to be machine-readable and test execution needs to scale beyond a single machine.

k6 2.0 adds a new JSON summary output, making end-of-test results easier for CI/CD systems and AI agents to consume. Instead of scraping terminal output, tools can read structured results and make decisions based on them.

For real-time observability, native OpenTelemetry output makes it easier to analyze k6 results alongside the application telemetry teams already use.

And for teams that need production-scale load, k6 Operator 1.0 is now stable. The operator lets teams run distributed k6 tests on Kubernetes, closer to the environments where their applications already run.

Getting started with k6 2.0

Here are a few ways to try k6 2.0 today:

Thank you to the k6 community!

To everyone in the community who contributed features, filed issues, fixed bugs, wrote extensions, tested early builds, or pushed for more reliable software: thank you. k6 2.0 would not be possible without you.

You can learn more in our k6 documentation, and we'd love to hear what you think on GitHub.

Happy testing!

DevOps clouddatabaseawsgraviton

Amazon Redshift introduces AWS Graviton-based RG instances with an integrated data lake query engine

AWS eliminated Redshift Spectrum's $5/TB data lake scanning fees while launching Graviton-based RG instances that run 2.2x faster than RA3 at 30% lower cost per vCPU.

Summary

What: AWS launched Amazon Redshift RG instances powered by Graviton processors, delivering up to 2.2x faster performance than RA3 instances at 30% lower cost per vCPU. The new instances eliminate the previous $5/TB data lake scanning fees charged by Redshift Spectrum and include an integrated data lake query engine running up to 2.4x faster for Apache Iceberg queries and 1.5x faster for Apache Parquet. Available now in 24 AWS regions with no code changes required.
Why it matters: AWS is explicitly positioning this release to handle the operational cost challenges of AI agents querying data warehouses at scale, which the announcement states can dwarf typical human usage and cause spiraling costs. The move to eliminate per-terabyte scanning fees while improving performance suggests cloud providers are adapting pricing models to accommodate the query volume patterns of agentic AI workloads.
Takeaway: If you're running Redshift RA3 instances, migrate to RG via elastic resize (10-15 min downtime) or snapshot/restore. Use the AWS Pricing Calculator to estimate savings based on your workload patterns.

Deep Dive

  • AWS launched Redshift RG instances powered by Graviton processors in 24 regions, replacing the previous generation RA3 instances with significant performance and cost improvements
  • Performance gains: up to 2.2x faster for data warehouse workloads, 2.4x faster for Apache Iceberg queries, 1.5x faster for Apache Parquet queries compared to RA3
  • Cost reduction: 30% lower price per vCPU compared to RA3, plus elimination of the $5/TB Redshift Spectrum scanning fees that previously added to total costs
  • The integrated data lake query engine now runs on cluster nodes within your VPC boundary using existing IAM roles, removing the need for Redshift Spectrum
  • Migration paths: elastic resize (10-15 min downtime) for compatible configurations, or snapshot/restore for those wanting to make configuration changes during migration
  • No application changes required - external tables, schemas, and query syntax remain unchanged, including existing Spectrum queries
  • Instance comparison: rg.xlarge replaces ra3.xlplus (4 vCPU, 32GB), rg.4xlarge replaces ra3.4xlarge (16 vCPU vs 12, 128GB vs 96GB)
  • AWS explicitly designed this generation to handle high query volumes from AI agents, which can query data warehouses at scale that dwarfs typical human usage
  • The March 2026 performance update already improved BI dashboard and ETL workload query speeds by up to 7x for low-latency SQL queries
  • Pricing options: On-Demand Instances with hourly billing or Reserved Instances for additional cost savings

Decoder

  • AWS Graviton: Amazon's custom ARM-based processor designed for cloud workloads, offering better price-performance than x86 alternatives
  • Apache Iceberg: Open table format for huge analytic datasets in data lakes, providing ACID transactions and schema evolution
  • Apache Parquet: Columnar storage file format optimized for analytics, commonly used in data lakes
  • Redshift Spectrum: AWS's previous external data query service that charged $5/TB for scanning data in S3, now replaced by the integrated query engine in RG instances

Original Article

Amazon Redshift introduces AWS Graviton-based RG instances with an integrated data lake query engine

Since 2013, Amazon Redshift has given the full power of a data warehouse in the cloud, at a fraction of the on-premises cost. Every architectural generation—from dense compute to Amazon RA3 instances, from provisioned to Amazon Redshift Serverless—has made each query cheaper, faster, and more efficient than the last.

For over a decade, as data volumes have grown and analytics requirements have evolved, organizations increasingly leverage both data warehouse tables for structured, frequently-accessed data and data lakes for cost-effective storage of diverse datasets. Add AI agents to the mix and they query your data warehouse at a scale that dwarfs typical human usage, leading to spiraling operational costs.

Amazon Redshift has doubled down on its core strengths to meet the demands of any workload — whether driven by humans or AI agents. For example, in March 2026, Amazon Redshift improved the performance of business intelligence (BI) dashboards and ETL workloads by speeding up new queries by up to 7 times. This significantly improves the response times of low-latency SQL queries, such as those used in near-real-time analytics applications, BI dashboards, ETL pipelines, and autonomous, goal-seeking AI agents.

Today, we're announcing Amazon Redshift RG instances, a new instance family powered by AWS Graviton. RG instances deliver better performance, running data warehouse workloads up to 2.2x as fast as RA3 instances at 30% lower price per vCPU. Their integrated data lake query engine lets you run SQL analytics across your data warehouse and data lake from a single engine with performance up to 2.4x as fast as RA3 for Apache Iceberg and up to 1.5x as fast as RA3 for Apache Parquet. This blend of speed, cost efficiency, and an integrated data lake query engine makes Redshift RG instances well-suited to handle the high query volumes and low-latency requirements of today's analytics and agentic AI workloads.

You can compare new RG instances and current RA3 instances:

Current RA3 Instance Recommended RG instance vCPU Memory (GB) Primary Use Case
ra3.xlplus rg.xlarge 4 32 Small cluster departmental analytics
ra3.4xlarge rg.4xlarge 12 → 16 (1.33:1) 96 GB → 128 GB (1.33:1) Standard production workloads, medium data volumes

This approach reduces total analytics costs for customers running combined data warehouse and data lake workloads, while simplifying operations through a single system for querying both warehouse tables and Amazon Simple Storage Service (Amazon S3) data lakes. We recommend using the AWS Pricing Calculator with your specific workload patterns to estimate savings.

Getting started with Amazon Redshift RG instances

You can launch new clusters or migrate existing clusters through the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS API. The integrated data lake query engine is enabled by default.

In the Amazon Redshift console, you can choose new RG instances when you create a cluster.

You can migrate previous-generation instances to RG instances with optimal paths based on your cluster configuration to estimate costs, validate compatibility, and automate execution.

  • Elastic Resize—in-place migration with 10-15 minutes downtime for compatible configurations
  • Snapshot and Restore—create a RG cluster from an RA3 snapshot. This is best for customers who want to make configuration changes during the migration

Your external tables, schemas, and query syntax—including existing Spectrum queries—remain unchanged. There is no need to recreate external tables or modify application code. To learn more, visit the Redshift Management Guide.

Amazon Redshift now executes data lake queries on cluster nodes—the same compute that processes data warehouse workloads. As a result, Amazon Redshift Spectrum is no longer required. Data lake queries stay within your VPC boundary, use existing IAM roles, and incur zero per-terabyte scanning charges. This removes the $5/TB Spectrum scanning fees that previously added to total Redshift costs.

Now available

Amazon Redshift RG instances are now available in the following AWS Regions: US East (N. Virginia, Ohio), US West (N. California, Oregon), Asia Pacific (Hong Kong, Hyderabad, Jakarta, Malaysia, Melbourne, Mumbai, Osaka, Seoul, Singapore, Sydney, Taiwan, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, Milan, London, Paris, Spain, Stockholm), and South America (São Paulo). For Regional availability and a future roadmap, visit the AWS Capabilities by Region. For Redshift Provisioned, you can select On-Demand Instances with hourly billing and no commitments or choose Reserved Instances for cost savings. To learn more, visit the Amazon Redshift Pricing page.

Give RG instances a try in the Redshift console and send feedback to AWS re:Post for Amazon Redshift or through your usual AWS Support contacts.

DevOps datainfrastructuremysqlcdc

Migrating Data Ingestion Systems at Meta Scale

Meta's zero-downtime migration of petabyte-scale MySQL ingestion swapped production and shadow jobs mid-flight to enable instant rollback.

Summary

What: Meta migrated its MySQL-powered social graph data ingestion system—processing several petabytes daily across tens of thousands of jobs—from customer-owned pipelines to a self-managed architecture. Used a three-phase migration: shadow testing in pre-production, reverse shadow (swapping production and shadow roles for instant rollback), and cleanup. Built automated data quality tools with continuous checksum and row count validation.
Why it matters: The reverse shadow technique—where the new system becomes production while the old system becomes the shadow—is a clever solution for zero-downtime migrations at scale. It provides continuous validation after rollout and enables instant rollback without reconfiguring the old system. This pattern could apply to any large-scale data pipeline migration.

Deep Dive

  • Meta's data ingestion system scrapes several petabytes of social graph data daily from MySQL into their data warehouse for analytics, ML training, and product development
  • Legacy customer-owned pipeline became unstable at hyperscale under strict data landing time requirements
  • Three-phase migration lifecycle: (1) Shadow phase in pre-production with continuous row count and checksum validation (2) Reverse shadow phase where shadow job output goes to production table while production job becomes the shadow (3) Cleanup after verification
  • Success criteria per job before promotion: matching checksums and row counts between systems, no landing latency regression, no resource utilization regression
  • Custom data quality tooling read mismatch logs from Scuba hourly, identified example problematic rows via queries, logged debugging info for fast root cause analysis
  • CDC bad data propagation prevention: system marked partitions with quality issues in metadata, stopped landing new data if delta partition affected, merged older partitions with new deltas if target partition affected
  • Automated migration tooling for tens of thousands of jobs: continuously sent status signals to Scuba, external tools auto-promoted or demoted jobs between lifecycle stages based on whether criteria met
  • Batched migration due to capacity limits: categorized jobs by throughput, priority, and special cases; excluded jobs with known issues to reduce noise; avoided creating shadow jobs during bug fixes to prevent unnecessary full dumps
  • Reverse shadow phase enabled early signals before bad data reached consumers via backfill testing, and fast rollback by querying metadata for marked partitions
  • Authors: Zihao Tao, Mohan Perumal Swamy, Grace Gong, Ailyn Tong, Peishan Wang, Nilay Kapadia, Md Mustafijur Rahman Faysal, Saurav Sen, Jameel Mohamed

Decoder

  • Change Data Capture (CDC): Technique for tracking and capturing database changes incrementally rather than full dumps, using delta tables to record only modified rows between snapshots
  • Shadow testing: Running a parallel system with production data but isolated output tables to validate behavior before switching production traffic
  • Reverse shadow: Migration technique where the new system's output is switched to the production table while the old system continues running in shadow mode for validation and fast rollback capability
  • Scuba: Meta's internal real-time data management and analysis system, used here for logging migration metrics and querying job status signals
  • Full dump vs. delta: Full dump is complete snapshot of source database; delta captures only changes since last snapshot, making incremental updates more efficient but propagating bad data if initial snapshot has issues

Original Article

  • Meta's data ingestion system, which our engineering teams leverage for up-to-date snapshots of the social graph, has recently undergone a significant revamp to enhance its reliability at scale. 
  • Moving from our legacy system to our new architecture required a large-scale migration of our entire data ingestion system. 
  • We're sharing the solutions and strategies that enabled a successful large-scale system migration, as well as the key factors that influenced our architectural decisions.

At Meta, our social graph is powered by one of the largest MySQL deployments in the world. Every day, our data ingestion system incrementally scrapes several petabytes of social graph data from MySQL into the data warehouse to power the analytics, reporting, and downstream data products that teams across the company utilize for tasks ranging from day-to-day decision-making to machine learning model training and product development.

We've recently revamped our data ingestion system's architecture to significantly enhance its efficiency and reliability. The new architecture moves away from customer-owned pipelines, which functioned effectively at a small scale, to a simpler self-managed data warehouse service that still operates efficiently at hyperscale.

We've successfully transitioned 100% of the workload and fully deprecated the legacy system. But migrating a data ingestion system of this scale was a major challenge. Several important solutions and strategies helped make a migration of this scope successful.  

The Migration Challenge

As our operations grew in scale, our legacy data ingestion system began to show signs of instability under the increasingly strict data landing time requirements. We knew we needed to migrate to a new system. But we also knew that meant facing challenges around not only how to make sure each job would be migrated seamlessly but also how to perform large scale migration itself.

Ensuring a Seamless Transition

Ensuring a seamless migration meant we had to effectively track the migration lifecycle for thousands of jobs and put robust rollout and rollback controls in place to handle issues that might arise during the migration process.

The Migration Lifecycle

Our first step was to establish a clear migration job lifecycle to ensure data integrity and operational reliability throughout the process. 

Each job needed to be verified for correctness and had to meet defined success criteria before moving to the next step of the migration lifecycle:

  • No data quality issues. There is no difference between the data delivered by the old system and the new system. We verify this by comparing both the row count and the checksum of the data, ensuring complete consistency between the two systems.
  • No landing latency regression is observed. The data delivered by the new system should exhibit improved landing latency, or at minimum, match the performance of the old system.
  • No resource utilization regression is observed. The compute and storage usage of the job running in the new system should be improved, or at minimum, be comparable to that of the old system. 
  • For the critical table migration, we defined and agreed on extra migration criteria with the teams who were reliant on the service.

Phase 1: The Shadow Phase

In the first step of the lifecycle we set up shadow jobs in the pre-production environment to be delivered via the new system. This is essentially a production-realistic test that each shadow job consumed the same source as the production job but delivered data to a different table called the shadow table. This setup can help reveal issues because it exposes the new system to real production data and behavior, while still providing an isolated place to inspect outcomes and deploy fixes quickly.

We continuously monitored row count and checksum mismatches between the production jobs and the shadow jobs. When mismatches occurred, we quickly investigated the root cause and deployed fixes to the pre-production environment, then verified that the mismatch was resolved.

During this step, we also measured the compute and storage quotas for the shadow jobs to ensure that the production environment had sufficient resources before proceeding. 

If the shadow job met the above criteria it moved to the production environment and made sure the job could still run reliably in the production environment before moving to the next step.

Phase 2: The Reverse Shadow Phase

Once the production job and the shadow job were running reliably in the production environment, we began the reverse shadow phase. In this phase, the shadow job's data was written to the production table, effectively making the shadow job the new production job. Meanwhile, the production job's data was written to the shadow table, so the original production job then acted as the shadow job.

This approach provided two key benefits. First, we could still get ongoing data-quality signals after rollout by continuing to compare outputs from the two systems. Second, we could roll back fast if discrepancies were detected, without needing to recreate or reconfigure the old system job.

Phase 3: Migration Cleanup

We continued to monitor and compare the data delivered by both jobs. If no discrepancies were detected, the shadow job, now running on the old system, was removed. The new system then took over and continued delivering data through the production job, marking the completion of the migration.

Custom Data Quality Analysis Tooling

We also built a comprehensive set of debugging tools to help team members efficiently identify and resolve issues that might arise during the migration.

We developed a data quality analysis tool to ensure that edge cases across jobs are effectively captured and addressed. For each landed shadow table partition, the system would read the corresponding production table partition and compare both the row count and checksum. Any mismatches were logged to Scuba, Meta's data management system for real-time analysis. Every hour, the data quality analysis tool read the logs from Scuba, ran queries to identify example rows causing mismatches, and logged detailed debugging information back to Scuba. This process enabled team members to quickly determine the root cause of issues and assess whether they were already known and being addressed.

This same data quality analysis tool is still in use after the migration as part of the release validation process.

Handling Rollout and Rollback

Both our legacy and new data ingestion systems used change data capture (CDC) to incrementally ingest data into the target table. Each data ingestion job has its own internal table for a full dump of source databases (full dump), an internal table for capturing changes of source databases (delta), and the target table consumed by the data customers. All the information about job entities, including table names and table schemas, is saved and managed by the central management service.

Being a CDC process means the data generated by the system is used again to generate the new data. This means if previous landed data has any issues the problematic data will be passed to the new landed data. If issues were to happen after the migration we'd need to perform a rollback to fix the landed data to stop the bleeding.

To reduce the risk, we focused on two solutions:

  1. Early signals before problematic data are landed on the data customer.
  2. How to stop the bleeding quickly during rollback.

Early Signals After Rollout

Rather than waiting for data consumers to discover issues with problematic data, we received early signals indicating whether the migration is successful. As previously mentioned, after the rollout, the migration entered the reverse shadow phase. This meant the shadow job's data was written to the production table, effectively making the shadow job the new production job. And the production job's data was written to the shadow table, so the original production job now acts as the shadow job.

To get the early signals, we triggered backfill on both production and shadow jobs. If the backfill results still matched it indicated the migration is successful. If the result did not match, the job would be rolled back immediately and data consumers would not be impacted.

Stopping the Bad Data Propagation Quickly During Rollback

As previously mentioned, one characteristic of the CDC process is that problematic data can propagate to newly generated data. Quickly stopping the spread of bad data not only makes the migration more robust but also improves reliability after the migration is complete.

During the reverse shadow phase, if any data quality issues were detected in a specific partition, that partition would be marked in its metadata as having bad data quality. If this partition was a delta partition, then new data would stop landing, and an alert would be sent to a team member. If this partition was a target partition, the system would instead select an older partition and merge it with more deltas.

In this way we could stop bad data propagation quickly. For rollback, we could quickly query the metadata to find all partitions that were marked with bad data quality and fix them with backfill.

How We Executed the Large-Scale Migration

After successfully migrating a small batch of jobs, we were confident we would perform the full migration. The challenges in doing so roughly fell into two buckets:

  1. How to monitor and migrate large numbers of jobs automatically. (This challenge is intensified by the sheer volume of jobs to migrate).
  2. How to do effective shadow testing with limited capacity.

Monitoring With Automated Tooling

With tens of thousands of ingestion jobs to migrate we developed tooling that automated the entire process and minimized friction.

Running shadow tests and addressing edge cases across such a large job set requires robust automation and thorough validation to ensure reliability and correctness. 

Since we established a clear migration job lifecycle and job promotion criteria, the system continuously sent job status signals to Scuba, including data related to the lifecycle promotion criteria and the job's current stage in the migration lifecycle. We built external migration tools that continuously monitored signals from each job and automatically promoted or demoted jobs between stages of the migration lifecycle, depending on whether they met (or no longer met) the migration criteria. We also built system-level and job-level dashboards so engineers could quickly track the overall migration progress as well as monitor and debug individual jobs.

Planning With Limited Capacity

Because migration capacity was limited, we could not run all shadow jobs at once. Instead, we migrated the jobs in batches. Migration efficiency depends heavily on how jobs are selected for each migration batch.

We categorized jobs according to various features such as throughput, priority, and special cases. Engineering teams worked to ensure that the environment was properly prepared before creating a batch. For instance, they established selection criteria to exclude jobs with known issues that were still being resolved, thereby reducing noise caused by duplicate issues. Jobs were also prioritized based on business need and teams who were reliant on the service were notified ahead of migration.

We avoided creating new shadow jobs with known issues until those issues were resolved. When an issue was detected we removed any potentially affected jobs from the migration list and held them until a fix was in place.

As noted above, due to the system's CDC design, a new job's first snapshot was landed via a full dump, which is typically slow and expensive. If we detected data quality issues in a landed snapshot we also triggered another full dump to land a corrected snapshot after the underlying bugs were fixed. Creating shadow jobs while known issues were still present would therefore trigger a lot of unnecessary full dumps, both at job creation time and again during data-quality remediation. By avoiding creating those jobs, we avoided large amounts of extra full dump work and improved migration efficiency. We also built creative solutions like reusing snapshot partitions delivered by the old system as snapshot initially to reduce the full dump load.

DevOps infrastructurequicrust

When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug

Cloudflare spent weeks finding a 3-line QUIC bug: a ported TCP idle optimization measured from sends, not ACKs, trapping bandwidth at minimum after loss.

Summary

What: Esteban Carisimo and Antonio Vicente from Cloudflare debugged a CUBIC bug in quiche (their open-source QUIC implementation in Rust) where the congestion window stuck at 2 packets after packet loss, causing 60% of integration tests to timeout. The bug stemmed from a 2017 Linux kernel optimization by Eric Dumazet, Yuchung Cheng, and Neal Cardwell that adjusted epoch timing for idle TCP connections. When ported to quiche in 2020, it measured idle time as now - last_sent_time instead of now - last_ack_time, creating a death spiral: at minimum cwnd, every ACK-send cycle looked like a 14ms idle period (one RTT), pushing the recovery boundary into the future and preventing cwnd growth. The fix changes 3 lines to measure from max(last_ack_time, last_sent_time).
Why it matters: TCP kernel optimizations assume timing signals that don't translate directly to userspace QUIC implementations, and minimum-cwnd corner cases after severe loss are undertested in congestion control despite being the regime CCAs exist to handle.
Takeaway: Add congestion control tests that drive cwnd to minimum with heavy loss then stop loss entirely to verify recovery—Cloudflare's bug was invisible to normal throughput tests.

Deep Dive

  • Cloudflare discovered a bug in quiche's CUBIC implementation where congestion window (cwnd) got stuck at 2 packets (2700 bytes) after severe packet loss, never recovering
  • Test scenario: 10MB HTTP/3 download over localhost with 10ms RTT, 30% packet loss for first 2 seconds, then zero loss—60% of runs failed to complete within 10-second timeout
  • Expected behavior: CUBIC should reduce cwnd during loss, then ramp up once loss stops and complete download in 4-5 seconds
  • Actual behavior: After loss stopped at T=2s, cwnd stayed pinned at minimum and CUBIC oscillated between recovery and congestion avoidance states 999 times over 6.7 seconds (one transition every ~14ms, matching the RTT)
  • Root cause traced to a 2017 Linux kernel CUBIC fix for idle connections by Eric Dumazet, Yuchung Cheng, and Neal Cardwell—when app goes idle then resumes, delta_t becomes huge and CUBIC inflates cwnd unreasonably
  • Kernel fix: shift epoch forward by idle duration (measured from last send time) to preserve CUBIC growth curve shape
  • When ported to quiche in 2020, the idle detection check went into on_packet_sent() (QUIC lacks TCP's kernel CA_EVENT_TX_START callback), using bytes_in_flight == 0 as idle signal
  • The bug: measured idle time as now - last_sent_time, but at minimum cwnd (2 packets), this captures the full RTT (~14ms) instead of actual idle gap (effectively zero)
  • Death spiral mechanics: (1) send 2-packet window, (2) after 1 RTT both ACKed so bytes_in_flight = 0, (3) next send sees bif==0 and calculates ~14ms "idle" time, (4) shifts congestion_recovery_start_time forward by 14ms, often into the future, (5) ACKs processed while in_congestion_recovery()==true skip cwnd growth, (6) cwnd stays at 2 packets, repeat
  • A followup kernel fix one week later addressed the future epoch issue in TCP CUBIC by noting that tracking idle time from send events is imprecise since epoch_start is set during ACK processing
  • Fix for quiche: track last_ack_time and measure idle from max(last_ack_time, last_sent_time) instead of just last_sent_time—captures true idle gap for minimum-cwnd case while preserving epoch-shift behavior for genuinely idle connections
  • Code change was just 3 lines despite weeks of investigation using qlog instrumentation and visualization
  • Bug only triggered when three conditions met simultaneously: (1) connection exited slow-start, (2) running in congestion avoidance, (3) cwnd collapsed to 2-packet floor
  • Validation: 100% test pass rate restored, cwnd grows along expected CUBIC curve, downloads complete in 4-5 seconds
  • Key lessons: "idle" is contextual and harder to define than simple checks suggest; minimum-cwnd dynamics are a unique corner case invisible at high speeds; tiny timing assumptions from kernel TCP don't always transfer to userspace QUIC

Decoder

  • CUBIC: Congestion control algorithm that grows bandwidth using a cubic function instead of linear growth, standardized in RFC 9438 and default in Linux for both TCP and QUIC
  • quiche: Cloudflare's open-source Rust implementation of QUIC and HTTP/3 protocols
  • epoch_start: Reference timestamp CUBIC uses to anchor its cubic growth curve; reset after loss events to restart bandwidth probing
  • qlog: Structured logging format for QUIC connection diagnostics and performance analysis
  • App-limited exclusion: RFC 9438 rule that CUBIC should not increase cwnd when the application isn't sending enough data to fill available window (connection is app-limited, not network-limited)
  • BBRv3: Bottleneck Bandwidth and RTT version 3, a model-based congestion control algorithm that estimates available bandwidth directly rather than inferring it from packet loss

Original Article

Let me process the HTML you provided directly:

When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug

CUBIC, standardized in RFC 9438, is the default congestion controller in Linux, and as a result governs how most TCP and QUIC connections on the public Internet probe for available bandwidth, back off when they detect loss, and recover afterward. At Cloudflare, our open-source implementation of QUIC, quiche, uses CUBIC as its default congestion controller, meaning this code is in the critical path for a significant share of the traffic we serve.

In this post, we'll tell the story of a bug in which CUBIC's congestion window (cwnd) gets permanently pinned at its minimum and never recovers from a congestion collapse event.

The story starts with a Linux kernel change aimed at bringing CUBIC into line with the app-limited exclusion described in RFC 9438 §4.2-12 — a fix to a real problem in TCP that, when ported to our QUIC implementation, surfaced unexpected behaviors in quiche. It has a happy ending: an elegant (near-)one-line fix that broke the cycle.

CUBIC's logic in a nutshell

Before we dive into the core problem, a quick refresher on Congestion Control Algorithms (CCAs) may help to set the stage.

The central knob a CCA turns is the congestion window (cwnd): the sender-side cap on how many bytes can be in flight (sent but not yet acknowledged) at any moment. A larger cwnd lets the sender push more data per round trip; a smaller cwnd throttles it. Every loss-based CCA, CUBIC included, is ultimately a policy for how to grow cwnd when the network looks healthy and how to shrink it when it doesn't.

In essence, CCAs aim to maximize data transfer by inferring the "available bandwidth" of the network; because no one wants to pay for a 1 Gbps subscription and only use a fraction of it. The family of loss-based algorithms, to which CUBIC belongs, operate on a fundamental premise: (1) if there is no packet loss, increase the sending rate (i.e. increase the bandwidth utilization); (2) if there is loss, loss-based algorithms assume that the network's capacity has been exceeded, and the sender must back off (i.e. decrease the bandwidth utilization).

This logic is built on several assumptions that have been revisited over the years. However, we'll save that discussion for another time.

The symptom: a test that fails 61% of the time

Our investigation started with the report of unexpected failures in our ingress proxy integration test pipeline. This erratic behavior appeared in tests where CUBIC was evaluated in a scenario of heavy loss in the early part of the connection.

Recovery after congestion collapse is an uncommon regime, but it is exactly the regime a congestion controller exists to handle. Most congestion control tests exercise the steady-state and growth phases of an algorithm; far fewer probe what happens at minimum cwnd, after the connection has been beaten down. Bugs in this corner of the state space are invisible in throughput dashboards, undetectable by static review, and only surface when you deliberately drive a CCA into it and watch whether it can climb back out — which is exactly what this test did.

The simulated test setup includes the following details:

  • Quiche HTTP/3 client and server running at locally (localhost)
  • RTT = 10ms (set up in the configuration)
  • A 10 MB file download over HTTP/3
  • Using CUBIC congestion control
  • With 30% random packet loss injected during the first two seconds
  • After two seconds, loss stops entirely
  • The test has a generous 10-second timeout to complete the download, which is expected to be completed in four or five seconds

The expected behavior is straightforward: CUBIC should take some hits during the loss phase, reduce its congestion window, and once loss stops, steadily ramp up and finish the download well within the timeout. Instead, we observed in multiple 100-time runs that around 60% of our tests were not able to complete the download within the generous 10-second timeout.

The anomaly: 999 state transitions with zero loss

We instrumented quiche's qlog output with packet loss events and built visualizations to understand what was happening inside the congestion controller.

After the two-second (2000 ms) mark, packet loss stops entirely. However, the number of bytes in flight remains flat, which contradicts the core logic of the CUBIC algorithm: in the absence of loss, apply more gas to increase throttle (more bytes in our world). This raises the question: if the network is no longer dropping packets, why is the congestion window failing to grow?

When we zoom into that region, our analysis shows that CUBIC enters a rapid oscillation, shown in our plot as an extended recovery phase, between congestion avoidance state (the operational regime phase) and recovery state (the packet loss recovery state) — 999 transitions in approximately 6.7 seconds. That's one transition every ~14ms — suspiciously close to the connection's RTT (10ms). Throughout this entire period, cwnd is locked at the minimum floor: 2700 bytes, or two full-size packets.

Clearly something in CUBIC's logic is misinterpreting the state of the connection. The key clue is the oscillation period: ~14ms matches the RTT. Whatever is triggering the recovery/avoidance flip is happening once per round trip, in lockstep with connection's ACK clock; the self-clocking rhythm in which each round-trip's ACKs from the client trigger the server's next send. Because this is a download (server to client), the ACKs in question travel client to server, and CUBIC's state machine runs on the server side: every time those ACKs land, bytes_in_flight drops to zero and the server sends the next two-packet burst, which is what triggers the bug.

To confirm this behavior was CUBIC-specific, we ran the same test with Reno, another member of the loss-based family but with a different growth rate. The results were conclusive: 100% pass rate, showing Reno recovered cleanly after the loss phase, and revealing that this is a CUBIC-related bug.

Tracing the root cause

Loss-based algorithms have two pedals, gas and brake, with a difference in how they accelerate. Well, CUBIC comes with some extra features. Here we are going to focus on bytes_in_flight == 0.

TCP CUBIC after idle (Linux, 2017)

To understand the bug, we first need to understand the optimization it came from. In 2017, an issue was found with Linux kernel's CUBIC implementation. The commit message explains:

The epoch is only updated/reset initially and when experiencing losses. The delta "t" of now - epoch_start can be arbitrary large after app idle as well as the bic_target. Consequentially the slope (inverse of ca->cnt) would be really large, and eventually ca->cnt would be lower-bounded in the end to 2 to have delayed-ACK slow-start behavior.

This particularly shows up when slow_start_after_idle is disabled as a dangerous cwnd inflation (1.5 x RTT) after few seconds of idle time.

The epoch is the reference timestamp CUBIC uses to anchor its growth curve: W_cubic(delta_t) is parameterized by delta_t = now - epoch_start, and the epoch is reset whenever CUBIC restarts its growth function — most notably after a loss event reduces cwnd. Between resets, delta_t grows monotonically with wall-clock time.

When an application goes idle (stops sending) for a while and then resumes, the CUBIC growth function W_cubic(delta_t) computes delta_t as now - epoch_start. Since the epoch wasn't updated during idle, delta_t is huge, producing an enormous target window — and CUBIC would immediately try to inflate cwnd to an unreasonable value.

Jana Iyengar's initial fix was to reset `epoch_start` when the application resumes sending. But Neal Cardwell pointed out the flaw in that approach:

…it would ask the CUBIC algorithm to recalculate the curve so that we again start growing steeply upward from where cwnd is now (as CUBIC does just after a loss). Ideally we'd want the cwnd growth curve to be the same shape, just shifted later in time by the amount of the idle period.

The elegant solution, authored by Eric Dumazet, Yuchung Cheng, and Neal Cardwell, was to shift the epoch forward by the idle duration rather than resetting it. This preserves the shape of the CUBIC growth curve — just sliding it in time so that the algorithm picks up where it left off.

The port to quiche (2020)

When CUBIC was first implemented in quiche, this idle-period adjustment was ported. However, QUIC, which runs in the user space, doesn't have TCP's kernel-level CA_EVENT_TX_START callback. Instead, the quiche implementation checks for the idle condition inside on_packet_sent():

// cubic.rs — on_packet_sent() (simplified)
/// Updates the state when a packet is sent.
fn on_packet_sent(&mut self, bytes_in_flight: usize, now: Instant, ...) {
    // If the sending burst is restarting (i.e., bytes_in_flight was zero before this send),
    // adjust the congestion recovery start time to account for the gap in sending.
    if bytes_in_flight == 0 {
        let delta = now - self.last_sent_time;
        self.congestion_recovery_start_time += delta;
    }
    // Record the time of this send event.
    self.last_sent_time = now;
}

Where it breaks: the QUIC difference

The fix ported to quiche included a bug in the original kernel change which was fixed by a followup change to the kernel cubic module about a week later. The commit message for the second fix explains:

tcp_cubic: do not set epoch_start in the future Tracking idle time in bictcp_cwnd_event() is imprecise, as epoch_start is normally set at ACK processing time, not at send time.

Doing a proper fix would need to add an additional state variable, and does not seem worth the trouble, given CUBIC bug has been there forever before Jana noticed it.

Let's simply not set epoch_start in the future, otherwise bictcp_update() could overflow and CUBIC would again grow cwnd too fast.

As mentioned in the commit message, recovery start time is set during ACK processing, and the computation of the adjustment based on sent times can push the recovery start time into the future. This explains the oscillation between recovery and congestion avoidance seen on our test. The trap only consistently triggers when every incoming ACK drives bytes_in_flight all the way to zero — which in practice means cwnd has collapsed to its minimum (two packets) and the application has data ready to send another full window the moment an ACK arrives. Outside this regime, bytes_in_flight == 0 is less likely to hold on every send, so it is less likely to trigger the bug.

Why doesn't this also happen at connection start? The bug only triggers when the connection exits slow-start and switches over to congestion avoidance. Before exiting slow-start, congestion_recovery_start_time is not set, so the buggy branch in on_packet_sent has no recovery boundary to advance. During slow start CUBIC's cwnd grows by the same Reno-style ack-based rule shared by all loss-based CCAs — the cubic curve and its sensitivity to congestion_recovery_start_time only enter the picture once the connection is in congestion avoidance, meaning the trap needs three things at once: a real loss event to set the recovery boundary, congestion avoidance to be running, and cwnd collapsed to the two-packet floor.

At a minimum cwnd (two packets), the dynamics of the connection shift into a "death spiral" where the idle period optimization becomes a self-fulfilling prophecy. This trap operates in a continuous loop:

  1. Send and ACK packets: The sender transmits the entire two-packet window. After one RTT (~14ms), both packets are ACKed, causing bytes_in_flight to drop to zero.
  2. False idle detection: When the next burst is sent, on_packet_sent() sees bytes_in_flight == 0 and assumes the connection was idle, but it was congestion limited.
  3. Inflated delta: The calculation uses now - last_sent_time to determine the idle duration. When the congestion window (cwnd) is at its minimum, last_sent_time is the timestamp of the start of the previous RTT cycle. Therefore, the resulting delta is approximately 14ms (the connection's RTT + additional rounding errors). This RTT-sized delta is incorrectly applied as the "idle" time. The actual time the connection was idle (the processing gap between the last ACK arriving and the next packet being sent) is effectively 0. By measuring the full RTT instead of the true gap, the delta is inflated significantly, aggressively shifting the recovery start time forward, possibly into the future.
  4. Perceived recovery: Because the recovery start time is now in the future, the in_congestion_recovery() check returns true for every incoming ACK. Processing of the next ACK exits recovery and sets the recovery start to the ACK time which is larger than last_sent_time, making it likely for the congestion controller to push the recovery time into the future when doing the next send.
  5. Stagnation: Since CUBIC skips cwnd growth for any packet perceived to be in a recovery period, the window remains pinned at two packets — ensuring the pipe drains completely on the next ACK and restarting the cycle.

And this loop repeats for thousands of cycles until the accumulation of small deviations — from scheduler jitter and ACK processing variance — lets the <= boundary in in_congestion_recovery() slip behind the next packet's send time, breaking the cycle.

The fix: measuring idle from the right moment

Fixing the death spiral involves measuring the idle duration from when bytes_in_flight actually transitioned to zero (the last ACK processed) rather than the last packet sent.

The code change

  1. Add last_ack_time timestamp to the CUBIC state.
  2. Update that timestamp when ACKs arrive.
  3. Use it for the idle delta computation:
// cubic.rs — on_packet_sent()
fn on_packet_sent(&mut self, bytes_in_flight: usize, now: Instant, ...) {
    // Check if the connection was idle before this packet was sent.
    if bytes_in_flight == 0 {
        if let Some(recovery_start_time) = r.congestion_recovery_start_time {
            // Measure idle from the most recent activity: either the
            // last ACK (approximating when bif hit 0) or the last data
            // send, whichever is later. Using last_sent_time alone
            // would inflate the delta by a full RTT when cwnd is small
            // and bif transiently hits 0 between ACK and send.
            let idle_start = cmp::max(cubic.last_ack_time, cubic.last_sent_time);

            if let Some(idle_start) = idle_start {
                if idle_start < now {
                    let delta = now - idle_start;
                    r.congestion_recovery_start_time =
                        Some(recovery_start_time + delta);
                }
            }
        }
}

With the delta now reflecting the actual gap since the last ACK, the recovery boundary stops chasing the send time.

For genuinely idle connections, last_ack_time is far in the past and the same expression captures the full idle duration, the original epoch-shift behavior is preserved.

Validation

With the fix applied, the 100% pass rate of our quiche testing suite was restored.

We don't worry about the losses at the end of the connection — that's expected because we fully utilized the router's allocated buffer. In other words, we are fully utilizing the available bandwidth in this test case.

Takeaways

  • "Idle" is harder to define than it sounds. Normal pipeline delays at small windows can look like idleness to simple checks.
  • Minimum-cwnd dynamics are a unique corner case. The bug was invisible at high speeds and only triggered after severe loss.
  • The fix was surprisingly small compared to the complexity of the behavior. After weeks of instrumenting qlogs and analyzing visualizations to find the root cause, the solution required changing just three lines of code. As we noted during the investigation: the effort to find the bug was massive, but the fix itself was basically one line of logic.

The fix described in this post has been contributed to cloudflare/quiche, Cloudflare's open-source implementation of QUIC and HTTP/3. Our CCA efforts go beyond loss-based algorithms: we also use quiche's modular congestion control design to experiment with and tune our model-based BBRv3 implementation, now enabled for a growing percentage of our QUIC deployments. Stay tuned for further updates on QUIC congestion control implementation and performance.

If you're interested in congestion control, transport protocols, or contributing to open-source networking code, check out the quiche repository. We're always looking for talented engineers who love digging into problems like these, please explore our open positions.

DevOps infrastructurecloudkubernetes

With faster node startup for GKE, say goodbye to cold-start latency

Google cut GKE Autopilot node startup by 4x for GPU workloads through VM provisioning architecture changes, eliminating cold-start over-provisioning

Summary

What: Google rolled out up to 4x faster node startup for GKE Autopilot through architectural changes to VM provisioning. Available now for NVIDIA L4 (G2), A100 (A2), RTXPRO6000 (G4), H100 (A3) nodes and general-purpose Autopilot compute. Coming soon: H200 (A3 ultra), B200 (A4), Cloud TPUs. Automatic with no config changes.
Why it matters: Teams over-provision expensive GPU nodes to avoid startup lag during demand spikes. Faster provisioning lets autoscalers react in real-time, reducing idle buffer capacity costs—critical for AI inference where GPU scarcity and cost are major constraints.
Takeaway: If using GKE Standard clusters, enable Autopilot ComputeClass for GPU workloads to get 4x faster startup without migrating your entire cluster.

Deep Dive

  • Google rebuilt VM and node provisioning using compute buffers, fast-starting VMs, and a control plane that resizes VMs instantly without rebooting
  • Startup improvement is up to 4x faster compared to previous GKE versions for qualifying nodes
  • Targets cold start latency where teams over-provision to avoid waiting for new nodes during demand spikes
  • Particularly impacts AI inference and batch processing workloads with fluctuating demand
  • Available immediately for GKE Autopilot workloads including those running inside Standard clusters
  • Supported hardware: NVIDIA L4 (G2), A100 (A2), RTXPRO6000 (G4), H100 (A3), and general-purpose Autopilot compute
  • Coming soon: NVIDIA H200 (A3 ultra), B200 (A4), Cloud TPUs
  • No configuration changes required—automatic improvement for supported instance types
  • GKE Standard users can use Autopilot for specific workloads via ComputeClass without full cluster migration

Decoder

  • GKE Autopilot: Google's managed Kubernetes mode where Google handles node provisioning, scaling, and configuration automatically, versus GKE Standard where users manage nodes directly
  • ComputeClass: GKE resource that defines the compute configuration for Pods, allowing Standard cluster workloads to use Autopilot provisioning for specific Pods

Original Article

With faster node startup for GKE, say goodbye to cold-start latency

We've rolled out a significant update to Google Kubernetes Engine (GKE) that solves one of the most annoying problems in cloud infrastructure: cold start latency. GKE now has up to 4x faster node startup times compared to previous versions for qualifying nodes, allowing customers to provision quickly and efficiently. This isn't a setting you have to toggle or a config file you need to patch. It's an architectural upgrade to how we provision infrastructure, meaning your nodes just start faster, out of the box. This translates directly into enhanced agility and cost-efficiency for your cloud operations with a significant impact on a wide range of use cases, from rapid deployment of models for AI inference to dynamic scaling of accelerated and general-purpose nodes.

The problem we set out to tackle: the "cold start" tax

If you run workloads with fluctuating demand, especially AI inference or batch processing, you know the pain of waiting for a new node to spin up. When demand spikes, your autoscaler requests a node. Then you wait. To avoid that wait, and the resulting latency for your users, many teams resort to over-provisioning, keeping expensive nodes running "just in case." You end up paying for idle compute just to buy yourself insurance against startup lag. That insurance is especially expensive when it comes to scarce accelerators.

The solution: a complete rework of node provisioning

To address this, we rebuilt the provisioning logic for VMs and GKE nodes. At a high level, we are using a combination of intelligent compute buffers, specially designed fast-starting virtual machines, and a new control plane architecture that allows VMs to resize instantly without rebooting. While the technical details are complex, the benefit to you is simple: your GKE clusters now scale inherently faster and are more efficient, allowing you to shift precious resources to where they are needed.

What this means for you

  • Less over-provisioning: Because nodes come online faster, you can trust your autoscaler to react in real-time rather than keeping a buffer of idle nodes.
  • Better AI inference: For models running on GPUs, faster node provisioning reduces the time between a request spike and the model serving traffic.
  • No "Ops" overhead: This works automatically. You don't need to change your Terraform or YAML files to take advantage of it.

Availability

The accelerated provisioning is live right now for workloads running in GKE Autopilot — including Autopilot workloads running inside Standard clusters — using the following hardware:

Coming soon, we will continue to roll this out to more machines, including the following, so stay tuned:

How to try it

If you already use GKE Autopilot on the supported instance types, you've probably already noticed the improvement.

And if you're running a GKE Standard cluster, you can now use Autopilot specifically for these workloads without migrating your whole cluster. Just point your Pods to the Autopilot ComputeClass, and they will inherit these startup speeds while living alongside your standard nodes.

You can read the full technical documentation on fast-starting nodes here.

What's next

Learn how you can leverage these new improvements to improve your workload responsiveness with these resources.

DevOps databasebackendduckdbhttp

Quack: The DuckDB Client-Server Protocol

DuckDB launched Quack, an HTTP-based client-server protocol that beats PostgreSQL at small writes and transfers 60 million rows in under 5 seconds.

Summary

What: DuckDB released Quack on May 12, 2026, an HTTP-based client-server protocol available in v1.5.2 as a core_nightly extension. It supports multiple concurrent writers using token authentication on port 9494. Benchmarks on AWS m8g.2xlarge instances show 60M row transfers in 4.94 seconds (vs Arrow Flight SQL's 17.40s and PostgreSQL's 158.37s) and 5,434 tx/s for small writes with 8 threads (vs PostgreSQL's 4,320 tx/s).
Why it matters: This signals that even purpose-built in-process analytical databases need multiplayer capabilities for centralized state management. DuckDB chose custom serialization over Arrow Flight SQL to maintain control over protocol evolution and achieve single round-trip execution, prioritizing iteration speed over interchange format adoption.

Deep Dive

  • DuckDB announced Quack, an HTTP-based client-server protocol enabling multiple DuckDB instances to communicate with concurrent write support, available in v1.5.2 as a core_nightly extension with production release planned for DuckDB v2.0 in fall 2026
  • Marks a significant architectural shift from DuckDB's in-process-only model (similar to SQLite) to support centralized state use cases like multi-process observability collection and live dashboards
  • Protocol built directly on HTTP using application/duckdb MIME type, leverages DuckDB's internal serialization primitives battle-tested in Write-Ahead Log implementation
  • Authentication uses randomly-generated tokens by default, authorization callbacks customizable via SQL macros or user functions, server binds to localhost on port 9494 by default, recommends nginx with Let's Encrypt for public exposure
  • Optimized for single round-trip query execution and efficient bulk transfers: 60M rows transferred in 4.94s vs Arrow Flight SQL (17.40s) and PostgreSQL (158.37s) on AWS m8g.2xlarge instances with 0.280ms ping latency
  • Small write throughput reaches 5,434 tx/s with 8 parallel threads vs PostgreSQL's 4,320 tx/s, though PostgreSQL scales better beyond 8 threads due to current DuckDB concurrent insertion limits per table
  • Team rejected Arrow Flight SQL to maintain control over data types and protocol evolution without external standards constraints, and to enable single round-trip execution (Arrow Flight SQL requires minimum two round trips per query)
  • DuckDB acts as both client and server, enabling browser-based DuckDB-Wasm to connect directly to remote DuckDB instances via HTTP
  • Future roadmap includes DuckLake integration for remote catalog server, auto-installation of extension, improved syntax via new parser, transaction throughput scaling beyond 8 threads, extension-defined protocol messages, and replication protocol for read replicas
  • DuckCon #7 on June 24, 2026 will feature State of the Duck talk on Quack adoption; acknowledgements to Boaz Leskes (MotherDuck) and Philip Moore (GizmoSQL) for sharing client-server DuckDB lessons

Decoder

  • In-process database: Database architecture where the engine runs as a library inside the application's process (like SQLite or original DuckDB) rather than as a separate server, eliminating network overhead but limiting multi-process concurrent writes to the same database file

Original Article

Full article content is not available for inline reading.

Read the original article →

DevOps aicloudflareagents

Cloudflare Launches “Artifacts” Beta, Introducing Git-Like Versioning for AI Agents

Cloudflare launched Artifacts beta on May 8, bringing Git-style version control to AI agent outputs for rollback, audit trails, and collaborative governance.

Summary

What: Cloudflare announced Artifacts beta on May 8, 2026, a system that applies Git-style version control to AI agent outputs (code, configs, reasoning steps). It enables teams to track changes, compare versions, roll back outputs, and enforce governance policies for autonomous workflows.
Why it matters: As AI agents move from isolated tools to stateful production components, the lack of reproducibility and audit trails becomes a critical gap. Cloudflare is betting that treating AI outputs as first-class versioned assets will become as essential as version control for code.

Deep Dive

  • Artifacts brings Git-style version control to AI agent outputs, creating persistent records of generated code, configurations, and reasoning steps
  • Solves the reproducibility problem inherent in non-deterministic AI outputs that traditionally lack clear lineage or audit trails
  • Enables rollback, change tracking, and version comparison for agent-generated assets, similar to Git for source code
  • Targets multi-step autonomous workflows where agents iteratively refine outputs, providing visibility into both results and the process that produced them
  • Supports collaborative scenarios where multiple agents and humans interact with shared outputs, with policy enforcement and review workflows
  • Addresses enterprise compliance and governance requirements by making AI outputs traceable and reversible
  • Competes with OpenAI and Anthropic's conversation state management, LangChain and LlamaIndex's workflow persistence, and Weights & Biases' experiment tracking
  • Differentiates by treating AI outputs as version-controlled assets rather than logs, experiments, or conversation histories
  • Signals industry shift toward treating AI agents as stateful production components requiring the same engineering rigor as traditional software

Original Article

Cloudflare Launches "Artifacts" Beta, Introducing Git-Like Versioning for AI Agents

Cloudflare has announced the beta release of Artifacts, a new system designed to bring Git-style version control to AI agents, enabling developers to track, manage, and evolve agent-generated outputs with the same rigor as traditional code. The launch addresses a growing challenge in AI development: how to reliably manage the outputs, state, and behavior of increasingly autonomous agents operating in production environments.

Artifacts introduces a structured way to store and version agent outputs, such as generated code, configurations, or intermediate reasoning steps, allowing teams to trace changes, compare versions, and roll back when needed. Much like Git transformed software development, Cloudflare aims to provide similar guarantees for AI-driven workflows, where outputs are often non-deterministic and difficult to reproduce.

As AI agents become more capable, they are increasingly tasked with generating and modifying assets over time. However, unlike traditional software systems, these outputs are often ephemeral, lacking clear lineage or auditability. Artifacts addresses this by creating a persistent, versioned record of agent activity, enabling developers to understand how outputs evolve and ensuring that changes can be reviewed and governed.

The system is particularly relevant for teams building multi-step or autonomous workflows, where agents may iteratively refine outputs or interact with external systems. By capturing each step as a versioned artifact, developers gain visibility into both the final result and the process that produced it, an essential requirement for debugging, compliance, and trust.

Cloudflare positions Artifacts as a foundation for collaborative AI development, where multiple agents and humans can interact with shared outputs. Teams can review changes, enforce policies, and integrate artifact management into existing workflows, bringing AI development closer to established software engineering practices.

This also introduces a layer of governance and accountability, addressing concerns around the unpredictability of AI systems. By making outputs traceable and reversible, Artifacts helps organizations manage risk while still benefiting from the speed and flexibility of agent-driven automation.

The release reflects a broader shift in the industry as AI systems move from isolated tools to stateful, evolving components of production systems. Traditional tooling has struggled to keep up with this shift, particularly when it comes to tracking and managing non-deterministic outputs.

By applying version control principles to AI artifacts, Cloudflare is tackling a key gap in the AI development lifecycle: the lack of reproducibility and control. This is especially critical in enterprise environments, where auditability and compliance are essential.

Artifacts signals an emerging paradigm where AI outputs are treated as first-class assets, requiring the same level of management as source code. As organizations adopt more advanced AI workflows, the need for tooling that supports versioning, collaboration, and governance will only grow.

Other platforms are beginning to address the same problem - bringing structure, versioning, and governance to AI-generated outputs - but approach it from different angles depending on where they sit in the stack.

For example, OpenAI and Anthropic have introduced capabilities within their respective ecosystems (such as tool usage tracking and conversation state management) that allow developers to retain context and replay interactions, but these are typically tied to prompt/response histories rather than full artifact versioning. Similarly, orchestration frameworks like LangChain and LlamaIndex provide ways to persist intermediate steps and workflows, enabling some level of traceability, but they often rely on external storage or logging systems rather than offering a native, Git-like version control model for outputs.

On the more engineering-centric side, platforms such as Weights & Biases and Databricks focus on experiment tracking and data/version lineage, particularly for machine learning models and datasets. While these tools provide strong reproducibility and audit trails, they are typically optimized for model training workflows rather than dynamic, agent-driven output generation.

Cloudflare's Artifacts sits in a slightly different space, closer to software development practices, by treating AI outputs as version-controlled assets, aiming to unify traceability, collaboration, and rollback capabilities in a way that mirrors traditional code workflows but is purpose-built for autonomous agents.

AI llmmeta

Meta to release Muse Spark in Voice Mode and Meta Glasses

Meta rolled out Muse Spark, a compact foundation model powering voice AI across WhatsApp, Instagram, Facebook, and Ray-Ban Meta glasses with real-time camera recognition.

Summary

What: Meta launched Muse Spark, a foundational model now powering Meta AI across WhatsApp, Instagram, Facebook, Messenger, Threads, and the Meta AI app. Features include voice conversations with natural interruption support, shopping mode aggregating Facebook Marketplace and internet-wide listings with map-based browsing, and live camera recognition for real-time object and landmark identification. Initial rollout targets US and Canada users, with expansion planned for Ray-Ban Meta and Oakley Meta glasses. Built by Meta Superintelligence Labs.
Why it matters: Meta is embedding AI across their social and hardware ecosystem using a purpose-built compact model optimized for speed and multimodal perception, competing with OpenAI and Google's general-purpose large model approach by prioritizing real-world contextual understanding in consumer applications.

Original Article

Meta has launched Muse Spark, the foundational model now powering Meta AI across the Meta AI app and meta.ai, with expanded integration into WhatsApp, Instagram, Facebook, Messenger, Threads, and AI glasses. This rollout introduces faster voice responses, smarter shopping assistance, and real-time visual recognition through the device camera.

Today we're introducing Meta AI Voice Conversations powered by Muse Spark that let you talk naturally to Meta AI (interrupt, switch topics, or swap languages), and as you talk, Meta AI can generate images and pull up recommendations from Reels, maps, and more. We're also bringing…

Voice conversations enable users to speak naturally with Meta AI, switch topics or languages mid-discussion, and receive on-the-fly image generation and relevant recommendations. Shopping mode now aggregates Facebook Marketplace and internet-wide listings, offering map-based browsing, price and style filters, and direct brand content access in a grid layout. Live AI features allow users to point their camera at objects or landmarks for immediate context and help.

The initial rollout targets users in the US and Canada, with gradual expansion to Ray-Ban Meta and Oakley Meta glasses and broader app integration. Muse Spark is designed as a compact, fast model capable of advanced reasoning in fields like science, math, and health, including visual coding and multimodal perception. This distinguishes it from previous Meta models and competitors by enabling real-world contextual understanding and multitasking through subagents.

Early reactions from AI experts highlight Muse Spark's step toward more contextual personal assistants. Meta Superintelligence Labs, the team behind this release, rebuilt the AI stack for faster, more capable models, aiming to usher in personal superintelligence while implementing safety and privacy safeguards.

AI cloud

Google Eyes AI Data Centers in Space

Google and SpaceX are discussing orbital AI data centers as SpaceX prepares a $1.75 trillion IPO claiming space compute will soon undercut ground facilities.

Summary

What: Google and SpaceX are negotiating to launch orbital data centers for AI compute, according to the Wall Street Journal. SpaceX is preparing a $1.75 trillion IPO later in 2026, pitching investors that space-based facilities will be the cheapest option for AI infrastructure within a few years. Google announced Project Suncatcher, planning to launch prototype satellites by 2027. The discussions follow Anthropic's deal with SpaceX to use xAI's Memphis data center (SpaceX acquired xAI in February 2026).
Why it matters: The serious consideration of orbital data centers despite higher costs signals that AI compute demand has hit hard limits in both ground capacity and community acceptance of new facilities.

Original Article

Report: Google and SpaceX in talks to put data centers into orbit

SpaceX starship fully stacked

Google and SpaceX are in talks to launch orbital data centers in space, reports The Wall Street Journal, citing sources familiar with the matter.

The potential deal comes as SpaceX gears up for its $1.75 trillion IPO later this year, selling investors on the idea that data centers in space will be the cheapest place to put AI compute within the next few years. It also follows Anthropic's deal with SpaceX last week to use computing resources from xAI's data center in Memphis, Tennessee, with the potential to work together on orbital ones in the future. (SpaceX acquired xAI in February.)

Google is reportedly talking to other rocket-launch companies as well. The company also plans to launch prototype satellites by 2027 as part of an initiative called Project Suncatcher, announced late last year.

Elon Musk has created hype for orbital data centers, claiming they are cheaper to operate. Advocates also point out they are free from the local backlash that U.S. ground-based buildouts attract. However, as TechCrunch recently reported, today's terrestrial data centers are much cheaper than those in orbit once satellite construction and launch costs are factored in.

Google invested $900 million in SpaceX in 2015, according to regulatory filings.

TechCrunch has reached out to Google and SpaceX for comment.

AI hardwaresemiconductor

Semis Memo: Supply Chain Inheritance

Texas Instruments is raising prices instead of expanding capacity as AI data centers inherit the power supply chain built for EVs.

Summary

What: Analog semiconductor makers Texas Instruments (TXN) and NXP Semiconductors (NXPI) are raising prices instead of expanding capacity as AI infrastructure demand creates shortages in Multilayer Ceramic Capacitors and other power components. Nvidia's May 2025 blog on 800V DC rack architecture credits the technology to the electric vehicle and solar industries, revealing how AI is inheriting supply chains built for EVs and solar.
Why it matters: This shows how AI infrastructure can absorb overcapacity from adjacent industries. Burned by post-COVID oversupply, semiconductor makers now prioritize pricing power over volume, creating supply constraints AI demand cannot quickly resolve.

Decoder

  • MLCC (Multilayer Ceramic Capacitor): Small ceramic components that filter electrical noise and smooth voltage in power circuits, now in shortage as AI data centers require significantly more power management than previous workloads.

Original Article

Semis Memo: Supply Chain Inheritance

Power & Analog Semis, CPUs in the Agentic Era, Neoclouds, Material Bottlenecks, Korea Unlocked

Introduction

For the first innings of the AI Infrastructure trade, it was simple enough just to know the basics. Large Language Models run on GPUs, buy Nvidia. AI compute will lift optics out of the telecom doghouse and cause significant growth for interconnects, buy the optical interconnect names. Every iota of AI compute demand in the agentic era must inevitably flow through the memory OEMs, buy Micron and SK Hynix.

It seemed a lot more difficult than that at the time, but it was pretty much that simple. The reasons for that were twofold. First, not everyone bought into the massive growth in data centers that would be required for AI to proliferate and actualize. Second, between the post-COVID supply chain glut and numerous other headwinds to semiconductor and adjacent names, valuations remained quite forgiving in all but the most obvious first-order beneficiaries.

That's begun to change, and with it, outperforming in the AI infrastructure complex requires in-depth understanding beyond just identifying current bottlenecks. Understanding the roadmap for the future requires a bit more technical competence, which is why in January 2026 we began our Semis Memo series – guided by our semis analysts Zephyr and Jukan.

While the landscape has evolved, our framework stays the same. Begin with the macro. Find areas where forecasts are still reflecting overhangs from non-AI related headwinds and determine whether AI demand can overcome them in a way that makes estimates too low.

In this issue, we're covering the following places that meet our criteria:

  • Analog and Power Semis: Supply Chain Inheritance
  • CPUs in the Agentic Era
  • Neoclouds: The Inference Shortage
  • AI Materials Bottlenecks
  • Korea Unlocked
  • Updating Previous Semis Memo Ideas

We end with Some Thoughts on Where We're Going…

Analog and Power Semis

We first flagged the likelihood that AI demand would overwhelm the headwinds currently being experienced by the analog and power semi sector in our 25 Trades for 2025, specifically as it related to the upcoming Multilayer Ceramic Capacitors (MLCC) shortage.

Components integral to power quality management systems address common issues such as voltage sags, harmonics, and transients, thereby ensuring the reliable operation of electrical and electronic equipment. This includes capacitors, inductors, diodes, power ICs, surge protectors, filters, transformers, uninterruptible power supplies (UPS).

Discrete power semiconductors (like MOSFETs and diodes) will also benefit as they are integral to creating efficient, stable power rails. Filters, ferrite beads, and connectors may see growth, but the clearest secular uplift is likely in capacitors and inductors given their centrality to power conversion in AI-driven, high performance computing environments.

These names have begun to outperform, and we feel it's directly related to another framework we've posed for 2026 – "Post-Traumatic Supply Disorder". The companies dealing with power semis have had to contend with a barrage of headwinds – the COVID supply glut, competition from Chinese analog semis, the anemic EV and automotive cycle…the list goes on. However, they're beginning to see data center revenues climb. And they're not rushing to add capacity, having been burnt one too many times.

Take a look at the capex intensity (capex / revenue) for Texas Instruments (TXN US):

It's typically this part of the cycle that results in supply ramping, but instead, TXN and peers like NXP Semiconductors (NXPI US) are content to raise prices.

Now, we're at an inflection point, and these companies are letting ASPs go up rather than flood the market. Up until now, however, we've been mostly focused on the rack-internal story. Companies like Murata Manufacturing (6981 JP), Vishay Intertechnology (VSH US) and Samsung Electro-Mechanics (009150 KS) have taken off as the crowd recognizes exactly how short on MLCCs we are.

For our first and highest-conviction section, we're glad to say that you don't have to be a semiconductor expert to understand the rack-external power semis story. It's a pretty cut and dry setup. The capex that burned them once was actually the exact infrastructure necessary for this part of the cycle.

While we've long been waiting for the automotive overhang to lift the fog off of the names in the analog and power semis space, we're now realizing something more significant. It doesn't really need to – rather, the AI capex buildout is simply inheriting the EV buildout supply chain.

Supply Chain Inheritance

In Nvidia's May 2025 technical blog on 800V DC rack architecture, they credit the underlying technology to "the electric vehicle and solar industries." That's the trade…

AI researchml

What Parameter Golf taught us

AI coding agents contributed meaningfully to OpenAI's Parameter Golf, where 1,000+ participants submitted 2,000 model optimization entries.

Summary

What: OpenAI's Parameter Golf challenge attracted over 1,000 participants who submitted 2,000 entries focused on minimizing loss on a constrained dataset. Participants used techniques including quantization, careful tuning, and novel modeling approaches, with AI coding agents contributing to submissions.
Why it matters: AI coding agents participating meaningfully in research competitions suggests they're transitioning from tools to research collaborators, potentially accelerating ML optimization work.

Decoder

  • Parameter Golf: A competition format where participants minimize model loss on a dataset while staying within strict parameter count constraints, similar to golf's goal of using the fewest strokes.

Original Article

Parameter Golf attracted over 1,000 participants and 2,000 submissions focused on minimizing loss on a dataset within strict constraints. Participants leveraged a range of techniques, including careful tuning, quantization, and novel modeling ideas, with AI coding agents playing a significant role. This challenge revealed new talent and highlighted the evolving role of AI agents in research competitions.

AI agentssearchrag

Agentic search models

SID and Glean released specialized search LLMs (SID-1, Waldo) that orchestrate simple retrieval tools instead of building complex search pipelines.

Summary

What: SID released SID-1 and Glean released Waldo, LLMs trained specifically for search orchestration rather than adapting frontier models like GPT-5. Doug Turnbull argues these handle domain-specific nuances—like knowing 'bistro tables' in furniture search means outdoor furniture, not restaurant equipment.
Why it matters: This suggests search infrastructure will unbundle into simple retrieval backends orchestrated by vertical-specific LLMs, similar to the proliferation of domain-tuned embeddings, rather than relying on expensive context engineering around GPT-5.

Original Article

In retrieval, we know the lego pieces well. Embeddings. Rerankers. Query understanding. BM25. All the parts we put together in the standard stack.

But a new power is rising: agentic search models. LLMs trained specifically on controlling the search task. Compared to GPT-5 and pals, these models might focus on tasks / domains. They can be smaller, faster, and easier to deploy in our search infrastructure.

Let's set the stage.

With the traditional lego pieces, we build something like this thick search monolith:

That's what we've built for 1-2 decades. Queries flow into the system. We apply business rules and classify queries. We search one or more retrieval backends. We post-process and rerank what comes back.

It's all very manual, programmatic, and bespoke. Each piece doesn't see the whole. Every piece focuses on its part of the problem (rerankers, query classifiers, etc) ignorant of the whole.

Agentic search orchestrates the pieces

Agentic search unbundles the pieces to see the whole. An agent built on GPT-5 or Sonnet knows it has tools. It's given knowledge / context to solve the user's query.

The agent helps the user: driving a few simple retrieval primitives, exploring candidates to return what's relevant. Instead of thick, monolithic search, the underlying search tools become thin wrappers on our backend indices.

Unlike the traditional stack, the agent sees the whole process. It's not a series of reductive steps. Instead the model orchestrates a solution for the user using what's available - retrieval tools, other subagents, or a knowledge base to guide the user.

In other words, agentic search unbundles the retrieval stack. The parts still matter. But the whole can be managed by a single, intelligent model.

GPT-5 doesn't really know search

Frontier lab models (GPT-5, Sonnet and pals) do well in the 80% case. They understand the queries with general knowledge. They surface defensible search results.

But it's the last 20% that moves the needle.

What's in the last 20%? It's the non-obvious stuff in our users / domain. GPT-5 doesn't know that in our furniture store a search for "bistro tables" actually means "small outdoor tables" not restaurant equipment. GPT-5 doesn't know our fashion search users click on dark or plain patterns over complex ones.

Why not? Isn't GPT-5 itself also trained on search?

Yes, of course.

GPT-5 and pals think of 'search' as web search. Mainline models expect search tools that work near flawlessly. To give them answers they don't have. Unfortunately, our search isn't Google. We work on smaller, focused teams: building simpler search backends for our narrow domains. We need agents trained to reason and orchestrate simpler retrieval systems.

Currently, as I teach in Cheat at Search with Agents, we build constraints + checks around the model. But this intense level of context engineering eats up tokens and costs $$.

Agentic search models for the last 20%

What if we could train an LLM to focus on search specifically? Even better: search in our domain, for our users?

We're beginning to see models in this direction: SID was first with their SID-1 model. We've since seen Glean release Waldo. Startups like Charcoal tailor to your corpus.

Teams advertise these replacements by training on document search specifically - they focus on that last 20% needed to find useful results. Sid, for example, points to smaller size and lower latency compared to GPT-5 for agentic search.

Think about the implications. I've described a future where we don't build complex query + reranking pipeline. Instead agents orchestrate simpler retrieval primitives. A basic keyword search or embedding model with a few filters. Basic, scalable, and simple retrieval tools.

Then we deploy an agentic search model like SID-1. It knows what's relevant in our domain. It gets our users. Now it can orchestrate the tools to find relevant search results. Currently, sure, the focus begins with RAG and classic passage retrieval (ie chunks). That will surely expand to a family of models, targeting different domains from e-commerce to job search to ??.

Today we have scores of embedding models on huggingface, targeting financial, legal, e-commerce. Why wouldn't we also have many agentic search models? Tuned to domains?

An embedding only solves a narrow portion of the problem (similarity). An agentic search model could encompass the entire process from query understanding to hybrid search. Yes they're too slow today to drive site search. But that will surely change in the years ahead.

It's a future I'm excited about. Search will look radically different.

AI video

Perceptron Mk1 shocks with highly performant video analysis AI model 80-90% cheaper than Anthropic, OpenAI &amp; Google

Perceptron's Mk1 video model prices 80-90% below OpenAI, Anthropic, and Google

Summary

What: Perceptron announced Mk1, a video analysis AI model priced 80-90% cheaper than offerings from Anthropic, OpenAI, and Google while claiming high performance.
Why it matters: Suggests incumbents have significant margin in video AI pricing, potentially triggering competitive price cuts across multimodal models.

Original Article

Mk1 is a video analysis AI model priced 80-90% cheaper than rivals like Anthropic, OpenAI, and Google.

Tech aihardwareandroidgemini

Google's Android-powered laptops are called Googlebooks, and they're coming this year

Google announced Googlebooks, Android laptops that activate Gemini AI with cursor wiggles and stream phone apps to the desktop.

Summary

What: Google announced Googlebooks, Android-powered laptops launching later this year from Acer, Asus, Dell, HP, and Lenovo. The devices feature 'Magic Pointer' (wiggle cursor to activate full-screen Gemini), phone app streaming to laptop windows, seamless file transfer from phones, and an illuminated 'Glowbar' on the lid.
Why it matters: This signals Google's shift from web-first (Chromebooks) to AI-first computing, betting that contextual AI will define the next laptop generation, though Microsoft's Recall failure and Magic Cue's limited impact on Pixel phones suggest uncertain demand for screen-context AI features.

Original Article

Google took its first swing at laptops with Chromebooks way back in 2011. These web-first laptops have seen success over the years, mostly in enterprise and education. Google insists Chromebooks aren't going away, but the company's focus has shifted to something new: Googlebooks. That's what Google has decided to call the new line of Android-powered laptops, which will begin shipping later this year.

If you thought other Google products were steeped in Gemini, you haven't seen anything yet.

Google says it designed Googlebooks from the ground up with Gemini Intelligence, and it all starts with the cursor. Google calls this the Magic Pointer. Just wiggle the cursor back and forth, and it will activate a full-screen Gemini experience. The AI will see what's on your screen so it can make contextual suggestions and pull in data from multiple apps.

What can you do with that? Well, it's all a bit vague. Google's demos show how Magic Pointer can be used to select multiple images and instantly combine them with Nano Banana. Google also says you can use the cursor in AI mode to do things like suggest a calendar appointment simply by pointing it at the date in an email. Magic Cue, which has been available on Pixel phones since last year, will also be part of Googlebooks. This feature can recommend actions and surface information based on context like messages and emails.

There's definitely a problem with discoverability in AI features, but it's uncertain how many useful things generative AI can do with screen context. The best Microsoft could manage was Recall, and we all know how that went. So far, Google's Magic Cue on phones hasn't been a game changer—in fact, it rarely shows up at all. Can a laptop do any better?

Google's AI-generated widgets from Android phones will also come over to Googlebooks. The widgets are more limited than you might expect, though. They can collect data from the web, as well as certain content from your Google apps, to create a "personalized dashboard" for your home screen. The format and style will be adapted to the laptop form factor.

Phone apps and not phone apps

Google seems to be avoiding an explicit mention of Android when discussing Googlebooks, but that's the underlying software. That gives the devices access to a wide variety of apps—Google tried for years to shoehorn Android apps into Chrome OS with limited success, but it should be easier with laptops that run the apps natively.

These devices will have the Play Store, of course, but the rest of the software situation is hazy. Google is in the process of certifying third-party app stores for Android while also clamping down on sideloaded APKs, and we don't know where Googlebooks will end up in the openness spectrum. Google has refused to comment on specifics right now, saying only that it will have more to share regarding its "app ecosystem partners" closer to launch.

You might not have to install very many apps on a Googlebook, though. The platform will integrate deeply with your Android phone, allowing you to stream apps right to your laptop. A dedicated button in the taskbar lists all the apps on your phone. Click one, and it will appear on the Googlebook in a floating window. It's similar if you need a file from your phone—Googlebooks can seamlessly transfer files from your phone when you need them.

Glowing up later this year

Google has not discussed any plans to build its own Googlebook. Instead, most of the OEMs that have been making Chromebooks will also offer Googlebooks when they launch, including Acer, Asus, Dell, HP, and Lenovo. You can expect devices with varying prices and hardware configurations, but you'll know they're Googlebooks from the Glowbar on the lid.

Googlebook

This illuminated design feature is reminiscent of the bar on some older Google devices like the Pixel C tablet and Chromebook Pixel. On those devices, the light bar would indicate the battery level. Google says the bar on Googlebooks is both "functional and beautiful," but it hasn't explained the functionality yet. We've asked for details.

Tech hardwarespacex

Once again, SpaceX has set a new record for the tallest rocket ever built

SpaceX's Starship V3 cleared fueling tests for a May 19 launch, standing 408 feet tall with 18 million pounds of thrust—both new records.

Summary

What: SpaceX completed a launch rehearsal Monday for Starship Version 3, loading over 11 million pounds of propellant at Starbase, Texas. The rocket stands 408 feet tall (the tallest ever built) and generates 18 million pounds of thrust from 33 Raptor 3 engines—10% more power than V2. This marks the 12th Starship test flight overall and first from a new launch pad, targeting May 19.
Why it matters: Version 3 shifts SpaceX's focus from proving Starship can reach orbit to actually using it—this is the first iteration designed for in-orbit refueling experiments, which are required before Starship can fly to the Moon for NASA's Artemis missions or venture beyond low-Earth orbit.

Original Article

For the third time in three years, SpaceX has stacked a new version of its enormous Starship rocket on a launch pad in South Texas, just a few miles north of the US-Mexico border. The newest-generation Starship, known as Starship Version 3, is taller and more powerful than the ones that came before it.

The upgrades on Starship are numerous. Perhaps the most notable changes are higher-thrust, more efficient Raptor engines on the Super Heavy booster and Starship upper stage, a new reusable lattice-like structure at the top of the booster for hot staging, and three—not four—modified grid fins to help bring the first stage back to Earth for recovery and reuse.

If all goes according to plan, this is the version of Starship that SpaceX will use to begin experimenting with in-orbit refueling, a capability engineers must master before sending ships anywhere farther than low-Earth orbit. In the near-term, refueling will enable Starships to fly to the Moon to serve as landers for NASA's Artemis program. Starship remains an iterative development program, and new versions are in the pipeline, but Starship V3 should mark a step toward SpaceX actually using Starships in space, rather than solely proving they can get there and get home.

But SpaceX must first do just that with Starship V3. The company has not officially announced a target launch date. Airspace and maritime warning notices released in the last few days suggested the upgraded rocket could lift off as soon as Friday evening from SpaceX's Starbase launch site on the Gulf Coast east of Brownsville, Texas, but that was before a day-and-a-half delay in launch preps over the weekend.

A fresh set of maritime warnings issued late Monday indicated SpaceX is now targeting a launch attempt on Tuesday, May 19.

Final steps

Ground crews at Starbase lifted the Starship upper stage atop its Super Heavy booster Saturday, assembling a fully stacked Starship V3 for the first time. The rocket has a height of 408 feet (124 meters), a few feet taller than the previous version of Starship.

On Monday, SpaceX's launch team loaded more than 11 million pounds (more than 5,000 metric tons) of super-cold methane and liquid oxygen into both stages of the rocket after halting a previous fueling attempt Saturday night due to a technical issue. The launch rehearsal followed a test-firing of the booster's 33 Raptor engines at the launch site on May 6, the first time SpaceX ignited a full complement of uprated Raptor 3s.

At liftoff, the rocket is expected to produce some 18 million pounds of thrust, about 10 percent more than the previous generation of Super Heavy boosters, according to specifications previously released by SpaceX. The scale is staggering. For example, in Version 3, the internal transfer tube that channels methane fuel from the top of the booster to the engine compartment is about the same size as the first stage of SpaceX's workhorse Falcon 9 rocket, which is roughly 12 feet (3.7 meters) in diameter.

The upcoming flight will also mark the first liftoff from a new launch pad at Starbase, about 1,000 feet (300 meters) west of the departure point for all of SpaceX's past Starship test flights. This will be the 12th full-scale Starship test flight, and the first since last October, after delays in readying V3 for its first launch.

Like most prior Starship flights, the upper stage of the rocket will target a controlled splashdown in the Indian Ocean a little more than an hour into the mission. On future flights of Starship V3, SpaceX will attempt to bring the ship back to Starbase for a catch by the launch tower's mechanical arms, as the company has already demonstrated with the rocket's massive Super Heavy booster.

One change SpaceX is introducing on this launch is a more southerly flight path over the Gulf of Mexico, taking the rocket between the northeastern coast of the Yucatan Peninsula and the western tip of Cuba, instead of over the Florida Straits.

What's left before Starship V3 is ready to fly? On the SpaceX side, workers must install hardware for the rocket's self-destruct system, pyrotechnics that would blow up the vehicle if it deviated from its flight plan. This will require the removal of the ship from the booster. A launch license from the Federal Aviation Administration is still pending.

Tech cloudinfrastructurespacexsatellite

SpaceX and Google Are in Talks to Launch Data Centers in Orbit

SpaceX pitched orbital data centers to IPO investors as Google pursues Project Suncatcher to launch cloud infrastructure in space.

Summary

What: Google is negotiating with SpaceX and other rocket launch companies to deploy Project Suncatcher, its orbital data center initiative announced in 2025. SpaceX independently featured orbital data centers in its investor pitch ahead of a planned IPO.
Why it matters: Represents a bet that orbit can solve fundamental data center constraints like power, cooling, and real estate while positioning SpaceX beyond launch services into infrastructure ownership.

Original Article

Google is in talks with SpaceX and other rocket-launch companies for a rocket launch deal. Last year, Google announced Project Suncatcher, a moonshot initiative aimed at launching orbital data centers. Many industry leaders see orbit computing as a solution to the limitations of Earthbound data centers. SpaceX is also looking to launch orbital data centers. The speculative technology was the center of SpaceX's pitch to investors ahead of its planned IPO.

Tech careerai

Why senior developers fail to communicate their expertise

The phrase 'Can we try something quicker?' helps senior developers reframe complexity management as uncertainty reduction for stakeholders.

Summary

What: Tuhin Nair argues senior developers manage complexity to maintain stability while businesses reduce uncertainty through speed to market, creating conflicting goals. Senior devs should reframe their expertise (reducing, reusing, avoiding features) using 'Can we try something quicker?' to acknowledge stakeholder needs. With AI enabling rapid but unstable development, Nair proposes decoupling into a 'Speed' version for experimentation and a 'Scale' version senior devs stabilize, acting as editors rather than gatekeepers.
Why it matters: As AI tools accelerate feature development, senior developers must explicitly own the stabilization role rather than appearing as obstacles, transforming from writers to editors who review and harden what works from rapid experiments.
Takeaway: When stakeholders request features, respond with 'Can we try something quicker?' to acknowledge their uncertainty-reduction goal while proposing simpler alternatives.

Original Article

"AI agents are the future of software development. We won't need developers anymore to slow down the progress of a business."

If you're a senior developer and you think this is true, I'm somewhat suspicious of your expertise (I'll explain why; I'm not needlessly antagonistic).

But if you're not a senior developer and you think this is true, I think you're probably right.

Huh? What's going on here?

Copywriting is, in its essence, about matching a message to an audience.

And so, to me, a copywriter, what's happening here is that the same message is meaning two different things to two different audiences.

If you're a senior developer, and if you've played with the agents and skills and models and all the other things that are blowing people's minds, and if your intuition is still telling you something is off in how people are proclaiming your job obsolete, then here, in this post, I'm going to try and put words to your intuition (as a good copywriter does).

But wait a minute! Many seasoned and famous developers are also proclaiming the death of the developer.

How's that? Whose intuition is right? And what's causing this split?

When I join a team there are two kinds of senior developers I meet.

The first kind says things like:

"I found this new tool and it's pretty cool …"

"This company <company totally unlike the one we're in> does things this way, so …"

"Here, look at this HackerNews post that says this is best practice, we should probably …"

I don't like this kind of senior developer. A little self-protective, lots of time spent in the industry, probably a good people person.

But not my wavelength.

Then there's also this kind of senior developer:

"Do we really need that?"

"What happens if we don't do this?"

"Can we make do for now? Maybe come back to this later when it becomes more important?"

Ah, baby, this is my senior developer. The avoider, the reducer, the recycler. They want to avoid development as much as they can.

Why? Because they hunt a singular monster in professional software development: complexity.

Special cases, if conditions, new database tables, new components. All yuck yucks. The senior developer wants as little of this as possible, spending lots of time making sure they absolutely need to add more code.

Because adding to a system is risking more complexity.

Yes, yes, of course this is simplistic. There are senior developers who excel at taking on unsolved problems and finding new creative designs.

But eventually, if you're taking responsibility for a working system, you're scared of complexity.

Now, why is that? What's the downside of complexity? And why doesn't anybody else get it?

We're going to be simplifying what a business is using two loops.

This is the first loop; marketers, salespeople, product managers, the CEO, they all live here:

The main goal of this loop is to try and learn. The business wants to take things to market and then get feedback on whether they've got something valuable or not.

The monster, for people in this loop, is uncertainty.

And uncertainty is cruel because no strategy is guaranteed to work. When combined with time (compensation for marketing/sales, or payroll for founders, or data for product managers) it can feel like taking things to market as fast as possible is the only way to reduce uncertainty before a deadline. The more you can take to the market, the more you can get feedback from it, the more you can (potentially) reduce uncertainty.

This loop, and all companies start with this loop, is about pure, raw, speed.

But what happens when a business gets customers?

Ah, now, here's our second loop. People paying for a service.

This loop is where a lot of senior developers find themselves in. The main goal in this loop is the continuation and guarantee of service.

Keep things working, keep things understandable, keep things debuggable, keep things fixable, keep things teachable, keep things stable.

Senior developers worry about stability because they take responsibility for the business to continue serving customers.

And what risks all of that?

Complexity.

It makes a system less understandable, less debuggable, less fixable, less teachable, and ultimately, less stable.

Rising complexity = lowering stability = senior developer failing responsibility = bad bad not nice, payments interrupted, everybody sad.

So, if the first loop's goal was uncertainty reduction, the second loop's goal is complexity management.

But why does this lead to communication failure?

Because once you have customers, both loops are running simultaneously. A business needs to both explore possibilities and serve customers at the same time.

Ok, now you might be able to spot my answer to the question in the title of this post.

Depending on which loop you spend your time on, your problem is framed differently (which is why I think developers get split in their opinions on AI; some work more on one loop than the other)

This was the story of the people in the first loop:

But this was the story of the senior developer in the second loop:

The stories don't match.

The more requests to build and add to the system the senior developer gets, the more the senior developer wants to respond with "uhhh, no complexity … maintenance costs … understandability … speed of continuing development … productivity over time …".

But that does nothing to address the rest of the business's need for reducing uncertainty.

The copywriter's diagnosis: You can't explain away someone else's problem using your own problems.

And the copywriter's prescription: You need to describe your solution as a solution to their problem as well.

Senior developer's fail to communicate because they express their problems in terms of complexity management when they should be expressing their solutions in terms of uncertainty reduction.

By acknowledging that what the rest of the company is seeking for is uncertainty reduction, the senior developer can use their expertise to help.

And what's the most useful skill a senior developer has? The reluctance to build what's not necessary; the ability to spot an opportunity to re-use something already built.

Need to collect survey data? Google forms, baby.

Need to build a whole new feature to test it? Have you tried putting a button in the existing UI and seeing if people click it?

Need new analytics service? What's the most important decision we need analytics for? Can we start with one decision, one chart, one metric?

You want to bake me a whole birthday cake? Just put a candle on my sandwich.

This is what senior developers learn to do: they learn how to give people what they want by being resourceful with existing software.

But how do you communicate this without sending people whole essays?

Copywriters love boiling down multiple signals into singular phrases. And so, here's the magical phrase every senior developer must learn: 'Can we try something quicker?'

The use of 'quicker' acknowledges what they're really looking for; 'something' implies another way of achieving it; 'try' implies imperfection, but also the possibility of it being good enough.

It perfectly cuts down to the requirement of the rest of the company, speed to reduce uncertainty, while allowing the senior developer to exercise their expertise: reduce, re-use, and if life is truly a blessing, avoid.

That's it. That's my answer to the title of the post: senior developers talk in terms of complexity when everyone else is worried about uncertainty.

But! Big but!

AI now seems to make all of this pointless, doesn't it? Why reduce? Why re-use? Why avoid? The AI can build so much in so little time.

Ah, well, it can't yet do the one thing senior developers still do.

Take responsibility.

Senior developers care a lot about understanding the system because understanding allows fixing it when things go wrong. It allows extending it intelligently when the system needs to grow. It allows, more than anything, the continued, reliable servicing of paying customers.

AI threatens this understandability. It is incredible at improving the speed of taking things to the market, but it also affects the other loop, the one the senior developers are responsible for.

If you have a bunch of AI agents, junior developers, non-developers, and your investors and their mothers adding code into the system, you get a system that overcompensates for speed by giving up stability.

This was the business in two loops:

And this is how AI affects the two loops:

Forget maintaining stability, AI is a downright destabilizer. It worsens understandability, fixability, debuggability, teachability, guaranteability, all the bloody bilities.

AI does this and takes no responsibility.

Not nice. This is the senior developer's main worry that's being brushed away.

Luckily, senior developers have a few tricks up their sleeve.

Namely: decoupling.

For the longest time, software developers were the only ones who could build software. They were responsible for both loops.

That's one system supporting two goals.

What if we had two systems, one for each goal?

An analogy: a fiction writer rushes to complete a first draft (often called a vomit draft) and later extracts what's working and gets rid of what's not. There's an editing process after the first initial rapid write. The editor's job is to take the bits that are working well and shape it all into a cohesive whole.

What if we had one system just for speed? Everyone focused on bringing things to life could work here. AI agents, our own generated and unreviewed code, junior devs, marketing etc.

We could call this the 'Speed' version of the system. It's not meant to be understandable, the goal is getting things good enough to take it to the market for feedback.

And then what if we had a second system focused on stability?

We could call this the 'Scale' version of the system. It's designed by senior developers to be stable, understandable, and scalable.

The 'Speed' version allows the rest of the business to continue learning from the market, as the senior developers build a trailing version of the system that's well-reviewed and understandable.

Plus, the design of the 'Scale' version is influenced by what worked and what doesn't work in the 'Speed' version of the system.

Features get built on 'Speed' but then stabilized on 'Scale'.

What this looks like in practice might be unclear, but the idea is to have a well-communicated de-coupling that explains that there's a difference between going for speed and going for stability.

Imagine you get asked to build something ambitious, and you say:

"Sure, I'll have the Speed version ready in 3 days. Then the Scale version in about 6 weeks."

They get what they want, speed and momentum. You get what you want, observation and design.

Maybe?

Your thoughts, senior software developer?

Or should I say, senior software editor?

Tech mobileiosapple

Apple Plans Customizable Camera for Pros, Siri Design Changes in iOS 27

Apple will let users fully customize the iPhone Camera app in iOS 27, with movable controls and customizable feature placement targeting pro photographers.

Summary

What: Apple will make the iPhone Camera app fully customizable in iOS 27, letting users choose which controls appear and where they're placed. The update will also bring system-wide design changes including new animations, redesigned tab bars, and a revamped Siri. Apple will unveil iOS 27 at WWDC on June 8, 2026.
Why it matters: This signals Apple moving away from its traditional locked-down interface approach, giving pro users granular control over a core system app for the first time.

Original Article

Apple plans to upgrade its Camera app to make it fully customizable. Users will be able to choose which features appear and where they're placed. The next major iPhone update will include other design changes, including system-wide changes such as new animations and redesigned tab bars. The update, which will include a revamped Siri, will be unveiled on June 8 at WWDC.

Tech startuphardwarespacexipo

SpaceX eyes global spaceports as Starship launch ambitions grow ahead of IPO

SpaceX eyes global spaceports for thousands of annual Starship launches ahead of June IPO targeting $75B at $1.75T valuation.

Summary

What: Elon Musk announced SpaceX is scouting US and overseas locations for spaceports to support thousands of Starship launches annually. The company is targeting a June IPO to raise up to $75 billion at a $1.75 trillion valuation, which would be the world's biggest IPO. Starship's 12th test flight is scheduled for May 19, debuting next-generation rocket, Super Heavy booster, and Raptor engines.
Why it matters: SpaceX is shifting from proving rocket reusability to industrializing it, building global infrastructure for airline-like launch cadences. The $1.75T valuation indicates investors now treat rapid reusability as proven business model, not moonshot R&amp;D.

Decoder

  • Starship: SpaceX's fully reusable heavy-lift rocket system designed to carry 100+ metric tons, central to Mars colonization and satellite deployment plans.
  • Super Heavy: First-stage booster for Starship, designed for rapid reuse with minimal refurbishment.
  • Raptor engines: SpaceX's methane-fueled rocket engines powering both Starship and Super Heavy.

Original Article

SpaceX eyes global spaceports as Starship launch ambitions grow ahead of IPO

SpaceX's Elon Musk gives an update on Starship

May 12 (Reuters) - SpaceX is scouting potential U.S. and overseas locations to build "spaceports", CEO Elon Musk said in a post on X on Tuesday, as the company prepares for a future in which its massive Starship rocket could launch thousands of times a year.

Unlike traditional expendable rockets, Starship is designed for rapid reuse with minimal refurbishment, a capability SpaceX says is essential for lowering launch costs and enabling thousands of flights annually.

The fully reusable rocket system, designed to carry more than 100 metric tons of cargo, is central to Musk's long-term plans for Mars colonization, satellite deployment and rapid global transportation.

"It's no secret that we intend to launch Starship a lot, targeting thousands of flights per year," the company also said in a statement posted online, adding that such a cadence would require launches from "many different locations."

Musk had previously said future spaceports could operate more like airports, handling multiple launches a day with rapid rocket turnaround times.

SpaceX is targeting a June listing, which is expected to be the world's biggest initial public offering, as the rocket maker seeks to raise as much as $75 billion at a valuation of roughly $1.75 trillion, sources had earlier told Reuters.

It currently launches Starship test flights from its Starbase facility in Texas and is developing additional infrastructure in Florida. The expansion push comes as commercial launch activity surges globally, straining capacity at major launch sites.

SpaceX said Starship's twelfth test flight, scheduled as soon as May 19, would debut next-generation versions of the rocket, Super Heavy booster and Raptor engines as it tests upgrades aimed at enabling full and rapid rocket reusability.

Tech aienterpriseinfrastructureeconomics

AI economics

Despite 900M users at OpenAI and enterprise deals at Anthropic, no AI lab generates enough profit to self-fund its next cluster.

Summary

What: Sriram Krishnan analyzes AI lab economics: OpenAI (900M+ MAU, consumer-heavy, low revenue per token), Anthropic (300K+ businesses, 1,000+ spending &gt;$1M/year, GPU-constrained), xAI (massive Colossus GPU supply, zero demand, partnered with Cursor). Introduces metrics: tokens per watt-year (supply efficiency), revenue per token (demand quality), revenue per watt-year (profitability). No lab has achieved self-funding threshold.
Why it matters: Reveals the AI industry's fundamental profitability problem: despite billions in investment, every lab depends on external capital because inference costs scale indefinitely while revenue per token remains low. Cloud providers (Microsoft, Amazon) win regardless of which model succeeds.

Deep Dive

  • OpenAI faces a usage mix problem: 900M+ monthly active users (mostly consumers) generate high volume but low revenue per token, with many free and paid users sending low-value prompts that compound inference costs
  • Anthropic has the opposite problem: 300K+ businesses (1,000+ spending >$1M/year) create high revenue per token and predictable usage, but demand exceeds GPU capacity, causing constant rate-limiting
  • xAI owns massive GPU supply through Colossus but has zero model utilization or developer demand, leading to partnerships like Cursor and leasing spare capacity to Anthropic
  • Google owns the full stack (TPUs, data centers, cloud, applications) with low AI cost structure, playing defensive via Gemini to protect Search ads and offensive via TPUs/Cloud
  • Meta generates $200B in ad revenue immune to AI disruption; open-sourced Llama (one-time training cost, zero inference cost) to force OpenAI and Anthropic's prices down, trading researcher appeal for market pressure
  • Microsoft invested billions in OpenAI (now worth hundreds of billions), locked in multi-billion Azure commitments, and wins regardless of model race outcomes through cloud infrastructure revenue
  • Amazon invested in both OpenAI and Anthropic simultaneously with multi-billion AWS deals, building Titanium chips with Anthropic, completely indifferent to model race winner
  • Training is a one-time capital investment (build the factory once); inference runs 24/7 and compounds with scale, consuming compute on every prompt and response
  • Tokens per watt-year measures supply efficiency (output per unit of sustained power per year through better hardware, batching, and inference stacks)
  • Revenue per token measures demand quality (price × utilization × usage mix across consumer, business, enterprise, and API tiers)
  • Revenue per watt-year combines both to determine whether a lab can self-fund its next cluster, balancing against cost per watt-year
  • Jensen Huang introduced 'Tokens per Watt × Available Watt' at GTC 2026, but this only addresses supply without guaranteeing revenue follows
  • No AI lab has crossed the self-funding threshold yet; all still depend on outside capital from investors, IPOs, cloud partners, or sovereign funds

Decoder

  • Colossus: xAI's massive GPU cluster that currently has excess capacity and is being leased to other labs
  • Titanium chips: Amazon's custom AI accelerator chips co-developed with Anthropic as an alternative to NVIDIA GPUs

Original Article

AI economics

The Players

Every AI lab has its own identity/profile:

  • OpenAI - 900M+ monthly active users, mostly consumers
  • Anthropic - 300K+ businesses, 1000+ spending >$1M/year
  • xAI - massive GPU supply via Colossus but zero demand

This creates different problems:

  • OpenAI - usage mix problem. High consumer volume and a mix of free and paid users mean low revenue per token and high inference costs to serve. Many users like you and me sending way too many low value prompts compounds fast
  • Anthropic - GPU supply problem. An enterprise heavy customer base means a great usage mix. High revenue per token and predictable usage patterns mean demand exceeds capacity. Constant rate-limiting
  • xAI - demand problem. Huge supply with Colossus without model utilization

tldr: Anthropic needs more GPUs. OpenAI needs better quality demand. xAI has supply and no demand.

xAI figured it was easier to generate developer demand through the Cursor partnership/acquisition and monetize extra capacity by leasing some Colossus GPUs to Anthropic.

Meta, Google, Microsoft and Amazon are different beasts:

  • Google - owns the full stack: TPUs, data centers, cloud and applications (Search, YouTube). Low AI cost structure because it does everything in-house. But every query that goes to ChatGPT or Claude is a lost ad impression. Strategy: (1) defensive via Gemini to protect Search ads (2) offensive via TPUs and Cloud to monetize AI infra demand
  • Meta - ad revenue ($200B) on platforms AI doesn't threaten. Invested a lot in GPU supply but used for internal ad ranking and recommendations. Open-sourced its own Llama model: one-time training cost, zero ongoing inference cost. The bet: if open-source Llama is 60-90% as good, it creates market pressure that forces OpenAI and Anthropic's inference prices down. The tradeoff is talent- the best researchers want frontier compute not boring ad optimization. Hence the outsized comp to attract talent
  • Microsoft - cloud infra provider. Bet early by investing billions in OpenAI now worth hundreds of billions. OpenAI is committed to purchasing billions of cloud (Azure) services so Microsoft gets paid regardless of who wins the model race. Strategy: (1) defensive via Copilot embedded across Office, Teams and Windows to protect enterprise software (2) offensive via Azure (Cloud) to monetize infra demand
  • Amazon - cloud infra provider. Same playbook as Microsoft but bet on both AI models simultaneously. Invested in OpenAI and Anthropic and struck multi-billion AWS cloud deals with both. Indifferent to who wins the model race. Building its own Titanium chips with co-development from Anthropic. Focus is on cloud infrastructure usage. No search, no enterprise or consumer applications to defend or worry about.

The Economics

Training vs Inference

Training is a one-time capital investment. You build the factory once and it's a sunk cost. Inference is the factory running 24/7. Every user prompt and model response consumes compute. This compounds with scale and never stops.

The core metrics:

  • Tokens per watt-year is a supply side metric. It tells you how much output you're squeezing from physical infrastructure, eg how much output per unit of sustained power per year. Better hardware, smarter batching, and efficient inference stacks all show up here
  • Revenue per token is a demand side metric. Price × utilization × mix. This is (1) price per million tokens, eg consumer, businesses, API prices, (2) usage mix, eg individual user vs free user vs paid user vs enterprise user vs API calls, (3) % utilization of compute (eg like hotel occupancy but for GPUs, servers etc)

Revenue per watt-year results from both. What you actually earn per unit of physical infrastructure. This is one number that determines whether a lab can self-fund its next cluster. The other number is cost per watt-year. It matters because each model generation requires more compute than the last.

Jensen from NVIDIA introduced Tokens per Watt x Available Watt at GTC 2026. This only applies to the supply. Revenue per token addresses the demand. Just because you can build more efficient infrastructure and secure more watts, revenue won't automatically follow. Look at xAI.

Every lab is playing this same game: squeeze more tokens per watt, price highly and acquire more power.

The question is who funds the next cluster - outside capital (investors, IPO, cloud partners, sovereign funds) or self fund in-house through operating profit. No lab has crossed the self-funding threshold yet.

Tech mobileopensourceandroidmacos

Massive leak reveals Google's Aluminium OS with a 16-minute video

Mystic Leaks posted a 16-minute Aluminium OS demo showing Link to iOS integration and web-wrapped Google apps ahead of Google's I/O announcement.

Summary

What: Mystic Leaks posted a 16-minute Telegram video of Aluminium OS running on a MacBook Pro via UTM emulator, showing virtual desktops, a Link to iOS app, and web-wrapped Google apps instead of native desktop software. Key UI elements include a bottom dock, side-sliding Quick Settings, and an Android phone-like setup wizard.
Why it matters: Google's decision to ship web-wrapped apps instead of native desktop software suggests they're testing desktop Android viability with minimal investment, hedging against ChromeOS while pursuing cross-platform integration that includes even iOS.

Decoder

  • Aluminium OS: Google's unreleased Android-based desktop operating system
  • UTM emulator: macOS virtualization software that allows running different operating systems on Apple Silicon and Intel Macs
  • Samsung DeX: Samsung's desktop mode for Galaxy devices that provides a PC-like interface when connected to a monitor
  • Link to iOS: Google's app for integrating iPhone functionality with Aluminium OS devices, similar to Windows Phone Link

Original Article

Massive leak reveals Google's Aluminium OS with a 16-minute video

Screenshots and hands-on footage show off a familiar setup wizard, virtual desktops, and even "Link to iOS" integration.

TL;DR

  • Leaker Mystic Leaks shared screenshots and a 16-minute video of Google's Aluminium OS ahead of its official debut.
  • Key features include a bottom app dock, compact side-sliding Quick Settings, virtual desktops integrated into the Recents view, and a "Link to iOS" app.
  • The OS currently feels like "plain Android" optimized for larger screens, with many Google apps appearing as web-wrapped versions rather than native desktop software.

Google is hosting The Android Show: I/O Edition later today, where we hope to learn more about Aluminium OS and Google's plan for Android on desktops. If you can't wait, this new leak is giving us a thorough look at Aluminium OS before Google gets around to it.

Leaker Mystic Leaks has shared an extensive leak about Aluminium OS on their Telegram channel. The leak includes details about the OS, screenshots, and even a 16-minute-long hands-on video of Aluminium OS.

Aluminium OS Early hands on leak (1) Aluminium OS Early hands on leak (5) Aluminium OS Early hands on leak (6) Aluminium OS Early hands on leak (3) Aluminium OS Early hands on leak (4) Aluminium OS Early hands on leak (2)

This build of Aluminium OS is running on a MacBook Pro through the UTM emulator.

The leaker notes that Google's Aluminium OS is "essentially plain Android" but with several desktop-experience features, such as:

  • Desktop folders
  • Virtual desktops
  • Optimized Quick Settings and Notifications Panel
  • Optimized apps like Task Manager
  • Ecosystem between your laptop and mobile devices (including Apple iPhones through Link to iOS).

Going a step further, the leaker calls the current Aluminium OS experience an upgraded version of Samsung DeX instead of an actual desktop-class OS. The experience in its current form lacks mouse and keyboard-optimized apps — even the Google apps pictured are web versions wrapped in a window.

There's more information on the Aluminium OS experience in the video below (skip to the 2:15 mark to avoid the long booting time):

The setup screen will allow users to set up Aluminium OS hardware for work and personal needs through the regular Google setup wizard we're familiar with on our Android phones. Once you complete the setup and land on the home screen, you see an app dock at the bottom (with an app drawer button), the ever-familiar Google Search bar, the ever-familiar icons for the Play Store, and a folder full of Google apps.

AluminiumOS on Mac (MysticLeaks) 4 57 screenshot AluminiumOS on Mac (MysticLeaks) 5 8 screenshot AluminiumOS on Mac (MysticLeaks) 5 10 screenshot

The Quick Settings panel slides down from the side in a compact form factor when you tap and pull on the battery indicator in the status bar. A similarly compact Notification Panel slides down when you click on the notification icon on the status bar. Even the Settings app and the lock screen are quite similar to those on Android phones and tablets.

The video shows Recent apps being assigned as a shortcut for the bottom-right corner. When Recent apps are accessed, users will also be able to access virtual desktops, letting them organize different workspace setups and easily switch between them. We also see the Link to iOS app installed on the system.

So far, based on this very limited hands-on leak, Aluminium OS feels underwhelming. Hopefully, Google has more tricks up its sleeve to pique people's interest in this new software endeavor. We hope to learn more soon, either through leaks or straight from the company.

DevOps securityvaultldap

LDAP secrets management now available in IBM Vault Enterprise 2.0

HashiCorp Vault Enterprise 2.0 eliminates high-privilege master accounts in LDAP through self-managed credential rotation with centralized scheduling.

Summary

What: HashiCorp Vault Enterprise 2.0 introduces LDAP secrets management with a centralized rotation manager, configurable scheduling, pause controls, automated migration from legacy systems, and self-managed credential rotation that removes dependence on high-privilege master accounts.
Why it matters: This signals a broader industry shift toward zero-standing-privilege models where even infrastructure systems shed permanent admin credentials in favor of automated, scoped rotation that narrows the blast radius of compromised accounts.

Decoder

  • Static role: In HashiCorp Vault, a preconfigured service account whose credentials are automatically rotated on a schedule, contrasted with dynamic credentials that are generated on demand and revoked immediately after use.

Original Article

Vault Enterprise 2.0 modernizes LDAP secrets management by integrating static roles into a centralized rotation manager with configurable scheduling, retries, pause controls, initial password setup, and self-managed credential rotation that removes reliance on high-privilege master accounts. Automated background migration from legacy systems preserves operational continuity while improving compliance, reducing manual overhead, and strengthening identity security through standardized credential lifecycle automation.

DevOps cloudaws

Amazon CloudWatch Logs Insights supports querying by log group tags

CloudWatch Logs Insights now queries by tags instead of explicit group names, auto-updating as your tagged infrastructure changes.

Summary

What: Amazon CloudWatch Logs Insights added tag-based querying on May 4, 2026, available in all commercial AWS regions. Query using tags like Environment:Production or Application:PaymentService instead of listing log group names explicitly.
Why it matters: Part of AWS making services more tag-aware to match infrastructure-as-code workflows where resources are dynamically tagged and grouped rather than statically enumerated.
Takeaway: If you use CloudWatch Logs Insights, update your queries to use log group tags for automatic scope updates as you add or remove tagged resources.

Original Article

Amazon CloudWatch Logs Insights supports querying by log group tags

Amazon CloudWatch Logs Insights query language now supports querying log groups using tags, making it easier to analyze logs without listing the log groups explicitly. In addition to querying logs by log group names, data sources, and facets, customers can now query using log group tags.

Tags are key-value pairs that customers can assign to log groups to categorize them — for example, Environment: Production, Application: PaymentService, or Owner: TeamName. With this launch, customers can run a query across all log groups that share common tags. As log group tags are added or removed, queries automatically reflect the matching log groups, reducing operational overhead as environments grow.

Querying by log group tags is available today in all commercial AWS Regions. To learn more, see the Amazon CloudWatch Logs documentation.

Design careercomplexityarchitecture

The Complexity of Simplicity

Bryan Cantrill argues complexity spreads like a virus in software, requiring leaders who know what to say 'No' to prevent systems from accreting unnecessary features.

Summary

What: Jim Nielsen's May 6 blog post cites Bryan Cantrill's talk on how complexity spreads virally in software when teams lack principles to say 'No'. Cantrill argues technical education should teach humility about machines working at all, and that revolutionary software fails because users' mental models aren't ready for change.
Why it matters: Simple systems require organizational discipline to say 'No', not better tools—most complexity stems from lack of unifying principles, not technical limitations.

Original Article

Bryan Cantrill:

I actually view the primarily role of a technical education, in computer engineering or in computer science / software engineering, is giving you humility that anything works at all […] that once you understand how extraordinary it is for a machine to work at all, you're humbled.

More 👏 humility 👏 please.

Cantrill makes a good point that "complexity is contagious". It doesn't merely accrue, it can go viral and spread to everything around it when not properly managed.

When simplicity in the abstraction is a non-goal, you don't know what to say "No" to.

When constructed systems are done well, there is someone at the helm, or a group at the helm, or unifying principles, that allow them to know what to say "No" to and give them permission to say "No".

New software is so hard because people aren't ready for it. No matter when you ship it:

The most calcified software of all: the software in our minds.

Which is why:

The biggest problem for a revolutionary system is to stay funded.

When you're drowning in complexity, just know that it can get better — it will get better! How do you know that? Because revolutionary ideas can change the abstractions we build on.

when you are mired in accreted systems, that is when revolution systems form. That is a solace we can take when we are in extremely complicated, accreted systems.

It's very complicated to make things simple — and it's very simple to make things complicated.

Design frontendreactai

The most important Design System in 2026 that designers missed was built by a developer

shadcn/ui, built by a developer, became the default design system for Figma Make, Cursor, and Claude's AI-generated code while designers weren't looking.

Summary

What: shadcn/ui, a copy-and-paste React component library, has become the foundation for AI-generated interfaces in Figma Make, Cursor, and Claude. Unlike traditional design systems, it gives developers full ownership and customization of accessible components that fit AI coding workflows.
Why it matters: This marks a fundamental shift in design tooling: modern design systems are now shaped by developer ecosystems and AI coding tools rather than traditional design platforms, with designers discovering the change only after developers and AI tools had already standardized on it.

Decoder

  • shadcn/ui: A React component library that provides copy-and-paste code snippets rather than npm packages, allowing developers to own and modify components directly in their codebase instead of depending on external dependencies.
  • Figma Make: Figma's AI-powered code generation feature that converts designs into working code.

Original Article

shadcn/ui has quietly become the default foundation behind many AI-generated user interfaces, powering the patterns and components produced by tools like Figma Make, Cursor, and Claude. Its popularity comes from offering clean, accessible, copy-and-paste React components that developers fully own and customize, making it an ideal fit for AI coding workflows. The broader shift highlights how modern design systems are increasingly shaped by developer ecosystems, AI tooling, and code-first infrastructure rather than traditional design platforms, with many designers only later realizing how deeply these defaults had already spread across the industry.

Tech enterprisestartup

eBay rejects GameStop's $56 billion takeover bid, calling it ‘neither credible nor attractive'

GameStop CEO Ryan Cohen's audacious $56 billion bid to acquire the much-larger eBay was rejected as neither credible nor attractive.

Summary

What: Ryan Cohen offered $125/share in cash-and-stock to acquire eBay ($48B market cap) despite GameStop's $10.3B market cap. eBay's board rejected it, citing financing uncertainty—Cohen had $20B from TD Securities plus $9B cash but a substantial funding gap remained.

Original Article

eBay has several concerns with the offer, including uncertainty around GameStop's financing, operational risks, and the debt load that would result from the proposed transaction.

Tech infrastructurecloud

Amazon launches 30-minute delivery across the US

Amazon undercuts DoorDash and Instacart with $3.99 30-minute delivery, expanding to tens of millions of Prime members by year-end.

Summary

What: Amazon launched Amazon Now, a 30-minute delivery service, in Atlanta, Dallas-Fort Worth, Philadelphia, and Seattle, expanding to tens of millions of customers by year-end. Prime members pay $3.99 per order versus $13.99 for non-Prime, covering thousands of items including groceries, electronics, and household essentials.
Why it matters: This shows Amazon using its massive fulfillment infrastructure to compete directly with venture-backed quick-commerce startups (DoorDash, Instacart, Uber Eats) by absorbing delivery costs into Prime membership, making the $139/year subscription increasingly sticky while starving competitors of market oxygen.

Original Article

Amazon Now is now widely available in Atlanta, Dallas-Fort Worth, Philadelphia, and Seattle.

Design mobileiosui

WhatsApp is getting iOS 26's Liquid Glass glow-up, and it's surprisingly gorgeous

WhatsApp is adopting Apple's iOS 26 'Liquid Glass' design language with transparency, blur, and fluid animations in its iPhone app redesign.

Summary

What: WhatsApp is rolling out a redesign for iPhone matching iOS 26's 'Liquid Glass' design language, adding transparency, blur effects, and fluid animations to navigation, keyboard, buttons, and menus. The rollout is gradual, with some elements like the chat bar still using the older flat style.

Original Article

WhatsApp is rolling out a redesigned iPhone app inspired by Apple's iOS 26 “Liquid Glass” aesthetic, bringing more transparency, depth, blur effects, and fluid animations throughout the interface. The update refreshes elements like the bottom navigation bar, keyboard, buttons, and context menus with semi-transparent, layered visuals that better match the latest iOS design language, though the rollout is gradual and some parts — like the chat bar — still retain the older flat style.

Design aicreativesurvey

Have your views on AI changed? Here's what our creative community had to say

A year after peak AI anxiety, creatives who tested tools like Gemini 3 on real client work mostly found them not ready for production.

Summary

What: Creative Boom surveyed members of The Studio, their private creative community, on how AI views have evolved since May 2025. Photo retoucher Sandrine Bascouert tested Gemini 3 and Flux Kontext on a major brand portrait job, burning through 4,000 credits on 10 images with most work redone manually. Responses split between firm opponents citing copyright theft and environmental costs, pragmatic testers who found tools unreliable for production work, and selective users who apply AI only to administrative tasks while keeping creative work human.
Why it matters: The shift from debating whether AI is coming to learning where tools actually work versus where they fail suggests the creative industry has moved past denial into empirical testing, with early professional use revealing a performance ceiling well below the marketing hype.

Decoder

  • OFFF Barcelona: Annual design and creativity conference in Barcelona, one of the major events where designers discuss industry trends and evaluate new tools.
  • Flux Kontext: Adobe's generative AI tool integrated into Photoshop for context-aware image generation and retouching, part of Photoshop's credit-based AI features.
  • The Studio: Creative Boom's private, paid community for creative professionals to discuss industry issues.

Original Article

The conversation around AI in creative industries has shifted from pure anxiety to a more complicated debate about how to live and work with a technology that is now impossible to ignore. Many creatives still oppose generative AI because of concerns over copyright theft, environmental costs, and job losses, while others who tested the tools professionally found them unreliable and not yet ready for high-end creative work. At the same time, some designers and artists see value in AI for repetitive or administrative tasks, while arguing that human creativity, taste, and originality remain essential and difficult to replace.

Design aistartup

Create Winning Short-form Content in Seconds (Website)

VC-backed Fastlane generates a month of TikTok content from your URL in 30 seconds, one user claimed 31.8M views from a single video.

Summary

What: Fastlane auto-generates TikTok, Instagram Reels, and YouTube Shorts videos from a website URL, offering 'Blitz Mode' swipe approval, AI influencer creation, 500+ AI avatars, 2,000+ human UGC videos, and direct platform scheduling. The VC-backed company has 10,000+ users and charges $29-149/month, with a free tier and managed 'done-for-you' posting service.
Why it matters: Represents the commoditization of social media marketing as AI platforms automate the full content pipeline, turning distribution from a creative skill into a volume-driven subscription service.

Original Article

Fastlane is a platform that helps users create viral short-form content for TikTok, Instagram Reels, and YouTube Shorts in seconds. Users can enter their website URL, browse thousands of auto-generated content options, and schedule posts across multiple social platforms with performance tracking included.

Design mobileproductivity

Reference Board (Website)

Reference Board auto-tags moodboard images by color, style, and OCR'd text across iPhone, iPad, and Mac without manual sorting.

Summary

What: Reference Board is a native moodboard app for iPhone, iPad, and Mac. It automatically tags saved images by color, style, scene type, and embedded text (via OCR), making inspiration libraries searchable without manual organization.

Original Article

A native moodboard app for iPhone, iPad, and Mac that auto-tags everything you save by color, style, scene, and even the words inside images, so your inspiration library stays searchable without manual sorting.

Design ux

User Journey Maps: How UX Teams Turn Friction Into Better Products

User journey maps reveal why users abandon products, but only deliver value when built from user research instead of internal assumptions.

Summary

What: User journey maps visualize every action, emotion, and pain point users encounter while achieving a goal. UX Crush's guide covers 9 core components, 4 map types, and a 5-step creation process including research-backed personas, touchpoint mapping, emotion curves, and prioritized opportunities. Organizations acting on research-based maps see stronger marketing returns and faster sales cycles.

Decoder

  • User journey map: A visualization documenting every step, emotion, and pain point a user experiences while trying to accomplish a specific goal with a product.
  • Persona: A fictional but research-based character representing a segment of users, including demographics, behaviors, and motivations.
  • Touchpoint: Any interaction point between a user and a product, service, or brand (e.g., landing page, email, support chat, checkout flow).

Original Article

User journey maps visualize every action, emotion, and pain point a user encounters while achieving a goal, explaining why users drop off rather than just where. Building one involves five steps: defining a research-backed persona, setting a specific scenario, mapping all touchpoints, capturing an emotion curve, and converting pain points into prioritized opportunities. Organizations that act on these maps see compounding business gains, including stronger marketing returns and faster sales cycles — but only when maps are grounded in real user research, not internal assumptions.

Design dataux

Designing Data-intensive Applications — Advice for Interaction Designers

After solo-building a data product, this 30-year design veteran's key lesson: data structure should drive UI, not decorate it.

Summary

What: A designer with 30 years of experience shares 10 lessons from solo-building a data-intensive product. Core principle: treat data structure as the interface driver, not an afterthought. Practical tips include learning Python to understand data models, working with realistic data from day one, bridging user mental models with system data models, and designing for empty states.
Why it matters: Signals a shift in design thinking as applications become more data-heavy: designers who can't read code or understand data models will struggle to create effective interfaces.

Original Article

A 30-year design veteran shares 10 hard-earned lessons from a year of solo-building a data-intensive product. The core insight is treating data as the interface itself — letting its structure drive page layout, navigation, and interaction rather than decorating a pre-designed shell. Practical tips include learning Python, working with realistic data, bridging user mental models with system data models, and designing thoughtfully for empty states.

Design hardwarecreative-toolsapple

Top 10 Cool Design Gadgets for Creative Professionals in 2026

Apple launches M5 chip across iPad Pro, MacBook Pro, and Vision Pro with up to 6.7x faster 3D rendering, headlining a design hardware roundup.

Summary

What: We and the Color's 2026 roundup features Apple's M5-powered iPad Pro (3.5x faster AI than M4, 6.7x faster Octane X rendering than M1) and MacBook Pro, Wacom Cintiq 24 (first update since 2019 with Pro Pen 3 at $1,299.95), reMarkable Paper Pro e-ink tablet (12ms pen latency), Elgato Stream Deck + workflow controller ($199), Logitech MX Master 3S mouse, Sony WH-1000XM6 headphones, DJI Osmo Pocket 4 4K camera, XPPen Artist Ultra 16 4K OLED display, and Elgato Key Light MK.2.
Why it matters: Design hardware is maturing past the performance arms race into tools that reduce workflow friction - physical shortcut controllers, distraction-free capture devices, and environment management now rank alongside raw processing power in professional creative setups.

Original Article

The best design gadgets for creative professionals in 2026 represent a fundamental shift in how tools relate to creative process.

Design mobileios

iOS 26 adds fun way to customize your iPhone wallpaper, here's how to use it

iOS 26's Spatial Scenes turns any photo in your library into a 3D wallpaper with depth and motion effects.

Summary

What: iOS 26 introduces Spatial Scenes, a wallpaper feature that adds 3D depth effects to photos as the device moves. Users can apply the effect to Apple's suggested wallpapers or manually enable it on nearly any photo through Wallpaper settings, where iOS automatically generates the 3D scene. The effect also integrates with the Photos app and Home Screen widgets.

Original Article

iOS 26 adds a new “Spatial Scenes” feature that brings a 3D effect to iPhone wallpapers, making photos appear more immersive and dynamic as you move the device. The effect, which is also integrated into the Photos app and Home Screen widgets, works especially well as a wallpaper by adding depth and subtle motion to personal images. Users can choose from Apple's suggested spatial wallpapers or manually enable the effect on nearly any photo in their library through the Wallpaper settings, where iOS automatically generates the 3D scene before saving it.

Design industrial-design

New York Design Week is Here

NYCxDESIGN transforms New York into North America's design capital May 14-20 with ICFF furniture fair and 70-designer light exhibition at The Seaport.

Summary

What: NYCxDESIGN Festival runs May 14-20 across Manhattan and Brooklyn, anchored by the International Contemporary Furniture Fair at Javits Center (May 17-19). Featured events include SHINE, a 70-designer light exhibition curated by Harry Allen at The Seaport, Future Now AI-design conference on May 19, and Dumbo Design Day open studios on May 20.

Original Article

New York Design Week runs May 14-20, transforming the city into one of the world's most design-saturated destinations through the NYCxDESIGN Festival. The week features dozens of events across Manhattan and Brooklyn, including the SHINE light exhibition at The Seaport and an AI design conference. The International Contemporary Furniture Fair at the Javits Center (May 17-19) serves as the week's anchor event, functioning as North America's leading contemporary design fair.

Digest devoured!

May 13

Home